lunes, julio 17, 2006

Offspring matters ....





As you know, in XSD files, there is an attribute to mark an element as abstract. This means that other element will extend it. Typical example of Figure, with Circle, Square, ... as children.

Ok, that's perfect. Imagine you are using XmlBeans (XML binding). You have a list of FigureType, and want to iterate over all figures, and depending its type, make one thing or another (for example drawing the figure). In this case an instanceof approach won't work because in XML Beans we are working with interfaces, not with implementations object, so for example SquareType interface doesn't extends from FigureType interface. So how can we implements the previous example?

Using two thing, one working wiht DOM repesentation instead of object representation, and next using type attribute that all extentended tags require. An example:

<figure type="SquareType" area="c*c" sides="4">

Take a look that the tag is figure but square information is present.

Now let's see the java code to treat this example:

It is as simple as we can imagine:

First we return the DOM representation

final Node node = figureType.getDomNode();

Next we gets the type attribute using DOM library.

final NamedNodeMap nodeAttMap = node.getAttributes();

final String type = nodeAttMap.getNamedItemNS("http://www.w3.org/2001/XMLSchema-instance", "type").getNodeValue();

Remember to change the first parameter of getNameItemNS if namespace of type attribute is changed.

and finally simply a hell if-then-else sequence:

if("SquareType".equals(type)) {

(SquareType) figureType.changeType(SquareType.type));

} else {
...
}

Take a look at line inside if; we must cast from FigureType to SquareType using changeType method.

And that's all, simple and easy, only tedious because of if-then-else chain.

martes, julio 11, 2006

Debianizing Java


When you go to java.sun.com to download JDK, you find a disagreeable surprise, only a RPM or a BIN file is found. What's happening if you have a Debin/Ubuntu distribution? Probably you would like to work with a DEB file, but it is not available in java sun site. A simple solution can be done.

First you download the BIN JDK file you want to install. Then using synaptic, you download fakeroot aplication and make-jpkg application. And finally, you apply the next command, fakeroot make-jpkg .bin.

After a few moments you will have a .deb file, so you can install using dpkg tool. After that, because we want to execute by default this new java virtual machine/compiler, execute update-alternatives --config java, and choose the new jdk installed.

And thats all, as simple as you can imagine.

viernes, marzo 17, 2006

Cramer Versus Cramer

Cramer vs Cramer, Java Virtual Machine vs Java Virtual Machine. This article try to give information of what is the best java virtual machine for a given problem. To do this we have choosen two benchmarks so we can give an objective opinion of which are the best virtual machine. There is a lot of benchmarks and a lot of java virtual machines, but we have focused into two virtual machines, Java HotSpot(TM) Client VM (build 1.5.0_05-b05) and IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Linux x86-32 j9vmxi3223-20051103 (JIT enabled), and two benchmarks, JGF Benchmark http://www.epcc.ed.ac.uk/javagrande/javag.html and Richards and DeltaBlue Benchmark http://research.sun.com/people/mario/java_benchmarking/download2.html.

I would try to explain both benchmarks, but for more information please visit its site.

JGF is composed by three big groups of benchmarks, Sequential Benchmark, Multi-Threaded Benchmark and MPJ Benchmark.

  • The sequential benchmarks, suitable for single processor execution.
  • The multi-threaded benchmarks, suitable for parallel execution on shared memory multiprocessors.
  • The MPJ benchmarks, suitable for parallel execution on distributed memory multiprocessors.

For our porpouse only the first and second benchmarks would be executed.

Sequential Benchmark also have the next subsections:

  • Arith: execution of arithmetic operations
  • Assign: variable assignment
  • Cast: casting
  • Create: creating objects and arrays
  • Loop: Loop overheads
  • Math: execution of maths library operations
  • Method: method invocation
  • Serial: Serialisation
  • Exception Exception handling
  • Serial: Read/Write Lists
  • Method: executing methods in same object or foreign object

  • Series: Fourier coefficient analysis
  • LUFact: LU Factorisation
  • SOR: Successive over-relaxation
  • HeapSort: Integer sorting
  • Crypt: IDEA encryption
  • FFT: FFT
  • Sparse: Sparse Matrix multiplication

  • Search: Alpha-beta pruned search
  • Euler: Computational Fluid Dynamics
  • MD: Molecular Dynamics simulation
  • MC: Monte Carlo simulation
  • Ray Tracer: 3D Ray Tracer

Multi-Thread Benchmark also have the next subsection:

  • ForkJoin: forking and joining threads
  • Barrier: barrier synchronisation
  • Sync: synchronized blocks and methods
  • Series: Fourier coefficient analysis
  • LUFact: LU Factorisation
  • SOR: Successive over-relaxation
  • Crypt: IDEA encryption
  • Sparse: Sparse Matrix multiplication

  • MolDyn: Molecular Dynamics simulation
  • MonteCarlo: Monte Carlo simulation
  • RayTracer: 3D Ray Tracer


At the heart of Richards is a task dispatcher (Richards.schedule). Tasks come in 4 different flavors, each represented by a class (DeviceTask, HandlerTask, IdleTask, WorkTask, all subclasses of Task). Each kind of task has an associated work function (fn). At startup (Richards.run), a particular task mix is created, and then the tasks are scheduled, each having its work function invoked. The work functions manipulate work packets and packet queues. At the end of the benchmark (in Richards.run) the number of queued and held packets is checked against the correct value to assist in verifying that the benchmark ran correctly.

For our pourpouse we have executed all sequential benchmark. And only the first group of multi-thread becnhmark. Also, we have runned Richards Benchmark.

All this benchmarks has been runned on a Pentium IV 3 GHz and 512 MB RAM, on Ubuntu Linux 5.10. The optional arguments for java command has been -Xms256m -Xmx512m because there is some tests that needs as memory as possible or a RuntimeException would be thrown.

Lets see the results of the first section of sequencial benchmark.


In this graph we can see that for mathematical operations is faster Java HotSpot than IBM. IBM Java Virtual Machine only wins in creations of objects and rounding numbers, in all other operations, like managing lists, arrays or casting is better Java HotSpot.

About section 2 in sequencial benchmark, we can see that we have the same tests but with three different results, that is because the size of data increases in every test. See JGF homepage to see how the values increases because there is no standard growth between the tests.


We can say that in all tests Java HotSpot is much faster that IBM JVM, this is because HotSpot is better in numerical problems than IBM Java Virtual Machine.


Although the differences has been minimized, Java HotSpot is faster than IBM JVM.



When the load is heavy, IBM JVM in some tests are faster than Java HotSpot, surely we can say that IBM JVM manages memory data better than Java HotSpot.

Now, let's do the same, but with the second group of benchmarks, remember that only the first group, because we are not interested in knowing how fast is a JVM executing mathematical operations in parallel, because we have already executed when I was executing sequential, what we really wants to know is which Java Virtual Machine manages the threads faster.














As we can see, with only one thread, in this case (and all future cases), IBM Virtual Machine is faster than Sun Java HotSpot. See that synchronizers in IBM is spectacular in comparision to Java HotSpot.

Note that in forkjoin a negative value is calculated. I have tried three times the same test and with one thread, always the result has been negative. I think that an overfload has been produced. Of course it is impossible a negative count of forkjoins.

















With two threads, the difference between both JVM has been reduced in three first tests, but in synchronizations, IBM still be the best ione in difference.


With 5 , 10 and 20 threads happens the same as 2 threads. IBM still being faster than Java HotSpot.















































Finally, although first tests sun VM is faster than IBM, we can say but that with a lot of threads synchronizating, IBM VM is faster than sun VM.














































In summary, in thread environment like web servers, IBM JVM seems to be better than SUN VM. On the other hand, SUN VM in standard applications offer a better throughput.

jueves, diciembre 15, 2005

... and winter arrives to woodchuck.

In this blog, I would try to justify why use Hibernate instead of plain JDBC.

What is Hibernate? Hibernate is a powerful, high performance object/relational persistence and query service. What offer Hibernate? What makes better from other solutions?

  • First of all, Hibernate is considerated a professional open-source project.
  • It implements all Object Oriented Programming features like, associations, inheritance, polymorphism, composition and collections.
  • It is transparent to JDBC connections, but doesn't hide.
  • It has its query engine, HQL, Criteria and Query by Example, as well as native SQL.
  • Hibernate decouples business objects from RDMS, thanks of dialects. You can change your RDMS without changing business database process.
  • Hibernate lets you implements quality code, because it takes you to use good practices in patterns world. So easy to develop a Generic DAO pattern.
  • Easy to use. It is so easy to implements typical operations of the data layer. For example to insert a tuple into database it is as easy as call session.save(Object obj).
  • The model objects (Value Objects or Tranfer Objects), haven't any dependency to hibernate, you musn't extend them from any hibernate class. Hibernate would use reflection and configuration files to know which classes and fields must be supported by database operations.
  • Hibernate can work with cache services and transactional services. Typically JTA and JBossCache, but you can use any other.
  • Model objects have to be configured into Hibernate Engine. For this configuration, you have three possible strategies, XML Files, JDK5.0 (EJB3) Annotations or Hibernate XDoclet, so developer has variability at his point.
  • Hibernate Validator is an annotated solution for validating model objects, you can validate fields of your object model without implementing any conditional, only using field annotations.

martes, diciembre 06, 2005

Cross-Validation VS Bootstrap

When you develop a Machine Learning Technique, you need to know how better is your solution compared with other solutions.
There are a lot of methods, but the most used are Fold-Cross Validation and Bootstrap. Both are commonly used in classifiers system. In my thesis "Rule Induction Using Ants", we find not knowing with of two techniques we would use.

In paper "A study of Cross-Validation and Bootstrap dor Accuracy Estimation and Model Selection" written by Ron Kohavi are some experiments in C4.5 and Naive-Bayesian Classifiers. The results with that algorithms and six datasets are:

  • Bootstrap has low variance, but large bias in some problems.
  • K -Fold Cross Validation with moderate values (10-20), reduce the variance but increase bias.
  • Using Stratified strategy is better in terms of variance and bias, comparated with Regular Cross Validation.
So it seems that 10-Fold-Cross Validation is the best strategy tu use, but, are any better technique rather than Cross Validation and Bootstrap? And more important, that study has used only six datasets, has anybody know any study that say in which cases are better Bootstrap and which cases are K-Fold-Cross Validation?

That's all folks, I wish this Post could help someone.

Hello Everyone

Hello, this is my first Blog. I just don't know when I could write off another, but I promise writing new entries every time I can.

This blog try to be a discussion blog of Artificial Intelligence and Computer in General, but of course I am open to talk about any theme it could be interesting.

Also I would try to publish my research results in Artificial Intelligence and concretaly Emergent Intelligence.

I wish everyone could post comments and different points of view.