martes, diciembre 06, 2005

Cross-Validation VS Bootstrap

When you develop a Machine Learning Technique, you need to know how better is your solution compared with other solutions.
There are a lot of methods, but the most used are Fold-Cross Validation and Bootstrap. Both are commonly used in classifiers system. In my thesis "Rule Induction Using Ants", we find not knowing with of two techniques we would use.

In paper "A study of Cross-Validation and Bootstrap dor Accuracy Estimation and Model Selection" written by Ron Kohavi are some experiments in C4.5 and Naive-Bayesian Classifiers. The results with that algorithms and six datasets are:

  • Bootstrap has low variance, but large bias in some problems.
  • K -Fold Cross Validation with moderate values (10-20), reduce the variance but increase bias.
  • Using Stratified strategy is better in terms of variance and bias, comparated with Regular Cross Validation.
So it seems that 10-Fold-Cross Validation is the best strategy tu use, but, are any better technique rather than Cross Validation and Bootstrap? And more important, that study has used only six datasets, has anybody know any study that say in which cases are better Bootstrap and which cases are K-Fold-Cross Validation?

That's all folks, I wish this Post could help someone.

0 comentarios: