训练过程主要参考了一个博客
http://blog.csdn.net/wuxiaoyao12/article/details/39227189
训练过程中遇到很多问题。
1、运行了一会儿就结束了,当时忘记截屏,大概是这样的:提示:parameters cannt be written.the data/param.xml cannt be opened.查了一些资料说,这种情况要自己创建一个data文件夹,很多情况下data文件夹是自动创建的。果然,自己创建了一个data文件夹就好了。
2、
训练过程到7-stage就结束了。看了很多资料,感觉分类器是训练好了,不是出错,但是因为设置的-minHitRate不高,所以提早达到了要求就结束了,但是分类器的检测效果可能不会很好。另一方面训练样本也不多。
在一个网站看到的这段话,感觉讲的很有道理:
If you have a small number of data, you need less number of stages to achieve the required false alarm rate you set up. This means that the cascade classifier is “good enough” so it doesn’t have to grow further. The total false positive ratio is actually multiplied by every stage’s ratio, so after a point, the value is achieved.
In your options you set it up to 0.9. Consider making it higher, like 0.95 or more.
Apart from that, your datasets are small, so it’s easier for the algorithm to get good results when validating on them during training. The smaller the dataset, the easier for the classifier to be trained, so less stages are required. But this doesn’t mean that it’s better when running on real data. Also, if you keep the training size low and set a higher ratio, consider that the classifier will need more stages to finish and will be more complicated, but it’s very possible that it will be over-trained on the training set.
To conclude, if the nature of your positive and negatives that you have, is making them easy to seperate, then you don’t need so many samples. Of course that depends on what you are training the algorithm for. With your amount of samples, the 10 stages you put are a lot, so the algorithm terminates earlier (it’s not necessarily bad).
When I was training faces, I think I had around 1 thousand of positive (including all the rotations/deviations) and 2-3 thousands of negatives, to need a classifier of around 11-13 levels, if I remember correctly.
The tutorial of Naotoshi Neo had helped me a lot.
Also, what I noticed now, as Safir mentioned, you have too few negative samples comparing to the positive ones. The should be at least equal in number, preferably around 1.5 - 2 times more than the positives.
3、还有一种情况:训练到第6stage就卡住了,像死循环,后来退出来增加了一些负样本进去,好了。
最后训练好的cascade.xml拿来检测,发现效果不好
原因可能有(1)训练样本还不够多吧,我这里用了819张正样本,1060张负样本,训练的numStage =10
(2)detectmultiscale()这个函数的使用还要切磋