We used the transformed training datasets(from au to wav, 16k,mono), extract the same features, and the same classifier, we found that the accuracy was a little decreased.(from 80.6% to 78.9%) It is maybe because when we doing the conversion(from au to wav, some data was lost...)
And the most higher accuracy(82.0333%) is achieved by split the training data wav form into 3 segments(10s each)...