您好,说明很长,我试着贴出来。
说明
--------------------------------------
Project
You should see usps_main.mat, and usps_benchmark.mat. These are samples of some US Postal Service handwritten digit data – the usps_main set contains 500 examples of each of the digits 0-9 (so contains 5,000 digits totally). The usps_benchmark set is a different 500.
The aim of the project is to correctly recognise the “benchmark” examples of digits 3, 6, and 8. You should use only data from the “main” set for the training of your KNN rule.
COPY THESE .MAT FILES TO YOUR DIRECTORY, BUT LEAVE THE OTHER FILES
You are provided with an implementation of a K-nearest neighbour classifier. At the Matlab command line, do help knearest to find out how to use it. You also have the following utility functions:
showdigit - displays image for a single digit
getonedigit - retrieves pixels for a single digit
showdata - displays a group of digit images
crossfold - partitions the data
shufflerows - shuffles the rows
extractfeatures - transforms the raw pixels
Use the help command again to find out how they work – be sure to know what arguments they take, and what they return, so you don’t bother the demonstrators with something that you could solve by just reading the helpfile.
You load the digit data into memory with load usps_main.mat. After that, you can use the various functions listed above. As a starter, try the following:
getonedigit(3, 123, maindata)
Notice that we passed the variable maindata as an argument – this is the variable that should have been loaded when you typed load usps_main.mat. Now pass the output of that function to the function showdigit. A number ‘3’ should be displayed be on the screen. This is the 123rd example of a number ‘3’. There are totally 500 examples of each of the digits 0-9. Try others for yourself.
Since Matlab indexes arrays from 1 (I hope you all realised that by now) then to get examples of the digit ‘0’, you use array index 10, as follows:
showdigit( getonedigit(10, 312, main) )
Take note, all that the getonedigit function is doing is pulling out columns of the variable main. You could do this equally well yourself, and probably write something to make it more flexible. Look at the code by typing edit getonedigit.
-------------------------------------------------------------------------------------------------------
PART 1 – Easy Questions to get you started
1. Write a script that constructs a training dataset of just ‘3’ and ‘8’ digits. To start, pick 100 of each randomly. Your matrix should end up as 2D, 200 rows by 256 columns. Remember to include the true label for each digit, in another array, called labels (or whatever you want).
1. Use the KNN rule to classify each of the digits in your training set, and report the accuracy. Plot a graph to display the accuracy as you vary K.
1. Break your training set randomly into 2 parts, one part you will use for training, and one part for testing. Plot a testing accuracy graph, again varying K.
1. Repeat the random split and reclassify. Do you get the same behaviour? Plot an average, and standard deviation as error bars. Remember all graphs should have axis labels and a title. If you don’t know what Matlab commands to use, try Googling.
You can visualise your classifications using the showdata function provided. When you are comfortable with all this, extend the above to load and predict digits ‘3’, ‘6’, and ‘8’.