摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程,第六章《Octave教程》中第39课时《移动数据》的视频原文字幕。为本人在视频学习过程中记录下来并加以修正,使其更加简洁,方便阅读,以便日后查阅使用。现分享给大家。如有错误,欢迎大家批评指正,在此表示诚挚地感谢!同时希望对大家的学习能有所帮助。
In this second tutorial video/article on Octave, I'd like to start to tell you how to move data around in Octave. So, if you have data for a machine learning problem. How do you load that data in Octave. How do you put it into matrix? How do you manipulate these matrices? How do you save the results? How do you move data around and operate with data?
>> A
A =
1 2
3 4
5 6
>> size(A)
ans =
3 2
>> sz = size(A)
sz =
3 2
>> size(sz)
ans =
1 2
>>
Here's my Octave window as before, picking up from where we left off in the last video/article. If I type A, that is the matrix so we generate it, right? The size command in Octave tells you what is the size of a matrix. So size(A) returns 3,2. It turns out that the size command itself is actually return a 1x2 matrix. So you can set sz=size(A) and sz is now a 1x2 matrix where the first element of this is 3 and the second element is 2. So, if you type size(sz), this sz is a 1x2 matrix whose 2 elements contain the dimensions of the matrix A.
>> size(A, 1)
ans = 3
>> size(A, 2)
ans = 2
>> v = [1 2 3 4]
v =
1 2 3 4
>> length(v)
ans = 4
>>
You can also type size(A,1) to give you back the size of the first dimension of A. So that's the number of rows and size(A,2) to give you back 2, which is the number of columns in the matrix A. If you have a vector v, so let's say v=[1 2 3 4], and you type length(v). What this does is this gives you the size of the longest dimension. So you can also type length(A) and because A is a 3x2 matrix, the longer dimension is of size 3. So this should print out 3. But usually we apply length to vectors. So you know, length([1;2;3;4;5]) rather than apply length to matrices, because that's more confusing.
>> load featuresX.dat
>> load('featuresX.dat')
Now, let's look at how to load data and find data on the file system. When we start an Octave, we're usually in a path, that is, the location of where the Octave program is. So the pwd command shows the current directory or the current path Octave is in. The cd command stands for change directory. The ls command lists directories of current path. In fact, on my Desktop are two files: featuresX and priceY, that's maybe come from a machine learning problem I want to solve. The file featuresX is a two columns of data. This is actually my housing prices data. I think I have forty-seven rows in this dataset. So the first house has size 2104 square feet, has 3 bedrooms; second house has 1600 square feet, has 3 bedrooms, and so on. And priceY is this file that has the prices of the data in my training set. So featuresX and priceY are just text files with my data. How do I load this data into Octave? So I just type command load featuresX.dat, and if I do that, I load the featresX and can load priceY.dat. And by the way, there are multiple ways to do this. This command if you put feaureX.dat in strings and load it like so. This is an equivalent command. Octave use single quotes to represent strings.
>> who
Variables in the current scope:
featuresX priceY
>> size(featuresX)
ans =
47 2
>> size(priceY)
ans =
47 1
>>
Now the who command shows me what variables I have in my Octave workspace. So who shows me what the variables that Octave has in memory currently, featuresX and priceY are among them. As well as the variables that we created earlier in this session. So I can type featuresX to display featuresX. And I can type size(featuresX) and that's my 47 by 2 matrix. And similarly size(priceY) that gives me my 47 by 1 vector.
There's also the whos variable that gives you the detailed view.
>> whos
Variables in the current scope:
Attr Name Size Bytes Class
==== ==== ==== ===== =====
ans 1x2 16 double
featuresX 47x2 752 double
priceY 47x1 376 double
Total is 143 elements using 1144 bytes
>> clear featuresX
>> whos
Variables in the current scope:
Attr Name Size Bytes Class
==== ==== ==== ===== =====
ans 1x2 16 double
priceY 47x1 376 double
Total is 49 elements using 392 bytes
>>
Now if you want to get rid of a variable, you can use the clear command. So clear featuresX and type whos again. You noticed that the featuresX variable has now disappeared.
>> v = priceY(1:10)
v =
10000
30000
50000
20000
55000
10000
30000
50000
20000
55000
>> who
Variables in the current scope:
ans priceY v
>> whos
Variables in the current scope:
Attr Name Size Bytes Class
==== ==== ==== ===== =====
ans 1x21 21 char
priceY 47x1 376 double
v 10x1 80 double
Total is 78 elements using 477 bytes
>> save hello.mat v;
>> ls
Volume in drive C is VtasSOE
Volume Serial Number is E269-0541
Directory of c:\ecosys\toolkits\prog\Octave
[.] [..] 1.dat example1.m featuresX.dat hello.mat priceY.dat
5 File(s) 895 bytes
2 Dir(s) 45,769,801,728 bytes free
>> clear
>> who
>> load hello.mat
>> save hello.txt v -ascii
>> ls
Volume in drive C is VtasSOE
Volume Serial Number is E269-0541
Directory of c:\ecosys\toolkits\prog\Octave
[.] [..] 1.dat example1.m featuresX.dat hello.mat hello.txt priceY.dat
6 File(s) 1,065 bytes
2 Dir(s) 45,771,878,400 bytes free
>>
And how do we save data? Let's take the variable v and set it to priceY(1:10). This sets v to be the first 10 elements of vector priceY. So let's type who or whos. Whereas priceY was a 47 by 1 vector. v is now 10 by 1. Let's say I wanna save this to disk, the command save hello.mat v; This will save the variable v into a file called hello.mat. And let's say I clear all my variables. So, if you type clear without anything then this actually deletes all of the variables in your workspace. So type whos, there's now nothing left in the workspace. And if I load hello.mat, I can now load back my variable v, which is the value, that data that I previously save into the hello.mat file. So, hello.mat, what we did just now to save hello.mat v this save the data in a binary format, a somewhat more compressed binary format. So if v is a lot of data, this will be somewhat more compressing, will take less of the space. If you want to save your data in a human readable format, then you type save hello.txt v -ascii. So this will save it as text or as ascii format of text. And now, once I've done that I have this file hello.txt has just appeared on my desktop. So that's how you load and save data.
>> A = [1 2; 3 4; 5 6]
A =
1 2
3 4
5 6
>> A(3,2)
ans = 6
>> A(2,:)
ans =
3 4
>> A(:,2)
ans =
2
4
6
>>
Now let's talk a bit about how to manipulate data. Let's set A to that matrix again. Let's talk about indexing. So if I type A(3,2), this indexes into the 3,2 element of the matrix A. Normally, we write this as A_{3,2} or A_32. I can also type A(2,:) to fetch everything in the second row. And similarly, if I do A(:,2), then this means get everything of the second column of A.
>> A([1 3],:)
ans =
1 2
5 6
>>
Now, you can also use some of the more sophisticated index in the operations. Like A([1 3],:) = [1 2; 5 6].
>> A
A =
1 2
3 4
5 6
>> A(:,2) = [10; 11; 12]
A =
1 10
3 11
5 12
>>
You can also do assignment. A(:, 2) = [10; 11; 12]. Now A is the matrix [1 10; 3 11; 5 12].
>> A = [A, [101; 102; 103]]
A =
1 10 101
3 11 102
5 12 103
>>
Another example: A = [A, [100; 101; 102]], this appends another column vector to the right.
>> A = [1 10 100; 3 11 101; 5 12 102]
A =
1 10 100
3 11 101
5 12 102
>> A(:)
ans =
1
3
5
10
11
12
100
101
102
>>
And finally, one neat trick that I sometimes use. If you just do A(:), that is a somewhat special case syntax. It puts all elements of A into a single column vector, and this gives me a 9x1 vector.
Another few examples.
>> A
A =
1 2
3 4
5 6
>> B
B =
11 12
13 14
15 16
>> C = [A B]
C =
1 2 11 12
3 4 13 14
5 6 15 16
>> [A, B]
ans =
1 2 11 12
3 4 13 14
5 6 15 16
>> C = [A; B]
C =
1 2
3 4
5 6
11 12
13 14
15 16
>> size(C)
ans =
6 2
<end>