1.Numpy Array,List of Python, Pandas Series,DataFrame Comparison.
Python List | Numpy Array | Pandas Series | DataFrame |
Access by position(Pandas:index/position) iloc:sort by index loc:sort by position; for datafram index means row, position means colomn. In some function, axis=0/index means sort along along index --> pick the according column, verse-versa. | |||
Access a range of elements: a[:3] | Implemented in C: faster | ||
Use loops: for item in list/array/series Numpy:apply(function) Pandas:apply() & applymap() applymap()for every element in data frame, apply() for each column or each row. | |||
Each element should have same type. | Add two series by index. | Add data frame and series:it matchs up using index of series and column of data frame. | |
Convinent functions | |||
Vectorized operations | |||
List of list | Multi-dimension | ||
Numpy array is a souped-up python list. | Each column of data frame is assumed to different type, wheras numpy's is not. | ||
Series: A cross of Numpy array and Dictionary |
2. Hint:
Booleans could be added in Python.
Numpy index array to filter data:a[b],b is boolean array.
Standarlize the data for comparison: (x-xbar)/standard_deviation.
Pearson's r: correlation coefficient numpy.corrcoef
‘+=’ vs ‘+’: a=a+... , a+=... the later is in-place operation, means modifying the object in-place without creating a new one.
b=a[:3](a is a numpy array), this slice is a view of array instead a real slice. If modify the element of b, then the origin array a is modified as well, for the slice operation like this is also in-place operation.
3. Some convient functions: Just google for exemple:numpy shift element
groupby(): creat a smaller data frame of rows that match up a certain key(key1: rainy/fine). For exemple: compare the data in rainy days and fine days.
merge():merge two data frames
shift(): shift an numpy array ,diff():calculate the accumulative values
4.The procedure of data analysis
First Give a shot to structure the datas, ask questions about the datas, which part are we interested in? what problem we would like to solove? Indluge your curiosity!放任自己的好奇心,不放过任何探索数据的可能。
Then wrangling the datas, or cleaning, making the datas ready for analyse.
Analyze, build intution and find patterns about the datas.
Draw a conclusion or make a prediction: do not mistake correlation for causation!
Communicate the results with your fellows. Share it via blog, forum ,etc...
In fact, throughout the process, we may need to return to wrangle the datas and refine our questions as we are more familiar with our dataset.
5.要积极主动。
意思是说,在学习未知的事物时候,要做一个主动的学习者,倾听者,在接收的过程之中带着问题,并做推理,预想下一步直到最终的结果。这个感悟是来自于做听力,以及网课教授numpy用法的时候。这一点很重要,假如只是被动的接收,光顾着抄笔记,很快就会忘记,就会丧失注意力,思维也没有得到锻炼。主动的倾听,本身也是建立知识谱图,不断地进行思维反馈的过程。