数据分析(入门)-Numpy&Pandas

1.Numpy Array,List of Python, Pandas Series,DataFrame Comparison.

 
Python ListNumpy ArrayPandas SeriesDataFrame
Access by position(Pandas:index/position)  iloc:sort by index loc:sort by position; for datafram index means row, position means colomn. In some function, axis=0/index means sort along along index --> pick the according column, verse-versa.
Access a range of elements: a[:3]Implemented in C: faster
Use loops: for item in list/array/series  Numpy:apply(function)  Pandas:apply() & applymap()  applymap()for every element in data frame, apply() for each column or each row.
 Each element should have same type.Add two series by index.Add data frame and series:it matchs up using index of series and column of data frame.
 Convinent functions
 Vectorized operations
List of listMulti-dimension
Numpy array is a souped-up python list.Each column of data frame is assumed to different type, wheras numpy's is not.
 Series: A cross of Numpy array and Dictionary 

 

2. Hint:

Booleans could be added in Python.

Numpy index array to filter data:a[b],b is boolean array.

Standarlize the data for comparison:  (x-xbar)/standard_deviation.

Pearson's r: correlation coefficient   numpy.corrcoef

‘+=’ vs ‘+’: a=a+... , a+=...   the later is in-place operation, means modifying the object in-place without creating a new one.

b=a[:3](a is a numpy array), this slice is a view of array instead a real slice. If modify the element of b, then the origin array a is modified as well, for the slice operation like this is also in-place operation.

3. Some convient functions: Just google for exemple:numpy shift element 

groupby(): creat a smaller data frame of rows that match up a certain key(key1: rainy/fine). For exemple: compare the data in rainy days and fine days.

merge():merge two data frames

shift(): shift an numpy array ,diff():calculate the accumulative values

4.The procedure of data analysis

First Give a shot to structure the datas, ask questions about the datas, which part are we interested in? what problem we would like to solove? Indluge your curiosity!放任自己的好奇心,不放过任何探索数据的可能。

Then wrangling the datas, or cleaning, making the datas ready for analyse.

Analyze, build intution and find patterns about the datas. 

Draw a conclusion or make a prediction: do not mistake correlation for causation!

Communicate the results with your fellows. Share it via blog, forum ,etc...

In fact, throughout the process, we may need to return to wrangle the datas and refine our questions as we are more familiar with our dataset. 

5.要积极主动。

意思是说,在学习未知的事物时候,要做一个主动的学习者,倾听者,在接收的过程之中带着问题,并做推理,预想下一步直到最终的结果。这个感悟是来自于做听力,以及网课教授numpy用法的时候。这一点很重要,假如只是被动的接收,光顾着抄笔记,很快就会忘记,就会丧失注意力,思维也没有得到锻炼。主动的倾听,本身也是建立知识谱图,不断地进行思维反馈的过程。

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值