python收集数据程序_python学习、数据收集、数据分析

本文用于存放一些看到的好材料、文章的链接,不断更新。

1.http://www.cnblogs.com/ming5536/archive/2012/11/21/2781062.html

如何成为一个牛逼的数据分析师?

There are two classes of skills that are needed to be a successful data analyst: both soft and technical skills are needed.  The core work flow for a data analyst is severalfold.  Once a problem has been defined, and a hypothesis is to be tested, the data must be drawn out and then analyzed.  The resulting analysis is written up and communicated to the interested stake holder.  In order to do this there are several hard and soft skills that are required.

Technical Skills:

A basic knowledge of statistics to a rigorous understanding of Machine Learning.  Most consumers of analysis will not look at more than descriptive analysis (means, medians, significance).

Computer skills that are useful are a Querying Language (SQL,Hive,Pig), a scripting Language (Python,Matlab), a Statistical Language (R, SAS, SPSS), and a Spreadsheet (Excel).

Soft Skills

Defining the problem and narrowing the analysis down often requires a lot of soft skills.  Balancing the demands on your time to reduce infinite what-if scenarios and understanding the requestors needs requires good communication and understanding of the business needs.  Avoid agreeing to delivering too much information that will be not useful to solving the core issues.

Knowing the audience.There is a different presentation required for a PM or a CEO.  As a Data Analyst, you will be often required to answer to both.  A typical PM will want a more collaborative interaction with more scenarios spelled out and a less polished presentation.  A CEO will often be looking for a specific recommendation in a small polished presentation.

Delivery.Having a wonderfully accurate predictive model, that has been backtested to deliver a low RMSE, or an AB test that can increase conversion 15% without reducing sales price are all great results.  However, without a great presentation key findings may be left out of product road maps and in the backlog for months or years.

看完后,总结一下作者的意思,无非两层:一是要有干货,二是要有思维。干货包括对机器学习的理解,通一门查询语言(SQL,Hive或者Pig),通一门脚本语言(python或者Matlab),通一门统计语言(R,SAS或者SPSS),通一款软件(Excel)。思维包括,

明确问题的核心,理解客户心声。

很好的一篇文章,程序员的工作与收入,值得思考。

http://blog.csdn.net/justjavac/article/details/8686805

三、python学习

很喜欢下面几句话,代表不同境界:

当你觉得它很美的时候,你入门了。

当你觉得它很好用的时候,你掌握它了。

当你发现原来还有很多不知道的东西时,你是高手了。

当你知道这个问题没有答案时,你已经合格了。

3-1:字符串及其编码

1.http://www.cnblogs.com/pylemon/archive/2011/05/18/2050179.html

Python中strip lstrip rstrip使用方法 (去除指定字符)

2.http://www.rmi.net/~lutz/strings30.html        python3.X中字符串编码:unicode 和 bytes

3.http://woodpecker.org.cn/diveintopython3/strings.html      string和bytes,dive into python字符串章。

bytes对象有一个decode()方法,它使用某种字符编码作为参数,然后依照这种编码方式将bytes对象转换为字符串,对应地,字符串有一个encode()方法,它也使用某种字符编码作为参数,然后依照它将串转换为bytes对象。

3-2:正则表达式

1.http://www.cnblogs.com/coderzh/archive/2008/05/06/1185755.html    正则表达式(爬虫用到,匹配具体内容,import re)

3-3:Python面试题汇总

http://www.reddit.com/r/Python/comments/1knw7z/python_interview_questions/

3-4:字符编码的问题

python2.x版本中有很多编码问题,尤其是在写爬虫搜集数据时经常很囧,有个系统的了解能让我们编写代码时心中有谱:

http://www.cnblogs.com/huxi/archive/2010/12/05/1897271.html

四、社会网络分析

五、机器学习+推荐系统

[Python3.2  win32  如何安装scikitlearn包。目前官方的还不行,在上述网上搜集的unofficial版本,希望能帮助大家学http://download.csdn.net/detail/database_zbye/5258021]

scikit-learn包使用主要参考官方reference。这有一个简单的例子http://www.shahuwang.com/?p=1018

六、数据挖掘

七、C/C++

1.http://www.cnblogs.com/zjfdbz/archive/2011/12/17/2291233.html     EOF 文件结束 end of file

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值