python收集数据程序_python、数据收集、数据分析

本文用于存放一些看到的好材料、文章的链接,不断更新。

1.http://www.cnblogs.com/ming5536/archive/2012/11/21/2781062.html

如何成为一个牛逼的数据分析师?

There are two classes of skills that are needed to be a successful data analyst: both soft and technical skills are needed.  The core work flow for a data analyst is severalfold.  Once a problem has been defined, and a hypothesis is to be tested, the data must be drawn out and then analyzed.  The resulting analysis is written up and communicated to the interested stake holder.  In order to do this there are several hard and soft skills that are required.

Technical Skills:

A basic knowledge of statistics to a rigorous understanding of Machine Learning.  Most consumers of analysis will not look at more than descriptive analysis (means, medians, significance).

Computer skills that are useful are a Querying Language (SQL,Hive,Pig), a scripting Language (Python,Matlab), a Statistical Language (R, SAS, SPSS), and a Spreadsheet (Excel).

Soft Skills

Defining the problem and narrowing the analysis down often requires a lot of soft skills.  Balancing the demands on your time to reduce infinite what-if scenarios and understanding the requestors needs requires good communication and understanding of the business needs.  Avoid agreeing to delivering too much information that will be not useful to solving the core issues.

Knowing the audience.There is a different presentation required for a PM or a CEO.  As a Data Analyst, you will be often required to answer to both.  A typical PM will want a more collaborative interaction with more scenarios spelled out and a less polished presentation.  A CEO will often be looking for a specific recommendation in a small polished presentation.

Delivery.Having a wonderfully accurate predictive model, that has been backtested to deliver a low RMSE, or an AB test that can increase conversion 15% without reducing sales price are all great results.  However, without a great presentation key findings may be left out of product road maps and in the backlog for months or years.

看完后,总结一下作者的意思,无非两层:一是要有干货,二是要有思维。干货包括对机器学习的理解,通一门查询语言(SQL,Hive或者Pig),通一门脚本语言(python或者Matlab),通一门统计语言(R,SAS或者SPSS),通一款软件(Excel)。思维包括,

明确问题的核心,理解客户心声。

很好的一篇文章,程序员的工作与收入,值得思考。

http://blog.csdn.net/justjavac/article/details/8686805

三、python学习

3-1:字符串及其编码

1.http://www.cnblogs.com/pylemon/archive/2011/05/18/2050179.html

Python中strip lstrip rstrip使用方法 (去除指定字符)

2.http://www.rmi.net/~lutz/strings30.html        python3.X中字符串编码:unicode 和 bytes

3.http://woodpecker.org.cn/diveintopython3/strings.html      string和bytes,dive into python字符串章。

bytes对象有一个decode()方法,它使用某种字符编码作为参数,然后依照这种编码方式将bytes对象转换为字符串,对应地,字符串有一个encode()方法,它也使用某种字符编码作为参数,然后依照它将串转换为bytes对象。

3-2:正则表达式

1.http://www.cnblogs.com/coderzh/archive/2008/05/06/1185755.html    正则表达式(爬虫用到,匹配具体内容,import re)

四、社会网络分析

五、机器学习+推荐系统

[Python3.2  win32  如何安装scikitlearn包。目前官方的还不行,在上述网上搜集的unofficial版本,希望能帮助大家学

http://download.csdn.net/detail/database_zbye/5258021]

scikit-learn包使用主要参考官方reference。这有一个简单的例子

http://www.shahuwang.com/?p=1018

六、数据挖掘

七、C/C++

1.http://www.cnblogs.com/zjfdbz/archive/2011/12/17/2291233.html     EOF 文件结束 end of file

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值