python数据分析工具

一、各种库的简介

1.Numpy:数组支持
http://www.numpy.org/
http://reverland.org/python/2012/08/22/numpy/
2.Scipy:矩阵支持
http://www.scipy.org/
http://reverland.org/python/2012/08/24/scipy/
3.Matplotlib:可视化,作图
http://matplotlib.org/
http://reverland.org/python/2012/09/17/matplotlib-tutorial/
4.Pandas:数据分析,探索
http://pandas.pydata.org/pandas-docs/stable/
http://jingyan.baidu.com/season/43456
5.StatsModels:统计模型
http://statsmodels.sourceforge.net/stable/index.html
http://jingyan.baidu.com/season/43456
6.Scikit-learn:回归,分类,聚类等机器学习库
http://scikit-learn.org/
7.Keras:深度学习库,建立神经网络模型以及深度学习模型。
http://deeplearning.net/software/theano/install.html#install
http://github/fchollet/keras
http://radimrehurek.com/gensim/
http://www.52nlp.cn
8.Gensim:文本挖掘。
http://radimrehurek.com/gensim/
http://www.52nlp.cn
Pillow:图片处理。
OpenCV:视频处理。


二、各种库的简单使用

1.Numpy

# a=numpy.array([2,0,1,5])
# print(a)
# print (a[:3])
# print(a.min())
# a.sort()
# b=numpy.array([[1,2,3],[4,5,6]])
# print b*b

[2 0 1 5]
[2 0 1]
0
[[ 1 4 9]
[16 25 36]]

2.Scipy

# def f(x):
#     x1=x[0]
#     x2=x[1]
#     return [2*x1-x2**2-1,x1**2-x2-2]
#
# result=fsolve(f,[1,1])
# print result
#
# def g(x):
#     return (1-x**2)**0.5
#
# pi_2,err=integrate.quad(g,-1,1)
# print pi_2*2

[ 1.91963957 1.68501606]
3.14159265359

3.Matplotlib

import matplotlib.pyplot as plt

x=numpy.linspace(0,10,1000)
y=numpy.sin(x)+1
z=numpy.cos(x**2)+1

plt.figure(figsize=(8,4))
plt.plot(x,y,label= '$\sin x+1$',color='red',linewidth=2)
plt.plot(x,z,'b--',label='$\cos x^2+1$')
plt.xlabel('Time(s)')
plt.ylabel('Volt')
plt.title('A Simple Example')
plt.ylim(0,2.2)
plt.legend()
plt.show()

这里写图片描述

4.Pandas

import pandas
# s=pandas.Series([1,2,3],index=['a','b','c'])
# d=pandas.DataFrame([[1,2,3],[4,5,6]],columns=['a','b','c'])
#
# d.head()
# d.describe()
#
# pandas.read_excel('data.xls')
# pandas.read_csv('data.csv',encoding='utf-8')
#

这里写图片描述

这里写图片描述
5.StatsModels

# from statsmodels.tsa.stattools import adfuller as ADF
# ADF(numpy.random.rand(100))

(-10.225790909870486, 5.1813110351579693e-18, 0, 99, {‘5%’: -2.8912082118604681, ‘1%’: -3.4981980821890981, ‘10%’: -2.5825959973472097}, 30.807254539403033)

6.Scikit-learn

from sklearn.linear_model import LinearRegression
# model=LinearRegression()
# print model

# from sklearn import datasets
#
# iris=datasets.load_iris()
# print iris.data.shape
#
# from sklearn import svm
#
# clf=svm.LinearSVC()
# clf.fit(iris.data,iris.target)
# clf.predict([[5.0,3.6,1.3,0.25]])
# clf.coef_

这里写图片描述

这里写图片描述
7.Keras

from keras.models import Sequential
# from keras.optimizers import SGD
# from keras.layers.core import Dense,Dropout,Activation
#
# model=Sequential()
# model.add(Dense(20,64))
# model.add(Activation("tanh"))
# model.add(Dropout(0.5))
# model.add(Dense(64,64))
# model.add(Activation("tanh"))
# model.add(Dense(64,1))
# model.add(Activation("sigmoid"))
#
# sgd=SGD(lr=0.1,decay=1e-6,momentum=0.9,nesterov=True)
# model.compile(loss='mean_squard_error',optimized=sgd)
#
# model.fit(X_train,y_train,nb_epoch=20,batch_size=16)
# score=model.evaluate(X_test,y_test,batch_size=16)

8.Gensim

# import gensim,logging
# logging.basicConfig(format='%(asctime)s:%(levelname)s:%(message)s',level=logging.INFO)
#
# sentences=[['first','sentence'],['second','sentence']]
#
# 

model=gensim.models.Word2Vec(sentences,min_count=1)
# print(model['sentence'])

三、遇到问题以及解决

1.安装scrip包,使用pip install scrip 时报错:

Command /usr/bin/python -c “import setuptools,
tokenize;file=’/tmp/pip_build_root/scipy/setup.py’;
exec(compile(getattr(tokenize, ‘open’,
open)(file).read().replace(‘\r\n’, ‘\n’), file, ‘exec’))”
install –record /tmp/pip-8xMOAE-record/install-record.txt
–single-version-externally-managed –compile failed with error code 1 in /tmp/pip_build_root/scipy Storing debug log for failure in
/root/.pip/pip.log

解决办法;使用yum install scrip
2.安装Matplotlib包,使用pip install matplotlib报错:

Command python setup.py egg_info failed with error code 1 in
/tmp/pip_build_root/matplotlib Storing debug log for failure in
/root/.pip/pip.log

解决办法:yum install freetype-devel
yum install libpng-devel
再使用pip install matplotlib。
3.运行matplotlib程序报错:

(.:4109): Gdk-CRITICAL **: gdk_cursor_new_for_display: assertion ‘GDK_IS_DISPLAY (display)’ failed Traceback (most recent call last): File “”, line 1, in File “/usr/lib64/python2.7/site-packages/matplotlib/pyplot.py”, line 115, in _backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup() File “/usr/lib64/python2.7/site-packages/matplotlib/backends/init.py”, line 32, in pylab_setup globals(),locals(),[backend_name],0) File “/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_gtk3agg.py”, line 11, in from . import backend_gtk3 File “/usr/lib64/python2.7/site-packages/matplotlib/backends/backend_gtk3.py”, line 54, in cursors.MOVE : Gdk.Cursor.new(Gdk.CursorType.FLEUR), TypeError: constructor returned NULL

解决办法:
在python运行脚本下添加环境变量值MPLBACKEND=Agg,Which will tell matplotlib not to try to load up GTK.这是因为使用ssh的原因。

例如pycharm环境下这样改:
4.matplotlib图像不能显示
解决办法:保存在本地
plt.savefig(“/home/yourname/picFaster.png”)
5.Bad md5 hash for package…….
解决方法:
pip install –upgrade pip
6.ReadTimeoutError: HTTPSConnectionPool(host=’pypi.python.org’, port=443): Read timed out.
解决办法:
加大超时时间,如 pip –default-timeout=100 install -U pip
或者下载到本地,使用源码安装。
pip install tensorflow-0.5.0-cp27-none-linux_x86_64.whl

还可以使用其他的源:
pip install -i https://pypi.douban.com/simple gensim
7.升级numpy
下载numpy源码,
numpy下载地址:http://sourceforge.net/projects/numpy/files/NumPy/

numpy-1.11.2]# yum install setup.py

  • 1
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值