Numpy
Scipy
矩阵向量处理。
Numpy
provides a high-performance multidimensional array and basic tools to compute with and manipulate these arrays.
SciPy
builds on this, and provides a large number of functions that operate on numpy arrays and are useful for different types of scientific and engineering applications.
参考:
http://cs231n.github.io/python-numpy-tutorial/ (python基础,numpy, scipy, matplotlib均包含在内)
Scikit-learn
数据建模分析处理。
scikit-learn is a Python module for machine learning built on top of SciPy and distributed under the 3-Clause BSD license.
conda update sklearn:
conda update scikit
-
learn
文档还是很详细的,官网主页列出了很多个机器学习的项:
在user guide中列出了所有包含的项目:
安装:
pip install -U scikit-learn (需要提前安装numpy and scipy)
这种方式在安装完后,
from
sklearn.ensemble
import
RandomForestClassifier ,
可能会报ImportError: cannot import name check_arrays的错误.
解决:
conda update scikit
-
learn
sklearn model selection中带有GridSearch的功能。
Pandas
数据读写相关。
powerful Python data analysis toolkit.
gensim
Gensim
是一个很专业的主题模型Python工具包。
Gensim
is an
open-source
vector space modeling
and
topic modeling
toolkit, implemented in the
Python
programming language. It uses
NumPy
,
SciPy
and optionally
Cython
for performance. It is specifically intended for handling large text collections, using efficient online, incremental algorithms. Gensim is commercially supported by the startup RaRe Technologies.
Gensim includes implementations of
tf-idf
,
random projections
,
word2vec
and
document2vec
algorithms,
hierarchical Dirichlet processes
(HDP),
latent semantic analysis
(LSA)
and
latent Dirichlet allocation
(LDA)
, including
distributed
parallel
versions.
install: pip install gensim