1、下载
https://github.com/scikit-learn/scikit-learn
官网:http://scikit-learn.org/stable/
2、安装
参考官网文档,需要numpy、scipy,我直接尝试在文件目录下
sudo python setup.py install
出现错误,提示如下:
>>> import sklearn
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "sklearn/__init__.py", line 37, in <module>
from . import __check_build
File "sklearn/__check_build/__init__.py", line 46, in <module>
raise_build_error(e)
File "sklearn/__check_build/__init__.py", line 41, in raise_build_error
%s""" % (e, local_dir, ''.join(dir_content).strip(), msg))
ImportError: No module named _check_build
___________________________________________________________________________
Contents of sklearn/__check_build:
__init__.py __init__.pyc _check_build.c
_check_build.pyx setup.py setup.pyc
___________________________________________________________________________
It seems that scikit-learn has not been built correctly.
If you have installed scikit-learn from source, please do not forget
to build the package before using it: run `python setup.py install` or
`make` in the source directory.
If you have used an installer, please check that it is suited for your
Python version, your operating system and your platform.
尝试着重新安装numpy scipy 才发现Mac系统自己已经自带了许多类库了,如下:
CoreGraphics/
OpenSSL/
PyObjC/
Twisted-12.2.0-py2.7.egg-info/
altgraph/
altgraph-0.10.1-py2.7.egg-info/
bdist_mpkg/
bdist_mpkg-0.4.4-py2.7.egg-info/
bonjour/
dateutil/
macholib/
macholib-1.5-py2.7.egg-info/
matplotlib/
modulegraph/
modulegraph-0.10.1-py2.7.egg-info/
mpl_toolkits/
numpy/
py2app/
py2app-0.7.1-py2.7.egg-info/
python_dateutil-1.5-py2.7.egg-info/
pytz/
pytz-2012d-py2.7.egg-info/
scipy/
setuptools/
setuptools-0.6c12dev_r88846-py2.7.egg-info/
twisted/
xattr/
xattr-0.6.4-py2.7.egg-info/
zope/
zope.interface-3.8.0-py2.7.egg-info/
后来尝试了好几种方法,使用pip和easy_install的方法,分别报错。我就在site-packages下删除了原来的文件,然后重新安装了,就成功了。(刚开始失败的原因可能是没有把终端重启,重新进入python)
3、测试学习
➜ ~ python
Python 2.7.5 (default, Sep 12 2013, 21:33:34)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sklearn
>>> from sklearn import datasets
>>> iris = datasets.load_iris()
>>> digits = datasets.load_digits()
>>> print(digits.data)
[[ 0. 0. 5. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 10. 0. 0.]
[ 0. 0. 0. ..., 16. 9. 0.]
...,
[ 0. 0. 1. ..., 6. 0. 0.]
[ 0. 0. 2. ..., 12. 0. 0.]
[ 0. 0. 10. ..., 12. 1. 0.]]
>>>
4、后续计划
想跟着自带的例子,将机器学习的常用算法做一个后续的总结,是不错的学习资料。
http://scikit-learn.org/stable/auto_examples/feature_selection_pipeline.html