Spark之python版机器学习算法--ipython notebook配置及测试

最新推荐文章于 2023-11-18 18:46:22 发布

_飞奔的蜗牛_

最新推荐文章于 2023-11-18 18:46:22 发布

阅读量1.8k

点赞数

分类专栏： spark python 机器学习与数据挖掘大数据技术文章标签： spark 机器学习

本文链接：https://blog.csdn.net/dataningwei/article/details/64583468

版权

机器学习与数据挖掘同时被 3 个专栏收录

29 篇文章 1 订阅

订阅专栏

python

13 篇文章 0 订阅

订阅专栏

大数据技术

9 篇文章 0 订阅

订阅专栏

先说明一下我的环境配置：

操作系统：ubuntu14.04 64bit
spark2.0.0
hadoop 2.7.1
scala-2.11.8
python 2.7.6
java1.7.0

1. 安装 ipython notebook

安装步骤：

1安装pip工具

sudo apt-get install python-pip

2 安装ipython

sudo apt-get install ipython

3 安装ipython notebook

sudo apt-get install ipython-notebook

4 启动ipython notebook

ipython notebook

2. 其他python环境配置

1.为了方便画图安装matplotlib库

sudo apt-get install python-matplotlib

2.安装numpy库

sudo apt-get install python-numpy

3.安装scipy库

sudo apt-get install python-scipy
sudo apt-get install python-nose

3. pyspark调用ipython notebook

当spark配置成功后，直接调用pyspark即可打开spark的python接口。
为了使用ipython notebook，需要如下配置。

1. 方法1

PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS='notebook' ./bin/pyspark

2 方法2
修改~/.bashrc文件，添加以下内容：

export PYSPARK_DRIVER_PYTHON=ipython 
export PYSPARK_DRIVER_PYTHON_OPTS=”notebook” （去处此参数，可调用 ipython）

然后source ~/.bashrc，就可以通过启动 pyspark 来启动 IPython Notebook 了.

4. 环境测试

MovieLens 100k数据集

从http://files.grouplens.org/datasets/movielens/ml-100k.zip可下载测试数据，得到 ml-100k.zip
解压：

unzip ml-100k.zip

具体每个文件下，每一列的意义可参见 README.

这里写图片描述

接着用matplotlib的hist函数来创建一个直方图，以分析用户年龄的分布情况:

import matplotlib.pyplot as plt
ages = user_fields.map(lambda x: int(x[1])).collect()
plt.hist(ages, bins=20, color='lightblue', normed=True)
fig = matplotlib.pyplot.gcf()
fig.set_size_inches(16, 10)
plt.show() #显示图像

这里写图片描述

_飞奔的蜗牛_

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Spark之python版机器学习算法--ipython notebook配置及测试

先说明一下我的环境配置：操作系统：ubuntu14.04 64bit spark2.0.0 hadoop 2.7.1 scala-2.11.8 python 2.7.6 java1.7.01. 安装 ipython notebook安装步骤：1安装pip工具sudo apt-get install python-pip2 安装ipythonsudo apt-ge
复制链接

扫一扫