数据可视化漫谈(三)

声明:版权所有,转载请联系作者并注明出处  http://blog.csdn.net/u013719780?viewmode=contents


博主简介:风雪夜归子(Allen),机器学习算法攻城狮,喜爱钻研Meachine Learning的黑科技,对Deep Learning和Artificial Intelligence充满兴趣,经常关注Kaggle数据挖掘竞赛平台,对数据、Machine Learning和Artificial Intelligence有兴趣的童鞋可以一起探讨哦,个人CSDN博客:http://blog.csdn.net/u013719780?viewmode=contents



数据可视化有助于理解数据,在机器学习项目特征工程阶段也会起到很重要的作用,因此,数据可视化是一个很有必要掌握的武器。本系列博文就对数据可视化进行一些简单的探讨。本文使用Python的seaborn对数据进行可视化。


In [1]:
%matplotlib inline

# standard
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# I've got style,
# miles and miles
import seaborn as sns
sns.set()
sns.set_context('notebook', font_scale=1.5)
cp = sns.color_palette()


Thing 1: Line Chart (with many lines)


In [2]:
ts = pd.read_csv('data/ts.csv')

# casting to datetime is important for
# ensuring plots "just work"
ts = ts.assign(dt = pd.to_datetime(ts.dt))
ts.head()
Out[2]:
  dt kind value
0 2000-01-01 A 1.442521
1 2000-01-02 A 1.981290
2 2000-01-03 A 1.586494
3 2000-01-04 A 1.378969
4 2000-01-05 A -0.277937
In [3]:
# in matplotlib-land, the notion of a "tidy"
# dataframe matters not
dfp = ts.pivot(index='dt', columns='kind', values='value')
dfp.head()
Out[3]:
kind A B C D
dt        
2000-01-01 1.442521 1.808741 0.437415 0.096980
2000-01-02 1.981290 2.277020 0.706127 -1.523108
2000-01-03 1.586494 3.474392 1.358063 -3.100735
2000-01-04 1.378969 2.906132 0.262223 -2.660599
2000-01-05 -0.277937 3.489553 0.796743 -3.417402
In [4]:
g = sns.FacetGrid(ts, hue='kind', size=5, aspect=1.5)
g.map(plt.plot, 'dt', 'value').add_legend()
g.ax.set(xlabel='Date',
         ylabel='Value',
         title='Random Timeseries')
g.fig.autofmt_xdate()

In [5]:
g = sns.FacetGrid(ts, row='kind', hue='kind', size=5, aspect=1.5)
g.map(plt.plot, 'dt', 'value').add_legend()

g.fig.autofmt_xdate()


Thing 2: Scatter


In [6]:
df = pd.read_csv('data/iris.csv')
df.head()
Out[6]:
  petalLength petalWidth sepalLength sepalWidth species
0 1.4 0.2 5.1 3.5 setosa
1 1.4 0.2 4.9 3.0 setosa
2 1.3 0.2 4.7 3.2 setosa
3 1.5 0.2 4.6 3.1 setosa
4 1.4 0.2 5.0 3.6 setosa
In [7]:
g = sns.FacetGrid(df, hue='species', size=7.5)
g.map(plt.scatter, 'petalLength', 'petalWidth').add_legend()
g.ax.set_title('Petal Width v. Length -- by Species')
Out[7]:
<matplotlib.text.Text at 0x1186b99d0>


Thing 3: Trellising the Above


In [8]:
g = sns.FacetGrid(ts, hue='kind',
                  col='kind', col_wrap=2, size=5)

g.map(plt.plot, 'dt', 'value')
g.fig.autofmt_xdate()
g.fig.suptitle('Random Timeseries', y=1.01)
Out[8]:
<matplotlib.text.Text at 0x11819ead0>

In [9]:
g = sns.FacetGrid(df, col='species', hue='species', size=5)
g.map(plt.scatter, 'petalLength', 'petalWidth')
Out[9]:
<seaborn.axisgrid.FacetGrid at 0x1187474d0>

In [10]:
tmp_n = df.shape[0] - df.shape[0]/2

df['random_factor'] = np.random.permutation(['A'] * tmp_n + ['B'] * (df.shape[0] - tmp_n))
df.head()
Out[10]:
  petalLength petalWidth sepalLength sepalWidth species random_factor
0 1.4 0.2 5.1 3.5 setosa A
1 1.4 0.2 4.9 3.0 setosa A
2 1.3 0.2 4.7 3.2 setosa B
3 1.5 0.2 4.6 3.1 setosa A
4 1.4 0.2 5.0 3.6 setosa B
In [11]:
g = sns.FacetGrid(df.assign(tmp=df.species + df.random_factor).\
                      sort_values(['species', 'random_factor']),
                  col='species', row='random_factor', hue='tmp', size=5)
g.map(plt.scatter, 'petalLength', 'petalWidth')
Out[11]:
<seaborn.axisgrid.FacetGrid at 0x117ad90d0>


Thing 4: Visualizing Distributions (Boxplot and Histogram)


In [12]:
fig, ax = plt.subplots(1, 1, figsize=(10, 10))

g = sns.boxplot('species', 'petalWidth', data=df, ax=ax)
g.set(title='Distribution of Petal Width by Species')
Out[12]:
[<matplotlib.text.Text at 0x11969c510>]

In [13]:
g = sns.FacetGrid(df, hue='species', size=7.5)

g.map(sns.distplot, 'petalWidth', bins=10,
      kde=False, rug=True).add_legend()

g.set(xlabel='Petal Width',
      ylabel='Frequency',
      title='Distribution of Petal Width by Species')
Out[13]:
<seaborn.axisgrid.FacetGrid at 0x11819e710>


Thing 5: Bar Chart


In [14]:
df = pd.read_csv('data/titanic.csv')
df.head()
Out[14]:
  survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True
In [15]:
dfg = df.groupby(['survived', 'pclass']).agg({'fare': 'mean'})
dfg
Out[15]:
    fare
survived pclass  
0 1 64.684008
2 19.412328
3 13.669364
1 1 95.608029
2 22.055700
3 13.694887
In [16]:
died = dfg.loc[0, :]
print died

survived = dfg.loc[1, :]
print survived
             fare
pclass           
1       64.684008
2       19.412328
3       13.669364
             fare
pclass           
1       95.608029
2       22.055700
3       13.694887
In [17]:
g = sns.factorplot(x='class', y='fare', hue='survived',
                   data=df, kind='bar',
                   order=['First', 'Second', 'Third'],
                   size=7.5, aspect=1.5, ci=None)
g.ax.set_title('Fare by survival and class')
Out[17]:
<matplotlib.text.Text at 0x11a987b10>

In [18]:
g = sns.factorplot(x='class', y='fare', hue='survived',
                   data=df, kind='bar',
                   order=['First', 'Second', 'Third'],
                   size=7.5, aspect=1.5)
g.ax.set_title('Fare by survival and class')
Out[18]:
<matplotlib.text.Text at 0x11abfaa50>


  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Python中常用的数据可视化库有:Matplotlib、Mayavi和Plotly等。 1. MatplotlibMatplotlib是Python中最常用的绘图库之一,它也支持维数据的可视化,使用mpl_toolkits.mplot3d子包即可实现。例如,绘制3D散点图: ``` import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D fig = plt.figure() ax = fig.add_subplot(111, projection='3d') x = np.random.normal(size=100) y = np.random.normal(size=100) z = np.random.normal(size=100) ax.scatter(x, y, z) plt.show() ``` 2. Mayavi:Mayavi是基于VTK的Python科学可视化库,主要用于数据可视化和交互式数据可视化。它的优点是可以处理大规模数据、可以交互式操作、支持多种数据格式等。例如,绘制3D立方体: ``` from mayavi import mlab mlab.figure(bgcolor=(0.4, 0.4, 0.4)) mlab.box(extent=[-1, 1, -1, 1, -1, 1], color=(0.9, 0.9, 0.9)) mlab.show() ``` 3. Plotly:Plotly是一个交互式数据可视化库,支持多种编程语言,包括Python。它的优点是可以生成交互式图形、可以分享和嵌入到网页中。例如,绘制3D散点图: ``` import plotly.graph_objs as go import numpy as np x, y, z = np.random.multivariate_normal(np.array([0,0,0]), np.eye(3), 200).transpose() trace = go.Scatter3d(x=x, y=y, z=z, mode='markers', marker=dict( size=12, color=z, # set color to an array/list of desired values colorscale='Viridis', # choose a colorscale opacity=0.8 )) data = [trace] layout = go.Layout(margin=dict(l=0,r=0,b=0,t=0)) fig = go.Figure(data=data, layout=layout) fig.show() ``` 以上是种常用的Python数据可视化库的简单示例,使用这些库可以轻松地进行维数据的可视化。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值