声明:版权所有,转载请联系作者并注明出处 http://blog.csdn.net/u013719780?viewmode=contents
博主简介:风雪夜归子(Allen),机器学习算法攻城狮,喜爱钻研Meachine Learning的黑科技,对Deep Learning和Artificial Intelligence充满兴趣,经常关注Kaggle数据挖掘竞赛平台,对数据、Machine Learning和Artificial Intelligence有兴趣的童鞋可以一起探讨哦,个人CSDN博客:http://blog.csdn.net/u013719780?viewmode=contents
数据可视化有助于理解数据,在机器学习项目特征工程阶段也会起到很重要的作用,因此,数据可视化是一个很有必要掌握的武器。本系列博文就对数据可视化进行一些简单的探讨。本文使用Python的Altair对数据进行可视化。
In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
sns.set()
sns.set_context('notebook', font_scale=1.5)
cp = sns.color_palette()
In [3]:
from altair import *
In [55]:
ts = pd.read_csv('data/ts.csv')
ts = ts.assign(dt = pd.to_datetime(ts.dt))
ts.head()
Out[55]:
In [56]:
dfp = ts.pivot(index='dt', columns='kind', values='value')
dfp.head()
Out[56]:
In [6]:
c = Chart(ts).mark_line().encode(
x='dt',
y='value',
color='kind'
)
c
In [57]:
c = Chart(ts).mark_line().encode(
x='dt',
y='value',
color=Color('kind', scale=Scale(range=cp.as_hex()))
)
c
In [7]:
df = pd.read_csv('data/iris.csv')
df.head()
Out[7]:
In [8]:
c = Chart(df).mark_point(filled=True).encode(
x='petalLength',
y='petalWidth',
color='species'
)
c
In [9]:
c = Chart(ts).mark_line().encode(
x='dt',
y='value',
color='kind',
column='kind'
)
c.configure_cell(height=200, width=200)
In [10]:
c = Chart(df).mark_point().encode(
x='petalLength',
y='petalWidth',
color='species',
column=Column('species',
title='Petal Width v. Length by Species')
)
c.configure_cell(height=300, width=300)
In [11]:
tmp_n = df.shape[0] - df.shape[0]/2
df['random_factor'] = (np.\
random.\
permutation(['A'] * tmp_n +
['B'] * (df.shape[0] - tmp_n)))
df.head()
Out[11]:
In [12]:
c = Chart(df).mark_point().encode(
x='petalLength',
y='petalWidth',
color='species',
column=Column('species',
title='Petal Width v. Length by Species'),
row='random_factor'
)
c.configure_cell(height=200, width=200)
In [49]:
# please note: this code is super speculative -- I'm
# assuming there's a better way to do this and I just
# don't know it
c = Chart(df).mark_point(opacity=.5).encode(
x='species',
y='petalWidth'
)
c25 = Chart(df).mark_tick(tickThickness=3.0,
tickSize=20.0,
color='r').encode(
x='species',
y='q1(petalWidth)'
)
c50 = Chart(df).mark_tick(tickThickness=3.0,
tickSize=20.0,
color='r').encode(
x='species',
y='median(petalWidth)'
)
c75 = Chart(df).mark_tick(tickThickness=3.0,
tickSize=20.0,
color='r').encode(
x='species',
y='q3(petalWidth)'
)
LayeredChart(data=df, layers=[c, c25, c50, c75])
In [50]:
c = Chart(df).mark_bar(opacity=.75).encode(
x=X('petalWidth', bin=Bin(maxbins=30)),
y='count(*)',
color=Color('species', scale=Scale(range=cp.as_hex()))
)
c
In [51]:
df = pd.read_csv('data/titanic.csv')
df.head()
Out[51]:
In [52]:
dfg = df.groupby(['survived', 'pclass']).agg({'fare': 'mean'})
dfg
Out[52]:
In [53]:
died = dfg.loc[0, :]
survived = dfg.loc[1, :]
In [54]:
c = Chart(df).mark_bar().encode(
x='survived:N',
y='mean(fare)',
color='survived:N',
column='class')
c.configure(
facet=FacetConfig(cell=CellConfig(strokeWidth=0, height=250))
)