数据处理Seaborn

最新推荐文章于 2024-10-03 09:00:56 发布

2401_83835232

最新推荐文章于 2024-10-03 09:00:56 发布

阅读量1.5k

点赞数 33

文章标签： python

本文链接：https://blog.csdn.net/2401_83835232/article/details/141474659

版权

一、主题风格

darkgrid

whitegrid

dark

white

ticks

二、细节设置

sns.despine() 去掉上面和右面的轴

sns.despine(offset=10）所画图离轴线距离

sns.despine(left=True) 隐藏左边的轴

with域里的是一种风格域外一种风格

sns.set_context("paper")

sns.set_context("talk")

sns.set_context("poster")

sns.set_context("notebook") 画大小从上到下依次变大线条依次变粗

sns.set_context("notebook",font_scale=1.5) 调整轴上数字大小

sns.set_context("notebook",font_scale=1.5,rc={"lines.linewidth":1.5}) 调整线条粗细

三、调色板

1、离散画板

（1）color_palette()默认为6个

（2）sns.palplot(sns.color_palette("hls",8)) 8个种类

sns.boxplot(palette=sns.color_palette("hls",8))

l-亮度

s-饱和度

sns.palplot(sns.hls_palette(8,l=.3,s=.8))

（3）sns.palplot(sns.color_palette("Psired",8)) 8个颜色4对每对颜色类似

（4）使用xkcd命名颜色

plt.plot([0,1],[0,1],sns.xkcd_rgb["pale red"],lw=3) lw线宽

2、连续画板

（1）sns.palplot(sns.color_palette("Blues“)) 由浅到深

sns.palplot(sns.color_palette("Blues_r“)) 翻转渐变由深到浅

（2）cubehelix_palette()

sns.palplot(sns.color_palette("cubehelix“,8)) 色调（亮度饱和度）线性变换

sns.palplot(sns.cubehelix_palette(8,start=.5,rot=-.75)) 指定不同颜色区间

（3）light_palette()和dark_palette()调用定制连续调色板

sns.palplot(sns.light_palette("green")) 颜色从浅到深

sns.palplot(sns.dark_palette("purple")) 颜色从深到浅

sns.palplot(sns.light_palette("green",reverse=True)) 颜色从深到浅

四、单变量分析

1、单特征用直方图

sns.histplot()

sns.kdeplot() 画直方图（柱形图）

sns.histplot(x,bins=20,kde=False) bins切分成20个小块

sns.histplot(x,kde=False，fit=stats.gamma) 数据分布状态

2、观测两个变量之间的分布关系最好用散点图

（1）数据量小散点图

sns.jointplot(x="x",y="y",data=df) 画出直方图和散点图

（2）数据量大 hex图

sns.jointplot(x=x,y=y,kind="hex",color="k") 用颜色深浅判断数据分布密集的地方

3、四个维度画图

iris=sns.load_dataset("iris")

sns.pairplot(iris)

对角线是直方图其余为散点图

五、回归分析绘图

regplot()和lmplot() 推荐regplot()适用范围更广

sns.regplot(x="total_bill",y="tip",data=tips)

sns.regplot(x="size",y="tip",data=tips,x_jitter=.05) 对原始点进行小范围浮动

六、多变量分析绘图

1、散点图数据重叠成直线

（1）sns.strippplot(x="day",y="total_bill",data=tips,jitter=True) 将数据左右分开

（2）sns.swarmplot(x="day",y="total_bill",data=tips) 将数据左右分开更均匀（像树）

2、盒图

IQR四分位距第一四分位与第三四分位之间的距离

N=1.5IQR 如果一个值大于四分之三位+N或小于四分之一位-N 则为离群点

sns.boxplot(x="day",y="total_bill",hue="time",data=tips)

横着

sns.boxplot(data=iris,orient="h")

3、小提琴图

sns.violinplot(x="day",y="total_bill",hue="time",data=tips) 左右对称（time分开表示）

sns.violinplot(x="day",y="total_bill",hue="sex",data=tips,split=True) 左右表示不同的属性（sex）

4、条形图

sns.barplot(x="sex",y="survived",hue="class",data=titanic)

5、点图更好描述变化差异

sns.pointplot(x="sex",y="survived",hue="class",data=titanic，palette={"male":"g","female":"m"},markers=["^","o"],linestyles=["-","--"])

七、分类属性绘图

多层面板分类图

1、折线图

sns.factorplot(x="day",y="total_bill",hue="smoker",data=tips)

2、条形图

sns.factorplot(x="day",y="total_bill",hue="smoker",data=tips，kind=”bar“)

sns.factorplot(x="day",y="total_bill",hue="smoker",col=”time",data=tips，kind=”swarm“) 根据time分为两个swarm图(lunch dinner)

八、FacetGrid使用方法

1、画条形图

g=sns.FacetGrid(tips,col="time")

g.map(plt.hist,"tip")

2、画散点图

pal=dict(yes="seagreen",no="gray")

g=sns.FacetGrid(tips,col="sex"，hue=“smoker”，palette=pal,hue_kws={"marker":["^","v"]})

g.map(plt.scatter,“total_bill,"tip"，s=50,alpha=.7) s点大小 alpha透明度

g.add_legend() 加图例

g.set_axis_labels("Total bill(US Dollars)","Tip") 轴名字

g.fig.subplots_adjust(wspace=.02,hspace=.02) 子图距离

g=sns.FacetGrid(iris)

g.map_diag(plt.hist)

g.map_offdiag(plt.scatter) 指定对角线和非对角线图

g=sns.FacetGrid(iris,vars=["sepal_length","sepal_width"],hue="species")

g.map(plt.scatter) 画指定特征图

九、热度图绘制

ax=sns.heatmap(uniform_data,vmin=0.2,vmax=0.5) 颜色对应的最大最小值

ax=sns.heatmap(normal_data,center=0) 位于中间颜色值

ax=sns.heatmap(normal_data,annot=True，fmt=”d“) annot 显示颜色对应的值 fmt值的格式（默认值是乱码）

ax=sns.heatmap(normal_data,linewidth=.5) 格间距

核电站对人口影响的分析实践

import numpy as np

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

df = pd.read_csv('nuclear.csv', delimiter=',')

countries_shortNames = [['UNITED STATES OF AMERICA', 'USA'], \ ['RUSSIAN FEDERATION', 'RUSSIA'], \ ['IRAN, ISLAMIC REPUBLIC OF', 'IRAN'], \ ['KOREA, REPUBLIC OF', 'SOUTH KOREA'], \ ['TAIWAN, CHINA', 'CHINA']]

for shortName in countries_shortNames: df = df.replace(shortName[0], shortName[1])

import folium # folium Python的地图可视化工具库

import matplotlib.cm as cm #matplotlib.cm是Matplotlib中的一个模块，它提供了一组用于处理颜色映射（colormap）的函数和类。颜色映射是一种将数值映射到颜色的方法，常用于制作热力图、等值线图、散点图等。

import matplotlib.colors as colors#matplotlib.colors 是 Python 的 matplotlib 库中的一个模块，它提供了一组用于颜色处理的函数和类。

latitude, longitude = 40, 10.0 map_world_NPP = folium.Map(location=[latitude, longitude], zoom_start=2)

viridis = cm.get_cmap('viridis', df['NumReactor'].max())#使用 cm.get_cmap 从 matplotlib 获取名为 'viridis' 的颜色映射。计算数据框 df 中 'NumReactor' 列的最大值，用于确定颜色映射的范围。

colors_array = viridis(np.arange(df['NumReactor'].min() - 1, df['NumReactor'].max()))#根据 'NumReactor' 列的值范围生成颜色数组

rainbow = [colors.rgb2hex(i) for i in colors_array]#遍历 colors_array 中的每个 RGB 颜色值，将它们转换为十六进制颜色代码，并将这些颜色代码存储在名为 rainbow 的新列表中。

for nReactor, lat, lng, borough, neighborhood in zip(df['NumReactor'].astype(int), df['Latitude'].astype(float), df['Longitude'].astype(float), df['Plant'], df['NumReactor']):

label = '{}, {}'.format(neighborhood, borough)

label = folium.Popup(label, parse_html=True)

folium.CircleMarker( [lat, lng],

radius=3,

popup=label,

color=rainbow[nReactor - 1],

fill=True,

fill_color=rainbow[nReactor - 1],

fill_opacity=0.5).add_to(map_world_NPP)#使用 zip 函数将 'NumReactor', 'Latitude', 'Longitude', 'Plant', 'NumReactor' 列的数据组合在一起。

使用 folium.CircleMarker 创建一个圆点标记，并设置其位置、半径、弹出标签、颜色等属性

# 在地图上显示

map_world_NPP.save('world_map.html') # 保存为 HTML 文件

#然后打开world_map.html 文件可以看到

countries = df['Country'].unique()#提取了 'Country' 列中所有不重复的国家名称。

df_count_reactor = [i, df[df['Country'] == i.sum(), dfdf['Country'] == i.iloc[0]] for i in countries] df_count_reactor = pd.DataFrame(df_count_reactor, columns=['Country', 'NumReactor', 'Region']) df_count_reactor = df_count_reactor.set_index('Country').sort_values(by='NumReactor', ascending=False)[:20]

ax = df_count_reactor.plot(kind='bar', stacked=True, figsize=(10, 3), title='The 20 Countries With The Most Nuclear Reactors in 2010')

ax.set_ylim((0, 150))

for p in ax.patches:

ax.annotate(str(p.get_height()), xy=(p.get_x(), p.get_height() + 2))

df_count_reactor['Country'] = df_count_reactor.index sns.set(rc={'figure.figsize': (11.7, 8.27)}) sns.set_style("whitegrid") plt.rcParams['font.sans-serif']=['SimHei'] #用来正常显示中文标签

ax = sns.barplot(x="NumReactor", y="Country", hue="Region", data=df_count_reactor, dodge=False, orient='h')

ax.set_title('2010年拥有最多核反应堆的20个国家', fontsize=16)

ax.set_xlabel('Reactors', fontsize=16)

ax.set_ylabel('')

ax.legend(fontsize='14')

plt.show()

def getMostExposedNPP(Exposedradius):

df_pop_sort = df.sort_values(by=str('p10' + str(Exposedradius)), ascending=False)[:10] df_pop_sort['Country'] = df_pop_sort['Plant'] + ',\n' + df_pop_sort['Country']

df_pop_sort = df_pop_sort.set_index('Country')

df_pop_sort = df_pop_sort.rename( columns={str('p90' + str(Exposedradius)): '1990', str('p00' + str(Exposedradius)): '2000', str('p10' + str(Exposedradius)): '2010'})

df_pop_sort = df_pop_sort[['1990', '2000', '2010']] / 1E6

ax = df_pop_sort.plot(kind='bar', stacked=False, figsize=(10, 4))#绘制直方图

ax.set_ylabel('Population Exposure in millions', size=14)

ax.set_title( 'Location of nuclear power plants \n with the most exposed population \n within ' + Exposedradius + ' km radius', size=16)

print(df_pop_sort['2010'])#打印出 2010 年的数据

getMostExposedNPP('30')

latitude, longitude = 40, 10.0

map_world_NPP = folium.Figure(width=100, height=100)

map_world_NPP = folium.Map(location=[latitude, longitude], zoom_start=2)

for nReactor, lat, lng, borough, neighborhood in zip(df['NumReactor'].astype(int), df['Latitude'].astype(float), df['Longitude'].astype(float), df['Plant'], df['NumReactor']): label = '{}, {}'.format(neighborhood, borough) label = folium.Popup(label, parse_html=True) folium.Circle( [lat, lng], radius=30000, popup=label, color='grey', fill=True, fill_color='grey', fill_opacity=0.5).add_to(map_world_NPP)

Exposedradius = '30' df_sort = df.sort_values(by=str('p10_' + str(Exposedradius)), ascending=False)[:10]

for nReactor, lat, lng, borough, neighborhood in zip(df_sort['NumReactor'].astype(int), df_sort['Latitude'].astype(float), df_sort['Longitude'].astype(float), df_sort['Plant'], df_sort['NumReactor']):

label = '{}, {}'.format(neighborhood, borough)

label = folium.Popup(label, parse_html=True)

folium.CircleMarker( [lat, lng],

radius=5,

popup=label,

color='red',

fill=True,

fill_color='red',

fill_opacity=0.25).add_to(map_world_NPP)

Exposedradius = '30'
df_sort = df.sort_values(by=str('p10_' + str(Exposedradius)), ascending=False)[:10]#找出在 30 公里半径内人口暴露最多的前 10 个核电站。突出显示核电站的精确位置。

label = '{}, {}'.format(neighborhood, borough)

label = folium.Popup(label, parse_html=True)

folium.Circle( [lat, lng],

radius=30000,

popup=label,

color='red',

fill=True,

fill_color='red',

fill_opacity=0.25).add_to(map_world_NPP)#通过红色标记和圆圈来表示核电站的位置和它们周围可能受影响的区域。

# 在地图上显示

map_world_NPP.save('world_map2.html') # 保存为 HTML 文件

2401_83835232

关注

33
点赞
踩
27

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫