这篇博客主要介绍“热图”,在Jupyter Notebooks上进行练习。
可以参考这篇博客如何创建Jupyter Notebooks, 请点击这里
使用的数据:
import seaborn as sns
tips = sns.load_dataset('tips')
flights = sns.load_dataset('flights')
print(tips.head())
print(flights.head())
结果如下:
使用corr():
# dataframe.corr() is used to find the pairwise correlation of all columns in the dataframe
print(tips.corr())
# method='pearson': standard correlation coefficient
print(tips.corr(method='pearson'))
# method='kendall': Kendall Tau correlation coefficient
# 肯德尔相关系数是一个用来测量两个随机变量相关的统计值。一个肯德尔检验是一个无参数假设检验,它使用计算而得的相关系数去检验两个随机变量的统计依赖性
# 肯德尔相关系数的取值范围在-1到1之间,当T为1时,表示两个随机变量拥有一致的等级的相关性;当T为-1时,表示两个随机变量拥有完全相反的等级相关性;当T为0时,表示两个随机变量是相互独立的。
print(tips.corr(method='kendall'))
# method='spearman': Spearman rank correlation
# 斯皮尔曼等级相关主要用于解决名称数据和顺序数据相关的问题。
print(tips.corr(method='spearman'))
结果如下:
使用heatmap():
tc = tips.corr()
# heatmap(): Plot rectangualr data as a color-encoded matrix.
# This is an Axes-level function and will draw the heatmap into the currently-active Axes if none is provided to the ax argument.
# Part of this Axes space will be taken and used to plot a colormap, unless cbar is False or a separate Axes is provided to cbar_ax.
print(sns.heatmap(tc))
# heatmap(data,annot=None)
# data: 2D dataset that can be coerced into an ndarray.
# annot: bool or rectangualr dataset. If True, write the data value in each cell.
# If an array-like with the same shape as data, then use this to annotate the heatmap instead of the data.
print(sns.heatmap(tc, annot=True))
结果如下:
# cmap: matplotlib colormap name or object, or list of colors.
# THe mapping from data values to color space. If not privided, the default will depend on whether center is set.
print(sns.heatmap(tc, annot=True, cmap='coolwarm'))
结果如下:
使用pivot_table():
# pivot_table(data, index=None, columns=None, values=None)
# pivot_table: Create a spreadsheet-style pivot table as a DataFrame.
# index: column, Grouper, array, or list of the previous
# If an array is passed, it must be the same length as the data.
# The list can contain any of the other types (except list).
# Keys to group by on the pivot table index. If an array is passed, it is being used as the same manner as column values.
# columns: column, Grouper, array, or list of the previous
# If an array is passed, it must be the same length as the data.
# The list can contain any of the other types (except list).
# Keys to group by on the pivot table column. If an array is passed, it is being used as the same manner as column values.
# values: column to aggregate, optional
print(flights.pivot_table(index='month',columns='year',values='passengers'))
结果如下:
使用pivot_table() 和 heatmap() 混合使用:
fp = flights.pivot_table(index='month',columns='year',values='passengers')
print(sns.heatmap(fp))
print(sns.heatmap(fp, cmap='magma'))
print(sns.heatmap(fp, cmap='magma', linecolor='white', linewidth=1))
结果如下:
使用clustermap():
# Plot a matrix dataset as a hierachically-clustered heatmap
print(sns.clustermap(fp))
结果如下:
print(sns.clustermap(fp, cmap='coolwarm'))
结果如下:
# standard_scale: int or None
# Either 0 (rows) or 1 (columns). Whether or not to standardize that dimension, meaning for each row or column, subtract the minimum and
# divide each by its maximum.
print(sns.clustermap(fp, cmap='coolwarm',standard_scale=1))
结果如下:
如果觉得不错,就点赞或者关注或者留言~~
谢谢~ ~