博主是在Jupyter Notebooks上进行练习。若想知道如何创建Jupyter Notebooks, 请点击这里查阅。
要想查阅pandas.DataFrame.plot文档,请点击这里
要想查阅matplotlib的colormap文档,请点击这里
这次使用两个dataset:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df1 = pd.read_csv('df1', index_col=0)
print(df1.head())
结果如下:
df2 = pd.read_csv('df2')
print(df2.head())
结果如下:
使用hist():
# hist(): in pyplot module of matplotlib library is used to plot a histogram
df1['A'].hist()
结果如何:
使用hist(), bins:
# # hist(): in pyplot module of matplotlib library is used to plot a histogram
# bins: This parameter is an optional parameter and it contains the integer or sequence or string.
df1[['A','B','C','D']].hist(bins=30)
plt.tight_layout()
df1['A'].hist(bins=30)
结果如下:
另外一种写法调用hist:
df1['A'].plot(kind='hist')
df1['A'].plot(kind='hist', bins=30)
# Draw one histogram of the DataFrame’s columns.
# DataFrame.plot.hist()
# A histogram is a representation of the distribution of data.
# This function groups the values of all given Series in the DataFrame into bins and draws all bins in one matplotlib.axes.Axes.
# This is useful when the DataFrame’s Series are in a similar scale.
# bins: number of histogram bins to be used
df1['A'].plot.hist(bins=30)
df1['A'].plot.hist()
结果如下:
现在使用plot.area():
# Draw a stacked area plot.
# An area plot displays quantitative data visually. This function wraps the matplotlib area function.
df2.plot.area()
# alpha的值越小,越透明
df2.plot.area(alpha=0.2)
结果如下:
使用plot.bar():
# plot.bar():
# Vertical bar plot.
# A bar plot is a plot that presents categorical data with rectangular bars with lengths proportional to the values that they represent.
# A bar plot shows comparisons among discrete categories.
# One axis of the plot shows the specific categories being compared, and the other axis represents a measured value.
df2.plot.bar()
df2.plot.bar(alpha=0.3)
结果如下:
df2.plot.bar(stacked=True)
df2.plot.bar(stacked=True,alpha=0.3)
结果如下:
现在使用df1的dataset:
df1
结果如下:
现在使用df1.plot.scatter():
df1.plot.scatter(x='A',y='B')
df1.plot.scatter(x='A',y='B',c='C')
df1.plot.scatter(x='A',y='B',s=df1['C']*10)
结果如下:
使用plot.line():
# Plot Series or DataFrame as lines.
# This function is useful to plot lines using DataFrame’s values as coordinates.
df1.plot.line(x='A',y='B')
结果如下:
使用plot.box():
# Make a box plot of the DataFrame columns.
# A box plot is a method for graphically depicting groups of numerical data through their quartiles.
# The box extends from the Q1 to Q3 quartile values of the data, with a line at the median (Q2).
# The whiskers extend from the edges of box to show the range of the data.
# The position of the whiskers is set by default to 1.5*IQR (IQR = Q3 - Q1) from the edges of the box.
# Outlier points are those past the end of the whiskers.
df2.plot.box()
结果如下:
创建新的DataFrame:
df = pd.DataFrame(np.random.rand(1000,2),columns=['a','b'])
df.head()
结果如下:
使用plot.hexbin():
# Generate a hexagonal binning plot.
# Generate a hexagonal binning plot of x versus y. If C is None (the default),
# this is a histogram of the number of occurrences of the observations at (x[i], y[i]).
df.plot.hexbin(x='a',y='b')
df.plot.hexbin(x='a',y='b',gridsize=10)
# cmap: colormap
df.plot.hexbin(x='a',y='b',gridsize=10,cmap='coolwarm')
结果如下:
使用plot.kde() 和 plot.density():
# Generate Kernel Density Estimate plot using Gaussian kernels.
# In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function (PDF) of a random variable.
# This function uses Gaussian kernels and includes automatic bandwidth determination.
df2['a'].plot.kde()
# Generate Kernel Density Estimate plot using Gaussian kernels.
# In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function (PDF) of a random variable.
# This function uses Gaussian kernels and includes automatic bandwidth determination.
df2['a'].plot.density()
结果如下:
df2.plot.kde()
结果如下: