Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest option is to use Jupyter’s magic com‐ mand %matplotlib inline
. This tells Jupyter to set up Matplotlib so it uses Jupyter’s own backend.
Scatter Plot
housing.plot(kind="scatter", x="longitude", y="latitude")
You can set the parameter alpha
to study the density of points:
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.1)
The plot can convey more information by setting different colors, sizes, shapes, etc. Here we will use a predefined color map (option cmap
) called jet
. As an example, we plot the house prices in different locations and let the radius of each circle represents the district’s population (option s
), and the color represents the price (option c
).
%matplotlib inline
import matplotlib.pyplot as plt
housing.plot(kind="scatter", x="longitude", y="latitude", alpha=0.4,
s=housing["population"]/100, label="population", figsize=(10,7),
c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True,
sharex=False)
plt.legend()
save_fig("housing_prices_scatterplot")
Note that the argument sharex=False
fixes a display bug (the x-axis values and legend were not displayed). This is a temporary fix (see: https://github.com/pandas-dev/pandas/issues/10611).
Correlation Plot
We can calculate the correlation coefficients between each pair of attributes using corr()
method and look at the value by sort_values()
:
corr_matrix = housing.corr()
corr_matrix["median_house_value"].sort_values(ascending=False)
Also, we can use scatter_matrix
function, which plots every numerical attribute against every other numerical attribute. The diagonal displays the histogram of each attribute.
from pandas.tools.plotting import scatter_matrix
attributes = ["median_house_value", "median_income", "total_rooms", "housing_median_age"]
scatter_matrix(housing[attributes], figsize=(12, 8))