五大数据可视化库教程 ---Adam Studio

最新推荐文章于 2024-05-05 10:48:16 发布

Adam婷

最新推荐文章于 2024-05-05 10:48:16 发布

阅读量1.7k

点赞数

分类专栏：数据科学可视化数据 TIPs

本文链接：https://blog.csdn.net/weixin_41697507/article/details/96031677

版权

数据科学同时被 3 个专栏收录

37 篇文章 2 订阅

订阅专栏

TIPs

15 篇文章 0 订阅

订阅专栏

可视化数据

6 篇文章 0 订阅

订阅专栏

Notebook Content

Introduction
Loading Packages
version
Setup
Data Collection
Data Visualization Libraries
Matplotlib
Scatterplots
Line Plots
Bar Charts
Histograms
Box and Whisker Plots
Heatmaps
Animations
Interactivity
DataFrame.plot
Seaborn
Seaborn Vs Matplotlib
Useful Python Data Visualization Libraries
Plotly
New to Plotly?
Plotly Offline from Command Line
Bokeh
networkx
Read more
Courses
Ebooks
Cheat sheet
Conclusion
References

1- Introduction

If you’ve followed my other kernels so far. You have noticed that for those who are beginners, I’ve introduced a course " 10 Steps to Become a Data Scientist ". In this kernel we will start another step with each other. There are plenty of Kernels that can help you learn Python 's Libraries from scratch but here in Kaggle, I want to Analysis Meta Kaggle a popular Dataset. After reading, you can use it to Analysis other real dataset and use it as a template to deal with ML problems. It is clear that everyone in this community is familiar with Meta Kaggle dataset but if you need to review your information about the datasets please visit meta-kaggle .

I am open to getting your feedback for improving this kernel together.

2- Loading Packages

In this kernel we are using the following packages:

from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
from bokeh.io import push_notebook, show, output_notebook
import mpl_toolkits.axes_grid1.inset_locator as mpl_il
from bokeh.plotting import figure, output_file, show
from bokeh.io import show, output_notebook
import matplotlib.animation as animation
from matplotlib.figure import Figure
from sklearn.cluster import KMeans
import plotly.figure_factory as ff
import matplotlib.pylab as pylab
from ipywidgets import interact
import plotly.graph_objs as go
import plotly.graph_objs as go
import matplotlib.pyplot as plt
from bokeh.plotting import figure
from sklearn  import datasets
import plotly.plotly as py
import plotly.graph_objs as go
from plotly import tools

from sklearn import datasets
import plotly.offline as py
from random import randint
from plotly import tools
import matplotlib as mpl
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib
import warnings
import string
import numpy
import csv
import os

2-1 version

print('matplotlib: {}'.format(matplotlib.__version__))
print('seaborn: {}'.format(sns.__version__))
print('pandas: {}'.format(pd.__version__))
print('numpy: {}'.format(np.__version__))

#print('wordcloud: {}'.format(wordcloud.version))

在这里插入图片描述

2-2 Setup

A few tiny adjustments for better code readability

sns.set(style='white', context='notebook', palette='deep')
pylab.rcParams['figure.figsize'] = 12,8
warnings.filterwarnings('ignore')
mpl.style.use('ggplot')
sns.set_style('white')
%matplotlib inline

2-3 Data Collection

Data collection is the process of gathering and measuring data, information or any variables of interest in a standardized and established manner that enables the collector to answer or test hypothesis and evaluate outcomes of the particular collection.[techopedia]

在这里插入图片描述

I start Collection Data by the Users and Kernels datasets into Pandas DataFrames

# import kernels and users to play with it (MJ Bahmani)
#command--> 1
users = pd.read_csv("../input/Users.csv")
kernels = pd.read_csv("../input/Kernels.csv")
messages = pd.read_csv("../input/ForumMessages.csv")

#command--> 2
users.sample(1)

在这里插入图片描述

Please replace your username and find your userid
We suppose that userid==authoruserid and use userid for both kernels and users dataset

username="mjbahmani"
userid=int(users[users['UserName']=="mjbahmani"].Id)
userid

1840354

We can just use dropna()(be careful sometimes you should not do this!)

# remove rows that have NA's
print('Before Droping',messages.shape)
#command--> 3
messages = messages.dropna()
print('After Droping',messages.shape)

在这里插入图片描述

2-3-1 Features
Features can be from following types:

numeric
categorical
ordinal
datetime
coordinates

Find the type of features in Meta Kaggle?!
For getting some information about the dataset you can use info() command

#command--> 4
print(users.info())

在这里插入图片描述

2-3-2 Explorer Dataset

Dimensions of the dataset.
Peek at the data itself.
Statistical summary of all attributes.
Breakdown of the data by the class variable.

Don’t worry, each look at the data is one command. These are useful commands that you can use again and again on future projects.

# shape
#command--> 5
print(users.shape)

在这里插入图片描述

#columns*rows
#command--> 6
users.size

在这里插入图片描述

We can get a quick idea of how many instances (rows) and how many attributes (columns) the data contains with the shape property.

You see number of unique item for Species with command below:

#command--> 7
kernels['Medal'].unique()

在这里插入图片描述

#command--> 8
kernels["Medal"].value_counts()

在这里插入图片描述
To check the first 5 rows of the data set, we can use head(5).

kernels.head(5)

在这里插入图片描述
To check out last 5 row of the data set, we use tail() function

#command--> 9
users.tail()

在这里插入图片描述
To pop up 5 random rows from the data set, we can use sample(5) function

kernels.sample(5)

在这里插入图片描述
To give a statistical summary about the dataset, we can use describe()

kernels.describe()

在这里插入图片描述

2-3-5 Find yourself in Users datset

#command--> 12
users[users['Id']==userid]

在这里插入图片描述

2-3-6 Find your kernels in Kernels dataset

#command--> 13
yourkernels=kernels[kernels['AuthorUserId']==userid]
yourkernels.head(2)

在这里插入图片描述
3- Data Visualization Libraries
Before you start learning , I am giving an overview of 10 interdisciplinary Python data visualization libraries, from the well-known to the obscure. based on modeanalytics:

1- matplotlib
matplotlib is the O.G. of Python data visualization libraries. Despite being over a decade old, it’s still the most widely used library for plotting in the Python community. It was designed to closely resemble MATLAB, a proprietary programming language developed in the 1980s.

2- Seaborn
Seaborn harnesses the power of matplotlib to create beautiful charts in a few lines of code. The key difference is Seaborn’s default styles and color palettes, which are designed to be more aesthetically pleasing and modern. Since Seaborn is built on top of matplotlib, you’ll need to know matplotlib to tweak Seaborn’s defaults.

3- ggplot
ggplot is based on ggplot2, an R plotting system, and concepts from The Grammar of Graphics. ggplot operates differently than matplotlib: it lets you layer components to create a complete plot. For instance, you can start with axes, then add points, then a line, a trendline, etc. Although The Grammar of Graphics has been praised as an “intuitive” method for plotting, seasoned matplotlib users might need time to adjust to this new mindset.

4- Bokeh
Like ggplot, Bokeh is based on The Grammar of Graphics, but unlike ggplot, it’s native to Python, not ported over from R. Its strength lies in the ability to create interactive, web-ready plots, which can be easily outputted as JSON objects, HTML documents, or interactive web applications. Bokeh also supports streaming and real-time data.

5- pygal
Like Bokeh and Plotly, pygal offers interactive plots that can be embedded in the web browser. Its prime differentiator is the ability to output charts as SVGs. As long as you’re working with smaller datasets, SVGs will do you just fine. But if you’re making charts with hundreds of thousands of data points, they’ll have trouble rendering and become sluggish.

6- Plotly
You might know Plotly as an online platform for data visualization, but did you also know you can access its capabilities from a Python notebook? Like Bokeh, Plotly’s forte is making interactive plots, but it offers some charts you won’t find in most libraries, like contour plots, dendograms, and 3D charts.

7- geoplotlib
geoplotlib is a toolbox for creating maps and plotting geographical data. You can use it to create a variety of map-types, like choropleths, heatmaps, and dot density maps. You must have Pyglet (an object-oriented programming interface) installed to use geoplotlib. Nonetheless, since most Python data visualization libraries don’t offer maps, it’s nice to have a library dedicated solely to them.

8- Gleam
Gleam is inspired by R’s Shiny package. It allows you to turn analyses into interactive web apps using only Python scripts, so you don’t have to know any other languages like HTML, CSS, or JavaScript. Gleam works with any Python data visualization library. Once you’ve created a plot, you can build fields on top of it so users can filter and sort data.

9- missingno
Dealing with missing data is a pain. missingno allows you to quickly gauge the completeness of a dataset with a visual summary, instead of trudging through a table. You can filter and sort data based on completion or spot correlations with a heatmap or a dendrogram.

10- Leather
Leather’s creator, Christopher Groskopf, puts it best: “Leather is the Python charting library for those who need charts now and don’t care if they’re perfect.” It’s designed to work with all data types and produces charts as SVGs, so you can scale them without losing image quality. Since this library is relatively new, some of the documentation is still in progress. The charts you can make are pretty basic—but that’s the intention.

At the end, nice cheatsheet on how to best visualize your data. I think I will print it out as a good reminder of “best practices”. Check out the link for the complete cheatsheet, also as a PDF.

11- Chartify Chartify is a Python library that makes it easy for data scientists to create charts.
Why use Chartify?

Consistent input data format: Spend less time transforming data to get your charts to work. All plotting functions use a consistent tidy input data format.
Smart default styles: Create pretty charts with very little customization required.
Simple API: We’ve attempted to make to the API as intuitive and easy to learn as possible.
Flexibility: Chartify is built on top of Bokeh, so if you do need more control you can always fall back on Bokeh’s API. Link: https://blog.modeanalytics.com/python-data-visualization-libraries/

在这里插入图片描述

4- Matplotlib

This Matplotlib tutorial takes you through the basics Python data visualization:

the anatomy of a plot
pyplot
pylab
and much more ###### Go to top

You can show matplotlib figures directly in the notebook by using the %matplotlib notebook and %matplotlib inline magic commands.

%matplotlib notebook provides an interactive environment.

We can use html cell magic to display the image.

#import matplotlib.pyplot as plt
plt.plot([1, 2, 3, 4], [10, 20, 25, 30], color='lightblue', linewidth=3)
plt.scatter([0.4, 3.8, 1.2, 2.5], [15, 25, 9, 26], color='darkgreen', marker='o')
plt.xlim(0.5, 4.5)
plt.show()

在这里插入图片描述
Simple and powerful visualizations can be generated using the Matplotlib Python Library. More than a decade old, it is the most widely-used library for plotting in the Python community. A wide range of graphs from histograms to heat plots to line plots can be plotted using Matplotlib.

Many other libraries are built on top of Matplotlib and are designed to work in conjunction with analysis, it being the first Python data visualization library. Libraries like pandas and matplotlib are “wrappers” over Matplotlib allowing access to a number of Matplotlib’s methods with less code.[7]

4-1 Scatterplots

x = np.array([1,2,3,4,5,6,7,8])
y = x

plt.figure()
plt.scatter(x, y) # similar to plt.plot(x, y, '.'), but the underlying child objects in the axes are not Line2D

在这里插入图片描述

x = np.array([1,2,3,4,5,6,7,8])
y = x

# create a list of colors for each point to have
# ['green', 'green', 'green', 'green', 'green', 'green', 'green', 'red']
colors = ['green']*(len(x)-1)
colors.append('red')

plt.figure()

# plot the point with size 100 and chosen colors
plt.scatter(x, y, s=100, c=colors)

在这里插入图片描述

plt.figure()
# plot a data series 'Tall students' in red using the first two elements of x and y
plt.scatter(x[:2], y[:2], s=100, c='red', label='Tall students')
# plot a second data series 'Short students' in blue using the last three elements of x and y 
plt.scatter(x[2:], y[2:], s=100, c='blue', label='Short students')

在这里插入图片描述

x = np.random.randint(low=1, high=11, size=50)
y = x + np.random.randint(1, 5, size=x.size)
data = np.column_stack((x, y))

fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2,
                               figsize=(8, 4))

ax1.scatter(x=x, y=y, marker='o', c='r', edgecolor='b')
ax1.set_title('Scatter: $x$ versus $y$')
ax1.set_xlabel('$x$')
ax1.set_ylabel('$y$')

ax2.hist(data, bins=np.arange(data.min(), data.max()),
         label=('x', 'y'))
ax2.legend(loc=(0.65, 0.8))
ax2.set_title('Frequencies of $x$ and $y$')
ax2.yaxis.tick_right()

在这里插入图片描述

# Modify the graph above by assigning each species an individual color.
#command--> 19
x=yourkernels["TotalVotes"]
y=yourkernels["TotalViews"]
plt.scatter(x, y)
plt.legend()
plt.show()

在这里插入图片描述

4-2 Line Plots

linear_data = np.array([1,2,3,4,5,6,7,8])
exponential_data = linear_data**2

plt.figure()
# plot the linear data and the exponential data
plt.plot(linear_data, '-o', exponential_data, '-o')

在这里插入图片描述

# plot another series with a dashed red line
plt.plot([22,44,55], '--r')

在这里插入图片描述

4-3 Bar Charts

plt.figure()
xvals = range(len(linear_data))
plt.bar(xvals, linear_data, width = 0.3)

在这里插入图片描述

new_xvals = []

# plot another set of bars, adjusting the new xvals to make up for the first set of bars plotted
for item in xvals:
    new_xvals.append(item+0.3)

plt.bar(new_xvals, exponential_data, width = 0.3 ,color='red')

在这里插入图片描述

linear_err = [randint(0,15) for x in range(len(linear_data))] 

# This will plot a new set of bars with errorbars using the list of random error values
plt.bar(xvals, linear_data, width = 0.3, yerr=linear_err)

在这里插入图片描述

# stacked bar charts are also possible
plt.figure()
xvals = range(len(linear_data))
plt.bar(xvals, linear_data, width = 0.3, color='b')
plt.bar(xvals, exponential_data, width = 0.3, bottom=linear_data, color='r')

在这里插入图片描述

# or use barh for horizontal bar charts
plt.figure()
xvals = range(len(linear_data))
plt.barh(xvals, linear_data, height = 0.3, color='b')
plt.barh(xvals, exponential_data, height = 0.3, left=linear_data, color='r')

在这里插入图片描述

# Initialize the plot
fig = plt.figure(figsize=(20,10))
ax1 = fig.add_subplot(121)
ax2 = fig.add_subplot(122)

# or replace the three lines of code above by the following line: 
#fig, (ax1, ax2) = plt.subplots(1,2, figsize=(20,10))

# Plot the data
ax1.bar([1,2,3],[3,4,5])
ax2.barh([0.5,1,2.5],[0,1,2])

# Show the plot
plt.show()

在这里插入图片描述

plt.figure()
# subplot with 1 row, 2 columns, and current axis is 1st subplot axes
plt.subplot(1, 2, 1)

linear_data = np.array([1,2,3,4,5,6,7,8])

plt.plot(linear_data, '-o')

在这里插入图片描述

exponential_data = linear_data**2 

# subplot with 1 row, 2 columns, and current axis is 2nd subplot axes
plt.subplot(1, 2, 2)
plt.plot(exponential_data, '-o')

在这里插入图片描述

# plot exponential data on 1st subplot axes
plt.subplot(1, 2, 1)
plt.plot(exponential_data, '-x')

在这里插入图片描述

plt.figure()
ax1 = plt.subplot(1, 2, 1)
plt.plot(linear_data, '-o')
# pass sharey=ax1 to ensure the two subplots share the same y axis
ax2 = plt.subplot(1, 2, 2, sharey=ax1)
plt.plot(exponential_data, '-x')

在这里插入图片描述

4-4 Histograms

# create 2x2 grid of axis subplots
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, sharex=True)
axs = [ax1,ax2,ax3,ax4]

# draw n = 10, 100, 1000, and 10000 samples from the normal distribution and plot corresponding histograms
for n in range(0,len(axs)):
    sample_size = 10**(n+1)
    sample = np.random.normal(loc=0.0, scale=1.0, size=sample_size)
    axs[n].hist(sample)
    axs[n].set_title('n={}'.format(sample_size))

在这里插入图片描述

# repeat with number of bins set to 100
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, sharex=True)
axs = [ax1,ax2,ax3,ax4]

for n in range(0,len(axs)):
    sample_size = 10**(n+1)
    sample = np.random.normal(loc=0.0, scale=1.0, size=sample_size)
    axs[n].hist(sample, bins=100)
    axs[n].set_title('n={}'.format(sample_size))

在这里插入图片描述

plt.figure()
Y = np.random.normal(loc=0.0, scale=1.0, size=10000)
X = np.random.random(size=10000)
plt.scatter(X,Y)

在这里插入图片描述

It looks like perhaps two of the input variables have a Gaussian distribution. This is useful to note as we can use algorithms that can exploit this assumption.

yourkernels["TotalViews"].hist();

在这里插入图片描述

yourkernels["TotalComments"].hist();

在这里插入图片描述

sns.factorplot('TotalViews','TotalVotes',data=yourkernels)
plt.show()

在这里插入图片描述

4-5 Box and Whisker Plots

In descriptive statistics, a box plot or boxplot is a method for graphically depicting groups of numerical data through their quartiles. Box plots may also have lines extending vertically from the boxes (whiskers) indicating variability outside the upper and lower quartiles, hence the terms box-and-whisker plot and box-and-whisker diagram.[wikipedia]

normal_sample = np.random.normal(loc=0.0, scale=1.0, size=10000)
random_sample = np.random.random(size=10000)
gamma_sample = np.random.gamma(2, size=10000)

df = pd.DataFrame({'normal': normal_sample, 
                   'random': random_sample, 
                   'gamma': gamma_sample})

plt.figure()
# create a boxplot of the normal data, assign the output to a variable to supress output
_ = plt.boxplot(df['normal'], whis='range')

在这里插入图片描述

# clear the current figure
plt.clf()
# plot boxplots for all three of df's columns
_ = plt.boxplot([ df['normal'], df['random'], df['gamma'] ], whis='range')

在这里插入图片描述

plt.figure()
_ = plt.hist(df['gamma'], bins=100)

在这里插入图片描述

plt.figure()
plt.boxplot([ df['normal'], df['random'], df['gamma'] ], whis='range')
# overlay axis on top of another 
ax2 = mpl_il.inset_axes(plt.gca(), width='60%', height='40%', loc=2)
ax2.hist(df['gamma'], bins=100)
ax2.margins(x=0.5)

在这里插入图片描述

# switch the y axis ticks for ax2 to the right side
ax2.yaxis.tick_right()

# if `whis` argument isn't passed, boxplot defaults to showing 1.5*interquartile (IQR) whiskers with outliers
plt.figure()
_ = plt.boxplot([ df['normal'], df['random'], df['gamma'] ] )

在这里插入图片描述

sns.factorplot('TotalComments','TotalVotes',data=yourkernels)
plt.show()

在这里插入图片描述

4-6 Heatmaps

plt.figure()

Y = np.random.normal(loc=0.0, scale=1.0, size=10000)
X = np.random.random(size=10000)
_ = plt.hist2d(X, Y, bins=25)

在这里插入图片描述

plt.figure()
_ = plt.hist2d(X, Y, bins=100)

在这里插入图片描述

4-7 Animations

n = 100
x = np.random.randn(n)

# create the function that will do the plotting, where curr is the current frame
def update(curr):
    # check if animation is at the last frame, and if so, stop the animation a
    if curr == n: 
        a.event_source.stop()
    plt.cla()
    bins = np.arange(-4, 4, 0.5)
    plt.hist(x[:curr], bins=bins)
    plt.axis([-4,4,0,30])
    plt.gca().set_title('Sampling the Normal Distribution')
    plt.gca().set_ylabel('Frequency')
    plt.gca().set_xlabel('Value')
    plt.annotate('n = {}'.format(curr), [3,27])

fig = plt.figure()
a = animation.FuncAnimation(fig, update, interval=100)

在这里插入图片描述

4-8 Interactivity

plt.figure()
data = np.random.rand(10)
plt.plot(data)

def onclick(event):
    plt.cla()
    plt.plot(data)
    plt.gca().set_title('Event at pixels {},{} \nand data {},{}'.format(event.x, event.y, event.xdata, event.ydata))

# tell mpl_connect we want to pass a 'button_press_event' into onclick when the event is detected
plt.gcf().canvas.mpl_connect('button_press_event', onclick)

在这里插入图片描述

from random import shuffle
origins = ['China', 'Brazil', 'India', 'USA', 'Canada', 'UK', 'Germany', 'Iraq', 'Chile', 'Mexico']

shuffle(origins)

df = pd.DataFrame({'height': np.random.rand(10),
                   'weight': np.random.rand(10),
                   'origin': origins})
df

在这里插入图片描述

plt.figure()
# picker=5 means the mouse doesn't have to click directly on an event, but can be up to 5 pixels away
plt.scatter(df['height'], df['weight'], picker=5)
plt.gca().set_ylabel('Weight')
plt.gca().set_xlabel('Height')

在这里插入图片描述

def onpick(event):
    origin = df.iloc[event.ind[0]]['origin']
    plt.gca().set_title('Selected item came from {}'.format(origin))

# tell mpl_connect we want to pass a 'pick_event' into onpick when the event is detected
plt.gcf().canvas.mpl_connect('pick_event', onpick)

在这里插入图片描述

# use the 'seaborn-colorblind' style
plt.style.use('seaborn-colorblind')

4-9 DataFrame.plot

np.random.seed(123)

df = pd.DataFrame({'A': np.random.randn(365).cumsum(0), 
                   'B': np.random.randn(365).cumsum(0) + 20,
                   'C': np.random.randn(365).cumsum(0) - 20}, 
                  index=pd.date_range('1/1/2017', periods=365))
df.head()

在这里插入图片描述

You can also choose the plot kind by using the DataFrame.plot.kind methods instead of providing the kind keyword argument.

kind :

‘line’ : line plot (default)
‘bar’ : vertical bar plot
‘barh’ : horizontal bar plot
‘hist’ : histogram
‘box’ : boxplot
‘kde’ : Kernel Density Estimation plot
‘density’ : same as ‘kde’
‘area’ : area plot
‘pie’ : pie plot
‘scatter’ : scatter plot
‘hexbin’ : hexbin plot ###### Go to top

# create a scatter plot of columns 'A' and 'C', with changing color (c) and size (s) based on column 'B'
df.plot.scatter('A', 'C', c='B', s=df['B'], colormap='viridis')

在这里插入图片描述

ax = df.plot.scatter('A', 'C', c='B', s=df['B'], colormap='viridis')
ax.set_aspect('equal')

在这里插入图片描述

df.plot.box();

在这里插入图片描述

df.plot.hist(alpha=0.7);

在这里插入图片描述
Kernel density estimation plots are useful for deriving a smooth continuous function from a given sample.

df.plot.kde();

在这里插入图片描述

5- Seaborn

Seaborn is an open source, BSD-licensed Python library providing high level API for visualizing the data using Python programming language.[9] tutorialspoint

5-1 Seaborn Vs Matplotlib

It is summarized that if Matplotlib “tries to make easy things easy and hard things possible”, Seaborn tries to make a well defined set of hard things easy too.seaborn_introduction

Seaborn helps resolve the two major problems faced by Matplotlib; the problems are

Default Matplotlib parameters
Working with data frames

As Seaborn compliments and extends Matplotlib, the learning curve is quite gradual. If you know Matplotlib, you are already half way through Seaborn.

Important Features of Seaborn Seaborn is built on top of Python’s core visualization library Matplotlib. It is meant to serve as a complement, and not a replacement. However, Seaborn comes with some very important features. Let us see a few of them here. The features help in −

Built in themes for styling matplotlib graphics
Visualizing univariate and bivariate data
Fitting in and visualizing linear regression models
Plotting statistical time series data
Seaborn works well with NumPy and Pandas data structures
It comes with built in themes for styling Matplotlib graphics

In most cases, you will still use Matplotlib for simple plotting. The knowledge of Matplotlib is recommended to tweak Seaborn’s default plots.[9]

def sinplot(flip = 1):
   x = np.linspace(0, 14, 100)
   for i in range(1, 5): 
      plt.plot(x, np.sin(x + i * .5) * (7 - i) * flip)
sinplot()
plt.show()

在这里插入图片描述

def sinplot(flip = 1):
   x = np.linspace(0, 14, 100)
   for i in range(1, 5):
      plt.plot(x, np.sin(x + i * .5) * (7 - i) * flip)
 
sns.set()
sinplot()
plt.show()

在这里插入图片描述

np.random.seed(1234)

v1 = pd.Series(np.random.normal(0,10,1000), name='v1')
v2 = pd.Series(2*v1 + np.random.normal(60,15,1000), name='v2')

plt.figure()
plt.hist(v1, alpha=0.7, bins=np.arange(-50,150,5), label='v1');
plt.hist(v2, alpha=0.7, bins=np.arange(-50,150,5), label='v2');
plt.legend();

在这里插入图片描述

plt.figure()
# we can pass keyword arguments for each individual component of the plot
sns.distplot(v2, hist_kws={'color': 'Teal'}, kde_kws={'color': 'Navy'});

在这里插入图片描述

sns.jointplot(v1, v2, alpha=0.4);

在这里插入图片描述

grid = sns.jointplot(v1, v2, alpha=0.4);
grid.ax_joint.set_aspect('equal')

在这里插入图片描述

sns.jointplot(v1, v2, kind='hex');

在这里插入图片描述

# set the seaborn style for all the following plots
sns.set_style('white')

sns.jointplot(v1, v2, kind='kde', space=0);

在这里插入图片描述

sns.factorplot('TotalComments','TotalVotes',data=yourkernels)
plt.show()

在这里插入图片描述

# violinplots on petal-length for each species
#command--> 24
sns.violinplot(data=yourkernels,x="TotalViews", y="TotalVotes")

在这里插入图片描述

# violinplots on petal-length for each species
sns.violinplot(data=yourkernels,x="TotalComments", y="TotalVotes")

在这里插入图片描述

sns.violinplot(data=yourkernels,x="Medal", y="TotalVotes")

在这里插入图片描述

sns.violinplot(data=yourkernels,x="Medal", y="TotalComments")

在这里插入图片描述

How many NA elements in every column.

5-2 kdeplot

# seaborn's kdeplot, plots univariate or bivariate density estimates.
#Size can be changed by tweeking the value used
#command--> 25
sns.FacetGrid(yourkernels, hue="Medal", size=5).map(sns.kdeplot, "TotalComments").add_legend()
plt.show()

在这里插入图片描述

sns.FacetGrid(yourkernels, hue="Medal", size=5).map(sns.kdeplot, "TotalVotes").add_legend()
plt.show()

在这里插入图片描述

f,ax=plt.subplots(1,3,figsize=(20,8))
sns.distplot(yourkernels[yourkernels['Medal']==1].TotalVotes,ax=ax[0])
ax[0].set_title('TotalVotes in Medal 1')
sns.distplot(yourkernels[yourkernels['Medal']==2].TotalVotes,ax=ax[1])
ax[1].set_title('TotalVotes in Medal 2')
sns.distplot(yourkernels[yourkernels['Medal']==3].TotalVotes,ax=ax[2])
ax[2].set_title('TotalVotes in Medal 3')
plt.show()

在这里插入图片描述

5-3 jointplot

# Use seaborn's jointplot to make a hexagonal bin plot
#Set desired size and ratio and choose a color.
#command--> 25
sns.jointplot(x="TotalVotes", y="TotalViews", data=yourkernels, size=10,ratio=10, kind='hex',color='green')
plt.show()

在这里插入图片描述

5-4 andrews_curves

# we will use seaborn jointplot shows bivariate scatterplots and univariate histograms with Kernel density 
# estimation in the same figure
sns.jointplot(x="TotalVotes", y="TotalViews", data=yourkernels, size=6, kind='kde', color='#800000', space=0)

在这里插入图片描述

5-5 Heatmap

#command--> 26
plt.figure(figsize=(10,7)) 
sns.heatmap(yourkernels.corr(),annot=True,cmap='cubehelix_r') #draws  heatmap with input as the correlation matrix calculted by(iris.corr())
plt.show()

在这里插入图片描述

sns.factorplot('TotalComments','TotalVotes',data=yourkernels)
plt.show()

在这里插入图片描述

5-6 distplot

sns.distplot(yourkernels['TotalVotes']);

在这里插入图片描述

6- Plotly

How to use Plotly offline inside IPython notebooks.

6-1 New to Plotly?

Plotly, also known by its URL, Plot.ly, is a technical computing company headquartered in Montreal, Quebec, that develops online data analytics and visualization tools. Plotly provides online graphing, analytics, and statistics tools for individuals and collaboration, as well as scientific graphing libraries for Python, R, MATLAB, Perl, Julia, Arduino, and REST.

# example for plotly
py.init_notebook_mode(connected=True)
iris = datasets.load_iris()
X = iris.data[:, :2]  # we only take the first two features.
Y = iris.target
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
trace = go.Scatter(x=X[:, 0],
                   y=X[:, 1],
                   mode='markers',
                   marker=dict(color=np.random.randn(150),
                               size=10,
                               colorscale='Viridis',
                               showscale=False))

layout = go.Layout(title='Training Points',
                   xaxis=dict(title='Sepal length',
                            showgrid=False),
                   yaxis=dict(title='Sepal width',
                            showgrid=False),
                  )
 
fig = go.Figure(data=[trace], layout=layout)

py.iplot(fig)

在这里插入图片描述

from sklearn.decomposition import PCA
X_reduced = PCA(n_components=3).fit_transform(iris.data)

trace = go.Scatter3d(x=X_reduced[:, 0], 
                     y=X_reduced[:, 1], 
                     z=X_reduced[:, 2],
                     mode='markers',
                     marker=dict(
                         size=6,
                         color=np.random.randn(150),
                         colorscale='Viridis',   
                         opacity=0.8)
                    )
layout=go.Layout(title='First three PCA directions',
                 scene=dict(
                         xaxis=dict(title='1st eigenvector'),
                         yaxis=dict(title='2nd eigenvector'),
                         zaxis=dict(title='3rd eigenvector'))
                 )
fig = go.Figure(data=[trace], layout=layout)

py.iplot(fig)

在这里插入图片描述

6-2 Plotly Offline from Command Line

You can plot your graphs from a python script from command line. On executing the script, it will open a web browser with your Plotly Graph drawn. plot.ly

plot([go.Scatter(x=[1, 2, 3], y=[3, 1, 6])])

np.random.seed(5)

fig = tools.make_subplots(rows=2, cols=3,
                          print_grid=False,
                          specs=[[{'is_3d': True}, {'is_3d': True}, {'is_3d': True}],
                                 [ {'is_3d': True, 'rowspan':1}, None, None]])
scene = dict(
    camera = dict(
    up=dict(x=0, y=0, z=1),
    center=dict(x=0, y=0, z=0),
    eye=dict(x=2.5, y=0.1, z=0.1)
    ),
    xaxis=dict(
        range=[-1, 4],
        title='Petal width',
        gridcolor='rgb(255, 255, 255)',
        zerolinecolor='rgb(255, 255, 255)',
        showbackground=True,
        backgroundcolor='rgb(230, 230,230)',
        showticklabels=False, ticks=''
    ),
    yaxis=dict(
        range=[4, 8],
        title='Sepal length',
        gridcolor='rgb(255, 255, 255)',
        zerolinecolor='rgb(255, 255, 255)',
        showbackground=True,
        backgroundcolor='rgb(230, 230,230)',
        showticklabels=False, ticks=''
    ),
    zaxis=dict(
        range=[1,8],
        title='Petal length',
        gridcolor='rgb(255, 255, 255)',
        zerolinecolor='rgb(255, 255, 255)',
        showbackground=True,
        backgroundcolor='rgb(230, 230,230)',
        showticklabels=False, ticks=''
    )
)

centers = [[1, 1], [-1, -1], [1, -1]]
iris = datasets.load_iris()
X = iris.data
y = iris.target

estimators = {'k_means_iris_3': KMeans(n_clusters=3),
              'k_means_iris_8': KMeans(n_clusters=8),
              'k_means_iris_bad_init': KMeans(n_clusters=3, n_init=1,
                                              init='random')}
fignum = 1
for name, est in estimators.items():
    est.fit(X)
    labels = est.labels_

    trace = go.Scatter3d(x=X[:, 3], y=X[:, 0], z=X[:, 2],
                         showlegend=False,
                         mode='markers',
                         marker=dict(
                                color=labels.astype(np.float),
                                line=dict(color='black', width=1)
        ))
    fig.append_trace(trace, 1, fignum)
    
    fignum = fignum + 1

y = np.choose(y, [1, 2, 0]).astype(np.float)

trace1 = go.Scatter3d(x=X[:, 3], y=X[:, 0], z=X[:, 2],
                      showlegend=False,
                      mode='markers',
                      marker=dict(
                            color=y,
                            line=dict(color='black', width=1)))
fig.append_trace(trace1, 2, 1)

fig['layout'].update(height=900, width=900,
                     margin=dict(l=10,r=10))

在这里插入图片描述

py.iplot(fig)

在这里插入图片描述

7- Bokeh

Bokeh is a large library that exposes many capabilities, so this section is only a quick tour of some common Bokeh use cases and workflows. For more detailed information please consult the full User Guide.[11] pydata

Let’s begin with some examples. Plotting data in basic Python lists as a line plot including zoom, pan, save, and other tools is simple and straightforward:

output_notebook()

x = np.linspace(0, 2*np.pi, 2000)
y = np.sin(x)

# prepare some data
x = [1, 2, 3, 4, 5]
y = [6, 7, 2, 4, 5]



# create a new plot with a title and axis labels
p = figure(title="simple line example", x_axis_label='x', y_axis_label='y')

# add a line renderer with legend and line thickness
p.line(x, y, legend="Temp.", line_width=2)

# show the results
show(p)

在这里插入图片描述
When you execute this script, you will see that a new output file “lines.html” is created, and that a browser automatically opens a new tab to display it. (For presentation purposes we have included the plot output directly inline in this document.) bokeh

The basic steps to creating plots with the bokeh.plotting interface are:

Prepare some data In this case plain python lists, but could also be NumPy arrays or Pandas series. Tell Bokeh where to generate output In this case using output_file(), with the filename “lines.html”. Another option is output_notebook() for use in Jupyter notebooks. Call figure() This creates a plot with typical default options and easy customization of title, tools, and axes labels. Add renderers In this case, we use line() for our data, specifying visual customizations like colors, legends and widths. Ask Bokeh to show() or save() the results. These functions save the plot to an HTML file and optionally display it in a browser. Steps three and four can be repeated to create more than one plot, as shown in some of the examples below.

The bokeh.plotting interface is also quite handy if we need to customize the output a bit more by adding more data series, glyphs, logarithmic axis, and so on. It’s also possible to easily combine multiple glyphs together on one plot as shown below:

from bokeh.plotting import figure, output_file, show

# prepare some data
x = [0.1, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0]
y0 = [i**2 for i in x]
y1 = [10**i for i in x]
y2 = [10**(i**2) for i in x]



# create a new plot
p = figure(
   tools="pan,box_zoom,reset,save",
   y_axis_type="log", y_range=[0.001, 10**11], title="log axis example",
   x_axis_label='sections', y_axis_label='particles'
)

# add some renderers
p.line(x, x, legend="y=x")
p.circle(x, x, legend="y=x", fill_color="white", size=8)
p.line(x, y0, legend="y=x^2", line_width=3)
p.line(x, y1, legend="y=10^x", line_color="red")
p.circle(x, y1, legend="y=10^x", fill_color="red", line_color="red", size=6)
p.line(x, y2, legend="y=10^x^2", line_color="orange", line_dash="4 4")

# show the results
show(p)

在这里插入图片描述

# bokeh basics
# Create a blank figure with labels
p = figure(plot_width = 600, plot_height = 600, 
           title = 'Example Glyphs',
           x_axis_label = 'X', y_axis_label = 'Y')

# Example data
squares_x = [1, 3, 4, 5, 8]
squares_y = [8, 7, 3, 1, 10]
circles_x = [9, 12, 4, 3, 15]
circles_y = [8, 4, 11, 6, 10]

# Add squares glyph
p.square(squares_x, squares_y, size = 12, color = 'navy', alpha = 0.6)
# Add circle glyph
p.circle(circles_x, circles_y, size = 12, color = 'red')

# Set to output the plot in the notebook
output_notebook()
# Show the plot
show(p)

在这里插入图片描述

8- NetworkX

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.geeksforgeeks

import sys
import matplotlib.pyplot as plt
import networkx as nx
G = nx.grid_2d_graph(5, 5)  # 5x5 grid

# print the adjacency list
for line in nx.generate_adjlist(G):
    print(line)
# write edgelist to grid.edgelist
nx.write_edgelist(G, path="grid.edgelist", delimiter=":")
# read edgelist from grid.edgelist
H = nx.read_edgelist(path="grid.edgelist", delimiter=":")
nx.draw(H)
plt.show()

在这里插入图片描述

from ipywidgets import interact
%matplotlib inline
import matplotlib.pyplot as plt
import networkx as nx
# wrap a few graph generation functions so they have the same signature
def random_lobster(n, m, k, p):
    return nx.random_lobster(n, p, p / m)

def powerlaw_cluster(n, m, k, p):
    return nx.powerlaw_cluster_graph(n, m, p)

def erdos_renyi(n, m, k, p):
    return nx.erdos_renyi_graph(n, p)

def newman_watts_strogatz(n, m, k, p):
    return nx.newman_watts_strogatz_graph(n, k, p)

def plot_random_graph(n, m, k, p, generator):
    g = generator(n, m, k, p)
    nx.draw(g)
    plt.show()
    
interact(plot_random_graph, n=(2,30), m=(1,10), k=(1,10), p=(0.0, 1.0, 0.001),
         generator={
             'lobster': random_lobster,
             'power law': powerlaw_cluster,
             'Newman-Watts-Strogatz': newman_watts_strogatz,
             u'Erdős-Rényi': erdos_renyi,
         });

在这里插入图片描述

9- Read more

you can start to learn and review your knowledge about ML with a perfect dataset and try to learn and memorize the workflow for your journey in Data science world with read more sources, here I want to give some courses, e-books and cheatsheet:

9-1 Courses

There are a lot of online courses that can help you develop your knowledge, here I have just listed some of them:

Machine Learning Certification by Stanford University (Coursera)

Machine Learning A-Z™: Hands-On Python & R In Data Science (Udemy)

Deep Learning Certification by Andrew Ng from deeplearning.ai (Coursera)

[Python for Data Science and Machine Learning Bootcamp (Udemy))](https://www.kaggleusercontent.com/kf/10851880/eyJhbGciOiJkaXIiLCJlbmMiOiJBMTI4Q0JDLUhTMjU2In0..LcBVU9_734JYZZbrREZ53A.ne0i3Bu6ItQaAGYPyngOrJ0laByN0Z7sXSblI3UvTe5dl3g4BtjNlBehxRguQPpu4SR8E-HZYKcjyj4nwi15JzmVB6QXP0guJFqXLPstwGICWva08XGNZnO1dLoe0QWZ8EJ1nBiaaCtXmb9D38fq0KnakVIc1QIECaU87qcRgkfohhWtA-UzcBKZqzG5Jsim.x0yTxbh_Q57Gc_TfvLSCWA/Python for Data Science and Machine Learning Bootcamp (Udemy)

Mathematics for Machine Learning by Imperial College London

Deep Learning A-Z™: Hands-On Artificial Neural Networks

Complete Guide to TensorFlow for Deep Learning Tutorial with Python

Data Science and Machine Learning Tutorial with Python – Hands On

Machine Learning Certification by University of Washington

Data Science and Machine Learning Bootcamp with R

Creative Applications of Deep Learning with TensorFlow
Neural Networks for Machine Learning
Practical Deep Learning For Coders, Part 1
Machine Learning

9-2 Ebooks

So you love reading , here is 10 free machine learning books

Probability and Statistics for Programmers
Bayesian Reasoning and Machine Learning
An Introduction to Statistical Learning
Understanding Machine Learning
A Programmer’s Guide to Data Mining
Mining of Massive Datasets
A Brief Introduction to Neural Networks
Deep Learning
Natural Language Processing with Python
Machine Learning Yearning

10- conclusion

Some of the other popular data visualisation libraries in Python are

Bokeh
Geoplotlib
Gleam
Missingno
Dash
Leather Python gives a lot of options to visualise data, it is important to identify the method best suited to your needs, from basic plotting to sophisticated and complicated statistical charts, and others. It many also depend on functionalities such as generating vector and interactive files to flexibility offered by these tools.
This kernel it is not completed yet! Following up!

11- References

Coursera
GitHub
analyticsindiamag
primeoncology
10 Useful Python Data Visualization Libraries for Any Discipline
PythonDataScienceHandbook
Python Data Science Handbook by Jake VanderPlas
datacamp
tutorialspoint
towardsdatascience
pydata
plot.ly Go to top

数据下载

参考

Adam婷

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
打赏
0
评论
五大数据可视化库教程 ---Adam Studio

Top 5 Data Visualization Libraries TutorialNotebook ContentIntroductionLoading PackagesversionSetupData CollectionData Visualization LibrariesMatplotlibScatterplotsLine PlotsBar ChartsHi...
复制链接

扫一扫