使用python和javascript进行数据可视化

Any data science or data analytics project can be generally described with the following steps:

通常可以通过以下步骤来描述任何数据科学或数据分析项目:

  1. Acquiring a business understanding & defining the goal of a project

    获得业务理解并定义项目目标
  2. Getting data

    获取数据
  3. Preprocessing and exploring data

    预处理和探索数据
  4. Improving data, e.g., by feature engineering

    改善数据,例如通过特征工程
  5. Visualizing data

    可视化数据
  6. Building a model

    建立模型
  7. Deploying the model

    部署模型
  8. Scoring its performance

    对其表现进行评分

This time, I would like to bring your attention to the data cleaning and exploration phase since it’s a step which value is hard to measure, but the impact it brings is difficult to overestimate. Insights gained during this stage can affect all further work.

这次,我想提请您注意数据清理和探索阶段,因为这是一个难以衡量的步骤,但很难估量其带来的影响。 在此阶段获得的见解会影响所有进一步的工作。

There are multiple ways you can start exploratory data analysis with:

您可以通过多种方式开始探索性数据分析:

  1. Load data and preprocess it: clean it from unnecessary artifacts, deal with missing values. Make your dataset comfortable to work with.

    加载数据并进行预处理:清除不必要的工件,处理缺失值。 使数据集易于使用。
  2. Visualize as much data as possible using different kinds of plots & a pivot table.

    使用不同种类的绘图和数据透视表,可视化尽可能多的数据。

目的 (Purpose)

In this tutorial, I would like to show how to prepare your data with Python and explore it using a JavaScript library for data visualization. To get the most value out of exploration, I recommend using interactive visualizations since they make exploring your data faster and more comfortable.

在本教程中,我想展示如何使用Python 准备数据并使用JavaScript库进行数据可视化探索。 为了从探索中获得最大价值,我建议使用交互式可视化,因为它们可以使您更快,更舒适地浏览数据。

Hence, we will present data in an interactive pivot table and pivot charts.

因此,我们将在交互式数据透视表数据透视图中显示数据。

Hopefully, this approach will help you facilitate the data analysis and visualization process in Jupyter Notebook.

希望这种方法将帮助您促进Jupyter Notebook中的数据分析和可视化过程。

设置环境 (Set up your environment)

Run your Jupyter Notebook and let’s start. If Jupyter is not installed on your machine, choose the way to get it.

运行Jupyter Notebook,开始吧。 如果您的计算机上未安装Jupyter,请选择获取方式

获取数据 (Get your data)

Choosing the data set to work with is the number one step.

选择要使用的数据集是第一步。

If your data is already cleaned and ready to be visualized, jump to the Visualization section.

如果您的数据已被清理并准备可视化,请跳至“ 可视化”部分。

For demonstration purposes, I’ve chosen the data for the prediction of Bike Sharing Demand. It’s provided as data for the Kaggle’s competition.

出于演示目的,我选择了用于预测“ 自行车共享需求”的数据 。 作为Kaggle比赛数据提供。

本教程的导入 (Imports for this tutorial)

Classically, we will use the “pandas” library to read data into a dataframe.

传统上,我们将使用“ pandas”库将数据读入数据框。

Additionally, we will need json and IPython.display modules. The former will help us serialize/deserialize data and the latter — render HTML in the cells.

此外,我们将需要jsonIPython.display模块。 前者将帮助我们对数据进行序列化/反序列化,而后者将在单元格中呈现HTML。

Here’s the full code sample with imports we need:

这是我们需要导入的完整代码示例:

from IPython.display import HTMLimport jsonimport pandas as pd

读取数据 (Read data)

df = pd.read_csv('train.csv')

df = pd.read_csv('train.csv')

清理和预处理数据 (Clean & preprocess data)

Before starting data visualization, it’s a good practice to see what’s going on in the data.

在开始数据可视化之前,最好先查看数据中发生了什么。

df.head()

df.head()

Image for post

df.info()

df.info()

Image for post

First, we should check the percentage of missing values.

首先,我们应该检查缺失值的百分比。

missing_percentage = df.isnull().sum() * 100 / len(df)

missing_percentage = df.isnull().sum() * 100 / len(df)

There are a lot of strategies to follow when dealing with missing data. Let me mention the main ones:

处理丢失的数据时,有许多策略可以遵循。 让我提到主要的:

  1. Dropping missing values. The only reason to follow this approach is when you need to quickly remove all NaNs from the data.

    删除缺失值。 遵循这种方法的唯一原因是当您需要快速从数据中删除所有NaN时。
  2. Replacing NaNs with values. This is called imputation. A common decision is to replace missing values with zeros or with a mean value.

    用值替换NaN。 这称为归因 。 常见的决定是用零或平均值替换缺失值。

Luckily, we don’t have any missing values in the dataset. But if your data has, I suggest you look into a quick guide with the pros and cons of different imputation techniques.

幸运的是,我们在数据集中没有任何缺失值。 但是,如果您有数据,建议您快速了解各种插补技术的优缺点

管理要素数据类型 (Manage features data types)

Let’s convert the type of “datetime”’ column from object to datetime:

让我们将“ datetime”列的类型从对象转换为datetime:

df['datetime'] = pd.to_datetime(df['datetime'])

df['datetime'] = pd.to_datetime(df['datetime'])

Now we are able to engineer new features based on this column, for example:

现在,我们可以根据此专栏设计新功能,例如:

  • a day of the week

    一周中的一天
  • a month

    一个月
  • an hour

    一小时
df['weekday'] = df['datetime'].dt.dayofweekdf['hour'] = df['datetime'].dt.hourdf['month'] = df['datetime'].dt.month

These features can be used further to figure out trends in rent.

这些功能可以进一步用于确定租金趋势。

Next, let’s convert string types to categorical:

接下来,让我们将字符串类型转换为分类类型:

categories = ['season', 'workingday', 'weekday', 'hour', 'month', 'weather', 'holiday']for category in categories:    df[category] = df[category].astype('category')

Read more about when to use the categorical data type here.

在此处阅读有关何时使用分类数据类型的更多信息。

Now, let’s make values of categorical more meaningful by replacing numbers with their categorical equivalents:

现在,通过将数字替换为对应的类别,使分类的值更有意义:

df['season'] = df['season'].replace([1, 2, 3, 4], ['spring', 'summer', 'fall', 'winter'])df['holiday'] = df['holiday'].replace([0, 1],['No', 'Yes'])

By doing so, it will be easier for us to interpret data visualization later on. We won’t need to look up the meaning of a category each time we need it.

这样,以后我们将更容易解释数据可视化。 我们不需要每次都需要查找类别的含义。

使用数据透视表和图表可视化数据 (Visualize data with a pivot table and charts)

Now that you cleaned the data, let’s visualize it.

现在您已经清理了数据,让我们对其可视化。

The data visualization type depends on the question you are asking.

数据可视化类型取决于您要询问的问题。

In this tutorial, we’ll be using:

在本教程中,我们将使用:

  • a pivot table for tabular data visualization

    用于表格数据可视化的数据透视表
  • a bar chart

    条形图

为数据透视表准备数据 (Prepare data for the pivot table)

Before loading data to the pivot table, convert the dataframe to an array of JSON objects. For this, use the to_json() function from the json module.

在将数据加载到数据透视表之前,将数据帧转换为JSON对象数组。 为此,请使用json模块中的to_json()函数。

The records orientation is needed to make sure the data is aligned according to the format the pivot table requires.

需要records方向,以确保数据根据数据透视表所需的格式对齐。

json_data = df.to_json(orient=”records”)

json_data = df.to_json(orient=”records”)

创建数据透视表 (Create a pivot table)

Next, define a pivot table object and feed it with the data. Note that the data has to be deserialized using the loads() function that decodes JSON:

接下来,定义数据透视表对象并向其提供数据。 请注意,必须使用可解码JSON的loads()函数对数据进行反序列化:

pivot_table = {
"container": "#pivot-container",
"componentFolder": "https://cdn.flexmonster.com/",
"toolbar": True,
"report": {
"dataSource": {
"type": "json",
"data": json.loads(json_data)
},
"slice": {
"rows": [{
"uniqueName": "weekday"
}],
"columns": [{
"uniqueName": "[Measures]"
}],
"measures": [{
"uniqueName": "count",
"aggregation": "median"
}],
"sorting": {
"column": {
"type": "desc",
"tuple": [],
"measure": {
"uniqueName": "count",
"aggregation": "median"
}
}
}
}
}
}

In the above pivot table initialization, we specified a simple report that consists of a slice (a set of fields visible on the grid), data source, options, formats, etc. We also specified a container where the pivot table should be rendered. The container will be defined a bit later.

在上述数据透视表初始化中,我们指定了一个简单的报告,该报告由一个切片(网格上可见的一组字段),数据源,选项,格式等组成。我们还指定了一个应在其中呈现数据透视表的容器。 稍后将定义容器。

Plus, here we can add a mapping object to prettify the field captions or set their data types. Using this object eliminates the need in modifying the data source.

另外,在这里我们可以添加一个映射对象来美化字段标题或设置其数据类型。 使用此对象消除了修改数据源的需要。

Next, convert the pivot table object to a JSON-formatted string to be able to pass it for rendering in the HTML layout:

接下来,将数据透视表对象转换为JSON格式的字符串,以便能够将其传递以在HTML布局中呈现:

pivot_json_object = json.dumps(pivot_table)

pivot_json_object = json.dumps(pivot_table)

定义仪表板布局 (Define a dashboard layout)

Define a function that renders the pivot table in the cell:

定义一个在单元格中呈现数据透视表的函数:

In this function, we call HTML() from the IPython.display module — it will render the layout enclosed into a multi-line string.

在此函数中,我们从IPython.display模块调用HTML() - 它会 将布局呈现为多行字符串。

Next, let’s call this function and pass to it the pivot table previously encoded into JSON:

接下来,让我们调用此函数并将之前编码为JSON的数据透视表传递给它:

render_pivot_table(pivot_json_object)

render_pivot_table(pivot_json_object)

Likewise, you can create and render as many data visualization components as you need. For example, interactive pivot charts that visualize aggregated data:

同样,您可以根据需要创建和呈现任意数量的数据可视化组件 。 例如,可视化聚合数据的交互式数据透视图

Image for post

下一步是什么 (What’s next)

Now that you embedded the pivot table into Jupyter, it’s time to start exploring your data:

现在,您已将数据透视表嵌入Jupyter中,是时候开始探索数据了:

  • drag and drop fields to rows, columns, and measures of the pivot table

    将字段拖放到数据透视表的行,列和度量

  • set Excel-like filtering

    设置类似Excel的过滤

  • highlight important values with conditional formatting

    使用条件格式突出显示重要的值

At any moment, you can save your results to a JSON or PDF/Excel/HTML report.

您随时可以将结果保存到JSONPDF / Excel / HTML报告中。

例子 (Examples)

Here is how you can try identifying trends on bikes usage depending on the day of the week:

您可以按照以下方式尝试确定自行车使用情况的趋势,具体取决于星期几:

Image for post

You can also figure out if any weather conditions affect the number of rents by registered and unregistered users:

您还可以确定是否有任何天气情况影响注册和未注册用户的租金数量:

Image for post

To dig deeper into the data, drill through aggregated values by double-clicking and see the raw records they are composed of:

要通过双击深入挖掘数据, 追溯汇总值,看看它们是由原始的记录:

Image for post

Or simply switch to the pivot charts mode and give your data an even more comprehensible look:

或者,只需切换到数据透视图模式,即可使您的数据看起来更清晰:

Image for post

汇集全部 (Bringing it all together)

By completing this tutorial, you learned a new way to interactively explore your multi-dimensional data in Jupyter Notebook using Python and the JavaScript data visualization library. I hope this will make your exploration process more insightful than before.

通过完成本教程,您学习了一种使用Python和JavaScript数据可视化库在Jupyter Notebook中交互式浏览多维数据的新方法。 我希望这将使您的探索过程比以往更有见识。

有用的链接 (Useful links)

翻译自: https://medium.com/python-in-plain-english/data-visualization-with-python-and-javascript-c1c28a7212b2

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Learn how to turn raw data into rich, interactive web visualizations with the powerful combination of Python and JavaScript. With this hands-on guide, author Kyran Dale teaches you how build a basic dataviz toolchain with best-of-breed Python and JavaScript libraries—including Scrapy, Matplotlib, Pandas, Flask, and D3—for crafting engaging, browser-based visualizations. As a working example, throughout the book Dale walks you through transforming Wikipedia’s table-based list of Nobel Prize winners into an interactive visualization. You’ll examine steps along the entire toolchain, from scraping, cleaning, exploring, and delivering data to building the visualization with JavaScript’s D3 library. If you’re ready to create your own web-based data visualizations—and know either Python or JavaScript— this is the book for you. Learn how to manipulate data with Python Understand the commonalities between Python and JavaScript Extract information from websites by using Python’s web-scraping tools, BeautifulSoup and Scrapy Clean and explore data with Python’s Pandas, Matplotlib, and Numpy libraries Serve data and create RESTful web APIs with Python’s Flask framework Create engaging, interactive web visualizations with JavaScript’s D3 library Table of Contents Chapter 1 Development Setup Chapter 2 A Language-Learning Bridge Between Python and JavaScript Chapter 3 Reading and Writing Data with Python Chapter 4 Webdev 101 Chapter 5 Getting Data off the Web with Python Chapter 6 Development Setup Chapter 7 A Language-Learning Bridge Between Python and JavaScript Chapter 8 Reading and Writing Data with Python Chapter 9 Webdev 101 Chapter 10 Getting Data off the Web with Python Chapter 11 Heavyweight Scraping with Scrapy Chapter 12 Introduction to NumPy Chapter 13 Introduction to Pandas Chapter 14 Cleaning Data with Pandas Chapter 15 Visualizing Data with Matplotlib Chapter 16 Exploring Data with Pandas Chapter 17 Delivering the Data Chapter 18 RESTful Data with Flask Chapter 19 Imagining a Nobel Visualization Chapter 20 Building a Visualization Chapter 21 Introducing D3—The Story of a Bar Chart Chapter 22 Visualizing Individual Prizes Chapter 23 Mapping with D3 Chapter 24 Visualizing Individual Winners Chapter 25 The Menu Bar Chapter 26 Conclusion

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值