我如何以及为何使用Plotly（而不是D3）来可视化我的Lollapalooza数据

最新推荐文章于 2022-07-14 14:53:26 发布

cumian9828

最新推荐文章于 2022-07-14 14:53:26 发布

阅读量782

点赞数

文章标签：可视化 python java 数据可视化大数据

原文链接：https://www.freecodecamp.org/news/how-and-why-i-used-plotly-instead-of-d3-to-visualize-my-lollapalooza-data-d48345e2ca68/

版权

by Déborah Mesquita

由DéborahMesquita

我如何以及为何使用Plotly(而不是D3)来可视化我的Lollapalooza数据 (How and why I used Plotly (instead of D3) to visualize my Lollapalooza data)

D3.js is an awesome JavaScript library, but it has a very steep learning curve. This makes the task of building a valuable visualization something that can take a lot of effort. This extra effort is ok if your goal is to make new and creative data visualizations, but often that is not the case.

D3.js是一个很棒JavaScript库，但是学习曲线非常陡峭。这使得构建有价值的可视化任务变得很费力。如果您的目标是进行新的和有创意的数据可视化，则可以做出额外的努力，但是通常并非如此。

Often times, your goal might just be to build an interactive visualization with some well-known charts. And if you're not a front-end engineer, this can become a little tricky.

通常，您的目标可能只是建立一些知名图表的交互式可视化效果 。而且，如果您不是前端工程师，这可能会有些棘手。

As data scientists, one of our main tasks is data manipulation. Today the main tool I use for that is Pandas (Python). What if I tell you that you can build some beautiful and interactive charts for the web right from your Pandas dataframes? Well, you can! We can use Plotly for that.

作为数据科学家，我们的主要任务之一是数据操纵。今天，我为此使用的主要工具是Pandas (Python)。如果我告诉你什么，你 可以建立一些美丽和交互式图表从你的熊猫dataframes网络吗 ？好吧， 你可以！ 我们可以使用Plotly 。

For the record, there are also Plotly API Libraries for Matlab, R and JavaScript, but we’ll stick with the Python library here.

作为记录，还有用于Matlab，R和JavaScript的Plotly API库，但是在这里我们将继续使用Python库。

To be fair, Plotly is built on top of d3.js (and stack.gl). The main difference between D3 and Plotly is that Plotly is specifically a charting library.

公平地说，Plotly构建在d3.js(和stack.gl)之上。 D3和Plotly之间的主要区别在于Plotly是专门的图表库 。

Let's build a bar chart to get to know how Plotly works.

让我们建立一个条形图以了解Plotly的工作原理。

用plotly构建条形图 (Building a bar chart with plotly)

There are 3 main concepts in Plotly’s philosophy:

Plotly的哲学有3个主要概念：

Data
数据
Layout
布局
Figure
数字

数据 (Data)

The Data object defines what we want to display in the chart (that is, the data). We define a collection of data and the specifications to display them as a trace. A Data object can have many traces. Think of a line chart with two lines representing two different categories: each line is a trace.

Data对象定义我们要在图表中显示的内容(即数据)。我们定义了数据收集和规范以将其显示为跟踪。一个数据对象可以有很多痕迹。考虑一个折线图，其中有两条线代表两个不同的类别：每条线都是一条迹线。

布局 (Layout)

The Layout object defines features that are not related to data (like title, axis titles, and so on). We can also use the Layout to add annotations and shapes to the chart.

布局对象定义了与数据不相关的功能(例如标题，轴标题等)。我们还可以使用Layout向图表添加注释和形状。

数字 (Figure)

The Figure object creates the final object to be plotted. It's an object that contains both data and layout.

Figure对象创建要绘制的最终对象。这是一个既包含数据又包含布局的对象。

Plotly visualizations are built with plotly.js. This means that the Python API is just a package to interact with the plotly.js library. The plotly.graph_objs module contains the functions that will generate graph objects for us.

Plotly可视化是使用plotly.js构建的。这意味着Python API只是一个与plotly.js库进行交互的包 。 plotly.graph_objs模块包含将为我们生成图形对象的函数。

Ok, now we a ready to build a bar chart:

好的，现在我们准备建立条形图：

import plotly.graph_objs as goimport pandas as pdimport plotly.offline as offline

df = pd.read_csv("data.csv")

df_purchases_by_type = df.pivot_table(    index = "place",     columns = "date",     values = "price",     aggfunc = "sum"    ).fillna(0)

trace_microbar = go.Bar(    x = df_purchases_by_type.columns,     y = df_purchases_by_type.loc["MICROBAR"])

data = [trace_microbar]

layout = go.Layout(title = "Purchases by place", showlegend = True)

figure = go.Figure(data = data, layout = layout)

offline.plot(figure)

Note: in this article we’ll not talk about what I’m doing with the dataframes. But if you would like a post about that, let me know in the comments ?

注意：在本文中，我们不会谈论我在处理数据框。 但是，如果您想发表有关此事的帖子，请在评论中告诉我？

Okay, so first we want to show the bars of one category (a place called "MICROBAR"). So we create a data object (a list) with go.Bar() (a trace) specifying the data for the x and y axes. Trace is a dictionary and data is a list of dictionaries. Here is the trace_microbar contents (notice the type key):

好的，所以我们首先要显示一个类别的条形图(一个称为"MICROBAR"的地方)。因此，我们使用go.Bar() (一条轨迹)创建了一个数据对象(一个列表)，为x和y轴指定了数据。跟踪是一个字典，数据是一个字典列表。这是trace_microbar内容(请注意类型键)：

{'type': 'bar',  'x': Index(['23/03/2018', '24/03/2018', '25/03/2018'], dtype='object', name='date'),   'y': date  23/03/2018     0.0  24/03/2018     0.0  25/03/2018    56.0  Name: MICROBAR, dtype: float64}

In the Layout object, we set the title of the chart and the showlegend parameter. Then we wrap Data and Layout in a figure and call plotly.offline.plot() to display the chart. Plotly has different options for displaying the charts, but let’s stick with the offline option here. This will open a browser window with our chart.

在Layout对象中，我们设置图表标题和showlegend参数。然后，将数据和布局包装在图中，并调用plotly.offline.plot()以显示图表。 Plotly有不同的选项来显示图表，但在这里让我们坚持使用离线选项。这将打开包含我们图表的浏览器窗口。

I want to display everything in a stacked bar chart, so we’ll create a data list with all the traces (places) we want to display and set the barmode parameter to stack.

我想在堆叠的条形图中显示所有内容，因此我们将创建一个包含所有要显示的迹线(位置)的数据列表，并将barmode参数设置为stack 。

import plotly.graph_objs as goimport pandas as pdimport plotly.offline as offline

df = pd.read_csv("data.csv")

df_purchases_by_place = df.pivot_table(index="place",columns="date",values="price",aggfunc="sum").fillna(0)

data = []

for index,place in df_purchases_by_place.iterrows():    trace = go.Bar(        x = df_purchases_by_place.columns,         y = place, name=index    )    data.append(trace)

layout = go.Layout(          title="Purchases by place",           showlegend=True,           barmode="stack"        )

figure = go.Figure(data=data, layout=layout)

offline.plot(figure)

And that’s the basics of Plotly. To customize our charts, we set different parameters for traces and the layout. Now let’s go ahead and talk about the Lollapalooza visualization.

这就是Plotly的基础。为了自定义图表，我们为迹线和布局设置了不同的参数。现在，让我们继续讨论Lollapalooza可视化。

我的Lollapalooza经验 (My Lollapalooza experience)

For the 2018 edition of Lollapalooza Brazil, all purchases were made through an RFID-enabled wristband. They send the data to your email address, so I decided to take a look at it. What can we learn about me and my experience by analyzing the purchases I made at the festival?

对于2018年的Lollapalooza Brazil，所有购买均通过支持RFID的腕带进行。他们将数据发送到您的电子邮件地址，所以我决定看看它。 通过分析我在音乐节上所作的购买，我们可以了解到我什么以及我的经历？

This is how the data looks:

数据如下所示：

purchase date
购买日期
purchase hour
购买时间
product
产品
quantity
数量
stage
阶段
place where I did the purchase
我购买的地方

Based on this data, let’s answer some questions.

根据这些数据，让我们回答一些问题。

节日期间我去了哪里？ (Where did I go during the festival?)

The data only tells us the name of the location where I made the purchase, and the festival took place at Autódromo de Interlagos. I took the map with the stages from here and used the georeferencer tool from georeference.com to get the latitude and longitude coordinates for the stages.

数据仅告诉我们购买地点的名称，音乐节在Autódromode Interlagos举行。我从此处获取了具有各个阶段的地图，并使用了来自georeference.com的georeferencer工具来获取各个阶段的纬度和经度坐标。

We need to display a map and the markers for each purchase, so we will use Mapbox and the scattermapbox trace. First let’s plot only the stages to see how this works:

我们需要为每次购买显示一个地图和标记，因此我们将使用Mapbox和scattermapbox跟踪。首先，让我们仅绘制阶段以了解其工作原理：

import plotly.graph_objs as goimport plotly.offline as offlineimport pandas as pd

mapbox_token = "" #https://www.mapbox.com/help/define-access-token/

df = pd.read_csv("stages.csv")

trace = go.Scattermapbox(    lat = df["latitude"],     lon = df["longitude"],     text=df["stage"],     marker=go.Marker(size=10),     mode="markers+text",     textposition="top"  )

data = [trace]

layout = go.Layout(          mapbox=dict(            accesstoken=mapbox_token,             center=dict(              lat = -23.701057,              lon = -46.6970635             ),             zoom=14.5          )         )

figure = go.Figure(data = data, layout = layout)

offline.plot(figure)

Let’s learn a new Layout parameter: updatemenus. We will use this to display the markers by date. There are four possible update methods:

让我们学习一个新的Layout参数： updatemenus 。我们将使用它来按日期显示标记。有四种可能的更新方法：

"restyle": modify data or data attributes
"restyle" ：修改数据或数据属性
"relayout": modify layout attributes
"relayout" ：修改布局属性
"update": modify data and layout attributes
"update" ：修改数据和布局属性
"animate": start or pause an animation)
"animate" ：开始或暂停动画 )

To update the markers, we only need to modify the data, so we will use the "restyle" method. When restyling you can set the changes for each trace or for all traces. Here we set each trace to be visible only when the user changes the dropdown menu option:

要更新标记，我们只需要修改数据，因此我们将使用"restyle"方法。重新设置样式时，可以为每个迹线或所有迹线设置更改。在这里，我们将每个跟踪设置为仅在用户更改下拉菜单选项时可见：

import plotly.graph_objs as goimport plotly.offline as offlineimport pandas as pdimport numpy as np

mapbox_token = ""

df = pd.read_csv("data.csv")

df_markers = df.groupby(["latitude","longitude","date"]).agg(dict(product = lambda x: "%s" % ", ".join(x), hour = lambda x: "%s" % ", ".join(x)))df_markers.reset_index(inplace=True)

data = []update_buttons = []

dates = np.unique(df_markers["date"])

for i,date in enumerate(dates):    df_markers_date = df_markers[df_markers["date"] == date]    trace = go.Scattermapbox(               lat = df_markers_date["latitude"],               lon = df_markers_date["longitude"],               name = date, text=df_markers_date["product"]+"<br>"+df_markers_date["hour"],               visible=False            )    data.append(trace)

visible_traces = np.full(len(dates), False)    visible_traces[i] = True

button = dict(               label=date,                method="restyle",                args=[dict(visible = visible_traces)]             )    update_buttons.append(button)

updatemenus = [dict(active=-1, buttons = update_buttons)]

layout = go.Layout(            mapbox=dict(              accesstoken=mapbox_token,               center=dict(                  lat = -23.701057,                  lon = -46.6970635),                   zoom=14.5),               updatemenus=updatemenus           )

figure = go.Figure(data = data, layout = layout)

offline.plot(figure)

我如何花钱？ (How did I spend my money?)

To answer that, I created a bar chart with my spendings for food and beverage by each day and built a heatmap to show when I bought stuff. We already saw how to build a bar chart, so now let’s build a heatmap chart:

为了回答这个问题，我创建了一个条形图，其中包含我每天在食品和饮料上的支出，并建立了一个热图来显示我何时购买东西。我们已经了解了如何构建条形图，因此现在让我们构建一个热图图表：

import plotly.graph_objs as goimport pandas as pdimport plotly.offline as offline

df = pd.read_csv("data.csv")

df_purchases_by_type = df.pivot_table(index="place",columns="date",values="price",aggfunc="sum").fillna(0)df["hour_int"] = pd.to_datetime(df["hour"], format="%H:%M", errors='coerce').apply(lambda x: int(x.hour))

df_heatmap = df.pivot_table(index="date",values="price",columns="hour", aggfunc="sum").fillna(0)

trace_heatmap = go.Heatmap(                 x = df_heatmap.columns,                  y = df_heatmap.index,                  z = [df_heatmap.iloc[0], df_heatmap.iloc[1], df_heatmap.iloc[2]]                )

data = [trace_heatmap]

layout = go.Layout(title="Purchases by place", showlegend=True)

figure = go.Figure(data=data, layout=layout)

offline.plot(figure)

我看了哪些音乐会？ (Which concerts did I watch?)

Now let’s go to the coolest part: could I guess the concerts I attended based only on my purchases?

现在，让我们进入最酷的部分：我是否可以仅根据购买的商品猜猜参加的音乐会？

Ideally, when we are watching a show, we are watching the show (and not buying stuff), so the purchases should be made before or after each concert. I then made a list of each concert happening one hour before, one hour after, and according to the time the purchase was made.

理想情况下，当我们观看表演时，我们正在观看表演(而不是购买东西)，因此应在每次音乐会之前或之后进行购买。然后，我列出了每场演唱会的清单，这些清单发生在演出前一小时，一小时后以及根据购买时间而定。

To find out which one of these shows I attended, I calculated the distance from the location of the purchase to each stage. The shows I attended should be the ones with the shortest distance to the concessions.

为了弄清楚我参加了其中哪一场演出，我计算了从购买地点到每个阶段的距离。我参加的表演应该是距离特许权最短的表演。

As we want to show each data point, the best choice for a visualization is a table. Let’s build one:

当我们要显示每个数据点时，可视化的最佳选择是表格。让我们建立一个：

import plotly.graph_objs as goimport plotly.offline as offlineimport pandas as pd

df_table = pd.read_csv("concerts_I_attended.csv")

def colorFont(x):    if x == "Yes":       return "rgb(0,0,9)"    else:       return "rgb(178,178,178)"

df_table["color"] = df_table["correct"].apply(lambda x: colorFont(x))

trace_table = go.Table(      header=dict(          values=["Concert","Date","Correct?"],          fill=dict(            color=("rgb(82,187,47)"))          ),          cells=dict(          values= [df_table.concert,df_table.date,df_table.correct],          font=dict(color=([df_table.color])))      )

data = [trace_table]

figure = go.Figure(data = data)

offline.plot(figure)

Three concerts were missing and four were incorrect, giving us a precision of 67% and recall of 72%.

三场音乐会失踪，四场不正确，使我们的命中率达到67％，召回率达到72％。

放在一起：破折号 (Putting it all together: dash)

We have all the charts, but the goal is to put them all together on a page. To do that we will use Dash (by Plotly).

我们拥有所有图表，但目标是将所有图表放在一起。为此，我们将使用Dash (按Plotly)。

“Dash is a Python framework for building analytical web applications. No JavaScript required. Dash is ideal for building data visualization apps with highly custom user interfaces in pure Python. It’s particularly suited for anyone who works with data in Python.” — Plotly’s site

“ Dash是用于构建分析Web应用程序的Python框架。无需JavaScript。 Dash是使用纯Python的高度自定义用户界面构建数据可视化应用程序的理想选择。它特别适合使用Python处理数据的任何人。” — Plotly的网站

Dash is written on top of Flask, Plotly.js, and React.js. It works in a very similar way to the way we create Plotly charts:

Dash是在Flask，Plotly.js和React.js之上编写的。它的工作方式与创建Plotly图表的方式非常相似：

import dashimport dash_core_components as dccimport dash_html_components as htmlimport plotly.graph_objs as goimport pandas as pd app = dash.Dash()

df_table = pd.read_csv("concerts_I_attended.csv").dropna(subset=["concert"])def colorFont(x):    if x == "Yes":       return "rgb(0,0,9)"    else:       return "rgb(178,178,178)"

df_table["color"] = df_table["correct"].apply(lambda x: colorFont(x))

trace_table = go.Table(header=dict(values=["Concert","Date","Correct?"],fill=dict(color=("rgb(82,187,47)"))),cells=dict(values=[df_table.concert,df_table.date,df_table.correct],font=dict(color=([df_table.color]))))

data_table = [trace_table]

app.layout = html.Div(children=[    html.Div(        [            dcc.Markdown(                """                ## My experience at Lollapalooza Brazil 2018                ***                """.replace('  ', ''),                className='eight columns offset-by-two'            )        ],        className='row',        style=dict(textAlign="center",marginBottom="15px")    ),

html.Div([        html.Div([            html.H5('Which concerts did I attend?', style=dict(textAlign="center")),            html.Div('People usually buy things before or after a concert, so I took the list of concerts, got the distances from the location of the purchases to the stages and tried to guess which concerts did I attend. 8 concerts were correct and 3 were missing from a total of 12 concerts.', style=dict(textAlign="center")),            dcc.Graph(id='table', figure=go.Figure(data=data_table,layout=go.Layout(margin=dict(t=30)))),        ], className="twelve columns"),    ], className="row")])

app.css.append_css({    'external_url': 'https://codepen.io/chriddyp/pen/bWLwgP.css'})

if __name__ == '__main__':    app.run_server(debug=True)

Cool right?

酷吧？

I hosted the final visualization here and the all the code is here.

我在这里主持了最终的可视化，所有代码都在这里。

There are some alternatives to hosting the visualizations: Dash has a public dash app hosting and Plotly also provides a web-service for hosting graphs.

托管可视化文件有一些替代方法：Dash具有公共破折号应用程序托管，而Plotly还提供了用于托管图形的Web服务。

Did you found this article helpful? I try my best to write a deep dive article each month, you can receive an email when I publish a new one.

您觉得这篇文章对您有帮助吗？我每个月都会尽力写一篇深入的文章，当我发布新文章时，您会收到一封电子邮件。

I had a pretty good experience with Plotly, I’ll definitely use it for my next project. What are your thoughts about it after this overview? And what other tools do you use to build visualizations for the web? Share them in the comments! And thank you for reading! ?

我在Plotly方面有很好的经验，我肯定会在下一个项目中使用它。概述之后，您对此有何看法？您还使用其他哪些工具来构建Web可视化？在评论中分享他们！并感谢您的阅读！？

翻译自: https://www.freecodecamp.org/news/how-and-why-i-used-plotly-instead-of-d3-to-visualize-my-lollapalooza-data-d48345e2ca68/

cumian9828

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
我如何以及为何使用Plotly（而不是D3）来可视化我的Lollapalooza数据

by Déborah Mesquita 由DéborahMesquita 我如何以及为何使用Plotly(而不是D3)来可视化我的Lollapalooza数据 (How and why I used Plotly (instead of D3) to visualize my Lollapalooza data)D3.js is an awesome JavaScript library, ...
复制链接

扫一扫