数据可视化 信息可视化_更好的数据可视化的8个技巧

数据可视化 信息可视化

Ggplot is R’s premier data visualization package. Its popularity can likely be attributed to its ease of use — with just a few lines of code you are able to produce great visualizations. This is especially great for beginners who are just beginning their journey into R, as it’s very encouraging that you can create something visual with just two lines of code:

G gplot是R的首要数据可视化软件包。 它的受欢迎程度可能归因于它的易用性-只需几行代码,您就可以产生出色的可视化效果。 对于刚开始使用R的初学者来说,这尤其有用,因为您可以仅用两行代码就可以创建视觉效果,这非常令人鼓舞:

ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point()
Image for post
Output of above code
上面代码的输出

In this article, I want to highlight ggplot’s flexibility and customizability. Alternatives such as Matplotlib and Seaborn (both Python) and Excel are also easy to use, but they are less customizable. In this article, I’ll walk through 8 concrete steps you can do to improve your ggplot.

在本文中,我想强调ggplot的灵活性和可定制性。 Matplotlib和Seaborn(都为Python)和Excel等替代方案也易于使用,但可定制性较低。 在本文中,我将逐步完成改善ggplot的8个具体步骤。

In order to make sure that the advice in this article is practical, I’m going to abide by two themes:

为了确保本文中的建议切实可行,我将遵循两个主题:

  1. Assume that the reader has some familiarity with ggplot: If you understood the chunk of code above you should be good. If you’re not familiar with ggplot, I’ll try to make the tips as language-agnostic as possible, so if you use base R, Python, or other visualization tools these tips may still be helpful.

    假设读者对ggplot有一定的了解:如果您理解上面的代码块,那就应该不错。 如果您不熟悉ggplot,我将尽量使这些技巧与语言无关,因此,如果您使用基本R,Python或其他可视化工具,则这些技巧可能仍然有用。

  2. Easy to follow along: If you want to run the example code yourself, all you need is R and tidyverse. No external datasets necessary as we will be using the diamonds dataset, which is included with ggplot.

    易于遵循:如果您想自己运行示例代码,则只需要R和tidyverse。 没有必要的外部数据集,我们将使用diamonds的数据集,其中包括与ggplot。

Image for post
Source: https://ggplot2.tidyverse.org/reference/diamonds.html
资料来源: https : //ggplot2.tidyverse.org/reference/diamonds.html

1.主题是您最好的朋友 (1. Themes are your best friend)

Themes control all non-data display and are an easy way to change the appearance of your graph. It only takes one extra line of code in order to do this, and ggplot already comes with 8 separate themes.

主题控制所有非数据显示,并且是更改图形外观的简便方法。 这样做只需要多一行代码 ,ggplot已经带有8个独立的主题。

ggplot(data = diamonds, aes(x = Sepal.Width, y = Sepal.Length)) + 
geom_point() + theme_bw()

The ggplot themes are simple. They won’t really stand out, but they look great, are easy to read, and get the point across. Also, if you want to use the same theme over and over, you can set a global theme with one line of code and it will apply to all ggplots.

ggplot主题很简单。 它们并不会真正脱颖而出,但是它们看起来很棒,易于阅读,而且很清楚 另外,如果您想一遍又一遍地使用同一主题,则可以用一行代码设置一个全局主题,它将应用于所有ggplots。

# Set global themetheme_set(theme_bw())

For full details on the 8 themes you can visit this link.

有关这8个主题的完整详细信息,您可以访问 此链接

Image for post
ggplot themes, compared
ggplot主题,比较

Themes are also super customizable. Beyond the 8 themes that come with ggplot, you can also make your own theme but more importantly use themes that others have created already. In the few companies I’ve worked at, we’ve all had internal ggplot themes. For example, I helped create theme_fb() at Facebook which with input from designers at the company.

主题也是超级可定制的。 除了ggplot随附的8个主题外,您还可以创建自己的主题,但更重要的是使用其他人已经创建的主题。 在我工作过的几家公司中,我们都有内部ggplot主题。 例如,我在Facebook上帮助创建了theme_fb() ,并得到了公司设计师的帮助。

If you wanted to use some other themes outside of ggplot, the most popular package is ggthemes which has some interesting options such as theme_fivethirtyeight(), theme_wsj(), and theme_economist(). A sample of these themes are below, but I definitely recommend checking out this link to learn more.

如果要在ggplot之外使用其他主题,最受欢迎的软件包是ggthemes,它具有一些有趣的选项,例如theme_fivethirtyeight(), theme_wsj(), and theme_economist() 。 以下是这些主题的样本,但我绝对建议您查看此链接 了解更多。

Image for post
ggthemes, compared
ggthemes,比较

2.方面是超级大国 (2. Facets are a superpower)

When visualizing data, one thing you always want to think about is how many dimensions of data you want to display. A majority of graphs will typically only need 1–2 dimensions of data to get a point across, for example:

可视化数据时,您始终要考虑的一件事就是要显示多少个维度。 大多数图形通常只需要1-2维的数据即可得出一个点,例如:

  • Height x weight of basketball players on a scatter plot

    散点图中身高x篮球运动员的体重
  • Heights of players on the Los Angeles Lakers on a bar graph

    条形图上洛杉矶湖人队的球员身高

As you increase the number of dimensions, a single graph is going to get more cluttered, which makes it harder to get a point across. For example:

随着维数的增加,单个图形将变得更加混乱,这将使跨点分布变得更加困难。 例如:

  • Height x weight of basketball players on a scatter plot, but different color dots for each of the 30 teams. This will be hard to read because you’ll need 30 separate colors to represent the different teams, and a legend to list out all of the team names.

    散点图中篮球运动员的身高x体重, 但30支球队中的每支球队都有不同的色点 。 这将很难理解,因为您将需要30种不同的颜色来代表不同的团队,并使用图例列出所有团队名称。

This is where the magic of faceting shines. What if we don’t have to limit ourselves to one graph? My hypothesis for why a lot of us think this way is because we are used to visualizing data in Excel, where we are constrained to a single graph. In ggplot, we can break this mode of thinking, and all it takes is a single line of code to do so. Facets allow us to easily add up to two additional dimensions our visualizations.

这是刻面魔术的光芒。 如果我们不必将自己限制在一张图中怎么办? 我的假设是,为什么我们很多人会这样认为,是因为我们习惯于在Excel中可视化数据,而我们只能将其约束为单个图形。 在ggplot中,我们可以打破这种思维方式,只需要一行代码即可做到。 构面使我们可以轻松地将可视化效果添加到另外两个维度。

Let’s explore how we can use facets to visualize diamonds data.

让我们探讨如何使用刻面来可视化钻石数据。

At a basic level, we can view the relationship between the carat of a diamond and its price, which is the main purpose of this dataset:

在基本层面上,我们可以查看钻石的克拉与其价格之间的关系,这是此数据集的主要目的:

ex2 <- 
diamonds %>%
sample_n(5000) %>%
ggplot(aes(x = carat, y = price)) +
geom_point()ex2
Image for post

This graph only shows two dimensions of data, but there are a few others are also important. The cut, color, and clarity of the diamond — all of these could are related to the price of the diamond. One way to bring these dimensions in is to have the dots be different colors or use different dot shapes, but let’s give faceting a try instead.

该图仅显示了数据的两个维度,但其他一些维度也很重要。 钻石的切割,颜色和净度-所有这些都可能与钻石的价格有关。 引入这些尺寸的一种方法是使点具有不同的颜色或使用不同的点形状,但让我们尝试一下。

Use facet_wrap() if you only want to break out the graph by a single dimension:

如果只想按一个分解图,则使用facet_wrap()

ex2 +
facet_wrap(~cut)
Image for post

Use facet_grid() if you want to break out the graph by two dimensions:

如果要按 二维分解图,请使用facet_grid()

ex2 +
facet_grid(color~cut)
Image for post

These are just two examples of how you can use facet_wrap() and facet_grid() , but the key takeaway from this section is that with ggplot, you are not constrained to thinking about visualizations in a single graph.

这些只是如何使用facet_wrap()facet_grid()两个示例,但是本节的重点是,对于ggplot, 您不必考虑在单个图形中考虑可视化。

3.颜色! (3. Colors!)

Colors serve two key purposes in data visualization:

颜色在数据可视化中有两个主要目的:

  1. Makes a visualization more appealing

    使可视化更具吸引力
  2. Represents an additional dimension of data

    代表数据的附加维度

There are many ways to color your ggplots, but for simplicity this section focuses on Viridis palettes, which are my personal favorite as they are:

有多种方法可以为ggplots着色,但为简单起见,本节重点介绍Viridis调色板 ,因为它们是我个人最喜欢的调色板

  • Colorful: spanning as wide a palette as possible so as to make differences easy to see

    色彩丰富:尽可能广泛地显示调色板,以使差异易于看到

  • Perceptually uniform: meaning that values close to each other have similar-appearing colors and values far away from each other have more different-appearing colors, consistently across the range of values

    感知上一致:表示彼此接近的值具有相似的外观颜色,而彼此远离的值具有更多不同的外观颜色,并且在值的范围内保持一致

  • Robust to colorblindness: so that the above properties hold true for people with common forms of colorblindness, as well as in grey scale printing

    色盲的鲁棒性:因此上述属性对于具有常见色盲形式的人以及在灰度打印中适用

Image for post
Source: https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html
资料来源: https : //cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html

You can read more theory behind the colors above here, but this section focuses on 4 key functions which allows you to use these colors:

你可以阅读更多的理论上面的颜色背后这里 但是这部分的重点,它允许您使用这些颜色4个关键功能:

  • scale_color_viridis_d() & scale_fill_viridis_d() : Add this statement to your ggplot in order to color / fill your graph on a discrete/categorical variable. (Notice the “d” at the end of the function)

    scale_color_viridis_d()scale_fill_viridis_d() :将此语句添加到ggplot中,以便为离散/类别变量上的图形着色/填充。 (请注意函数末尾的“ d”)

  • scale_color_viridis_c() & scale_fill_viridis_c() : Add this statement to your ggplot in order to color / fill your graph on a continuous variable. (Notice the “c” at the end of the function)

    scale_color_viridis_c()scale_fill_viridis_c() :将此语句添加到ggplot中,以便在连续变量上为图形着色/填充。 (请注意函数末尾的“ c”)

# Discrete
ggplot(data = diamonds %>% sample_n(1e3),
aes(x = carat, y = price,
color = cut)) +
geom_point() + scale_color_viridis_d()
Image for post
# Continuous
ggplot(data = diamonds %>% sample_n(1e3),
aes(x = carat, y = price,
color = depth)) +
geom_point() + scale_color_viridis_c(option = 'A')

Protip: Here, I’m using the option parameter to change the color palette within viridis. You can switch between options A-E which reflect the different color schemes in above.

Protip:在这里,我正在使用option参数来更改viridis中的调色板。 您可以在反映上述不同配色方案的选项AE之间进行切换。

Image for post

4.颜色与填充:了解差异 (4. Color vs. fill: Know the difference)

I introduced this in the last section, but I wanted to address it more explicitly because it can be confusing when you first use ggplot. To color a ggplot, you’ll either use color or fill , and this depends on the graph type.

我在上一节中对此进行了介绍,但是我想更明确地解决它,因为当您第一次使用ggplot时可能会造成混淆。 要为ggplot着色,您可以使用colorfill ,这取决于图形类型。

So what's the difference? Generally, fill defines the color with which a geom is filled (i.e. geom_bar()), whereas color defines the color with which a geom is outlined (i.e. geom_point()).

那有什么区别呢? 通常, fill定义填充 geom_bar()的颜色(即geom_bar() ),而color定义geom_point() 轮廓颜色(即geom_point() )。

ggplot(data = diamonds, aes(x = price)) +
geom_histogram(color = 'blue',
fill = 'red')
Image for post

So the takeaway here is that if you try to color a graph and it appears that nothing has changed, simply switch color to fill or vice versa.

因此,这里的要点是,如果您尝试给图形着色,但似乎没有任何变化,只需将color切换为fill ,反之亦然。

Read more on StackOverflow

阅读更多关于StackOverflow

5.在上面贴上标签 (5. Put a label on it)

Good visualizations have concise and descriptive labels. They help readers understand what they are seeing, and this especially important if you expect your visualization to be shared. Luckily it’s super easy to label in ggplot.

良好的可视化效果应具有简洁和描述性的标签。 它们可以帮助读者理解他们所看到的内容,如果您希望共享可视化内容,这尤其重要。 幸运的是,在ggplot中标记非常容易。

Below are ggplot’s most useful labelling functionalities, listed in how often they should probably be used. You can pick and choose which labels you want to use — for example, if you only want to add a title, you only need to enter the title parameter in labs() .

以下是ggplot最有用的标记功能,列出了它们应该使用的频率。 您可以选择要使用的标签,例如,如果只想添加标题,则只需在labs()输入title参数。

ggplot(data = diamonds %>% sample_n(1e3), 
aes(x = carat, y = price, color = cut)) +
geom_point() + labs(title = 'title example',
x = 'x-axis example',
y = 'y-axis example',
color = 'color example',
subtitle = 'subtitle example',
caption = 'caption example',
tag = 'tag example')

Note: The color field is only accessible if you have a color as an aesthetic in your ggplot. This labelling method will also work whether you use fill, color, size, alpha, etc.

注意: 仅当您在ggplot中具有某种美感的颜色时,才可以访问色域。 无论您使用填充,颜色,大小,alpha等,此标记方法也将起作用。

Image for post

6.线注释 (6. Line annotations)

On the theme of telling a story with your visualization, line annotations are a very useful tool. Some examples that I’ve personally used include:

以可视化讲故事为主题,行注释是一个非常有用的工具。 我个人使用的一些示例包括:

  • Marking a before/after period on a line graph

    在折线图上标记之前/之后的期间
  • Plotting the mean of an x or y value on a scatter plot

    在散点图上绘制x或y值的平均值
  • Annotating a goal metric that we want to hit

    注释我们要达到的目标指标

Whatever the use case, having a line annotation helps communicate an important point to those who will be viewing your visualization. To add a line to your ggplot, you’ll use either:

无论用例如何,都有行注释都可以帮助将重要点传达给将要查看您的可视化内容的人员。 要将行添加到ggplot中,请使用以下任一方法:

  • geom_hline(): Adds a horizontal line (has a y intercept)

    geom_hline():添加一条水平线(具有ay截距)

  • geom_vline(): Adds a vertical line (has an x intercept)

    geom_vline():添加垂直线(具有x截距)

The example below will show both of these in action:

下面的示例将展示这两种功能:

ggplot(data = diamonds %>% sample_n(1e3), 
aes(x = carat, y = price, color = cut)) +
geom_point() + geom_hline(data = . %>% summarise(y = mean(price)),
aes(yintercept = y)) +
geom_vline(data = . %>% summarise(x = mean(carat)),
aes(xintercept = x))

Note that the above code may look a little more complicated than some of the other ggplot code in this article. I’ll try to explain what’s going on there. In order to get the average carat and price, a more straightforward way to get these values is to calculate them before your ggplot code. However, because I am lazy and like reducing the number of variables that I have, I instead pipe the data (diamonds %>% sample_n(1e3)) directly into the geom_line() statements, which work just as well.

请注意,上面的代码可能看起来比本文中的其他其他ggplot代码更复杂。 我将尽力解释那里发生了什么。 为了获得平均克拉和价格,获取这些值的一种更直接的方法是在ggplot代码之前计算它们。 但是,因为我很懒,并且喜欢减少变量的数量,所以我将数据( diamonds %>% sample_n(1e3) )直接管道传输到geom_line()语句中,该语句同样有效。

Image for post

7.文字注释 (7. Text annotations)

In addition to lines, it is always useful to have some sort of data labelling in your graphs. However, it’s only going to be useful if your data labels are easy to read. For example, if you blindly apply the text geom, you’ll end up with a really ugly graph:

除了线条外,在图形中具有某种数据标签也总是有用的。 但是,只有您的数据标签易于阅读时,它才有用。 例如,如果您盲目地应用文本几何,那么您将得到一个非常丑陋的图形:

p <- 
ggplot(data = diamonds %>% sample_n(1e3),
aes(x = carat, y = price, color = cut)) +
geom_point()p + geom_text(aes(label = price))
Image for post
This is bad
这不好

In this section, I’ll talk about three key tips for using geom_text() effectively.

在本节中,我将讨论有效使用geom_text()三个关键技巧。

  1. Filtering which labels are shown: You can get creative with this, but the goal of doing this is to only show relevant data labels. In the case below, I only want to show the prices of high-carat diamonds:

    过滤显示的标签:您可以以此为创意,但是这样做的目的是仅显示相关的数据标签。 在以下情况下,我只想显示高克拉钻石的价格:

p +   geom_text(data = . %>% filter(carat >= 2.25),
aes(label = price))
Image for post
Notice that this is easier to read now, only a few prices are shown
请注意,这现在比较容易阅读,只显示了一些价格

2. hjust + vjust

2.调整+调整

In the above graph, you’ll see that the text completely overlaps the point, which looks ugly. You can easily fix this by aligning your text within geom_text(). The way that I think of this is similar to left and right align in Microsoft Word.

在上图中,您将看到文本与该点完全重叠,这看起来很难看。 您可以通过在geom_text()中对齐文本来轻松解决此问题。 我想到的方式类似于Microsoft Word中的左对齐和右对齐。

Generally, you’ll have vjust and hjust range from [0,1] but it also takes on negative values and values greater than one (it will just move your label further in the specified direction). The graph below shows how text will be aligned based on your hjust and vjust values:

通常,您可以在[0,1]范围内调整范围,但也可以采用负值和大于1的值(它将沿指定方向进一步移动标签)。 下图显示了如何根据您的hjust和vjust值对齐文本:

Image for post
p + 
geom_text(data = . %>% filter(carat >= 2.25),
aes(label = price),
hjust = 0,
vjust = 0)
Image for post

3. color

3.颜色

This is more of a preference, but know that you can change the color of your text. You generally want to have your text contrast as much with the background as possible, as this makes it the most legible. This is important if you have some lighter colors (i.e. yellow) that may be hard to read:

这更多是一个首选项,但是您知道可以更改文本的颜色。 通常,您希望文本与背景的对比度尽可能大 ,因为这使文字更清晰。 如果您有一些较浅的颜色(即黄色)可能难以阅读,则这一点很重要:

p + 
geom_text(data = . %>% filter(carat >= 2.25),
aes(label = price),
hjust = 0,
vjust = 0,
color = 'black')
Image for post
# Another example where we add contrastdiamonds %>%
group_by(clarity) %>%
summarise(price = mean(price)) %>%
ggplot(aes(x = clarity, y = price)) +
geom_bar(stat = 'identity') +
geom_text(aes(label = round(price, 2)),
vjust = 1.25,
color = 'white')
Image for post

8.订购,订购,订购! (8. Order, order, order!)

Lastly, ordering your graph can make it easier to read, and this is especially useful for bar graphs. All you have to do is use fct_reorder() on the x value such that it’s sorted by the y-value:

最后,对图形进行排序可以使其更易于阅读,这对于条形图尤其有用。 您所要做的就是在x值上使用fct_reorder() ,使其按y值排序:

# By default, ggplot will order by the x valuediamonds %>%
group_by(clarity) %>%
summarise(price = mean(price)) %>%
ggplot(aes(x = clarity, y = price)) +
geom_bar(stat = 'identity')
Image for post
# Reordered:diamonds %>%
group_by(clarity) %>%
summarise(price = mean(price)) %>%
ggplot(aes(x = fct_reorder(clarity, price), y = price)) +
geom_bar(stat = 'identity')
Image for post

结论思想 (Concluding Thoughts)

I had a tough time deciding what different topics I wanted to cover in this article. I ended up focusing on topics that were initially confusing to me, and that I wish I understood more when I first started learning ggplot. Hopefully, this article gives you some concrete ideas on how to improve your visualizations or demystifies some of the more confusing/hidden aspects of ggplot.

我很难决定本文要涵盖的主题。 我最终将精力集中在最初让我感到困惑的主题上,希望我第一次开始学习ggplot时能了解更多。 希望本文为您提供一些有关如何改善可视化效果或使ggplot更加令人困惑/隐藏的方面变得神秘的具体想法。

翻译自: https://towardsdatascience.com/8-tips-for-better-data-visualization-2f7118e8a9f4

数据可视化 信息可视化

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值