如何用Galaaz制作漂亮的Ruby图

by Rodrigo Botafogo

由Rodrigo Botafogo

如何用Galaaz制作漂亮的Ruby图 (How to make Beautiful Ruby Plots with Galaaz)

Rodrigo Botafogo和DanielMossé (By Rodrigo Botafogo & Daniel Mossé)

According to Wikipedia “Ruby is a dynamic, interpreted, reflective, object-oriented, general-purpose programming language. It was designed and developed in the mid-1990s by Yukihiro “Matz” Matsumoto in Japan.” It reached high popularity with the development of Ruby on Rails (RoR) by David Heinemeier Hansson.

根据Wikipedia的说法,“ Ruby是一种动态的,解释性的,反射性的,面向对象的通用编程语言。 它是由日本的松本行弘(Yukihiro“ Matz” Matsumoto)在1990年代中期设计和开发的。” 随着David Heinemeier Hansson的Ruby on Rails(RoR)的开发,它获得了很高的知名度。

RoR is a web application framework first released around 2005. It makes extensive use of Ruby’s meta-programming features. With RoR, Ruby became very popular. According to Ruby’s Tiobe index it peaked in popularity around 2008, then declined until 2015 when it started picking up again.

RoR是一个Web应用程序框架,最早于2005年左右发布。它广泛使用了Ruby的元编程功能。 通过RoR,Ruby变得非常流行。 根据Ruby的Tiobe指数,它在2008年左右达到顶峰,然后下降到2015年,然后又开始回升。

At the time of this writing (November 2018), the Tiobe index puts Ruby in 16th position as most popular language.

在撰写本文时(2018年11月),Tiobe索引使Ruby在最受欢迎的语言中排名第16位。

Python, a language similar to Ruby, ranks 4th in the index. Java, C and C++ take the first three positions. Ruby is often criticized for its focus on web applications. But Ruby can do much more than just web applications. Yet, for scientific computing, Ruby lags way behind Python and R. Python has the Django framework for web, NumPy for numerical arrays, and Pandas for data analysis. R is a free software environment for statistical computing and graphics with thousands of libraries for data analysis.

Python与Ruby类似,在索引中排名第四。 Java,C和C ++占据前三个位置。 Ruby因其对Web应用程序的关注而经常受到批评。 但是Ruby不仅可以做Web应用程序,还可以做更多的事情。 但是,对于科学计算,Ruby落后于Python和R。Python具有用于Web的Django框架,用于数值数组的NumPy和用于数据分析的Pandas。 R是用于统计计算和图形的免费软件环境,具有数千个用于数据分析的库。

Until recently, there was no real way for Ruby to bridge this gap. Implementing a complete scientific computing infrastructure would take too long. Enters Oracle’s GraalVM:

直到最近,Ruby还没有真正的方法来弥合这种差距。 实施完整的科学计算基础架构将花费很长时间。 进入Oracle的GraalVM

GraalVM is a universal virtual machine for running applications written in JavaScript, Python 3, Ruby, R, JVM-based languages like Java, Scala, Kotlin, and LLVM-based languages such as C and C++.

GraalVM是一种通用虚拟机,用于运行以JavaScript,Python 3,Ruby,R,基于JVM的语言(例如Java,Scala,Kotlin)和基于LLVM的语言(例如C和C ++)编写的应用程序。

GraalVM removes the isolation between programming languages and enables interoperability in a shared run-time. It can run either standalone or in the context of OpenJDK, Node.js, Oracle Database, or MySQL.

GraalVM消除了编程语言之间的隔离,并在共享的运行时中实现了互操作性。 它可以独立运行,也可以在OpenJDK,Node.js,Oracle数据库或MySQL的上下文中运行。

GraalVM allows you to write polyglot applications with a seamless way to pass values from one language to another. With GraalVM there is no copying or marshaling necessary as it is with other polyglot systems. This lets you achieve high performance when language boundaries are crossed. Most of the time there is no additional cost for crossing a language boundary at all.

GraalVM允许您以无缝的方式编写多语言应用程序,以将值从一种语言传递到另一种语言。 与其他多语言系统一样,使用GraalVM无需复制或封送处理。 当您跨越语言界限时,这可以使您获得高性能。 在大多数情况下,完全不需要跨越语言边界的额外费用。

Often developers have to make uncomfortable compromises that require them to rewrite their software in other languages. For example:

通常,开发人员必须做出令人不舒服的妥协,这要求他们用其他语言重写其软件。 例如:

- “That library is not available in my language. I need to rewrite it.”
-“该库不支持我的语言。 我需要重写它。”
- “That language would be the perfect fit for my problem, but we cannot run it in our environment.”
-“该语言最适合我的问题,但我们无法在自己的环境中运行它。”
- “That problem is already solved in my language, but the language is too slow.”
-“这个问题已经用我的语言解决了,但是语言太慢了。”

With GraalVM we aim to allow developers to freely choose the right language for the task at hand without making compromises.

借助GraalVM,我们旨在使开发人员能够为手头的任务自由选择正确的语言,而不会做出任何妥协。

As stated above, GraalVM is a universal virtual machine that allows Ruby and R (and other languages) to run on the same environment. GraalVM allows polyglot applications to seamlessly interact with one another and pass values from one language to the other.

如上所述,GraalVM是一种通用虚拟机,它允许Ruby和R(和其他语言)在同一环境中运行。 GraalVM允许多语言应用程序彼此无缝交互,并将值从一种语言传递到另一种语言。

GraalVM is a very powerful environment. Yet, it still requires application writers to know several languages. To eliminate that requirement, we built Galaaz, a gem for Ruby, to tightly couple Ruby and R and allow those languages to interact in a way that the user will be unaware of such interaction. In other words, a Ruby programmer will be able to use all the capabilities of R without knowing the R syntax.

GraalVM是一个非常强大的环境。 但是,它仍然需要应用程序编写者知道几种语言。 为了消除该需求,我们构建了Galaaz(Ruby的瑰宝),将Ruby和R紧密耦合,并允许这些语言进行交互,以使用户不会意识到这种交互。 换句话说,Ruby程序员将能够使用R的所有功能而无需了解R语法。

Library wrapping is a usual way of bringing features from one language into another. To improve performance, Python often wraps more efficient C libraries. For the Python developer, the existence of such C libraries is hidden. The problem with library wrapping is that for any new library, there is the need to handcraft a new wrapper requiring a high level of expertise and time.

库包装是将功能从一种语言转换为另一种语言的常用方法。 为了提高性能,Python通常包装更有效的C库。 对于Python开发人员而言,这种C库的存在是隐藏的。 库包装的问题在于,对于任何新图书馆,都需要手工制作需要高度专业知识和时间的新包装。

Galaaz, instead of wrapping a single C or R library, wraps the whole R language in Ruby. Doing so, all thousands of R libraries are available immediately to Ruby developers without any new wrapping effort.

Galaaz而不是包装单个C或R库,而是将整个R语言包装在Ruby中。 这样做,Ruby开发人员可以立即使用所有成千上万个R库,而无需进行任何新的包装工作。

To show the power of Galaaz, we show in this article how Ruby can use R’s ggplot2 library transparently bringing to Ruby the power of high quality scientific plotting. We also show that migrating from R to Ruby with Galaaz is a matter of small syntactic changes. By using Ruby, the R developer can use all of Ruby’s powerful object-oriented features. Also, with Ruby, it becomes much easier to move code from the analysis phase to the production phase.

为了展示Galaaz的力量,我们在本文中展示Ruby如何能够透明地使用R的ggplot2库,从而为Ruby带来高质量科学绘图的力量。 我们还表明,使用Galaaz从R迁移到Ruby是一个小的语法更改。 通过使用Ruby,R开发人员可以使用Ruby的所有强大的面向对象功能。 而且,使用Ruby,将代码从分析阶段转移到生产阶段变得容易得多。

In this article we will explore the R ToothGrowth dataset. To illustrate, we will create some boxplots. A primer on boxplot is available in this article.

在本文中,我们将探索R ToothGrowth数据集。 为了说明,我们将创建一些箱形图。 本文提供了关于箱线图的入门资料。

We will also create a Corporate Template ensuring that plots will have a consistent visualization. This template is built using a Ruby module. There is a way of building ggplot themes that will work the same as the Ruby module. Yet, writing a new theme requires specific knowledge on theme writing. Ruby modules are standard to the language and don’t need special knowledge.

我们还将创建一个Corporate Template,以确保绘图具有一致的可视化效果。 该模板是使用Ruby模块构建的。 有一种构建ggplot主题的方法,该主题的工作原理与Ruby模块相同。 但是,编写新主题需要有关主题写作的特定知识。 Ruby模块是该语言的标准组件,不需要特殊知识。

Here we show a scatter plot in Ruby also with Galaaz.

在这里,我们还使用Galaaz显示了Ruby中的散点图。

针织 (gKnit)

Knitr is an application that converts text written in rmarkdown to many different output formats. For instance, a writer can convert an rmarkdown document to HTML, LaTex, docx and many other formats.

Knitr是将以rmarkdown编写的文本转换为许多不同输出格式的应用程序。 例如,编写者可以将rmarkdown文档转换为HTML, LaTex ,docx和许多其他格式。

Rmarkdown documents can contain text and code chunks. Knitr formats code chunks in a grayed box in the output document. It also executes the code chunks and formats the output in a white box. Every line of output from the execution code is preceded by ‘##’.

Rmarkdown文档可以包含文本和代码块 。 Knitr在输出文档的灰色框中格式化代码块。 它还执行代码块,并在白框中格式化输出。 执行代码的每一行输出均以“ ##”开头。

Knitr allows code chunks to be in R, Python, Ruby and dozens of other languages. Yet, while R and Python chunks can share data, in other languages, chunks are independent. This means that a variable defined in one chunk cannot be used in another chunk.

Knitr允许代码块使用R,Python,Ruby和许多其他语言。 然而,尽管R和Python块可以共享数据,但在其他语言中,块是独立的。 这意味着一个块中定义的变量不能在另一块中使用。

With gKnit Ruby code chunks can share data. In gKnit each Ruby chunk executes in its own scope and thus, local variable defined in a chunk are not accessible by other chunks. Yet, All chunks execute in the scope of a ‘chunk’ class and instance variables (‘@’), are available in all chunks.

使用gKnit Ruby代码块可以共享数据。 在gKnit中,每个Ruby块都在其自己的范围内执行,因此,块中定义的局部变量无法被其他块访问。 但是,所有块都在“块”类的范围内执行,并且实例变量('@')在所有块中均可用。

探索数据集 (Exploring the Dataset)

Let’s start by exploring our selected dataset. A dataset is like a simple excel spreadsheet, in which each column has only one type of data. For instance one column can have float, the other integer, and a third strings.

让我们开始探索我们选择的数据集。 数据集就像一个简单的Excel电子表格,其中每一列只有一种类型的数据。 例如,一列可以有浮点数,另一整数和第三个字符串。

ToothGrowth R dataset analyzes the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs, where each animal received one of three dose levels of Vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice (OJ) or ascorbic acid (a form of vitamin C and coded as VC).

ToothGrowth R数据集分析了60只豚鼠的成牙本质细胞(负责牙齿生长的细胞)的长度,其中每只动物通过两种递送方法之一接受三种剂量水平的维生素C(0.5、1和2 mg /天)之一,橙汁(OJ)或抗坏血酸(维生素C的一种形式,编码为VC)。

The ToothGrowth dataset contains three columns: ‘len’, ‘supp’ and ‘dose’. Let’s take a look at a few rows of this dataset.

ToothGrowth数据集包含三列:“ len”,“ supp”和“ dose”。 让我们看一下该数据集的几行。

In Galaaz, R variables are accessed by using the corresponding Ruby symbol preceded by the tilde (‘~’) function. Note in the following chunk that ‘ToothGrowth’ is the R variable and Ruby’s ‘@tooth_growth’ is assigned the value of ‘~:ToothGrowth’.

在Galaaz中,R变量是通过使用相应的Ruby符号(带波浪号('〜'))来访问的。 请注意,在以下代码块中,“ ToothGrowth”是R变量,而Ruby的“ @tooth_growth”被分配了“〜:ToothGrowth”的值。

# Read the R ToothGrowth variable and assign it to the# Ruby instance variable @tooth_growth that will be # available to all Ruby chunks in this document.@tooth_growth = ~:ToothGrowth
# print the first few elements of the datasetputs @tooth_growth.head
##    len supp dose## 1  4.2   VC  0.5## 2 11.5   VC  0.5## 3  7.3   VC  0.5## 4  5.8   VC  0.5## 5  6.4   VC  0.5## 6 10.0   VC  0.5

Great! We’ve managed to read the ToothGrowth dataset and take a look at its elements. We see here the first 6 rows of the dataset. To access a column, follow the dataset name with a dot (‘.’) and the name of the column. Also use dot notation to chain methods in usual Ruby style.

大! 我们已经设法读取了ToothGrowth数据集并对其元素进行了研究。 我们在这里看到数据集的前6行。 要访问列,请在数据集名称后加上点('。')和列名称。 还可以使用点符号来链接通常的Ruby风格的方法。

# Access the tooth_growth 'len' column and print the first few# elements of this column with the 'head' method.puts @tooth_growth.len.head
## [1]  4.2 11.5  7.3  5.8  6.4 10.0

The ‘dose’ column contains a numeric value with either, 0.5, 1 or 2, although the first 6 rows as seen above only contain the 0.5 values. Even though those are number, they are better interpreted as a factor or category. So, let’s convert our ‘dose’ column from numeric to ‘factor’.

“剂量”(dose)列包含数值为0.5、1或2,尽管如上所示,前6行仅包含0.5值。 即使这些是数字,也最好将它们解释为一个因素或类别 。 因此,让我们将“剂量”列从数字转换为“因子”。

In R, the function ‘as.factor’ is used to convert data in a vector to factors. To use this function from Galaaz the dot (‘.’) in the function name is substituted by ’__‘(double underline). The function ’as.factor’ becomes ’R.as__factor’ or just ’as__factor’ when chaining.

在R中,函数“ as.factor”用于将向量中的数据转换为因数。 要从Galaaz使用此函数,函数名称中的点('。')将替换为'__'(双下划线)。 链接时,函数“ as.factor”变为“ R.as__factor”或只是“ as__factor”。

# convert the dose to a factor@tooth_growth.dose = @tooth_growth.dose.as__factor

Let’s explore some more details of this dataset. In particular, let’s look at its dimensions, structure and summary statistics.

让我们探索这个数据集的更多细节。 特别是,让我们看一下它的维度,结构和摘要统计信息。

puts @tooth_growth.dim
## [1] 60  3

This dataset has 60 rows, one for each subject and 3 columns, as we have already seen.

正如我们已经看到的,该数据集有60行,每个主题一个,三列。

Note that we do not need to call ‘puts’ when using the ‘str’ function. This functions does not return anything and prints the structure of the dataset as a side effect.

请注意,使用“ str”功能时,我们无需调用“ puts”。 此函数不返回任何内容,并作为副作用打印数据集的结构。

@tooth_growth.str
## 'data.frame':    60 obs. of  3 variables:##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...##  $ dose: Factor w/ 3 levels "0.5","1","2": 1 1 1 1 1 1 1 1 1 1 ...

Observe that both variables ‘supp’ and ‘dose’ are factors. The system made variable ‘supp’ a factor automatically, since it contains two strings OJ and VC.

注意变量“ supp”和“ dose”都是因素。 系统自动将变量“ supp”作为一个因子,因为它包含两个字符串OJ和VC。

Finally, using the summary method, we get the statistical summary for the dataset

最后,使用摘要方法,我们可以获得数据集的统计摘要

puts @tooth_growth.summary
##       len        supp     dose   ##  Min.   : 4.20   OJ:30   0.5:20  ##  1st Qu.:13.07   VC:30   1  :20  ##  Median :19.25           2  :20  ##  Mean   :18.81                   ##  3rd Qu.:25.27                   ##  Max.   :33.90

进行数据分析 (Doing the Data Analysis)

快速查看数据图 (Quick plot for seeing the data)

Let’s now create our first plot with the given data by accessing ggplot2 from Ruby. For Rubyists that have never seen or used ggplot2, here is the description of ggplot found in its home page:

现在,通过从Ruby访问ggplot2,使用给定的数据创建第一个图。 对于从未看过或从未使用过ggplot2的Ruby主义者,以下是在其主页上找到的ggplot的描述:

“ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.”

“ ggplot2是一个基于图形语法的声明式创建图形的系统。 您提供数据,告诉ggplot2如何将变量映射到美观,使用哪些图形基元,以及如何处理细节。”

This description might be a bit cryptic and it is best to see it at work to understand it. Basically, in the grammar of graphics developers add layers of components such as grid, axes, data, title, subtitle and also graphical primitives such as bar plot, box plot, to form the final graphics.

此描述可能有点含糊,最好是在工作中加以理解才能理解。 基本上,在图形开发人员的语法中,添加诸如网格,轴,数据,标题,副标题之类的组件层,以及诸如条形图箱形图之类的图形基元,以形成最终的图形。

Interested readers can look up the following articles on the grammar of graphics on medium: A Comprehensive Guide to the Grammar of Graphics for Effective Visualization of Multi-dimensional Data and What are the Ingredients of a Terrible Data Story?

有兴趣的读者可以在媒体上查找以下有关图形语法的文章: 有效地多维数据可视化的图形语法综合指南,以及可怕的数据故事的成分是什么?

In order to make a plot, we use the ‘ggplot’ function to the dataset. In R, this would be written as ggplot(<dataset>, ...). Galaaz gives you the flexibility to use either R.ggplot(<dataset&gt;, ...) or <dataset>.ggplot(...). In the graph specification bellow, we use the second notation that looks more like Ruby. Ggplot uses the ‘aes’ method to specify x and y axes; in this case, the ‘dose’ on the x axis and the ‘length’ on the y axis: ‘E.aes(x: :dose, y: :len)’. To specify the type of plot add a geom to the plot. For a boxplot, the geom is R.geom_boxplot.

为了绘制图,我们对数据集使用“ ggplot”函数。 在R中,这将写为ggplot(<dataset> ,...)。 Galaaz使您可以灵活地使用ither R.ggplot(<datas et&g t;, ...) or <datas datas et> .ggplot(...)。 在下面的图形规范中,我们使用看起来更像Ruby的第二种表示法。 Ggplot使用'aes'方法指定x和y轴; 在x轴和y轴个e- '长度'这种情况下, E '剂量': 'E.aes(X::剂量,Y:LEN)'。 要指定图的类型,请在图上添加几何图形。 对于箱线图,geom为R.geom_boxplot。

Note also that we have a call to ‘R.png’ before plotting and ’R.dev__off’ after the print statement. ‘R.png’ opens a ‘png device’ for outputting the plot. If we do no pass a name to the ‘png’ function, the image gets a default name of ‘Rplot<nnn>’ where <nnn> is the number of the plot. ’R.dev__off’ closes the device and creates the ‘png’ file. We can then include the generated ‘png’ file in the document by adding an rmarkdown directive.

还要注意,在绘制之前我们调用了“ R.png”,在打印语句之后调用了“ R.dev__off”。 “ R.png”打开一个“ png设备”以输出绘图。 如果我们不将名称传递给'png'函数,则图像将获得默认名称'Rplot <nnn>',其中<nnn>是图的编号。 'R.dev__off'关闭设备并创建'png'文件。 然后,我们可以通过添加rmarkdown指令将生成的“ png”文件包含在文档中。

Great! We’ve just managed to create and save our first plot in Ruby with only four lines of code. We can now easily see with this plot a clear trend: as the dose of the supplement increases, so are the length of teeth.

大! 我们仅用四行代码就可以在Ruby中创建并保存我们的第一张图。 现在,我们可以通过该图轻松地看到一个明显的趋势:随着补充剂剂量的增加,牙齿的长度也随之增加。

面对情节 (Faceting the plot)

This first plot shows a trend, but our data has information about two different forms of delivery method, either by Orange Juice (OJ) or by Vitamin C (VC). Let’s then try to create a plot that helps us discern the effect of each delivery method.

第一张图显示了趋势,但是我们的数据提供了关于两种不同形式的递送方法(橙汁(OJ)或维生素C(VC))的信息。 然后,让我们尝试创建一个图表,以帮助我们识别每种交付方式的效果。

This next plot is a facetted plot where each delivery method gets is own plot. On the left side, the plot shows the OJ delivery method. On the right side, we see the VC delivery method. To obtain this plot, we use the ‘R.facet_grid’ function that automatically creates the facets based on the delivery method factors. The parameter to the ‘facet_grid’ method is a formula.

下一个图是分图,其中每种交付方式获得的图都是自己的图。 在左侧,该图显示了OJ交付方法。 在右侧,我们看到了VC交付方法。 为了获得该图,我们使用“ R.facet_grid”函数,该函数根据传递方法因素自动创建构面。 'facet_grid'方法的参数是一个公式

In Galaaz we give programmers the flexibility to use two different ways to write formulas. In the first way, the following changes from writing formulas (for example ‘x ~ y’) in R are necessary:

在Galaaz中,我们为程序员提供了使用两种不同方式编写公式的灵活性。 在第一种方式中,有必要对在R中编写公式(例如'x〜y')进行以下更改:

  • R symbols are represented by the same Ruby symbol prefixed with the ‘+’ method. The symbol x in R becomes +:x in Ruby;

    R符号由以'+'方法为前缀的相同Ruby符号表示。 R中的符号x在Ruby中变成+:x

  • The ‘~’ operator in R becomes ‘=~’ in Ruby. The formula x ~ y in R is written as +:x =~ +:y in Ruby;

    R中的'〜'运算符在Ruby中变为'=〜'。 式x ~ y R中写为+:x =~ +:y在Ruby;

  • The ‘.’ symbol in R becomes ‘+:all’

    “。” R中的符号变为'+:all'

Another way of writing a formula is to use the ‘formula’ function with the actual formula as a string. The formula x ~ y in R can be written as R.formula("x ~ y"). For more complex formulas, the use of the ‘formula’ function is preferred.

编写公式的另一种方法是将“公式”函数与实际公式一起用作字符串。 式x ~ y中的R可被写为R.formula("x ~ y") 对于更复杂的公式,首选使用“公式”函数。

The formula +:all =~ +:supp indicates to the ‘facet_grid’ function that it needs to facet the plot based on the supp variable and split the plot vertically. Changing the formula to +:supp =~ +:all would split the plot horizontally.

公式+:all =~ +:supp向'facet_grid'函数表明,它需要根据supp变量对图进行分面并垂直分割图。 将公式更改为+:supp =~ +:all将水平分割图。

R.png("figures/facet_by_delivery.png")@base_tooth = @tooth_growth.ggplot(E.aes(x: :dose, y: :len, group: :dose))@bp = @base_tooth + R.geom_boxplot +      # Split in vertical direction      R.facet_grid(+:all =~ +:supp)      puts @bpR.dev__off

It now becomes clear that although both methods of delivery have a direct impact on tooth growth, method OJ is non-linear having a higher impact with smaller doses of ascorbic acid and reducing it’s impact as the dose increases. With the VC approach, the impact seems to be more linear.

现在清楚的是,尽管两种输送方法都对牙齿的生长有直接影响,但是OJ方法是非线性的,对较小剂量的抗坏血酸具有较高的影响,并随着剂量的增加而减小其影响。 使用VC方法,其影响似乎更加线性。

增加色彩 (Adding Color)

If we were writing about data analysis, we would make a better analysis of the trends and improve the statistical analysis. But here we are interested in working with ggplot in Ruby. So, let’s add some colors to this plot to make the trend and comparison more visible.

如果我们正在撰写有关数据分析的文章,我们将对趋势进行更好的分析并改善统计分析。 但是在这里,我们有兴趣在Ruby中使用ggplot。 因此,让我们为该绘图添加一些颜色以使趋势和比较更加可见。

In the following plot, the boxes are color coded by dose. To add color, it is enough to add fill: :dose to the aesthetic of boxplot. With this command each ‘dose’ factor gets its own color.

在下图中,方框按剂量进行了颜色编码。 要添加颜色,只需在boxplot的美学中添加fill: :dose即可。 使用此命令,每个“剂量”因子都有自己的颜色。

R.png("figures/facets_by_delivery_color.png")
@bp = @bp + R.geom_boxplot(E.aes(fill: :dose))puts @bp
R.dev__off

Faceting helps us compare the general trends for each delivery method. Adding color allow us to compare specifically how each dosage impacts the tooth growth. It is possible to observe that with smaller doses, up to 1mg, OJ performs better than VC (red color). For 2mg, both OJ and VC have the same median, but OJ is less disperse (blue color). For 1mg (green color), OJ is significantly better than VC. By this very quick visual analysis, it seems that OJ is a better delivery method than VC.

构面可帮助我们比较每种投放方式的总体趋势。 添加颜色使我们可以专门比较每种剂量如何影响牙齿生长。 可以观察到,较小剂量(最高1mg)的OJ效果优于VC(红色)。 对于2毫克,OJ和VC的中位数相同,但OJ的分散性较低(蓝色)。 对于1mg(绿色),OJ明显优于VC。 通过这种快速的视觉分析,似乎OJ比VC是更好的交付方式。

澄清数据 (Clarifying the data)

Boxplots give us a nice idea of the distribution of data, but looking at those plots with large colored boxes leaves us wondering what else is going on. According to Edward Tufte in Envisioning Information:

箱线图使我们对数据的分布有了一个很好的了解,但是看看那些带有大号彩色方框的地块,就让我们想知道还会发生什么。 根据爱德华·塔夫特在《远景信息》中的说法:

Thin data rightly prompts suspicions: “What are they leaving out? Is that really everything they know? What are they hiding? Is that all they did?” Now and then it is claimed that vacant space is “friendly” (anthropomorphizing an inherently murky idea) but it is not how much empty space there is, but rather how it is used. It is not how much information there is, but rather how effectively it is arranged.

稀疏数据正确地引起了怀疑:“它们遗漏了什么? 他们真的知道所有这些吗? 他们隐藏了什么? 他们做了这些吗?” 时不时地有人声称,空闲空间是“友好的”(拟人化固有的模糊想法),但它不是有多少空闲空间,而是如何使用它。 它不是信息量,而是信息的有效安排。

And he states:

他说:

A most unconventional design strategy is revealed: to clarify, add detail.

揭示了一种最非常规的设计策略: 澄清,增加细节。

Let’s use this wisdom and add yet another layer of data to our plot, so that we clarify it with detail and do not leave large empty boxes. In this next plot, we add data points for each of the 60 pigs in the experiment. For that, add the function ‘R.geom_point’ to the plot.

让我们利用这种智慧,在情节中再添加一层数据,以便我们对其进行详细说明,而不会留下大的空白框。 在下一个图表中,我们为实验中的60只猪添加了数据点。 为此,将函数“ R.geom_point”添加到绘图中。

R.png("figures/facets_with_points.png")
# Add point for each subject@bp = @bp + R.geom_point
puts @bp
R.dev__off

Now we can see the actual distribution of all the 60 subjects. Actually, this is not totally true. We have a hard time seeing all 60 subjects. It seems that some points might be placed one over the other hiding useful information.

现在我们可以看到所有60个主题的实际分布。 实际上,这并非完全正确。 我们很难看到所有60个主题。 似乎有些观点可能会一一列举,从而隐藏了有用的信息。

But no sweat! Another layer might solve the problem. In the following plot a new layer called ‘geom_jitter’ is added to the plot. Jitter adds a small amount of random variation to the location of each point, and is a useful way of handling overplotting caused by discreteness in smaller datasets. This makes it easier to see all of the points and prevents data hiding. We also add color and change the shape of the points, making them even easier to see.

但是没有汗水! 另一层可以解决问题。 在下面的绘图中,将新图层“ geom_jitter”添加到绘图中。 抖动会给每个点的位置增加少量的随机变化,并且是处理较小数据集中离散性导致的过度绘图的一种有用方法。 这样可以更轻松地查看所有要点,并防止数据隐藏。 我们还添加了颜色并更改了点的形状,使它们更易于查看。

R.png("figures/facets_with_jitter.png")
# Use small diamonds in a light blue color (cyan3) # to plot the subjects of the experimentputs @bp + R.geom_jitter(shape: 23, color: "cyan3", size: 1)
R.dev__off

准备演示图 (Preparing the Plot for Presentation)

We have come a long way since our first plot. As we’ve already said, this is not an article about data analysis and the focus is on the integration of Ruby and ggplot. So, let’s assume that the analysis is now done. Yet, ending the analysis does not mean that the work is done. On the contrary, the hardest part is yet to come!

自从我们的第一个情节以来,我们已经走了很长一段路。 正如我们已经说过的,这不是一篇有关数据分析的文章,而是着眼于Ruby和ggplot的集成。 因此,我们假设分析已经完成。 但是,结束分析并不意味着工作已经完成。 相反,最困难的部分尚未到来!

After the analysis it is necessary to communicate it by making a final plot for presentation. The last plot has all the information we want to share, but it is not very pleasing to the eye.

分析之后,有必要通过绘制最终图进行交流以进行展示。 最后一个情节具有我们要共享的所有信息,但并不令人赏心悦目。

改善色彩 (Improving Colors)

Let’s start by trying to improve colors. For now, we will not use the jitter layer. The previous plot uses three bright colors. Is there any obvious, or non-obvious for that matter, interpretation for the colors? Clearly, they are just random colors selected automatically by our software. Although those colors helped us understand the data, for a final presentation random colors can distract the viewer.

让我们开始尝试改善色彩。 现在,我们将不使用抖动层。 上一个图使用三种明亮的颜色。 关于颜色,有没有明显的解释? 显然,它们只是我们的软件自动选择的随机颜色。 尽管这些颜色帮助我们理解了数据,但在最终演示中,随机颜色会分散观众的注意力。

In the following plot we use ‘scale_fill_manual’ function to change the colors of the boxes and order of labels. For colors, we use shades of blue for each dosage, with light blue (‘cyan’) representing the lower dose and deep blue (‘deepskyblue4’) the higher dose.

在下面的图中,我们使用“ scale_fill_manual”功能更改框的颜色和标签的顺序。 对于颜色,我们为每个剂量使用蓝色阴影,浅蓝色('cyan')代表较低的剂量,深蓝色('deepskyblue4')代表较高的剂量。

Also, the legend could be improved: we use the ‘breaks’ parameter to put the smaller value (0.5) at the bottom of the labels and the largest (2) at the top. This ordering seems more natural and matches with the actual order of the colors in the plot.

同样,图例也可以得到改进:我们使用“ breaks”参数将较小的值(0.5)放在标签的底部,将最大的值(2)放在顶部。 这种排序看起来更自然,并且与图中颜色的实际顺序匹配。

R.png("figures/facets_by_delivery_color2.png")
@bp = @bp +      R.scale_fill_manual(values: R.c("cyan", "deepskyblue",                                      "deepskyblue4"),                          breaks: R.c("2","1","0.5"))
puts @bp
R.dev__off
小提琴图和抖动 (Violin Plot and Jitter)

The boxplot with jitter did look a bit overwhelming. The next plot uses a variation of a boxplot known as a violin plot with jittered data.

带有抖动的箱线图看上去确实有点压倒性的。 下一个图使用箱形图的变化形式,即带有抖动数据的小提琴图

From Wikipedia

来自维基百科

A violin plot is a method of plotting numeric data. It is similar to a box plot with a rotated kernel density plot on each side.

小提琴图是一种绘制数字数据的方法。 它类似于箱形图,每侧都有旋转的核密度图。

A violin plot has four layers. The outer shape represents all possible results, with thickness indicating how common. (Thus the thickest section represents the mode average.) The next layer inside represents the values that occur 95% of the time. The next layer (if it exists) inside represents the values that occur 50% of the time. The central dot represents the median average value.

小提琴图有四层。 外部形状表示所有可能的结果,厚度表示其普遍程度。 (因此,最厚的部分表示模式平均值。)内部的下一层表示在95%的时间出现的值。 内部的下一层(如果存在)表示出现在50%的时间内的值。 中心点代表中位数平均值。

R.png("figures/violin_with_jitter.png")@violin = @base_tooth + R.geom_violin(E.aes(fill: :dose)) +    R.facet_grid(+:all =~ +:supp) +   R.geom_jitter(shape: 23, color: "cyan3", size: 1) +   R.scale_fill_manual(values: R.c("cyan", "deepskyblue",                                   "deepskyblue4"),                       breaks: R.c("2","1","0.5"))puts @violinR.dev__off

This plot is an alternative to the original boxplot. For the final presentation, it is important to think which graphics will be best understood by our audience. A violin plot is a less known plot and could add mental overhead, yet, in my opinion, it does look a bit better than the boxplot and provides even more information than the boxplot with jitter.

该图是原始箱线图的替代方案。 对于最后的演示,重要的是要考虑哪些图形将被我们的听众最好地理解。 小提琴图是一个鲜为人知的图,可能会增加心理负担,但是,在我看来,它的确比盒图好一点,并且比带抖动的盒图提供更多的信息。

添加装饰 (Adding Decoration)

Our final plot is starting to take shape, but a presentation plot should have at least a title, labels on the axes and maybe some other decorations. Let’s start adding those. Since decoration requires more graph area, this new plot has a ‘width’ and ‘height’ specification. When there is no specification, the default values from R for width and height are 480 pixels.

我们的最终情节开始成形,但是演示情节应该至少具有标题,轴上的标签以及其他一些装饰。 让我们开始添加这些。 由于装饰需要更多图形区域,因此该新图具有“宽度”和“高度”规范。 没有规格时,R的宽度和高度默认值为480像素。

The ‘labs’ function adds the required decoration. In this example we use ‘title’, ‘subtitle’, ‘x’ for the x axis label and ‘y’, for the y axis label, and ‘caption’ for information about the plot (for clarity, we defined a caption variable using Ruby’s Here Doc style).

“实验室”功能可添加所需的装饰。 在此示例中,我们将'title','subtitle','x'用作x轴标签,并将'y'用作y轴标签,并使用'caption'表示有关绘图的信息(为清楚起见,我们定义了caption变量使用Ruby的Here Doc样式)。

R.png("figures/facets_with_decorations.png", width: 540,       height: 560)
caption = <<-EOTLength of odontoblasts in 60 guinea pigs. Each animal received one of three dose levels of vitamin C.EOT
@decorations =  R.labs(title: "Tooth Growth:  Length vs Vitamin C Dose",         subtitle: "Faceted by delivery method, OJ or VC",         x: "Dose (mg)", y: "Teeth length",         caption: caption)
puts @bp + @decorations
R.dev__off
公司主题 (The Corp Theme)

We are almost done. But the default plot configuration does not yet look nice to the eye. We are still distracted by many aspects of the graph. First, the black font color does not look good. Then plot background, borders, grids all add clutter to the plot.

我们快完成了。 但是默认的绘图配置看起来还不太好。 我们仍然对图表的许多方面分心。 首先,黑色字体颜色看起来不太好。 然后,绘图背景,边框和网格都会使绘图变得混乱。

We will now define our corporate theme. in a module that can be used/loaded for all plots, similar to CSS or any other style definition.

现在,我们将定义公司主题。 在可以用于所有绘图的模块中/类似于CSS或任何其他样式定义。

In this theme, we remove borders and grids. The background is left for faceted plots but removed for non-faceted plots. Font colors are a shade o blue (color: ‘#00080’). Axes labels are moved near the end of the axis and written in ‘bold’.

在此主题中,我们删除边框和网格。 多面图保留背景,无面图保留背景。 字体颜色为深蓝色(颜色:“#00080”)。 轴标签移动到轴的末端附近并以“粗体”书写。

module CorpTheme
R.install_and_loads 'RColorBrewer'   #----------------------------------------------------------------# face can be  (1=plain, 2=bold, 3=italic, 4=bold-italic)#----------------------------------------------------------------    def self.text_element(size, face: "plain", hjust: nil)    E.element_text(color: "#000080",                    face: face,                   size: size,           hjust: hjust)  end  #----------------------------------------------------------------# Defines the plot theme (visualization).  In this theme we # remove major and minor grids, borders and background.  We # also turn-off scientific notation.#----------------------------------------------------------------    def self.global_theme(faceted = false)    # turn-off scientific notation like 1e+48    R.options(scipen: 999)    # remove major grids    gb = R.theme(panel__grid__major: E.element_blank())    # remove minor grids    gb = gb + R.theme(panel__grid__minor: E.element_blank)    # remove border    gb = gb + R.theme(panel__border: E.element_blank)    # remove background. When working with faceted graphs,     # the background makes it easier to see each facet, so     # leave it    gb = gb +       R.theme(panel__background: E.element_blank) if !faceted    # Change axis font    gb = gb + R.theme(axis__text: text_element(8))    # change axis title font    gb = gb +      R.theme(axis__title:        text_element(10, face: "bold", hjust: 1))    # change font of title    gb = gb + R.theme(title: text_element(12, face: "bold"))    # change font of subtitle    gb = gb + R.theme(plot__subtitle: text_element(9))    # change font of captions    gb = gb + R.theme(plot__caption: text_element(8))
end   end

最终箱图 (Final Box Plot)

We can now easily make our final boxplot and violin plot. All the layers for the plot were added in order to expose our understanding of the data and the need to present the result to our audience.

现在,我们可以轻松制作最终的箱形图和小提琴图。 添加了该图的所有图层,以展示我们对数据的理解以及将结果呈现给听众的需要。

The final specification is just the addition of all layers build up to this point (@bp), plus the decorations (@decorations), plus the corporate theme.

最终的规范只是添加到此为止的所有层(@bp),再加上装饰(@decorations),再加上公司主题。

Here is our final boxplot, without jitter.

这是没有抖动的最终箱形图。

R.png("figures/final_box_plot.png", width: 540, height: 560)
puts @bp + @decorations + CorpTheme.global_theme(faceted: true)
R.dev__off

And here is the final violin plot, with jitter and the same look and feel of the corporate boxplot.

这是最终的小提琴图,具有抖动感,并且具有与公司箱形图相同的外观。

R.png("figures/final_violin_plot.png", width: 540, height: 560)
puts @violin + @decorations + CorpTheme.global_theme(faceted: true)
R.dev__off

另一种观点 (Another View)

We now make another plot, with the same look and feel as before but facetted by dose and not by supplement. This shows how easy it is to create new plots by just changing small statement on the grammar of graphics.

现在,我们进行另一个绘图,其外观和感觉与以前相同,但按剂量而非补充进行分面。 这表明仅更改图形语法中的小语句即可轻松创建新图。

R.png("figures/facet_by_dose.png", width: 540, height: 560)
caption = <<-EOTLength of odontoblasts in 60 guinea pigs. Each animal received one of three dose levels of vitamin C.EOT
@bp = @tooth_growth.ggplot(E.aes(x: :supp, y: :len,                                  group: :supp)) +       R.geom_boxplot(E.aes(fill: :supp)) +       R.facet_grid(+:all =~ +:dose) +      R.scale_fill_manual(values: R.c("cyan", "deepskyblue4")) +      R.labs(title: "Tooth Growth:  Length by Dose",             subtitle: "Faceted by dose",             x: "Delivery method", y: "Teeth length",             caption: caption) +      CorpTheme.global_theme(faceted: true)
puts @bp
R.dev__off

结论 (Conclusion)

In this article, we introduce Galaaz and show how to tightly couple Ruby and R in a way that Ruby developers do not need to be aware of the executing R engine. For the Ruby developer the existence of R is of no consequence, she is just coding in Ruby. On the other hand, for the R developer, migration to Ruby is a matter of small syntactic changes with a very gentle learning curve. As the R developer becomes more proficient in Ruby, he can start using ‘classes’, ‘modules’, ‘procs’, ‘lambdas’.

在本文中,我们将介绍Galaaz并展示如何以Ruby开发人员不需要知道正在执行的R引擎的方式将Ruby和R紧密耦合。 对于Ruby开发人员而言,R的存在并不重要,她只是用Ruby进行编码。 另一方面,对于R开发人员而言,向Ruby的迁移只是一个很小的语法更改,而且学习曲线非常柔和。 随着R开发人员越来越精通Ruby,他可以开始使用“类”,“模块”,“ procs”,“ lambdas”。

Trying to bring to Ruby the power of R starting from scratch is an enormous endeavor and would probably never be accomplished. Today’s data scientists would certainly stick with either Python or R. Now, both the Ruby and R communities can benefit from this marriage, provided by Galaaz on top of GraalVM and Truffle’s polyglot environment.

试图从一开始就将R的功能带给Ruby是一项巨大的努力,并且可能永远无法实现。 当今的数据科学家肯定会坚持使用Python或R。现在,由Galaaz在GraalVM和Truffle的多语言环境之上提供的这种婚姻将使Ruby和R社区都受益。

We developed the coupling of Ruby and R, but the process we used can also be done to couple Ruby and JavaScript or Ruby and Python. In a polyglot world we believe that a uniglot library might be extremely relevant.

我们开发了Ruby和R的耦合,但是我们也可以完成将Ruby和JavaScript或Ruby和Python耦合的过程。 在一个多语言的世界中,我们认为,单语言库可能非常相关。

From the perspective of performance, GraalVM and Truffle promises improvements that could reach over 10 times, both for FastR and for TruffleRuby.

从性能的角度来看,GraalVM和松露承诺改进,可以达到10倍以上,既为FastRTruffleRuby

This article has shown how to improve a plot step-by-step. Starting from a very simple boxplot with all default configurations, we moved slowly to our final plot. The important point here is not if the final plot is actually beautiful (as beauty is in the eye of the beholder), but that there is a process of small steps improvements that can be followed to getting a final plot ready for presentation.

本文介绍了如何逐步改善绘图。 从具有所有默认配置的非常简单的箱线图开始,我们逐渐移至最终图线。 这里的重点不是最终图是否真的很漂亮(因为情人在旁观者的眼中),而是要经过一步一步的改进才能使最终图准备好呈现。

Finally, this whole article was written in rmarkdown and compiled to HTML by gknit, an application that wraps knitr and allows documenting Ruby code. This application can be of great help for any Rubyist trying to write articles, blogs or documentation for Ruby.

最后,整篇文章都是用rmarkdown编写的,并由gknit编译为HTML, gknit是包装knitr并允许记录Ruby代码的应用程序。 对于试图为Ruby撰写文章,博客或文档的任何Rubyist,该应用程序都可以提供很大的帮助。

安装Galaaz (Installing Galaaz)

先决条件 (Prerequisites)

The following R packages will be automatically installed when necessary, but could be installed prior to using gKnit if desired:

以下R软件包将在必要时自动安装,但如果需要,可以在使用gKnit之前安装:

  • ggplot2

    ggplot2
  • gridExtra

    gridExtra
  • knitr

    针织衫

Installation of R packages requires a development environment and can be time consuming. In Linux, the gnu compiler and tools should be enough. I am not sure what is needed on the Mac.

R软件包的安装需要开发环境,并且可能很耗时。 在Linux中,gnu编译器和工具应该足够了。 我不确定Mac上需要什么。

In order to run the ‘specs’ the following Ruby package is necessary:

为了运行'specs',需要以下Ruby包:

  • gem install rspec

    gem install rspec
制备 (Preparation)
  • gem install galaaz

    宝石安装加拉兹

用法 (Usage)

  • gknit <rmarkdonw_file.Rmd>

    gknit <rmarkdonw_file.Rmd>
  • In a scrip add: require ‘galaaz’

    在脚本中添加:要求'galaaz'

运行演示 (Running the demos)

After installation, many galaaz demos are available doing:

安装后,可以使用许多galaaz演示:

> galaaz -T

will show a list with all available demos. To run any of the demos in the list, substitute the call to ‘rake’ to ‘galaaz’. For instance, one of the examples in the list is ‘rake sthda:bar’. In order to run this example just do ‘galaaz sthda:bar’. Doing ‘galaaz sthda:all’ will run all demos in the sthda category, in this case a slide show with over 80 ggplot graphics written in Ruby.

将显示所有可用演示的列表。 要运行列表中的任何演示,请将“ rake”的调用替换为“ galaaz”。 例如,列表中的示例之一是“ rake sthda:bar”。 为了运行此示例,只需执行“ galaaz sthda:bar”。 进行“ galaaz sthda:all”将运行sthda类别中的所有演示,在本例中为幻灯片演示,其中包含用Ruby编写的80多个ggplot图形。

Some of the examples require ‘rspec’ to be available. To install ‘rspec’ just do ‘gem install rspec’.

一些示例要求“ rspec”可用。 要安装“ rspec”,只需执行“ gem install rspec”。

翻译自: https://www.freecodecamp.org/news/how-to-make-beautiful-ruby-plots-with-galaaz-320848058857/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值