Plot with ggplot2, interact, collaborate, and share online

Editor’s note: This is a guest post by Marianne Corvellec from Plotly. This post is based on an interactive Notebook (click to view) she presented at the R User Conference on July 1st, 2014.

Plotly is a platform for making, editing, and sharing graphs. If you are used to making plots with ggplot2, you can call ggplotly() to make your plots interactive, web-based, and collaborative. For example, see plot.ly/~ggplot2examples/211, shown below and in this Notebook. Notice the hover text!

img1

0. Get started

Visit http://plot.ly. Here, you’ll find a GUI that lets you create graphs from data you enter manually, or upload as a spreadsheet (or CSV file). From there you can edit graphs! Change between types (from bar charts to scatter charts), change colors and formatting, add fits and annotations, try other themes…

img2

Our R API lets you use Plotly with R. Once you have your R visualization in Plotly, you can use the web interface to edit it, or to extract its data. Install and load package “plotly” in your favourite R environment. For a quick start, follow: https://plot.ly/ggplot2/getting-started/

Go social! Like, share, comment, fork and edit plots… Export them, embed them in your website. Collaboration has never been so sweet!

img3

Not ready to publish? Set detailed permissions for who can view and who can edit your project.

img4

1. Make a (static) plot with ggplot2

Baseball data is the best! Let’s plot a histogram of batting averages. I downloaded data here.

Load the CSV file of interest, take a look at the data, subset at will:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
library (RCurl)
 
online_data <-
 
batting_table <-
  read.csv ( textConnection (online_data))
 
head (batting_table)
 
summary (batting_table)
 
batting_table <-
  subset (batting_table, yearID >= 2004)

The batting average is defined by the number of hits divided by at bats:

1
2
batting_table$Avg <-
  with (batting_table, H / AB)

You may want to explore the distribution of your new variable as follows:

1
2
3
4
5
6
7
8
9
10
11
12
library (ggplot2)
ggplot (data=batting_table)
  + geom_histogram ( aes (Avg), binwidth=0.05)
 
# Let's filter out entries where players were at bat less than 10 times.
 
batting_table <-
  subset (batting_table, AB >= 10)
hist <-
  ggplot (data=batting_table) + geom_histogram ( aes (Avg),
  binwidth=0.05)
hist

We have created a basic histogram; let us share it, so we can get input from others!

2. Save your R plot to plot.ly

1
2
3
4
5
6
7
8
9
10
11
# Install the latest version
# of the “plotly” package and load it
 
library (devtools)
install_github ( "ropensci/plotly" )
library (plotly)
 
# Open a Plotly connection
 
py <-
  plotly ( "ggplot2examples" , "3gazttckd7" )

Use your own credentials if you prefer. You can sign up for a Plotly account online.

Now call the `ggplotly()` method:

1
2
collab_hist <-
  py$ ggplotly (hist)

And boom!

img5

You get a nice interactive version of your plot! Go ahead and hover…

Your plot lives at this URL (`collab_hist$response$url`) alongside the data. How great is that?!

If you wanted to keep your project private, you would use your own credentials and specify:

1
2
3
4
5
py <- plotly ()
 
py$ ggplotly (hist,
  kwargs= list (filename= "private_project" ,
  world_readable= FALSE ))

3. Edit your plot online

 

Now let us click “Fork and edit”. You (and whoever you’ve added as a collaborator) can make edits in the GUI. For instance, you can run a Gaussian fit on this distribution:

img6

You can give a title, edit the legend, add notes, etc.

img7

You can add annotations in a very flexible way, controlling what the arrow and text look like:

img8

When you’re happy with the changes, click “Share” to get your plot’s URL.

If you append a supported extension to the URL, Plotly will translate your plot into that format. Use this to export static images, embed your graph as an iframe, or translate the code between languages. Supported file types include:

Isn’t life wonderful?

4. Retrieve your plot.ly plot in R

The JSON file specifies your plot completely (it contains all the data and layout info). You can view it as your plot’s DNA. The R file (https://plot.ly/~mkcor/305.r) is a conversion of this JSON into a nested list in R. So we can interact with it by programming in R!

Access a plot which lives on plot.ly with the well-named method `get_figure()`:

1
2
enhanc_hist <-
  py$ get_figure ( "mkcor" , 305)

Take a look:

1
2
3
4
str (enhanc_hist)
 
# Data for second trace
enhanc_hist$data[[2]]

The second trace is a vertical line at 0.300 named “Good”. Say we get more ambitious and we want to show a vertical line at 0.350 named “Very Good”. We overwrite old values with our new values:

1
2
3
enhanc_hist$data[[2]]$name <- "VeryGood"
enhanc_hist$data[[2]]$x[[1]] <- 0.35
enhanc_hist$data[[2]]$x[[2]] <- 0.35

Send this new plot back to plot.ly!

1
2
3
4
5
enhanc_hist2 <-
  py$ plotly (enhanc_hist$data,
  kwargs= list (layout=enhanc_hist$layout))
 
enhanc_hist2$url

Visit the above URL (`enhanc_hist2$url`).

How do you like this workflow? Let us know!

Tutorials are at plot.ly/learn. You can see more examples and documentatation at plot.ly/ggplot2 and plot.ly/r. Our gallery has the following examples:

img9

Acknowledgments

This presentation benefited tremendously from comments by Matt Sundquist and Xavier Saint-Mleux.

Plotly’s R API is part of rOpenSci. It is under active development; you can find it on GitHub. Your thoughts, issues, and pull requests are always welcome!

好的,我了解了。我们可以选择UCI机器学习库中的Iris数据集,该数据集包含了三种不同种类的鸢尾花(山鸢尾,变色鸢尾和维吉尼亚鸢尾)及其各自的花萼长度,花萼宽度,花瓣长度和花瓣宽度的测量值。使用以下代码来加载并描述该数据集: ```r # 加载packages library(datasets) library(ggplot2) # 下载UCI机器学习库中的iris数据集 iris_data <- read.table("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data", sep = ",", header = FALSE) # 添加列名 colnames(iris_data) <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species") # 查看数据集结构 str(iris_data) ``` 运行上述代码后,你会得到一个包含5个变量的数据集,其中4个变量是数值型的,代表花萼长度,花萼宽度,花瓣长度和花瓣宽度,另外一个变量是因变量,代表鸢尾花的种类。 接下来,我们使用ggplot2包来创建Sepal.Length变量的密度图。使用以下代码: ```r # 创建密度图 ggplot(iris_data, aes(Sepal.Length, fill = Species)) + geom_density(alpha = 0.5) ``` 上述代码将Sepal.Length变量作为x轴,使用geom_density()函数创建密度图。这里我们使用fill属性将不同种类的鸢尾花用不同的颜色填充,使用alpha属性设置透明度为0.5。运行代码后,你会看到一个密度图,它展示了不同种类鸢尾花的萼片长度分布情况。 分析密度图: 从密度图中,我们可以看到不同种类的鸢尾花萼片长度的分布情况。山鸢尾和变色鸢尾的萼片长度分布比较类似,都呈现单峰分布,而维吉尼亚鸢尾的萼片长度分布相对更宽,呈现双峰分布。此外,山鸢尾和变色鸢尾的萼片长度分布比较集中,而维吉尼亚鸢尾的萼片长度分布相对更分散。这些信息可以帮助我们更好地了解不同种类的鸢尾花的特征,并对它们进行分类。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值