r语言是高级编程语言_为什么R编程将成为您的首选语言

r语言是高级编程语言

I shied away from R for quite some time. My background is in C++, Java, and later C# with the major flavors of database engines thrown into the mix. I dabbled with R, but could never understand why it was so appealing. It simply wasn't intuitive in a classic programming sense. At least, this is the way I thought back then.

我避开了R很长时间了。 我的背景是C ++,Java和更高版本的C#,其中融合了多种主要的数据库引擎。 我迷上了R,但无法理解为什么它如此吸引人。 就经典的编程意义而言,它根本不是直观的。 至少,这就是我当时的想法。

As I shifted away from pure programming to the field of data science, however, I realized it was time to commit to learning R. I took some free classes and started hunkering down. I have finally broken through the barrier and can call myself a qualified R programmer.

但是,当我从纯粹的编程转向数据科学领域时,我意识到是该致力于学习R的时候了。我参加了一些免费的课程并开始大胆地学习。 我终于突破了障碍,可以称自己是合格的R程序员。

I find myself doing everything in R, from command-line calculations to scraping the web for data. Recently, I took on a project that analyzes baseball data (a package that can be installed directly from R). It's been fun going back in time, exploring the history of players and games. Giving yourself actual projects to work on is one of the best ways to learn any language, and R is no exception.

我发现我自己在R中做所有事情,从命令行计算到在Web上抓取数据。 最近,我进行了一个项目,该项目分析棒球数据(可以直接从R安装的软件包)。 时光倒流,探索玩家和游戏的历史,这很有趣。 让实际的项目继续进行是学习任何语言的最好方法之一,R也不例外。

轮到你了! (It's Your Turn!)

Why do I believe you will become hooked on the R language? It has two features that I find quite appealing and cannot imagine living without them. The first is element-wise processing and the second is enhanced subsetting. I'll get to why these features are great below.

为什么我相信您会迷上R语言? 它有两个特征,我觉得很吸引人,无法想象没有它们就可以生活。 第一个是逐元素处理,第二个是增强子集。 下面我将解释为什么这些功能很棒。

Admittedly, R is going to require a mindset shift. If you are not a fan of interpreted languages, R could rub you the wrong way as this is how it operates. R doesn't have a particularly great user interface. There's some help from GUI tools such as RStudio. However, don't expect to develop frontend applications using R, at least, not in the traditional way of development.

诚然,R将需要转变观念。 如果您不喜欢解释性语言,R可能会以错误的方式摩擦您,因为这是它的运行方式。 R没有特别出色的用户界面。 GUI工具(例如RStudio)有一些帮助。 但是,不要指望至少不是以传统的开发方式使用R开发前端应用程序。

R的两个主要功能 (Two Great Features of R)

I will describe the two features of R that I feel have the biggest potential to convert you, starting with element-wise processing. You are given an assignment to read a CSV file and summarize the data contained in the file. Your boss wants to know the average salaries across all countries, and the CSV file contains average salaries by country. How would you process this? 

我将描述R的两个功能,我认为它们有最大的潜力来进行转换,从逐元素处理开始。 您将获得分配以读取CSV文件并汇总文件中包含的数据的任务。 您的老板想知道所有国家/地区的平均薪水,而CSV文件包含各个国家/地区的平均薪水。 您将如何处理?

In a language such as VBA, for example, you would read in the file using some built-in file construct. Then, you would need to parse the file (usually line-by-line) and further parse each line into a structure of some form. Then, you either need to perform the required summarization (average) tasks on the data or store the structure for use later in the program. Let's suppose you are going to process it later. This helps you modularize the program, making it easier to read and maintain.

例如,以VBA之类的语言,您将使用一些内置的文件构造读取文件。 然后,您将需要解析文件(通常逐行),并进一步将每一行解析为某种形式的结构。 然后,您需要对数据执行所需的汇总(平均)任务,或者存储结构以供以后在程序中使用。 假设您稍后要处理它。 这可以帮助您模块化程序,使其更易于阅读和维护。

You have created that structure (an array, perhaps?) and it's time to process the array for summarization. You would need to loop through the array and obtain the data you need to summarize. For this example, you need to find the salary column and add to a total variable. Once you summed all of the values, you would divide by the number of items in the array.

您已经创建了该结构(也许是数组?),现在该对数组进行汇总了。 您将需要遍历数组并获取需要汇总的数据。 对于此示例,您需要找到薪水列并将其添加到总计变量中。 对所有值求和后,即可除以数组中的项目数。

I understand the above task wouldn't be difficult to do in most traditional languages. It's clear-cut what needs to be done. However, it still takes several lines of coding. I am going to show how this is done in R. 

我了解使用大多数传统语言完成上述任务并非难事。 明确需要做什么。 但是,它仍然需要几行编码。 我将展示如何在R中完成此操作。

For this discussion, let's say I have a file named "population.csv" that contains the salary data of citizens by country. We want to take the average of those salaries. This example is good enough to illustrate the point. Here are the lines of code to process this in R:

对于此讨论,假设我有一个名为“ population.csv”的文件,其中包含按国家/地区划分的公民工资数据。 我们想取这些工资的平均值。 这个例子足以说明这一点。 以下是在R中处理此代码的代码行:

salaries <- read.csv("population.csv")

薪金<-read.csv(“ population.csv”)

averageSalaries <- mean(salaries$salary)

averageSalaries <-平均值(salaries $ salary)

That is all it took to do satisfy the requirement. Of course, this is an oversimplification, and I would need to specify where the file is located just like any other programming language. In this case, it exists in my working directory. But, I could have just as easily added the path. That path could be a URL to a website that contains the CSV file.

这就是满足要求所要做的一切。 当然,这过于简单了,我需要像其他任何编程语言一样指定文件的位置。 在这种情况下,它存在于我的工作目录中。 但是,我可以轻松地添加路径。 该路径可以是包含CSV文件的网站的URL。

You'll notice there is no looping in this code. This is the power of element-wise processing. It simply took the mean of all of the elements in the salaries structure (in this case, a data frame.) This is yet another benefit of R, by the way. The read.csv() function took care of parsing and creating a data frame (similar to a database object in memory, for lack of a better term.)

您会注意到此代码中没有循环。 这就是逐元素处理的能力。 只是,它简单地采用了薪金结构中所有元素的平均值(在本例中为数据框)。这是R的另一个好处。 read.csv()函数负责解析和创建数据帧(类似于内存中的数据库对象,缺少更好的术语。)

下一步-子集... (Next Up - Subsetting...)

If element-wise processing is not enough to convince you, then take a look at another wonderful feature of the language: subsetting. For this example, I'll use the baseball package that you can install as part of your R instance. It's called Lahman, so named for the creator and maintainer of the package, Sean Lahman.

如果逐个元素的处理不足以说服您,那么请看一下该语言的另一个出色功能:子集。 在此示例中,我将使用可作为R实例的一部分安装的棒球包。 它被称为Lahman,因此以软件包的创建者和维护者Sean Lahman的名字命名。

The package contains several tables. For our purposes, we'll only deal with a few. I could download the packages from seanlahman.com, but when you install the package and include it (using the library or require commands), it is available in your workspace without reading any files.

该软件包包含几个表。 就我们的目的而言,我们只会处理一些。 我可以从seanlahman.com下载这些软件包但是当您安装该软件包并将其包含(使用库或require命令)时,它可以在工作空间中使用,而无需读取任何文件。

Suppose you wanted to know Babe Ruth's batting average the last year he played for Boston before switching to the Yankees (the year was 1919). This information is contained in the Batting data frame (I'll call it table going forward). You will need to know the key of Babe Ruth for this table. This can be found by looking up Babe Ruth in the Master table. You can easily accomplish this using the subsetting feature as follows:

假设您想知道贝比·露丝(Babe Ruth)在转投洋基(去年是1919年)之前为波士顿效力的最后一年的平均命中率。 此信息包含在“击球”数据框中(以后将其称为表)。 您将需要知道此表的Babe Ruth的钥匙。 可以通过在主表中查找Babe Ruth来找到。 您可以使用子设置功能轻松完成此操作,如下所示:

babe_ruth <- Master[Master$nameLast == "Ruth" & Master$nameFirst == "Babe", ]

babe_ruth <-Master [Master $ nameLast ==“ Ruth”&Master $ nameFirst ==“ Babe”,]

Again, that's all there is to it. In other languages, you would need to loop through the Master table, searching for the two elements that contained the last name and the first name. 

再说一遍,仅此而已。 在其他语言中,您将需要遍历Master表,搜索包含姓氏和名字的两个元素。

The object babe_ruth contains the playerID which is what we need to find his batting average. However, we have a minor problem. The Batting table does not contain the batting average. It does contain the components that make up the calculation, namely hits (H) and at-bats (AB). You can define a new column in Batting that has this calculation (and apply it using element-wise processing - WooHoo!) Let's call the calculated field BA for batting average.

对象babe_ruth包含玩家ID,这是我们找到他的击球平均值所需要的。 但是,我们有一个小问题。 击球表不包含击球平均值。 它确实包含构成计算的组件,即命中(H)和击球(AB)。 您可以在击球中定义一个具有此计算的新列(并使用逐个元素的处理方式应用它-WooHoo!)让我们将计算出的字段BA称为击球平均值。

Batting$BA <- Batting$H / Batting$AB

击球$ BA <-击球$ H /击球$ AB

We now have the batting average for all players in one fell swoop. Let's find what that is for the Babe:

现在,我们一举获得所有玩家的打击平均值。 让我们找到适合婴儿的东西:

Batting[Batting$playerID == babe_ruth$playerID & Batting$yearID == 1919, "BA"]

击球[Batting $ playerID == babe_ruth $ playerID&Batting $ yearID == 1919,“ BA”]

This will give you the answer of .3217594 or a 322 batting average.

这将为您提供0.321594或322打击平均值的答案。

As you can see, R can accomplish much in just three lines of code along with the inclusion of a library or package. If you want to count the library command that makes the Lahman database available, then it will be four lines of code. I'm okay with that!

如您所见,R只需三行代码就可以完成很多工作,并且包含一个库或包。 如果您要计算使Lahman数据库可用的库命令,那么它将是四行代码。 我没关系!

结论 (Conclusion)

I don't believe R is going to replace other languages. That is not what it was designed to do. It was designed by statisticians for statistics. It gives answers to problems quickly. Think of it as the Swat Team of computer programming languages. It does what it needs to do and then gets out!

我不相信R会取代其他语言。 那不是设计的目的。 它是由统计学家为统计而设计的。 它可以快速解决问题。 可以将其视为计算机编程语言的特警队。 它会做它需要做的事,然后下车!

This article doesn't even touch upon all that R can do. It gives you a basic idea of two powerful concepts. R is well-supported and growing in popularity. It does take a bit of a paradigm shift when learning the language. A few months back (at the time of this writing), I created a website called DataScienceReview.com. It's in its infancy, so there isn't much to it right now. But, you can bet it will contain plenty of material on how to work with R, including tutorials, samples, fun projects, etc. It will also have updates on the data science industry.

本文甚至没有涉及R可以做的所有事情。 它为您提供了两个强大概念的基本概念。 R得到了良好的支持,并且越来越受欢迎。 学习语言时确实需要进行一些范式转换。 几个月前(在撰写本文时),我创建了一个名为DataScienceReview.com的网站。 它还处于起步阶段,所以目前没有太多内容。 但是,您可以肯定,它将包含大量有关如何使用R的资料,包括教程,示例,有趣的项目等。它还将提供有关数据科学行业的最新信息。

翻译自: https://www.experts-exchange.com/articles/31125/Why-R-Programming-Will-Become-Your-Go-To-Language.html

r语言是高级编程语言

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值