R中的功能

Have you ever experienced this situation below?

您是否曾在以下经历过这种情况?

Image for post
Image by author
图片作者

I always seem to run into it. Maybe it’s just my luck…

我似乎总是碰到它。 也许这只是我的运气...

I love functions. I love building them, working with them, and creating them. There is nothing like someone saying a task will take a week but really, you know it can probably be done in 10 mins.

我喜欢功能。 我喜欢构建它们,与它们合作并创建它们。 没有什么比有人说一项任务需要一周的时间更重要的了,但实际上,您知道它可能在10分钟内完成。

Functions are great. Like little robots, they can save you from errors and repetitive work. If you haven’t checked out my article about using functions in Power BI, here it is. It will give you basic knowledge of working with functions and it will very likely save you tons of time if you work with flat files.

功能很棒。 像小型机器人一样,它们可以使您免于错误和重复性工作。 如果您还没有阅读我有关在Power BI中使用功能的文章,请点击这里。 它会为您提供有关使用函数的基本知识,并且如果使用平面文件,则很可能会节省大量时间。

A lot of what we use are already functions. We just don’t really think about it.

我们已经使用了很多功能。 我们只是没有真正考虑。

Take the mean function in R — it takes the function name and then the inputs. so mean() is a function. Much like in Power BI — AVERAGE([Column Name]) is a function.

在R中取均值函数-它取函数名称,然后取输入。 所以mean()是一个函数。 就像在Power BI中一样-AVERAGE([Column Name])是一个函数。

In this article, I hope I can show you how to do some work with functions in R.

在本文中,希望我能向您展示如何使用R中的函数进行一些工作。

Let’s dive in!

让我们潜入吧!

Image for post
Image by author
图片作者

I mostly work with a lot of flat files (It would be great if one day I can connect to the data warehouse, but that’s another story).

我主要处理大量平面文件(如果有一天我可以连接到数据仓库,那将是很棒的事情,但这是另一回事了)。

I want to perform a set of transformations to a set of flat files in R. How do we do that?

我想对R中的一组平面文件执行一组转换。我们该怎么做?

It’s not hard. It’s very much like how we created Power BI functions. The key is we need to perform a set of transformations on one file and set it up as a prototype to build our function.

不难这非常类似于我们创建Power BI函数的方式。 关键是我们需要对一个文件执行一组转换并将其设置为构建函数的原型。

First, let’s set our working directory using setwd and load our packages. We will be using tidyverse and rlist.

首先,让我们使用setwd设置工作目录并加载程序包。 我们将使用tidyverserlist

The working directory needs to set because R needs to know the folder it is working on. R needs to know where to export the files and where to read the files. We will assign our working directory to a variable call File_Directory

需要设置工作目录,因为R需要知道它正在使用的文件夹。 R需要知道在哪里导出文件以及在哪里读取文件。 我们将工作目录分配给变量File_Directory

library(tidyverse) #loading packages
library(rlist)File_Directory = setwd("Your folder containing your files")
#setting up the work directory

Now, let’s use the rlist’s package list.filter function with str_detect to filter for only the “.csv” flat files. We will assign it to the variable All_Files.

现在,让我们使用带有str_detect的rlist的package list.filter函数来仅过滤“ .csv”平面文件。 我们将其分配给变量All_Files

library(tidyverse) #loading packages
library(rlist)File_Directory = setwd("Your folder containing your files")
#setting up the work directory
All_Files = File_Directory %>%
list.files() %>%
list.filter(str_detect(., ".csv")) //just want the csv files and the "." here is a place holder for the output in the list.files.

list.files returns the names of all the files in the File_Directory variable. Then list.filter with str_detect filters for the csv files. Also, if you are not familiar with %>% , it is called the “pipe” and it pipes results from one line to another.

list.files返回File_Directory变量中所有文件的名称。 然后使用带有str_detect过滤器的list.filter用于csv文件。 另外,如果您不熟悉%>%,则将其称为“管道”,它将结果从一条线传送到另一条线。

The result of All_Files is a list of file names we want to R to clean.

All_Files的结果是我们要R清除的文件名列表。

Here is the key — we need to load in one of the files manually and apply our transformation steps.

这是关键-我们需要手动加载其中一个文件并应用转换步骤。

I want to group by Year and Sex and summarize by average salary. These are the transformation steps I want to apply to all the files.

我想按年份和性别分组,并按平均工资进行汇总。 这些是我要应用于所有文件的转换步骤。

Dataset_2007 %>% # I manually loaded this file from Import Dataset on R Studio
group_by(Year,Sex) %>%
summarize(Avg_Sal = mean(Salary, na.rm = T))

Let’s check if it works. You have to select the script and press cntrl+R which is to run the script.

让我们检查一下是否可行。 您必须选择脚本并按cntrl + R来运行脚本。

Image for post
image by author
图片作者

It works.

有用。

Since the prototype worked, let’s turn it into a function.

既然原型起作用了,我们就把它变成一个函数。

My_Function = function(X) {
read_csv(X) %>%
group_by(Year, Sex) %>%
summarize(Avg_Sal = mean(Salary))
}

Here we added in read_csv because we want our function to read the files we have in the directory and apply the transformations.

在这里,我们添加了read_csv,因为我们希望函数读取目录中的文件并应用转换。

It’s more efficient to do it this way than to load each file into R Studio. We can call this function My_Function. I’m not very imaginative with function names.

与将每个文件加载到R Studio中相比,以这种方式执行效率更高。 我们可以将此函数称为My_Function 。 我对函数名不是很富于想象力。

Here this function takes the “X” which is our input, uses read_csv to read the file, then passes the table into the next line which is group_by, and then summarize by average salary.

在这里,此函数以“ X”作为输入,使用read_csv读取文件,然后将表传递到下一行group_by,然后按平均薪水汇总

Here it reads it takes X, which is the first file in our All Files and performs the set transformations, and so on.

它在这里读取X,这是我们“所有文件”中的第一个文件,并执行set转换,依此类推。

Now let’s use this function and apply to all our files inside our All_Files variable. Remember the variable All_Files holds the names of our files in a list.

现在,让我们使用此函数并将其应用于All_Files变量中的所有文件。 请记住,变量All_Files将我们文件的名称保存在列表中。

In R, there is a neat little thing call map, basically, map takes your function and map it to all the inputs you have, also you can choose to either return all the results in a data frame, a list, or other things. Working with map is beyond the scope of this article, but let’s use map_df here. map_df stands for map dataframe.

在R中,有一个整洁的小东西调用map ,基本上, map接受您的函数并将其映射到您拥有的所有输入,也可以选择返回数据框,列表或其他所有结果。 使用map超出了本文的范围,但是我们在这里使用map_dfmap_df代表地图数据

map_df(All_Files, My_Function)

Here are the results in a data frame.

以下是数据框中的结果。

Image for post
image by author
图片作者

Now you can choose to export the results to your colleague in csv using the write.csv call in R.

现在,您可以选择使用R中的write.csv调用将结果导出到csv中的同事。

map_df(All_Files, My_Function) %>%
write.csv("My_Results.csv", row.names = F)

Here is all of it together.

这是全部。

library(tidyverse) #loading packages
library(rlist)File_Directory = setwd("Your folder containing your files")
#setting up the work directoryAll_Files = File_Directory %>%
list.files() %>%
list.filter(str_detect(., ".csv")) #filtering for csvDataset_2007 %>% # function prototype
group_by(Year,Sex) %>%
summarize(Avg_Sal = mean(Salary, na.rm = T))My_Function = function(X) {
read_csv(X) %>%
group_by(Year, Sex) %>%
summarize(Avg_Sal = mean(Salary))
} #creating the functionmap_df(All_Files, My_Function) %>%
write.csv("My_Results.csv", row.names = F) #mapping it exporting to a csv file call My_Results

This is just a very simple example of using functions in R.

这只是在R中使用函数的非常简单的示例。

To be honest, things in life never really work out this way. There are always hiccups somewhere. You may try this out today and realize your datasets are all in different columns and you have to clean it first. Perhaps a package won’t load for you. Maybe it’s problems with the computer.

老实说,生活中的事情从来没有像现在这样成功。 总有打always的地方。 您今天可以尝试一下,发现您的数据集都在不同的列中,因此您必须先对其进行清理。 也许一个包裹不会为您加载。 也许是计算机问题。

There will always be bugs somewhere to sort out.

总是有错误可以解决。

I hope you stay encouraged, stay safe, and keep moving forward in your journey!

希望您受到鼓励,保持安全并在旅途中继续前进!

翻译自: https://towardsdatascience.com/functions-in-r-3648eec4bcf9

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值