如何在R中正确使用列表？

最新推荐文章于 2022-03-13 15:19:31 发布

asdfgh0077

最新推荐文章于 2022-03-13 15:19:31 发布

阅读量460

点赞数

文章标签： r list data-structures language-features abstract-data-type

原文链接：https://oldbug.net/q/8bVG/How-to-Correctly-Use-Lists-in-R

版权

本文翻译自：How to Correctly Use Lists in R?

Brief background: Many (most?) contemporary programming languages in widespread use have at least a handful of ADTs [abstract data types] in common, in particular, 简要背景：广泛使用的许多（大多数？）当代编程语言至少有一些共同的ADT [抽象数据类型]，特别是

string (a sequence comprised of characters) string （由字符组成的序列）
list (an ordered collection of values), and list （有序的值集合），和
map-based type (an unordered array that maps keys to values) 基于地图的类型 （将键映射到值的无序数组）

In the R programming language, the first two are implemented as character and vector , respectively. 在R编程语言中，前两个分别实现为character和vector 。

When I began learning R, two things were obvious almost from the start: list is the most important data type in R (because it is the parent class for the R data.frame ), and second, I just couldn't understand how they worked, at least not well enough to use them correctly in my code. 当我开始学习R时，几乎从一开始就有两件事情是显而易见的： list是R中最重要的数据类型（因为它是R data.frame的父类），其次，我只是无法理解它们是怎样的工作，至少不够好，在我的代码中正确使用它们。

For one thing, it seemed to me that R's list data type was a straightforward implementation of the map ADT ( dictionary in Python, NSMutableDictionary in Objective C, hash in Perl and Ruby, object literal in Javascript, and so forth). 首先，在我看来，R的list数据类型是地图ADT的简单实现（Python中的dictionary ，Objective C中的NSMutableDictionary ，Perl和Ruby中的hash ，Javascript中的object literal ，等等）。

For instance, you create them just like you would a Python dictionary, by passing key-value pairs to a constructor (which in Python is dict not list ): 例如，您可以像创建Python字典一样创建它们，方法是将键值对传递给构造函数（在Python中是dict not list ）：

x = list("ev1"=10, "ev2"=15, "rv"="Group 1")

And you access the items of an R List just like you would those of a Python dictionary, eg, x['ev1'] . 并且您可以像访问Python字典那样访问R List的项目，例如x['ev1'] 。 Likewise, you can retrieve just the 'keys' or just the 'values' by: 同样，您可以通过以下方式仅检索“键”或仅检索“值” ：

names(x)    # fetch just the 'keys' of an R list
# [1] "ev1" "ev2" "rv"

unlist(x)   # fetch just the 'values' of an R list
#   ev1       ev2        rv 
#  "10"      "15" "Group 1" 

x = list("a"=6, "b"=9, "c"=3)  

sum(unlist(x))
# [1] 18

but R list s are also unlike other map-type ADTs (from among the languages I've learned anyway). 但是R list s也不同于其他地图类型的ADT（从我学到的语言中来看）。 My guess is that this is a consequence of the initial spec for S, ie, an intention to design a data/statistics DSL [domain-specific language] from the ground-up. 我的猜测是，这是S的初始规范的结果，即打算从头开始设计数据/统计DSL [特定于域的语言]。

three significant differences between R list s and mapping types in other languages in widespread use (eg,. Python, Perl, JavaScript): R list和广泛使用的其他语言中的映射类型之间存在三个显着差异（例如，Python，Perl，JavaScript）：

first , list s in R are an ordered collection, just like vectors, even though the values are keyed (ie, the keys can be any hashable value not just sequential integers). 首先，R中的list s是有序集合，就像向量一样，即使值是键控的（即，键可以是任何可散列值而不仅仅是顺序整数）。 Nearly always, the mapping data type in other languages is unordered . 几乎总是，其他语言中的映射数据类型是无序的 。

second , list s can be returned from functions even though you never passed in a list when you called the function, and even though the function that returned the list doesn't contain an (explicit) list constructor (Of course, you can deal with this in practice by wrapping the returned result in a call to unlist ): 第二，即使您在调用函数时从未传入过list ，也可以从函数返回list s，即使返回list函数不包含（显式） list构造函数（当然，您可以处理这在实践中通过将返回的结果包装在对unlist的调用中）：

x = strsplit(LETTERS[1:10], "")     # passing in an object of type 'character'

class(x)                            # returns 'list', not a vector of length 2
# [1] list

A third peculiar feature of R's list s: it doesn't seem that they can be members of another ADT, and if you try to do that then the primary container is coerced to a list . R list的第三个特点：似乎它们不能成为另一个ADT的成员，如果你试图这样做，那么主容器就会被强制转换为list 。 Eg, 例如，

x = c(0.5, 0.8, 0.23, list(0.5, 0.2, 0.9), recursive=TRUE)

class(x)
# [1] list

my intention here is not to criticize the language or how it is documented; 我的意图不是批评语言或如何记录; likewise, I'm not suggesting there is anything wrong with the list data structure or how it behaves. 同样，我并不是说list数据结构或它的行为有什么问题。 All I'm after is to correct is my understanding of how they work so I can correctly use them in my code. 我所要做的就是纠正我对它们如何工作的理解，这样我才能在我的代码中正确使用它们。

Here are the sorts of things I'd like to better understand: 以下是我想要更好理解的各种事情：

What are the rules which determine when a function call will return a list (eg, strsplit expression recited above)? 确定函数调用何时返回list的规则是什么（例如， strsplit列举的strsplit表达式）？
If I don't explicitly assign names to a list (eg, list(10,20,30,40) ) are the default names just sequential integers beginning with 1? 如果我没有明确地为list指定名称（例如， list(10,20,30,40) ），那么默认名称只是从1开始的连续整数？ (I assume, but I am far from certain that the answer is yes, otherwise we wouldn't be able to coerce this type of list to a vector w/ a call to unlist .) （我假设，但我很难确定答案是肯定的，否则我们无法将这种类型的list强制转换为无法调用unlist 。）
Why do these two different operators, [] , and [[]] , return the same result? 为什么这两个不同的运算符[]和[[]]返回相同的结果？
x = list(1, 2, 3, 4)

both expressions return "1": 两个表达式都返回“1”：
x[1]

x[[1]]
why do these two expressions not return the same result? 为什么这两个表达式不会返回相同的结果？
x = list(1, 2, 3, 4)

x2 = list(1:4)

Please don't point me to the R Documentation ( ?list , R-intro )--I have read it carefully and it does not help me answer the type of questions I recited just above. 请不要指向R文档（ ?list ， R-intro ） - 我仔细阅读了它并没有帮助我回答上面列举的问题类型。

(lastly, I recently learned of and began using an R Package (available on CRAN) called hash which implements conventional map-type behavior via an S4 class; I can certainly recommend this Package.) （最后，我最近了解并开始使用名为hash的R Package（可在CRAN上使用），它通过S4类实现传统的 map类型行为;我当然可以推荐这个包。）

#1楼

参考：https://stackoom.com/question/8bVG/如何在R中正确使用列表

#2楼

Regarding your questions, let me address them in order and give some examples: 关于你的问题，让我按顺序解决它们并给出一些例子：

1 ) A list is returned if and when the return statement adds one. 1 ）如果return语句添加一个列表，则返回一个列表。 Consider 考虑

 R> retList <- function() return(list(1,2,3,4)); class(retList())
 [1] "list"
 R> notList <- function() return(c(1,2,3,4)); class(notList())
 [1] "numeric"
 R>

2 ) Names are simply not set: 2 ）名称根本没有设置：

R> retList <- function() return(list(1,2,3,4)); names(retList())
NULL
R>

3 ) They do not return the same thing. 3 ）他们不会返回相同的东西。 Your example gives 你的例子给出了

R> x <- list(1,2,3,4)
R> x[1]
[[1]]
[1] 1
R> x[[1]]
[1] 1

where x[1] returns the first element of x -- which is the same as x . 其中x[1]返回的第一个元素x -这是一样的x 。 Every scalar is a vector of length one. 每个标量都是长度为1的向量。 On the other hand x[[1]] returns the first element of the list. 另一方面， x[[1]]返回列表的第一个元素。

4 ) Lastly, the two are different between they create, respectively, a list containing four scalars and a list with a single element (that happens to be a vector of four elements). 4 ）最后，两者在它们分别创建包含四个标量的列表和具有单个元素的列表（恰好是四个元素的向量）之间是不同的。

#3楼

Just to take a subset of your questions: 只是为了提出一些问题：

This article on indexing addresses the question of the difference between [] and [[]] . 这篇关于索引的文章解决了[]和[[]]之间的区别问题。

In short [[]] selects a single item from a list and [] returns a list of the selected items. 简而言之[[]]从列表中选择单个项目， []返回所选项目的列表。 In your example, x = list(1, 2, 3, 4)' item 1 is a single integer but x[[1]] returns a single 1 and x[1] returns a list with only one value. 在您的示例中， x = list(1, 2, 3, 4)' item 1是单个整数，但x[[1]]返回单个1， x[1]返回仅包含一个值的列表。

> x = list(1, 2, 3, 4)
> x[1]
[[1]]
[1] 1

> x[[1]]
[1] 1

#4楼

Just to address the last part of your question, since that really points out the difference between a list and vector in R: 只是为了解决你问题的最后一部分，因为这确实指出了R中list和vector之间的区别：

Why do these two expressions not return the same result? 为什么这两个表达式不会返回相同的结果？

x = list(1, 2, 3, 4); x = list（1,2,3,4）; x2 = list(1:4) x2 =清单（1：4）

A list can contain any other class as each element. 列表可以包含任何其他类作为每个元素。 So you can have a list where the first element is a character vector, the second is a data frame, etc. In this case, you have created two different lists. 因此，您可以拥有一个列表，其中第一个元素是字符向量，第二个元素是数据框等。在这种情况下，您创建了两个不同的列表。 x has four vectors, each of length 1. x2 has 1 vector of length 4: x有四个向量，每个长度为1. x2有1个长度为4的向量：

> length(x[[1]])
[1] 1
> length(x2[[1]])
[1] 4

So these are completely different lists. 所以这些是完全不同的列表。

R lists are very much like a hash map data structure in that each index value can be associated with any object. R列表非常类似于散列映射数据结构，因为每个索引值可以与任何对象相关联。 Here's a simple example of a list that contains 3 different classes (including a function): 这是一个包含3个不同类（包括函数）的列表的简单示例：

> complicated.list <- list("a"=1:4, "b"=1:3, "c"=matrix(1:4, nrow=2), "d"=search)
> lapply(complicated.list, class)
$a
[1] "integer"
$b
[1] "integer"
$c
[1] "matrix"
$d
[1] "function"

Given that the last element is the search function, I can call it like so: 鉴于最后一个元素是搜索函数，我可以像这样调用它：

> complicated.list[["d"]]()
[1] ".GlobalEnv" ...

As a final comment on this: it should be noted that a data.frame is really a list (from the data.frame documentation): 作为对此的最后评论：应该注意data.frame是一个列表（来自data.frame文档）：

A data frame is a list of variables of the same number of rows with unique row names, given class '"data.frame"' 数据框是具有唯一行名称的相同行数的变量列表，给定类“data.frame”'

That's why columns in a data.frame can have different data types, while columns in a matrix cannot. 这就是为什么data.frame的列可以具有不同的数据类型，而矩阵中的列不能。 As an example, here I try to create a matrix with numbers and characters: 作为一个例子，我在这里尝试创建一个包含数字和字符的矩阵：

> a <- 1:4
> class(a)
[1] "integer"
> b <- c("a","b","c","d")
> d <- cbind(a, b)
> d
 a   b  
[1,] "1" "a"
[2,] "2" "b"
[3,] "3" "c"
[4,] "4" "d"
> class(d[,1])
[1] "character"

Note how I cannot change the data type in the first column to numeric because the second column has characters: 请注意我不能将第一列中的数据类型更改为数字，因为第二列包含字符：

> d[,1] <- as.numeric(d[,1])
> class(d[,1])
[1] "character"

#5楼

One reason lists work as they do (ordered) is to address the need for an ordered container that can contain any type at any node, which vectors do not do. 列出工作的一个原因（有序）是为了解决对任何节点可以包含任何类型的有序容器的需求，这些向量不能执行。 Lists are re-used for a variety of purposes in R, including forming the base of a data.frame , which is a list of vectors of arbitrary type (but the same length). 列表在R中用于各种目的，包括形成data.frame的基础， data.frame是任意类型（但长度相同）的向量列表。

Why do these two expressions not return the same result? 为什么这两个表达式不会返回相同的结果？

x = list(1, 2, 3, 4); x2 = list(1:4)

To add to @Shane's answer, if you wanted to get the same result, try: 要添加@ Shane的答案，如果您想获得相同的结果，请尝试：

x3 = as.list(1:4)

Which coerces the vector 1:4 into a list. 将矢量1:4强制转换为列表。

#6楼

You say: 你说：

For another, lists can be returned from functions even though you never passed in a List when you called the function, and even though the function doesn't contain a List constructor, eg, 另一方面，即使您在调用函数时从未传入List，也可以从函数返回列表，即使函数不包含List构造函数，例如，

x = strsplit(LETTERS[1:10], "") # passing in an object of type 'character'
class(x)
# => 'list'

And I guess you suggest that this is a problem(?). 我猜你认为这是一个问题（？）。 I'm here to tell you why it's not a problem :-). 我在这里告诉你为什么这不是问题:-)。 Your example is a bit simple, in that when you do the string-split, you have a list with elements that are 1 element long, so you know that x[[1]] is the same as unlist(x)[1] . 您的示例有点简单，因为当您执行字符串拆分时，您有一个列表，其中元素长度为1个元素，因此您知道x[[1]]与unlist(x)[1] 。 But what if the result of strsplit returned results of different length in each bin. 但是如果strsplit的结果返回了每个bin中不同长度的结果strsplit ？ Simply returning a vector (vs. a list) won't do at all. 简单地返回一个向量（对比一个列表）根本不会做。

For instance: 例如：

stuff <- c("You, me, and dupree",  "You me, and dupree",
           "He ran away, but not very far, and not very fast")
x <- strsplit(stuff, ",")
xx <- unlist(strsplit(stuff, ","))

In the first case ( x : which returns a list), you can tell what the 2nd "part" of the 3rd string was, eg: x[[3]][2] . 在第一种情况下（ x ：返回一个列表），你可以知道第三个字符串的第二个“部分”是什么，例如： x[[3]][2] 。 How could you do the same using xx now that the results have been "unraveled" ( unlist -ed)? 如果结果已经“解开”（ unlist -ed），你怎么能用xx做同样的事？

asdfgh0077

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
如何在R中正确使用列表？

Brief background: Many (most?) contemporary programming languages in widespread use have at least a
复制链接

扫一扫