
字符串(由字符组成的序列) List(值的有序集合)和 基于映射的类型(将键映射到值的无序数组)


当我开始学习R时,有两件事几乎从一开始就很明显:list是R中最重要的数据类型(因为它是R data.frame的父类),其次,我就是不理解它们是如何工作的,至少在我的代码中不能正确地使用它们。

首先,在我看来,R的列表数据类型是映射ADT的直接实现(Python中的字典,Objective C中的NSMutableDictionary, Perl和Ruby中的散列,Javascript中的对象文字,等等)。


x = list("ev1"=10, "ev2"=15, "rv"="Group 1")

访问R List中的项就像访问Python字典中的项一样,例如x['ev1']。同样,你可以通过以下方法检索“键”或“值”:

names(x)    # fetch just the 'keys' of an R list
# [1] "ev1" "ev2" "rv"

unlist(x)   # fetch just the 'values' of an R list
#   ev1       ev2        rv 
#  "10"      "15" "Group 1" 

x = list("a"=6, "b"=9, "c"=3)  

# [1] 18


R列表与其他广泛使用的语言中的映射类型之间的三个显著区别(例如:Python, Perl, JavaScript):



x = strsplit(LETTERS[1:10], "")     # passing in an object of type 'character'

class(x)                            # returns 'list', not a vector of length 2
# [1] list


x = c(0.5, 0.8, 0.23, list(0.5, 0.2, 0.9), recursive=TRUE)

# [1] list



What are the rules which determine when a function call will return a list (e.g., strsplit expression recited above)? If I don't explicitly assign names to a list (e.g., list(10,20,30,40)) are the default names just sequential integers beginning with 1? (I assume, but I am far from certain that the answer is yes, otherwise we wouldn't be able to coerce this type of list to a vector w/ a call to unlist.) Why do these two different operators, [], and [[]], return the same result? x = list(1, 2, 3, 4) both expressions return "1": x[1] x[[1]] why do these two expressions not return the same result? x = list(1, 2, 3, 4) x2 = list(1:4)







Atomic vector ... I called that "sequence" for myself, no direction, just sequence of same types. [ subsets. Vector ... the sequence with one direction from 2D, [ subsets. Matrix ... bunch of vectors with the same length forming rows or columns, [ subsets by rows and columns, or by sequence. Arrays ... layered matrices forming 3D Dataframe ... a 2D table like in excel, where I can sort, add or remove rows or columns or make arit. operations with them, only after some time I truly recognized that data frame is a clever implementation of list where I can subset using [ by rows and columns, but even using [[. List ... to help myself I thought about the list as of tree structure where [i] selects and returns whole branches and [[i]] returns item from the branch. And because it is tree like structure, you can even use an index sequence to address every single leaf on a very complex list using its [[index_vector]]. Lists can be simple or very complex and can mix together various types of objects into one.


l <- list("aaa",5,list(1:3),LETTERS[1:4],matrix(1:9,3,3))
l[[c(5,4)]] # selects 4 from matrix using [[index_vector]] in list
l[[5]][4] # selects 4 from matrix using sequential index in matrix
l[[5]][1,2] # selects 4 from matrix using row and column in matrix




x = list(1, 2, 3, 4)

[ ] provides sub setting operation. In general sub set of any object will have the same type as the original object. Therefore, x[1] provides a list. Similarly x[1:2] is a subset of original list, therefore it is a list. Ex. x[1:2] [[1]] [1] 1 [[2]] [1] 2 [[ ]] is for extracting an element from the list. x[[1]] is valid and extract the first element from the list. x[[1:2]] is not valid as [[ ]] does not provide sub setting like [ ]. x[[2]] [1] 2 > x[[2:3]] Error in x[[2:3]] : subscript out of bounds



 R> retList <- function() return(list(1,2,3,4)); class(retList())
 [1] "list"
 R> notList <- function() return(c(1,2,3,4)); class(notList())
 [1] "numeric"


R> retList <- function() return(list(1,2,3,4)); names(retList())


R> x <- list(1,2,3,4)
R> x[1]
[1] 1
R> x[[1]]
[1] 1






Atomic vector ... I called that "sequence" for myself, no direction, just sequence of same types. [ subsets. Vector ... the sequence with one direction from 2D, [ subsets. Matrix ... bunch of vectors with the same length forming rows or columns, [ subsets by rows and columns, or by sequence. Arrays ... layered matrices forming 3D Dataframe ... a 2D table like in excel, where I can sort, add or remove rows or columns or make arit. operations with them, only after some time I truly recognized that data frame is a clever implementation of list where I can subset using [ by rows and columns, but even using [[. List ... to help myself I thought about the list as of tree structure where [i] selects and returns whole branches and [[i]] returns item from the branch. And because it is tree like structure, you can even use an index sequence to address every single leaf on a very complex list using its [[index_vector]]. Lists can be simple or very complex and can mix together various types of objects into one.


l <- list("aaa",5,list(1:3),LETTERS[1:4],matrix(1:9,3,3))
l[[c(5,4)]] # selects 4 from matrix using [[index_vector]] in list
l[[5]][4] # selects 4 from matrix using sequential index in matrix
l[[5]][1,2] # selects 4 from matrix using row and column in matrix


x = list(1, 2, 3, 4)
x2 = list(1:4)

不一样,因为1:4等于c(1,2,3,4) 如果你想让它们相同,那么:

x = list(c(1,2,3,4))
x2 = list(1:4)


Vectors are the atoms of R. Eg, rpois(1e4,5) (5 random numbers), numeric(55) (length-55 zero vector over doubles), and character(12) (12 empty strings), are all "basic". Either lists or vectors can have names. > n = numeric(10) > n [1] 0 0 0 0 0 0 0 0 0 0 > names(n) NULL > names(n) = LETTERS[1:10] > n A B C D E F G H I J 0 0 0 0 0 0 0 0 0 0 Vectors require everything to be the same data type. Watch this: > i = integer(5) > v = c(n,i) > v A B C D E F G H I J 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > class(v) [1] "numeric" > i = complex(5) > v = c(n,i) > class(v) [1] "complex" > v A B C D E F G H I J 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i Lists can contain varying data types, as seen in other answers and the OP's question itself.

我见过一些语言(ruby, javascript),其中的“数组”可能包含变量数据类型,但例如在c++中,“数组”必须是相同的数据类型。我相信这是一个速度/效率的问题:如果你有一个数字(1e6),你可以先验地知道它的大小和每个元素的位置;如果某个未知片段中可能包含“紫飞食人”,那么你就必须真正分析这些内容来了解它的基本事实。




