R语言swirl教程(R Programming)6——Subsetting Vectors

R语言swirl教程(R Programming)6——Subsetting Vectors

| In this lesson, we’ll see how to extract elements from a vector based on some conditions that we specify.

| For example, we may only be interested in the first 20 elements of a vector, or only the elements that are not NA, or only those that are positive or correspond to a specific variable of interest. By the end of this lesson, you’ll know how to handle each of these scenarios.

| I’ve created for you a vector called x that contains a random ordering of 20 numbers (from a standard normal distribution) and 20 NAs. Type x now to see what it looks like.

x
[1] 0.618804705 NA -0.561717045 NA NA 2.121961845 NA NA
[9] NA NA NA NA -0.116223284 -1.115846510 NA -1.404021991
[17] -0.902626087 NA -1.200279418 -0.171053254 0.729439833 NA 0.353889277 NA
[25] NA NA 1.005925106 NA -1.679218407 -0.670461758 NA -0.443677827
[33] NA -0.276915842 0.007862519 NA -0.047982745 -1.334484562 -1.102239409 NA

| The way you tell R that you want to select some particular elements (i.e. a ‘subset’) from a vector is by placing an ‘index vector’ in square brackets immediately following the name of the vector.

| For a simple example, try x[1:10] to view the first ten elements of x.

x[1:10]
[1] 0.6188047 NA -0.5617170 NA NA 2.1219618 NA NA NA
[10] NA

| Index vectors come in four different flavors – logical vectors, vectors of positive integers, vectors of negative integers, and vectors of character strings – each of which we’ll cover in this lesson.

| Let’s start by indexing with logical vectors. One common scenario when working with real-world data is that we want to extract all elements of a vector that are not NA (i.e. missing data). Recall that is.na(x) yields a vector of logical values the same length as x, with TRUEs corresponding to NA values in x and FALSEs corresponding to non-NA values in x.

| What do you think x[is.na(x)] will give you?

1: A vector with no NAs
2: A vector of length 0
3: A vector of all NAs
4: A vector of TRUEs and FALSEs

Selection: 3

| Prove it to yourself by typing x[is.na(x)].

x[is.na(x)]
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

| Recall that ! gives us the negation of a logical expression, so !is.na(x) can be read as ‘is not NA’. Therefore, if we want to create a vector called y that contains all of the non-NA values from x, we can use y <- x[!is.na(x)]. Give it a try.

y <- x[!is.na(x)]

| Print y to the console.

y
[1] 0.618804705 -0.561717045 2.121961845 -0.116223284 -1.115846510 -1.404021991 -0.902626087 -1.200279418
[9] -0.171053254 0.729439833 0.353889277 1.005925106 -1.679218407 -0.670461758 -0.443677827 -0.276915842
[17] 0.007862519 -0.047982745 -1.334484562 -1.102239409

| Now that we’ve isolated the non-missing values of x and put them in y, we can subset y as we please.

| Recall that the expression y > 0 will give us a vector of logical values the same length as y, with TRUEs corresponding to values of y that are greater than zero and FALSEs corresponding to values of y that are less than or equal to zero. What do you think y[y > 0] will give you?

1: A vector of length 0
2: A vector of all NAs
3: A vector of all the negative elements of y
4: A vector of all the positive elements of y
5: A vector of TRUEs and FALSEs

Selection: 4

| Type y[y > 0] to see that we get all of the positive elements of y, which are also the positive elements of our original vector x.

y[y > 0]
[1] 0.618804705 2.121961845 0.729439833 0.353889277 1.005925106 0.007862519

| Keep working like that and you’ll get there!

| You might wonder why we didn’t just start with x[x > 0] to isolate the positive elements of x. Try that now to see why.

x[x > 0]
[1] 0.618804705 NA NA NA 2.121961845 NA NA NA
[9] NA NA NA NA NA 0.729439833 NA 0.353889277
[17] NA NA NA 1.005925106 NA NA NA 0.007862519
[25] NA NA

| Since NA is not a value, but rather a placeholder for an unknown quantity, the expression NA > 0 evaluates to NA. Hence we get a bunch of NAs mixed in with our positive numbers when we do this.

| Combining our knowledge of logical operators with our new knowledge of subsetting, we could do this – x[!is.na(x) & x > 0]. Try it out.

x[!is.na(x) & x > 0]
[1] 0.618804705 2.121961845 0.729439833 0.353889277 1.005925106 0.007862519

| In this case, we request only values of x that are both non-missing AND greater than zero.

| I’ve already shown you how to subset just the first ten values of x using x[1:10]. In this case, we’re providing a vector of positive integers inside of the square brackets, which tells R to return only the elements of x numbered 1 through 10.

| Many programming languages use what’s called ‘zero-based indexing’, which means that the first element of a vector is considered element 0. R uses ‘one-based indexing’, which (you guessed it!) means the first element of a vector is considered element 1.

| Can you figure out how we’d subset the 3rd, 5th, and 7th elements of x? Hint – Use the c() function to specify the element numbers as a numeric vector.

x[c(1,3,5)]
[1] 0.6188047 -0.5617170 NA

| You almost had it, but not quite. Try again. Or, type info() for more options.

| Create a vector of indexes with c(3, 5, 7), then put that inside of the square brackets.

x[c(3,5,7)]
[1] -0.561717 NA NA

| It’s important that when using integer vectors to subset our vector x, we stick with the set of indexes {1, 2, …, 40} since x only has 40 elements. What happens if we ask for the zeroth element of x (i.e. x[0])? Give it a try.

x[0]
numeric(0)

| As you might expect, we get nothing useful. Unfortunately, R doesn’t prevent us from doing this. What if we ask for the 3000th element of x? Try it out.

x[3000]
[1] NA

| Again, nothing useful, but R doesn’t prevent us from asking for it. This should be a cautionary tale. You should always make sure that what you are asking for is within the bounds of the vector you’re working with.

| What if we’re interested in all elements of x EXCEPT the 2nd and 10th? It would be pretty tedious to construct a vector containing all numbers 1 through 40 EXCEPT 2 and 10.

| Luckily, R accepts negative integer indexes. Whereas x[c(2, 10)] gives us ONLY the 2nd and 10th elements of x, x[c(-2, -10)] gives us all elements of x EXCEPT for the 2nd and 10 elements. Try x[c(-2, -10)] now to see this.

x[c(-2, -10)]
[1] 0.618804705 -0.561717045 NA NA 2.121961845 NA NA NA
[9] NA NA -0.116223284 -1.115846510 NA -1.404021991 -0.902626087 NA
[17] -1.200279418 -0.171053254 0.729439833 NA 0.353889277 NA NA NA
[25] 1.005925106 NA -1.679218407 -0.670461758 NA -0.443677827 NA -0.276915842
[33] 0.007862519 NA -0.047982745 -1.334484562 -1.102239409 NA

| A shorthand way of specifying multiple negative numbers is to put the negative sign out in front of the vector of positive numbers. Type x[-c(2, 10)] to get the exact same result.

x[-c(2, 10)]
[1] 0.618804705 -0.561717045 NA NA 2.121961845 NA NA NA
[9] NA NA -0.116223284 -1.115846510 NA -1.404021991 -0.902626087 NA
[17] -1.200279418 -0.171053254 0.729439833 NA 0.353889277 NA NA NA
[25] 1.005925106 NA -1.679218407 -0.670461758 NA -0.443677827 NA -0.276915842
[33] 0.007862519 NA -0.047982745 -1.334484562 -1.102239409 NA

| So far, we’ve covered three types of index vectors – logical, positive integer, and negative integer. The only remaining type requires us to introduce the concept of ‘named’ elements.

| Create a numeric vector with three named elements using vect <- c(foo = 11, bar = 2, norf = NA).

vect <- c(foo = 11, bar = 2, norf = NA)

| Your dedication is inspiring!

| When we print vect to the console, you’ll see that each element has a name. Try it out.

vect
foo bar norf
11 2 NA

| We can also get the names of vect by passing vect as an argument to the names() function. Give that a try.

names(vect)
[1] “foo” “bar” “norf”

| Alternatively, we can create an unnamed vector vect2 with c(11, 2, NA). Do that now.

c(11, 2, NA)
[1] 11 2 NA

| You’re close…I can feel it! Try it again. Or, type info() for more options.

| Create an ordinary (unnamed) vector called vect2 that contains c(11, 2, NA).

vect2 <- c(11, 2, NA)

| Then, we can add the names attribute to vect2 after the fact with names(vect2) <- c(“foo”, “bar”, “norf”). Go ahead.

names(vect2) <- c(“foo”, “bar”, “norf”)

| Now, let’s check that vect and vect2 are the same by passing them as arguments to the identical() function.

identical(vect, vect2)
[1] TRUE

| Indeed, vect and vect2 are identical named vectors.

| Now, back to the matter of subsetting a vector by named elements. Which of the following commands do you think would give us the second element of vect?

1: vect[bar]
2: vect[“bar”]
3: vect[“2”]

Selection: 2

| Now, try it out.

vect[“bar”]
bar
2

| Likewise, we can specify a vector of names with vect[c(“foo”, “bar”)]. Try it out.

vect[c(“foo”, “bar”)]
foo bar
11 2

| Now you know all four methods of subsetting data from vectors. Different approaches are best in different scenarios and when in doubt, try it out!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值