R Programming -- data frames

  1. Data Frames

    • Try R is Sponsored By:
      O'Reilly
    • Complete to
      Unlock
      Chapter 6 Badge

    The weightsprices, and types data structures are all deeply tied together, if you think about it. If you add a new weight sample, you need to remember to add a new price and type, or risk everything falling out of sync. To avoid trouble, it would be nice if we could tie all these variables together in a single data structure.

    Fortunately, R has a structure for just this purpose: the data frame. You can think of a data frame as something akin to a database table or an Excel spreadsheet. It has a specific number of columns, each of which is expected to contain values of a particular type. It also has an indeterminate number of rows - sets of related values for each column.

  2. Data Frames6.1

    Our vectors with treasure chest data are perfect candidates for conversion to a data frame. And it's easy to do. Call the data.frame function, and pass weightsprices, and types as the arguments. Assign the result to the treasure variable:

    Redo Complete
    > treasure <- data.frame(weights, prices, types)
    
  3. Now, try printing treasure to see its contents:

    Redo Complete
    > print(treasure)
      weights prices  types
    1     300   9000   gold
    2     200   5000 silver
    3     100  12000   gems
    4     250   7500   gold
    5     150  18000   gems
    

    There's your new data frame, neatly organized into rows, with column names (derived from the variable names) across the top.

  4. Data Frame Access6.2

    Just like matrices, it's easy to access individual portions of a data frame.

    You can get individual columns by providing their index number in double-brackets. Try getting the second column (prices) of treasure:

    Redo Complete
    > treasure[[2]]
    [1]  9000  5000 12000  7500 18000
    
  5. You could instead provide a column name as a string in double-brackets. (This is often more readable.) Retrieve the "weights" column:

    Redo Complete
    > treasure[["weights"]]
    [1] 300 200 100 250 150
    
  6. Typing all those brackets can get tedious, so there's also a shorthand notation: the data frame name, a dollar sign, and the column name (without quotes). Try using it to get the "prices" column:

    Redo Complete
    > treasure$prices
    [1]  9000  5000 12000  7500 18000
    
  7. Now try getting the "types" column:

    Redo Complete
    > treasure$types
    [1] gold   silver gems   gold   gems  
    Levels: gems gold silver
    
  8. Loading Data Frames6.3

    Typing in all your data by hand only works up to a point, obviously, which is why R was given the capability to easily load data in from external files.

    We've created a couple data files for you to experiment with:

    > list.files()
    [1] "targets.csv" "infantry.txt"
    

    Our "targets.csv" file is in the CSV (Comma Separated Values) format exported by many popular spreadsheet programs. Here's what its content looks like:

    "Port","Population","Worth"
    "Cartagena",35000,10000
    "Porto Bello",49000,15000
    "Havana",140000,50000
    "Panama City",105000,35000
    

    You can load a CSV file's content into a data frame by passing the file name to the read.csv function. Try it with the "targets.csv" file:

    Redo Complete
    > read.csv("targets.csv")
             Port Population Worth
    1   Cartagena      35000 10000
    2 Porto Bello      49000 15000
    3      Havana     140000 50000
    4 Panama City     105000 35000
    
  9. The "infantry.txt" file has a similar format, but its fields are separated by tab characters rather than commas. Its content looks like this:

    Port         Infantry
    Porto Bello  700
    Cartagena    500
    Panama City  1500
    Havana       2000
    

    For files that use separator strings other than commas, you can use the read.table function. The separgument defines the separator character, and you can specify a tab character with "\t".

    Call read.table on "infantry.txt", using tab separators:

    Redo Complete
    > read.table("infantry.txt",sep="\t")
               V1       V2
    1        Port Infantry
    2 Porto Bello      700
    3   Cartagena      500
    4 Panama City     1500
    5      Havana     2000
    
  10. Notice the "V1" and "V2" column headers? The first line is not automatically treated as column headers with read.table. This behavior is controlled by the header argument. Call read.table again, setting header to TRUE:

    Redo Complete
    > read.table("infantry.txt", sep="\t",header=TRUE)
             Port Infantry
    1 Porto Bello      700
    2   Cartagena      500
    3 Panama City     1500
    4      Havana     2000
    
  11. Merging Data Frames6.4

    We want to loot the city with the most treasure and the fewest guards. Right now, though, we have to look at both files and match up the rows. It would be nice if all the data for a port were in one place...

    R's merge function can accomplish precisely that. It joins two data frames together, using the contents of one or more columns. First, we're going to store those file contents in two data frames for you, targets andinfantry.

    The merge function takes arguments with an x frame (targets) and a y frame (infantry). By default, it joins the frames on columns with the same name (the two Port columns). See if you can merge the two frames:

    Redo Complete
    > targets <- read.csv("targets.csv")
    > infantry <- read.table("infantry.txt", sep="\t", header=TRUE)
    > merge(x = targets, y = infantry)
             Port Population Worth Infantry
    1   Cartagena      35000 10000      500
    2      Havana     140000 50000     2000
    3 Panama City     105000 35000     1500
    4 Porto Bello      49000 15000      700
    
  12. Chapter 6 Completed

    Chapter 6 Badge
    Share your plunder:

    Thirty paces south from the gate of the fort, and dig… we've unearthed another badge!

    When your data grows beyond a certain size, you need powerful tools to organize it. With data frames, R gives you exactly that. We've shown you how to create and access data frames. We've also shown you how to load frames in from files, and how to cobble multiple frames together into a new data set.

    Time to take what you've learned so far, and apply it. In the next chapter, we'll be working with some real-world data!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值