R Programming -- Summary Statistics

  1. Summary Statistics

    • Try R is Sponsored By:
      O'Reilly
    • Complete to
      Unlock
      Chapter 4 Badge

    Simply throwing a bunch of numbers at your audience will only confuse them. Part of a statistician's job is to explain their data. In this chapter, we'll show you some of the tools R offers to let you do so, with minimum fuss.

  2. Mean4.1

    Determining the health of the crew is an important part of any inventory of the ship. Here's a vector containing the number of limbs each member has left, along with their names.

    limbs <- c(4, 3, 4, 3, 2, 4, 4, 4)
    names(limbs) <- c('One-Eye', 'Peg-Leg', 'Smitty', 'Hook', 'Scooter', 'Dan', 'Mikey', 'Blackbeard')
    

    A quick way to assess our battle-readiness would be to get the average of the crew's appendage counts. Statisticians call this the "mean". Call the mean function with the limbs vector.

    Redo Complete
    > mean(limbs)
    [1] 3.5
    

    An average closer to 4 would be nice, but this will have to do.

  3. Here's a barplot of that vector:

    Redo Complete
    > barplot(limbs)
    
    • One-EyePeg-LegSmittyHookScooterDanMikeyBlackbeard01234
  4. If we draw a line on the plot representing the mean, we can easily compare the various values to the average. Theabline function can take an h parameter with a value at which to draw a horizontal line, or a v parameter for a vertical line. When it's called, it updates the previous plot.

    Draw a horizontal line across the plot at the mean:

    Redo Complete
    > abline(h = mean(limbs))
    
    • One-EyePeg-LegSmittyHookScooterDanMikeyBlackbeard01234
  5. Median4.2

    Let's say we gain a crew member that completely skews the mean.

    > limbs <- c(4, 3, 4, 3, 2, 4, 4, 14)
    > names(limbs) <- c('One-Eye', 'Peg-Leg', 'Smitty', 'Hook', 
                        'Scooter', 'Dan', 'Mikey', 'Davy Jones')
    > mean(limbs)
    [1] 4.75
    

    Let's see how this new mean shows up on our same graph.

    Redo Complete
    > barplot(limbs)
    > abline(h = mean(limbs))
    

    It may be factually accurate to say that our crew has an average of 4.75 limbs, but it's probably also misleading.

    • One-EyePeg-LegSmittyHookScooterDanMikeyDavy Jones02468101214
  6. For situations like this, it's probably more useful to talk about the "median" value. The median is calculated by sorting the values and choosing the middle one - the third value, in this case. (For sets with an even number of values, the middle two values are averaged.)

    Call the median function on the vector:

    Redo Complete
    > median(limbs)
    [1] 4
    
  7. That's more like it. Let's show the median on the plot. Draw a horizontal line across the plot at the median.

    Redo Complete
    > abline(h = median(limbs))
    
    • One-EyePeg-LegSmittyHookScooterDanMikeyDavy Jones02468101214
  8. Standard Deviation4.3

    Some of the plunder from our recent raids has been worth less than what we're used to. Here's a vector with the values of our latest hauls:

    > pounds <- c(45000, 50000, 35000, 40000, 35000, 45000, 10000, 15000)
    > barplot(pounds)
    > meanValue <- mean(pounds)
    

    Let's see a plot showing the mean value:

    Redo Complete
    > abline(h = meanValue)
    

    These results seem way below normal. The crew wants to make Smitty, who picked the last couple ships to waylay, walk the plank. But as he dangles over the water, wily Smitty raises a question: what, exactly, is a "normal" haul?

    • 01000020000300004000050000
  9. Statisticians use the concept of "standard deviation" from the mean to describe the range of typical values for a data set. For a group of numbers, it shows how much they typically vary from the average value. To calculate the standard deviation, you calculate the mean of the values, then subtract the mean from each number and square the result, then average those squares, and take the square root of that average.

    If that sounds like a lot of work, don't worry. You're using R, and all you have to do is pass a vector to the sd function. Try calling sd on the pounds vector now, and assign the result to the deviation variable:

    Redo Complete
    > deviation <- sd(pounds)
    
  10. We'll add a line on the plot to show one standard deviation above the mean (the top of the normal range)...

    Redo Complete
    > abline(h = meanValue + deviation)
    

    Hail to the sailor that brought us that 50,000-pound payday!

    • 01000020000300004000050000
  11. Now try adding a line on the plot to show one standard devation below the mean (the bottom of the normal range):

    Redo Complete
    > abline(h = meanValue - deviation)
    

    We're risking being hanged by the Spanish for this? Sorry, Smitty, you're shark bait.

    • 01000020000300004000050000
  12. Chapter 4 Completed

    Chapter 4 Badge
    Share your plunder:

    Land ho! You've navigated Chapter 4. And what awaits us on the shore? It's another badge!

    Summary statistics let you show how your data points are distributed, without the need to look closely at each one. We've shown you the functions for mean, median, and standard deviation, as well as ways to display them on your graphs.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值