-
Summary Statistics
Simply throwing a bunch of numbers at your audience will only confuse them. Part of a statistician's job is to explain their data. In this chapter, we'll show you some of the tools R offers to let you do so, with minimum fuss.
-
Mean4.1
Determining the health of the crew is an important part of any inventory of the ship. Here's a vector containing the number of limbs each member has left, along with their names.
limbs <- c(4, 3, 4, 3, 2, 4, 4, 4) names(limbs) <- c('One-Eye', 'Peg-Leg', 'Smitty', 'Hook', 'Scooter', 'Dan', 'Mikey', 'Blackbeard')
A quick way to assess our battle-readiness would be to get the average of the crew's appendage counts. Statisticians call this the "mean". Call the mean function with the
limbs
vector. -
- One-EyePeg-LegSmittyHookScooterDanMikeyBlackbeard01234
-
If we draw a line on the plot representing the mean, we can easily compare the various values to the average. The
abline
function can take anh
parameter with a value at which to draw a horizontal line, or av
parameter for a vertical line. When it's called, it updates the previous plot.Draw a horizontal line across the plot at the mean:
- One-EyePeg-LegSmittyHookScooterDanMikeyBlackbeard01234
-
Median4.2
Let's say we gain a crew member that completely skews the mean.
> limbs <- c(4, 3, 4, 3, 2, 4, 4, 14) > names(limbs) <- c('One-Eye', 'Peg-Leg', 'Smitty', 'Hook', 'Scooter', 'Dan', 'Mikey', 'Davy Jones') > mean(limbs) [1] 4.75
Let's see how this new mean shows up on our same graph.
Redo Complete> barplot(limbs) > abline(h = mean(limbs))
It may be factually accurate to say that our crew has an average of 4.75 limbs, but it's probably also misleading.
- One-EyePeg-LegSmittyHookScooterDanMikeyDavy Jones02468101214
-
For situations like this, it's probably more useful to talk about the "median" value. The median is calculated by sorting the values and choosing the middle one - the third value, in this case. (For sets with an even number of values, the middle two values are averaged.)
Call the median function on the vector:
-
That's more like it. Let's show the median on the plot. Draw a horizontal line across the plot at the median.
- One-EyePeg-LegSmittyHookScooterDanMikeyDavy Jones02468101214
-
Standard Deviation4.3
Some of the plunder from our recent raids has been worth less than what we're used to. Here's a vector with the values of our latest hauls:
> pounds <- c(45000, 50000, 35000, 40000, 35000, 45000, 10000, 15000) > barplot(pounds) > meanValue <- mean(pounds)
Let's see a plot showing the mean value:
Redo Complete> abline(h = meanValue)
These results seem way below normal. The crew wants to make Smitty, who picked the last couple ships to waylay, walk the plank. But as he dangles over the water, wily Smitty raises a question: what, exactly, is a "normal" haul?
- 01000020000300004000050000
-
Statisticians use the concept of "standard deviation" from the mean to describe the range of typical values for a data set. For a group of numbers, it shows how much they typically vary from the average value. To calculate the standard deviation, you calculate the mean of the values, then subtract the mean from each number and square the result, then average those squares, and take the square root of that average.
If that sounds like a lot of work, don't worry. You're using R, and all you have to do is pass a vector to the sd function. Try calling sd on the pounds vector now, and assign the result to the deviation variable:
-
We'll add a line on the plot to show one standard deviation above the mean (the top of the normal range)...
Redo Complete> abline(h = meanValue + deviation)
Hail to the sailor that brought us that 50,000-pound payday!
- 01000020000300004000050000
-
Now try adding a line on the plot to show one standard devation below the mean (the bottom of the normal range):
Redo Complete> abline(h = meanValue - deviation)
We're risking being hanged by the Spanish for this? Sorry, Smitty, you're shark bait.
- 01000020000300004000050000
-
Chapter 4 Completed
Share your plunder:
Land ho! You've navigated Chapter 4. And what awaits us on the shore? It's another badge!
Summary statistics let you show how your data points are distributed, without the need to look closely at each one. We've shown you the functions for mean, median, and standard deviation, as well as ways to display them on your graphs.
R Programming -- Summary Statistics
最新推荐文章于 2023-12-28 16:11:54 发布