r求矩阵某一列的标准偏差_如何在R中找到标准偏差？-CSDN博客

r求矩阵某一列的标准偏差

Being a statistical language, R offers standard function sd(‘ ‘) to find the standard deviation of the values.

作为一种统计语言，R提供标准函数sd（''）以查找值的标准偏差。

那么标准偏差是多少？ (So what is the standard deviation? )

‘Standard deviation is the measure of the dispersion of the values’.
“标准偏差是对数值离散度的度量”。
The higher the standard deviation, the wider the spread of values.
标准偏差越高，值的分布范围越广。
The lower the standard deviation, the narrower the spread of values.
标准偏差越低，值的分布范围越窄。
In simple words the formula is defined as – Standard deviation is the square root of the ‘variance’.
简单来说，公式定义为：- 标准偏差是“方差”的平方根。

标准偏差的重要性 (Importance on Standard deviation)

Standard deviation is very popular in the statistics, but why? the reasons for its popularity and its importance are listed below.

标准差在统计数据中非常流行，但是为什么呢？下面列出了其受欢迎程度和重要性的原因。

Standard deviation converts the negative number to a positive number by squaring it.
标准偏差通过平方将负数转换为正数。
It shows the larger deviations so that you can particularly look over them.
它显示了较大的偏差，因此您可以特别查看它们。
It shows the central tendency, which is a very useful function in the analysis.
它显示了集中趋势，这在分析中非常有用。
It has a major role to play in finance, business, analysis, and measurements.
它在财务，业务，分析和度量中起着重要作用。

Before we roll into the topic, keep this definition in your mind!

在进入主题之前，请牢记此定义！

Variance – It is defined as the squared differences between the observed value and expected value.

方差 –定义为观察值与期望值之间的平方差。

在列表中找到R的标准偏差 (Find the Standard deviation in R for values in a list)

In this method, we will create a list ‘x’ and add some value to it. Then we can find the standard deviation of those values in the list.

在这种方法中，我们将创建一个列表 “ x”并为其添加一些值。然后，我们可以在列表中找到这些值的标准偏差。


 x <- c(34,56,87,65,34,56,89)    #creates list 'x' with some values in it.

 sd(x)  #calculates the standard deviation of the values in the list 'x'

Output —> 22.28175

输出-> 22.28175

Now we can try to extract specific values from the list ‘y’ to find the standard deviation.

现在我们可以尝试从列表“ y”中提取特定值以找到标准偏差。


 y <- c(34,65,78,96,56,78,54,57,89)  #creates a list 'y' having some values
 
data1 <- y[1:5] #extract specific values using its Index

sd(data1) #calculates the standard deviation for Indexed or extracted values from the list.

Output —> 23.28519

输出-> 23.28519

查找存储在CSV文件中的值的标准偏差 (Finding the Standard deviation of the values stored in a CSV file)

In this method, we are importing a CSV file to find the standard deviation in R for the values which are stored in that file.

在这种方法中，我们将导入一个CSV文件，以找到R中存储在该文件中的值的标准偏差。


readfile <- read.csv('testdata1.csv')  #reading a csv file

data2 <- readfile$Values      #getting values stored in the header 'Values'

sd(data2)                              #calculates the standard deviation

Output —> 17.88624

输出-> 17.88624

高低标准偏差 (High and Low Standard Deviation)

In general, The values will be so close to the average value in low standard deviation and the values will be far spread from the average value in the high standard deviation.

通常，在低标准偏差下 ，这些值将非常接近平均值，而在高标准偏差下 ，这些值将与平均值相差甚远。

We can illustrate this with an example.

我们可以用一个例子来说明。


x <- c(79,82,84,96,98)
mean(x)
--->  82.22222
sd(x)
--->  10.58038

To plot these values in a bar graph using in R, run the below code.

要在R中使用条形图绘制这些值，请运行以下代码。

To install the ggplot2 package, run this code in R studio.

要安装ggplot2软件包，请在R studio中运行此代码。

—> install.packages(“ggplot2”)

—> install.packages（“ ggplot2”）


library(ggplot2)

values <- data.frame(marks=c(79,82,84,96,98), students=c(0,1,2,3,4,))
head(values)                  #displayes the values
 marks students
1    79        0
2    82        1
3    84        2
4    96        3
5    98        4
x <- ggplot(values, aes(x=marks, y=students))+geom_bar(stat='identity')
x                             #displays the plot

In the above results, you can observe that most of the data is clustering around the mean value(79,82,84) which shows that it is a low standard deviation.

在以上结果中，您可以观察到大多数数据都围绕平均值（79、82、84）聚类，这表明它是一个低标准偏差 。

Illustration for high standard deviation.

高标准偏差的图示。


y <- c(23,27,30,35,55,76,79,82,84,94,96)
mean(y)
---> 61.90909
sd(y)
---> 28.45507

To plot these values using a bar graph in ggplot in R, run the below code.

要使用R中ggplot中的条形图绘制这些值，请运行以下代码。


library(ggplot2)

values <- data.frame(marks=c(23,27,30,35,55,76,79,82,84,94,96), students=c(0,1,2,3,4,5,6,7,8,9,10))
head(values)                  #displayes the values
  marks students
1    23        0
2    27        1
3    30        2
4    35        3
5    55        4
6    76        5
x <- ggplot(values, aes(x=marks, y=students))+geom_bar(stat='identity')
x                             #displays the plot

In the above results, you can see the widespread data. You can see the least score of 23 which is very far from the average score 61. This is called the high standard deviation

在以上结果中，您可以看到广泛的数据。您可以看到最低的23分，与平均得分61分相去甚远。这称为高标准偏差

By now, you got a fair understanding of using the sd(‘ ‘) function to calculate the standard deviation in the R language. Let’s sum up this tutorial by solving simple problems.

到目前为止，您已经对使用sd（''）函数计算R语言中的标准差有了一定的了解。让我们通过解决简单的问题来总结本教程。

示例1：偶数列表的标准偏差 (Example #1: Standard Deviation for a List of Even Numbers)

Find the standard deviation of the even numbers between 1-20 (exclude 1 and 20).

找出1-20之间的偶数标准差（不包括1和20）。

Solution: The even numbers between 1 to 20 are,

解决方案： 1到20之间的偶数是

—> 2, 4, 6, 8, 10, 12, 14, 16, 18

-> 2，4，6，8，10，12，14，16，18

Lets find the standard deviation of these values.

让我们找到这些值的标准偏差。


x <- c(2,4,6,8,10,12,14,16,18)  #list of even numbers from 1 to 20

sd(x)                           #calculates the standard deviation of these 
                            values in the list of even numbers from 1 to 20

Output —> 5.477226

输出-> 5.477226

例2：美国人口数据的标准差 (Example #2: Standard Deviation for US Population Data)

Find the standard deviation of the state-wise population in the USA.

找出美国各州人口的标准差。

For this, import the CSV file and read the values to find the standard deviation and plot the result in a histogram in R.

为此，请导入CSV文件并读取值以找到标准偏差，然后将结果绘制在R中的直方图中。


df<-read.csv("population.csv")      #reads csv file
data<-df$X2018.Population           #extarcts the data from population 
                                     column
mean(data)                          #calculates the mean
                          
View(df)                            #displays the data
sd(data)                            #calculates the standard deviation

Output —-> mean = 6432008, Sd = 7376752

输出 - > 平均值 = 6432008，SD = 7376752

结论 (Conclusion)

Finding the standard deviation of the values in R is easy. R offers standard function sd(‘ ‘) to find the standard deviation. You can create a list of values or import a CSV file to find the standard deviation.

在R中找到值的标准偏差很容易。 R提供标准函数sd（''）来查找标准偏差。您可以创建值列表或导入CSV文件以查找标准偏差。

Important: Don’t forget to calculate the standard deviation by extracting some values from a file or a list through indexing as shown above.

重要提示：不要忘记通过通过索引从文件或列表中提取一些值来计算标准差，如上所示。

Use the comment box to post any kind of doubts regarding the sd(‘ ‘) function in R. Happy learning!!!

使用评论框发布对R中sd（''）函数的任何疑问。祝您学习愉快！！！