Introduction to Multivariate Statistics

You will discover how fundamental statistical operations work and how to implement them using NumPy with notation and terminology from linear algebra.

After completing this tutorial , you will know:

  • What the expected value,average, and mean are and how to calculate them.
  • What the variance and standard  deviation are and how to calculate them.
  • What the covarience,correlation, and covariance matrix are and how to calculate them.

1.1 Tutorial Overiview

This tutorial is divided into 4 parts; they are:

  • 1.Expected Value and Mean
  • 2.Variance and Standard Deviation
  • 3. Covariance and Correlation
  • 4. Covariance Matrix

1.2 Expected Value and Mean

In probability, the average value of some random variable X is called the expected value or the expectation. The expected value uses the notation E with square brackets around the name of the variable; for example: E[X]

It is calculated as the probability weighted sum of values that can be drawn.

                         E[X] = \sumx1 × p1, x2 × p2, x3 × p3, · · · , xn × pn

In simple cases, such as the flipping of a coin or rolling a dice, the probability of each event is just as likely. Therefore, the expected value can be calculated as the sum of all values multiplied by the reciprocal of the number of values.

                        E[X] = 1 n × Xx1, x2, x3, · · · , xn

        In statistics, the mean, or more technically the arithmetic mean or sample mean, can be estimated from a sample of examples drawn from the domain. It is confusing because mean, average, and expected value are used interchangeably. In the abstract, the mean is denoted by the lower case Greek letter mu µ and is calculated from the sample of observations, rather than all possible values.

             

# Example of calculating a vector mean
# vector mean
from numpy import array
from numpy import mean
# define vector
v = array([1, 2, 3, 4, 5,6])
print(v)
# calculate mean
result = mean(v)
print(result)

    Running the example first prints the defined vector and the mean of the values in the vector.     

 

 The mean function can calculate the row or column means of a matrix by specifying the axis argument and the value 0 or 1 respectively. The example below defines a 2 × 6 matrix and calculates both column and row means.

# matrix means
from numpy import array
from numpy import mean
# define matrix
M = array([
    [1, 2, 3, 4, 5, 6],
    [1, 2, 3, 4, 5, 6]
])

# column means
col_mean = mean(M, axis=0)
print(col_mean)

# row means
row_mean = mean(M, axis=1)
print(row_mean)

Running the example first prints the defined matrix, then the calculated column and row mean values.

 1.3 Variance and Standard Deviation

 

 

# Example of calculating a vector variance
# vector variance
from numpy import array
from numpy import var

# define vector
v  = array([1,2,3,4,5,6])
print(v)
# calculate variance
result = var(v, ddof=1)
print(result)

Running the example first prints the defined vector and then the calculated sample variance of the values in the vector.

 The var function can calculate the row or column variances of a matrix by specifying the axis argument and the value 0 or 1 respectively, the same as the mean function above. The example below defines a 2 × 6 matrix and calculates both column and row sample variances.

# Example of calculating matrix variances
# matrix variances
from numpy import array
from numpy import var
# define matrix
M = array([
    [1,2,3,4,5,6],
    [1,2,3,4,5,6]
])
print(M)

# column variances
col_var = var(M, ddof=1,axis=0)
print(col_var)

# raw variances
row_var = var(M, ddof=1, axis=1)
print(row_var)

Running the example first prints the defined matrix and then the column and row sample variance values.

 The standard deviation is calculated as the square root of the variance and is denoted as lowercase s.

 

# Example of calculating matrix standard deviations
# matrix standard deviation
from numpy import array
from numpy import std

# define matrix
M = array([
    [1, 2, 3, 4, 5, 6],
    [1, 2, 3, 4, 5, 6]
])

print(M)

# column standard deviations
col_std = std(M, ddof=1, axis=0)
print(col_std)

# row standard deviations
row_std = std(M, ddof=1, axis = 1)
print(row_std)

Running the example first prints the defined matrix and then the column and row sample standard deviation values

 1.4 Covariance and Correlation

 The sign of the covariance can be interpreted as whether the two variables increase together (positive) or decrease together (negative). The magnitude of the covariance is not easily interpreted. A covariance value of zero indicates that both variables are completely independent. NumPy does not have a function to calculate the covariance between two variables directly. Instead, it has a function for calculating a covariance matrix called cov() that we can use to retrieve the covariance. By default, the cov()function will calculate the unbiased or sample covariance between the provided random variables. The example below defines two vectors of equal length with one increasing and one decreasing. We would expect the covariance between these variables to be negative. We access just the covariance for the two variables as the [0, 1] element of the square covariance matrix returned.

# Example of calculating a vector covariance
# vector covariance 
from numpy import array
from numpy import cov
# define first vector
x = array([1, 2, 3, 4, 5, 6, 7, 8, 9])
print(x)

# define second covariance
y = array([9, 8, 7, 6, 5, 4, 3, 2, 1])
print(y)

# calculate covariance 
Sigma = cov(x,y)[0,1]
print(Sigma)

Running the example first prints the two vectors followed by the covariance for the values in the two vectors. The value is negative, as we expected.

 

 

# Example of calculating a vector correlation.
# Vector correlation
from numpy import array
from numpy import corrcoef
# define first vector
x = array([1, 2, 3, 4, 5, 6, 7, 8, 9])
print(x)

# define second vector
y = array([9, 8, 7, 6, 5, 4, 3, 2, 1])
print(y)

# calculate correlation
corr = corrcoef(x,y)[0,1]
print(corr)

Running the example first prints the two defined vectors followed by the correlation coefficient. We can see that the vectors are maximally negatively correlated as we designed.

1.5 Covariance Matrix

 

# Example of calculating a covariance matrix
# covariance matrix
from numpy import array
from numpy import cov

# define matrix of observation
X = array([
    [1, 5, 8],
    [3, 5, 11],
    [2, 4, 9],
    [3, 6, 10],
    [1, 5, 10]
])
print(X)

# calculate covariance matrix
Sigma = cov(X.T)
print(Sigma)

Running the example first prints the defined dataset and then the calculated covariance matrix

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
本书分两部分上传,这是第一部分。该书作者是Barbara G. Tabachnick和Linda S. Fidell,出版时间为2007年(第5版),到目前(2010年9月15日)为止google上显示的引用次数为21129 Editorial ReviewsProduct Description This text takes a practical approach to multivariate data analysis, with an introduction to the most commonly encountered statistical and multivariate techniques. Using Multivariate Statistics provides practical guidelines for conducting numerous types of multivariate statistical analyses. It gives syntax and output for accomplishing many analyses through the most recent releases of SAS and SPSS. The book maintains its practical approach, still focusing on the benefits and limitations of applications of a technique to a data set - when, why, and how to do it. Overall, it provides advanced students with a timely and comprehensive introduction to today's most commonly encountered statistical and multivariate techniques, while assuming only a limited knowledge of higher-level mathematics. From the Back Cover Using Multivariate Statistics provides advanced students with a timely and comprehensive introduction to today’s most commonly encountered statistical and multivariate techniques, while assuming only a limited knowledge of higher level mathematics. This long-awaited revision reflects extensive updates throughout, especially in the areas of Data Screening (Chapter 4), Multiple Regression (Chapter 5), and Logistic Regression (Chapter 12). A brand new chapter (Chapter 15) on Multilevel Linear Modeling explains techniques for dealing with hierarchical data sets. Also included are syntax and output for accomplishing many analyses through the most recent releases of SAS and SPSS. As in past editions, each technique chapter: • discusses tests for assumptions of analysis (and procedures for dealing with their violation) • presents a small example, hand-worked for the most basic analysis • describes varieties of analysis • discusses important issues (such as effect size) • provides an example with a real data set from tests o

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值