python 描述性统计
描述性统计 (Descriptive Statistics)
After data collection, most Psychology researchers use different ways to summarise the data. In this tutorial we will learn how to do descriptive statistics in Python. Python, being a programming language, enables us many ways to carry out descriptive statistics. Pandas makes data manipulation and summary statistics quite similar to how you would do it in R. I believe that the dataframe in R is very intuitive to use and pandas offers a DataFrame method similar to Rs. Also, many Psychology researchers may have experience of R.
收集数据后,大多数心理学研究人员使用不同的方式来汇总数据。 在本教程中,我们将学习如何在Python中进行描述性统计 。 Python是一种编程语言,它使我们可以采用多种方式来进行描述性统计。 Pandas使数据操作和汇总统计信息与R中的操作非常相似。我相信R中的数据框的使用非常直观,Pandas提供了类似于Rs的DataFrame方法。 同样,许多心理学研究人员可能有R的经验。
Thus, in this tutorial you will learn how to do descriptive statistics using Pandas, but also using NumPy, and SciPy. We start with using Pandas for obtaining summary statistics and some variance measures. After that we continue with the central tenancy measures (e.g., mean and median) using Pandas and NumPy. The harmonic, geometric, and trimmed mean cannot be calculated using Pandas or NumPy so we use SciPy. Towards the end we learn how get some measures of variability (e.g., variance using pandas).
因此,在本教程中,您将学习如何使用Pandas以及NumPy和SciPy进行描述性统计。 我们首先使用熊猫获取摘要统计信息和一些方差度量。 之后,我们继续使用Pandas和NumPy进行中央租赁措施(例如,均值和中位数)。 谐波,几何和修剪均值无法使用Pandas或NumPy计算,因此我们使用SciPy。 最后,我们学习如何获得一些可变性的度量(例如,使用熊猫的变异)。
import numpy as np from pandas import DataFrame as df from scipy.stats import trim_mean, kurtosis from scipy.stats.mstats import mode, gmean, hmean
import numpy as np from pandas import DataFrame as df from scipy.stats import trim_mean, kurtosis from scipy.stats.mstats import mode, gmean, hmean
模拟响应时间数据 (Simulate response time data)
Many times in experimental psychology response time is the dependent variable. I to simulate an experiment in which the dependent variable is response time to some arbitrary targets. The simulated data will, further, have two independent variables (IV, “iv1” have 2 levels and “iv2” have 3 levels). The data are simulated as the same time as a dataframe is created and the first descriptive statistics is obtained using the method describe.
在实验心理学中,响应时间很多时候都是因变量。 我模拟一个实验,其中因变量是对某些任意目标的响应时间。 此外,模拟数据将具有两个自变量(IV,“ iv1”具有2个级别,“ iv2”具有3个级别)。 在创建数据框的同时对数据进行仿真,并使用描述的方法获得第一个描述性统计信息。
使用熊猫进行描述性统计 (Descriptive statistics using Pandas)
data.describe()
data.describe()
Pandas will output summary statistics by using this method. Output is a table, as you can see below.
熊猫将使用此方法输出摘要统计信息。 输出是一个表,如下所示。