python 描述性统计_Python中的基本统计：描述性统计

最新推荐文章于 2024-09-08 09:00:00 发布

cumei1658

最新推荐文章于 2024-09-08 09:00:00 发布

阅读量3.7k

点赞数 1

文章标签： python 机器学习人工智能数据分析编程语言

原文链接：https://www.pybloggers.com/2018/07/basic-statistics-in-python-descriptive-statistics/

版权

本文介绍了统计学的基础知识，重点是描述性统计，包括集中趋势的度量（如平均值、中位数和模式）和传播措施（如范围、四分位数范围、标准偏差和方差）。文章通过葡萄酒评分数据集为例，阐述了如何使用Python进行统计分析，并强调了正确理解和应用统计学的重要性。

摘要由CSDN通过智能技术生成

python 描述性统计

The field of statistics is often misunderstood, but it plays an essential role in our everyday lives. Statistics, done correctly, allows us to extract knowledge from the vague, complex, and difficult real world. Wielded incorrectly, statistics can be used to harm and mislead. A clear understanding of statistics and the meanings of various statistical measures is important to distinguishing between truth and misdirection.

统计领域经常被误解，但在我们的日常生活中起着至关重要的作用。正确完成的统计数据使我们能够从模糊，复杂和困难的现实世界中提取知识。错误地使用统计信息可能会造成伤害和误导。清楚地了解统计数据和各种统计方法的含义对于区分真相和误导很重要。

We will cover the following in this article:

我们将在本文中介绍以下内容：

defining statistics
descriptive statistics
- measures of central tendency
- measures of spread

定义统计
描述性统计
- 集中趋势的量度
- 传播措施

先决条件： (Prerequisites:)

This article assumes no prior knowledge of statistics, but does require at least a general knowledge of Python. If you are uncomfortable with for loops and lists, I recommend covering them briefly before progressing.

本文假定您没有统计学的先验知识，但至少需要具备Python的一般知识。如果您对for循环和列表不满意，建议您在进行操作之前简要介绍一下它们。

载入我们的数据 (Loading in our data)

We will root our discussion of statistics in real-world data, taken from Kaggle’s Wine Reviews data set. The data itself comes from a scraper that scoured the Wine Enthusiast site.

我们将对统计数据的讨论植根于来自Kaggle的Wine Reviews数据集的真实数据。数据本身来自刮擦酒爱好者网站的刮板。

For the sake of this article, let’s say that you are a sommelier-in-training, a new wine taster. You found this interesting data set on wines, and you would like to compare and contrast different wines. You’ll use statistics to describe the wines in the data set and derive some insights for yourself. Perhaps we can start our training with a cheap set of wines, or the most highly rated ones?

就本文而言，假设您是一位培训侍酒师，是一名新的葡萄酒品尝师。您找到了有关葡萄酒的有趣数据集，并且想要比较和对比不同的葡萄酒。您将使用统计数据来描述数据集中的葡萄酒，并为自己得出一些见解。也许我们可以从便宜的葡萄酒或评级最高的葡萄酒开始我们的培训？

The code below loads in the data set wine-data.csv into a variable wines as list of lists. We’ll perfrom statistics on wines throughout the article. You can use this code to follow along on your own computer.

下面的代码将数据集wine-data.csv装入列表中的变量wines中。在整篇文章中，我们将对wines进行统计。您可以使用此代码在自己的计算机上继续学习。

import csv
with open("wine-data.csv", "r", encoding="latin-1") as f:
    wines = list(csv.reader(f))
import csv
with open("wine-data.csv", "r", encoding="latin-1") as f:
    wines = list(csv.reader(f))

Let’s have a brief look at the first five rows of the data in table, so we can see what kinds of values we’re working with.

让我们简要看一下表中数据的前五行，这样我们就可以看到我们正在使用哪种类型的值。

index	指数	country	国家	description	描述	designation	指定	points	点数	price	价钱	province	省	region_1	region_1	region_2	region_2	variety	品种	winery	酒厂
0	0	US	我们	“This tremendous 100%…”	“这真是百分百……”	Martha’s Vineyard	玛莎葡萄园岛	96	96	235	235	California	加利福尼亚州	Napa Valley	纳帕谷	Napa	纳帕	Cabernet Sauvignon	赤霞珠	Heitz	海兹
1	1个	Spain	西班牙	“Ripe aromas of fig…	“无花果的成熟香气……	Carodorum Selecci Especial Reserva	Carodorum Selecci特别储备	96	96	110	110	Northern Spain	西班牙北部	Toro	托罗			Tinta de Toro	Tinta de Toro	Bodega Carmen Rodriguez	Bodega卡门·罗德里格斯（Bodega Carmen Rodriguez）
2	2	US	我们	“Mac Watson honors…	“ Mac Watson荣幸……	Special Selected Late Harvest	特别精选晚收	96	96	90	90	California	加利福尼亚州	Knights Valley	骑士谷	Sonoma	索诺玛	Sauvignon Blanc	长相思	Macauley	麦考利
3	3	US	我们	“This spent 20 months…	“这花了20个月……	Reserve	保留	96	96	65	65	Oregon	俄勒冈州	Willamette Valley	威拉米特山谷	Willamette Valley	威拉米特山谷	Pinot Noir