Python中的Series和DataFrame

by Shubhi Asthana

通过Shubhi Asthana

Python中的Series和DataFrame (Series and DataFrame in Python)

A couple of months ago, I took the online course “Using Python for Research” offered by Harvard University on edX. While taking the course, I learned many concepts of Python, NumPy, Matplotlib, and PyPlot. I also had an opportunity to work on case studies during this course and was able to use my knowledge on actual datasets. For more information about this program, check out here.

几个月前,我参加了哈佛大学在edX上开设的在线课程“使用Python研究”。 在学习本课程的同时,我学习了Python,NumPy,Matplotlib和PyPlot的许多概念。 在本课程中,我还有机会进行案例研究,并能够将我的知识用于实际数据集。 有关此程序的更多信息,请在此处查看

I learned two important concepts in this course — Series and DataFrame. I want to introduce these to you through a short tutorial.

我在本课程中学习了两个重要的概念-Series和DataFrame。 我想通过一个简短的教程向您介绍这些。

To start with the tutorial, lets get the latest source code of Python from the official website here.

要开始本教程,请从此处的官方网站获取最新的Python源代码。

Once you’ve installed Python is installed, you’ll use a graphical user interface called IDLE to work with Python.

安装Python后,将使用一个名为IDLE的图形用户界面来使用Python。

Let’s import Pandas to our workspace. Pandas is a Python library that provides data structures and data analysis tools for different functions.

让我们将Pandas导入我们的工作区。 Pandas是一个Python库,它为不同的功能提供数据结构和数据分析工具。

系列 (Series)

A Series is a one-dimensional object that can hold any data type such as integers, floats and strings. Let’s take a list of items as an input argument and create a Series object for that list.

系列是一维对象,可以保存任何数据类型,例如整数,浮点数和字符串。 让我们以项目列表作为输入参数,并为该列表创建Series对象。

>>> import pandas as pd
>>> x = pd.Series([6,3,4,6])
>>> x
0 6
1 3
2 4
3 6
dtype: int64

The axis labels for the data as referred to as the index. The length of index must be the same as the length of data. Since we have not passed any index in the code above, the default index will be created with values [0, 1, … len(data) -1]

数据的轴标签称为索引。 索引的长度必须与数据的长度相同。 由于我们没有在上面的代码中传递任何索引,因此将使用值[0, 1, … len(data) -1]创建默认索引[0, 1, … len(data) -1]

Lets go ahead and define indexes for the data.

让我们继续为数据定义索引。

>>> x = pd.Series([6,3,4,6], index=[‘a’, ‘b’, ‘c’, ‘d’])
>>> x
a 6
b 3
c 4
d 6
dtype: int64

The index in left most column now refers to data in the right column.

现在,最左边一列的索引引用了右边一列的数据。

We can lookup the data by referring to its index:

我们可以通过引用其索引来查找数据:

>>> x[“c”]
4

Python gives us the relevant data for the index.

Python为我们提供了索引的相关数据。

One example of a data type is the dictionary defined below. The index and values correlate to keys and values. We can use the index to get the values of data corresponding to the labels in the index.

数据类型的一个示例是下面定义的字典。 索引和值与键和值相关。 我们可以使用索引来获取与索引中的标签相对应的数据值。

>>> data = {‘abc’: 1, ‘def’: 2, ‘xyz’: 3}
>>> pd.Series(data)
abc 1
def 2
xyz 3
dtype: int64

Another interesting feature in Series is having data as a scalar value. In that case, the data value gets repeated for each of the indexes defined.

系列中另一个有趣的功能是将数据作为标量值。 在这种情况下,对于定义的每个索引,数据值都会重复。

>>> x = pd.Series(3, index=[‘a’, ‘b’, ‘c’, ‘d’])
>>> x
a 3
b 3
c 3
d 3
dtype: int64

数据框 (DataFrame)

A DataFrame is a two dimensional object that can have columns with potential different types. Different kind of inputs include dictionaries, lists, series, and even another DataFrame.

DataFrame是一个二维对象,可以包含具有潜在不同类型的列。 不同种类的输入包括字典,列表,序列,甚至另一个DataFrame。

It is the most commonly used pandas object.

它是最常用的熊猫对象。

Lets go ahead and create a DataFrame by passing a NumPy array with datetime as indexes and labeled columns:

让我们继续通过传递一个以日期时间为索引和带标签的列的NumPy数组来创建一个DataFrame:

>>> import numpy as np
>>> dates = pd.date_range(‘20170505’, periods = 8)
>>> dates
DatetimeIndex([‘2017–05–05’, ‘2017–05–06’, ‘2017–05–07’, ‘2017–05–08’,
‘2017–05–09’, ‘2017–05–10’, ‘2017–05–11’, ‘2017–05–12’],
dtype=’datetime64[ns]’, freq=’D’)
>>> df = pd.DataFrame(np.random.randn(8,3), index=dates, columns=list(‘ABC’))
>>> df
A B C
2017–05–05 -0.301877 1.508536 -2.065571
2017–05–06 0.613538 -0.052423 -1.206090
2017–05–07 0.772951 0.835798 0.345913
2017–05–08 1.339559 0.900384 -1.037658
2017–05–09 -0.695919 1.372793 0.539752
2017–05–10 0.275916 -0.420183 1.744796
2017–05–11 -0.206065 0.910706 -0.028646
2017–05–12 1.178219 0.783122 0.829979

A DataFrame with a datetime range of 8 days gets created as shown above. We can view the top and bottom rows of the frame using df.head and df.tail:

如上所示,将创建日期时间范围为8天的DataFrame。 我们可以使用df.headdf.tail查看框架的顶部和底部行:

>>> df.head()
A B C
2017–05–05 -0.301877 1.508536 -2.065571
2017–05–06 0.613538 -0.052423 -1.206090
2017–05–07 0.772951 0.835798 0.345913
2017–05–08 1.339559 0.900384 -1.037658
2017–05–09 -0.695919 1.372793 0.539752
>>> df.tail()
A B C
2017–05–08 1.339559 0.900384 -1.037658
2017–05–09 -0.695919 1.372793 0.539752
2017–05–10 0.275916 -0.420183 1.744796
2017–05–11 -0.206065 0.910706 -0.028646
2017–05–12 1.178219 0.783122 0.829979

We can observe a quick statistic summary of our data too:

我们也可以观察到我们数据的快速统计摘要:

>>> df.describe()
A B C
count 8.000000 8.000000 8.000000
mean 0.372040 0.729842 -0.109691
std 0.731262 0.657931 1.244801
min -0.695919 -0.420183 -2.065571
25% -0.230018 0.574236 -1.079766
50% 0.444727 0.868091 0.158633
75% 0.874268 1.026228 0.612309
max 1.339559 1.508536 1.744796

We can also apply functions to the data like cumulative sum, view histograms, merging DataFrames, concatenating and reshaping DataFrames.

我们还可以对数据应用函数,例如累积总和,查看直方图,合并DataFrame,连接和重塑DataFrame。

>>> df.apply(np.cumsum)
A B C
2017–05–05 -0.301877 1.508536 -2.065571
2017–05–06 0.311661 1.456113 -3.271661
2017–05–07 1.084612 2.291911 -2.925748
2017–05–08 2.424171 3.192296 -3.963406
2017–05–09 1.728252 4.565088 -3.423654
2017–05–10 2.004169 4.144905 -1.678858
2017–05–11 1.798104 5.055611 -1.707504
2017–05–12 2.976322 5.838734 -0.877526

You can read more details about these data structures here.

您可以在此处阅读有关这些数据结构的更多详细信息。

翻译自: https://www.freecodecamp.org/news/series-and-dataframe-in-python-a800b098f68/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值