深入学习pandas(1) : 10 minutes to pandas

最新推荐文章于 2021-07-23 15:30:47 发布

PerpetualLearner

最新推荐文章于 2021-07-23 15:30:47 发布

阅读量295

点赞数 1

分类专栏： # 小白学Python 文章标签： pandas

本文链接：https://blog.csdn.net/The_Time_Runner/article/details/108209347

版权

小白学Python 专栏收录该内容

488 篇文章 84 订阅

订阅专栏

pandas

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
10 minutes to pandas

df2 = pd.DataFrame({'A': 1.,
                    'B': pd.Timestamp('20130102'),
                    'C': pd.Series(1, index=list(range(4)), dtype='float32'),
                    'D': np.array([3] * 4, dtype='int32'),
                    'E': pd.Categorical(["test", "train", "test", "train"]),
                    'F': 'foo'})

# filter
df[df['E'].isin(['test'])]


df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index,
                   columns=['A', 'B', 'C', 'D'])
# plot

Intro to data structures

Here is a basic tenet to keep in mind : data alignment is intrinsic.

The link between labels and data will not be broken unless done so explicitly by you.

Series

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.).

The axis labels are collectively referred to as the index.

The basic method to create a Series is to call :

>>> s = pd.Series(data, index=index)

data can be many different things:

a Python dict
an ndarray
a scalar value

The passed index is a list of axis labels. Thus, this separates into a few cases depending on what data is :

# 1. From ndarray
# If data is an ndarray, index must be the same length as data. If no index is passed, one will be created having values [0,...,len(data)-1]
s = pd.Series(np.random.randn(5), index=['a','b', 'c', 'd', 'e'])

# 2. From dict
# The Series index will be ordered by the dict's insertion order(Python>=3.6, Pandas>=0.23), otherwise, the Series index will be the lexically ordered list of dict keys
d = {'b':1, 'a':0, 'c':2}
s = pd.Series(d)
# If an index is passed, the values in data corresponding to the labels in the index will be pulled out
s = pd.Series(d, index=['b', 'c', 'd', 'a'])

# 3. From scalar value
# If data is a scalar value, an index must be provided. The value will be repeated to match the length of index
s = pd.Series(5.0, index=['b', 'c', 'd', 'a'])

Pandas supports non-unique index values. If an operation that does not support duplicate index values is attempted, an exception will be raised at that time.