pandas数据结构简介

最新推荐文章于 2022-06-25 19:41:04 发布

wyc-

最新推荐文章于 2022-06-25 19:41:04 发布

阅读量283

点赞数 1

分类专栏： pandas

本文链接：https://blog.csdn.net/qq_28120673/article/details/103473276

版权

pandas 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

pandas数据结构简介

首先介绍基础的数据结构。基础行为包括 data types, indexing, axis labeling / alignment, 它们试用于所以对象。

import numpy as np
import pandas as pd

有一个基础原则：**data alignment is intrinsic.**标签和数据之间的连接不会断开，除非手动断开。

首先简要介绍数据结构，然后单独介绍功能和方法的广泛类别。

Series

Series是一维标记的数组，能够保存任何类型的数据（integers,strings,floating point numbers,python objects,etc）。
axis labels被统称为index。
创建Series的基础方法是：

s = pd.Series(data,index=index)

其中的data参数可以是：

a Python dict
an ndarray
a scalar value (like 5)

其中的index参数是一个表示axis labels标签的列表。

根据data的不同，可以分为以下几种情况：

ndarray:

如果data是ndarray,index必须与data长度相同。如果没有index参数传入，那么将会自动创建index为: 0,1,2,3,…

s = pd.Series(np.random.randn(5),index=['a','b','c','d','e'])

a   -0.050107
b   -0.275637
c    0.022653
d    0.677512
e    0.497479
dtype: float64

s.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

pd.Series(np.random.randn(5))

0   -1.190514
1   -2.139007
2   -0.195262
3    0.114374
4    0.239553
dtype: float64

dict:

series可以根据dicts实例化。

d = {'b':1,'a':0,'c':2}

pd.Series(d)

b    1
a    0
c    2
dtype: int64

如果python版本<3.6或者pandas<0.23，上述的Series的index顺序是[‘a’, ‘b’, ‘c’]，而不是[‘b’, ‘a’, ‘c’].

如果传入了index参数，那么dict在的key生成的index将被覆盖。

Scalar value:

如果传入的data是标量数据，那么index参数必须提供，Series中的value将自动复制data，使其长度与index相等。

pd.Series(5.,index=['a','b','c'])

a    5.0
b    5.0
c    5.0
dtype: float64

Series是一个ndarray-like

Series与ndarray非常类似，并且对大多数numpy函数来说都是有效的参数。然而，切片操作等操作也切割index.

s[0]

-0.05010660876964625

s[:3]

a   -0.050107
b   -0.275637
c    0.022653
dtype: float64

s[s > s.median()]

d    0.677512
e    0.497479
dtype: float64

s[[4,2,1]]

e    0.497479
c    0.022653
b   -0.275637
dtype: float64

np.exp(s)

a    0.951128
b    0.759089
c    1.022911
d    1.968973
e    1.644570
dtype: float64

和numpy一样，Series也有一个dtype。

s.dtype

dtype('float64')

Series的dtype通常是numpy的dtype，然而，pandas和其他三方库扩展了numpy的dtpye。可以参考dtypes

使用Series.array可以查看Series的实际数组.

s.array

<PandasArray>
[-0.05010660876964625, -0.27563658449690015,  0.02265290627681807,
   0.6775122610456474,   0.4974791479965988]
Length: 5, dtype: float64

Series.array是一个extensionArray。大体上说，ExtensionArray包装了了一个或多个具体的arrays，例如numpy.ndarray。pandas知道如何获取一个ExtensionArray并将其存储在一个Series中或DataFrame的column中。

Series是一个ndarray-like，入股需要一个实际的ndarray,可以使用Series.numpy()。

s.to_numpy()

array([-0.05010661, -0.27563658,  0.02265291,  0.67751226,  0.49747915])

dict-like

一个Series就像是一个固定尺寸的dict，它可以通过index label来获取和设置values。

s['a']

-0.05010660876964625

s['e']=12

a    -0.050107
b    -0.275637
c     0.022653
d     0.677512
e    12.000000
dtype: float64

'e' in s

True

'w' in s

False

如果不包含要查询的标签，将会返回错误。

# s['w']
# KeyError: 'f'

get()方法可以设置在没有要查询的属性时，默认返回值。如果没有设置默认值将返回None,并不会报错。

s.get('w',np.nan)

nan

print(s.get('w'))

None

序列化操作和Series的标签对齐

Series可以作为ndarray传入到numpy的方法中。

s + s

a    -0.100213
b    -0.551273
c     0.045306
d     1.355025
e    24.000000
dtype: float64

s *2

a    -0.100213
b    -0.551273
c     0.045306
d     1.355025
e    24.000000
dtype: float64

np.exp(s)

a         0.951128
b         0.759089
c         1.022911
d         1.968973
e    162754.791419
dtype: float64

Series和ndarray的一个主要区别是：Series的自动对齐是根据label的。
所以可以编写计算而不用考虑，涉及到的Series是否有相同的label。

s[1:]

b    -0.275637
c     0.022653
d     0.677512
e    12.000000
dtype: float64

s[:-1]

a   -0.050107
b   -0.275637
c    0.022653
d    0.677512
dtype: float64

s[1:]+s[:-1]

a         NaN
b   -0.551273
c    0.045306
d    1.355025
e         NaN
dtype: float64

Name属性

series有一个name属性。

s = pd.Series(np.random.randn(5),name='something')

0   -0.990679
1   -0.703465
2    0.689987
3    0.709681
4    0.186647
Name: something, dtype: float64

pandas.Series.rename()方法可以更改名字。

DataFrame

DataFrame是2维标签数据。它的每一列可以是不同的数据。可以把它看作是SQL table。
DataFrame接受许多不同类型的输入：

Dict of 1D ndarrays, lists, dicts, or Series
2-D numpy.ndarray
Structured or record ndarray
A Series
Another DataFrame

除了data参数，还可以传入index(row labels)和columns(column labels)参数。

根据Series的dict或者dicts创建DataFrame

得到的index将是所有series的index的并集。如果没有传入columns，传入的dict的keys排序后作为columns.

d = {
    'one':pd.Series([1.,2.,3.],index=['a','b','c']),
    'two':pd.Series([1.,2.,3.,4.],index=['a','b','c','d'])
}

df=pd.DataFrame(d)

df

	one	two
a	1.0	1.0
b	2.0	2.0
c	3.0	3.0
d	NaN	4.0

pd.DataFrame(d,index=['d','b','a'])

	one	two
d	NaN	4.0
b	2.0	2.0
a	1.0	1.0

pd.DataFrame(d, index=['d', 'b', 'a'], columns=['two', 'three'])

	two	three
d	4.0	NaN
b	2.0	NaN
a	1.0	NaN

根据ndarrays或者lists的dict创建DataFrame

ndarrays必须长度相同。如果传入了index参数，它也必须和arrays的长度相同。

d = {'one':[1.,2.,3.,4.],
    'two':[4.,3.,2.,1.]}

pd.DataFrame(d)

	one	two
0	1.0	4.0
1	2.0	3.0
2	3.0	2.0
3	4.0	1.0

pd.DataFrame(d, index=['a', 'b', 'c', 'd'])

	one	two
a	1.0	4.0
b	2.0	3.0
c	3.0	2.0
d	4.0	1.0

根据结构化或记录数组创建DataFrame

这种情况的处理与数组的dict相同。

data = np.zeros(
    (2,),
    dtype=[('A','i4'),('B','f4'),('C','a10')]
)

data

array([(0, 0., b''), (0, 0., b'')],
      dtype=[('A', '<i4'), ('B', '<f4'), ('C', 'S10')])

data[:]=[
    (1,2.,'Hello'),
    (2,3.,'World')
]

data

array([(1, 2., b'Hello'), (2, 3., b'World')],
      dtype=[('A', '<i4'), ('B', '<f4'), ('C', 'S10')])

pd.DataFrame(data)

	A	B	C
0	1	2.0	b'Hello'
1	2	3.0	b'World'

pd.DataFrame(data,index=['first','second'])

	A	B	C
first	1	2.0	b'Hello'
second	2	3.0	b'World'

 pd.DataFrame(data, columns=['C', 'A', 'B'])

	C	A	B
0	b'Hello'	1	2.0
1	b'World'	2	3.0

根据dicts的list创建DataFrame

data2 = [
    {'a':1,'b':2},
    {'a':5,'b':10,'c':20}
]

data2

[{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]

pd.DataFrame(data2)

	a	b	c
0	1	2	NaN
1	5	10	20.0

pd.DataFrame(data2, index=['first', 'second'])

	a	b	c
first	1	2	NaN
second	5	10	20.0

pd.DataFrame(data2, columns=['a', 'b'])

	a	b
0	1	2
1	5	10

根据tuples的dict创建DataFrame

根据元组字典创建。

pd.DataFrame({
    ('a','b'):{('A','B'):1,('A','C'):2},
    ('a','a'):{('A','C'):3,('A','B'):4},
    ('a','c'):{('A','B'):5,('A','C'):6},
    ('b','a'):{('A','C'):7,('A','B'):8},
    ('b','b'):{('A','D'):9,('A','B'):10}
})

		a			b
		b	a	c	a	b
A	B	1.0	4.0	5.0	8.0	10.0
	C	2.0	3.0	6.0	7.0	NaN
	D	NaN	NaN	NaN	NaN	9.0

根据Series创建DataFrame

创建的DataFrame的index和Series相同。column名称将会使用Series的名称。

s = pd.Series([1,2,3,4],index=['A','B','C','D'],name='test')
pd.DataFrame(s)

	test
A	1
B	2
C	3
D	4

s2 = pd.Series([1,2,3,4],index=['A','B','C','D'],name='test2')

pd.DataFrame((s,s2))

	A	B	C	D
test	1	2	3	4
test2	1	2	3	4

构造器

DataFrame.from_dict():

除了orient参数默认为’columns’之外，它的操作类似于DataFrame构造函数，但可以将其设置为’index’以便将dict键用作行标签。

pd.DataFrame.from_dict(dict([('A', [1, 2, 3]), ('B', [4, 5, 6])]))

	A	B
0	1	4
1	2	5
2	3	6

pd.DataFrame.from_dict(dict([('A', [1, 2, 3]), ('B', [4, 5, 6])]), orient='index', columns=['one', 'two', 'three'])

	one	two	three
A	1	2	3
B	4	5	6

DataFrame.from_records():

它接受一个tuplies的list或者一个有结构化数据的ndarray参数。除了生成的DataFrame索引可能是结构化dtype的特定字段之外，它的功能与正常的DataFrame构造器类似。

data

array([(1, 2., b'Hello'), (2, 3., b'World')],
      dtype=[('A', '<i4'), ('B', '<f4'), ('C', 'S10')])

pd.DataFrame.from_records(data, index='C')

	A	B
C
b'Hello'	1	2.0
b'World'	2	3.0

pd.DataFrame.from_records(data)

	A	B	C
0	1	2.0	b'Hello'
1	2	3.0	b'World'

column的选择，添加和删除

df['one']

a    1.0
b    2.0
c    3.0
d    NaN
Name: one, dtype: float64

df['three'] = df['one'] * df['two']

df['flag'] = df['one'] > 2

df

	one	two	three	flag
a	1.0	1.0	1.0	False
b	2.0	2.0	4.0	False
c	3.0	3.0	9.0	True
d	NaN	4.0	NaN	False

del df['two']

df['foo'] = 'bar'

df

	one	three	flag	foo
a	1.0	1.0	False	bar
b	2.0	4.0	False	bar
c	3.0	9.0	True	bar
d	NaN	NaN	False	bar

当插入series时，插入的Series可能和DataFrame的index不同，它将会遵从DataFrame的index

df['one_trunc']=df['one'][:2]

df

	one	three	flag	foo	one_trunc
a	1.0	1.0	False	bar	1.0
b	2.0	4.0	False	bar	2.0
c	3.0	9.0	True	bar	NaN
d	NaN	NaN	False	bar	NaN

也可以插入ndarray，但是它的长度必须和DataFrame的长度保持一致。

默认情况columns插入到最后，但是也可以指定插入位置。

df.insert(1,'bar',df['one'])

df

	one	bar	three	flag	foo	one_trunc
a	1.0	1.0	1.0	False	bar	1.0
b	2.0	2.0	4.0	False	bar	2.0
c	3.0	3.0	9.0	True	bar	NaN
d	NaN	NaN	NaN	False	bar	NaN

在方法链中赋值新的columns

assign()方法可以容易的创建源自现有columns的新columns

iris = pd.DataFrame([
    [5.1, 3.5, 1.4, 0.2,'Iris-setosa'],
    [4.9, 3.0,1.4,0.2,'Iris-setosa'],
    [4.7, 3.2,1.3, 0.2,'Iris-setosa'],
    [4.6, 3.1,1.5,0.2,'Iris-setosa'],
    [5.0, 3.6,1.4,0.2,'Iris-setosa']
],columns=[ 'SepalLength','SepalWidth','PetalLength','PetalWidth','Name'])

iris

	SepalLength	SepalWidth	PetalLength	PetalWidth	Name
0	5.1	3.5	1.4	0.2	Iris-setosa
1	4.9	3.0	1.4	0.2	Iris-setosa
2	4.7	3.2	1.3	0.2	Iris-setosa
3	4.6	3.1	1.5	0.2	Iris-setosa
4	5.0	3.6	1.4	0.2	Iris-setosa

iris.assign(sepal_ratio=iris.SepalWidth / iris.SepalLength )

	SepalLength	SepalWidth	PetalLength	PetalWidth	Name	sepal_ratio
0	5.1	3.5	1.4	0.2	Iris-setosa	0.686275
1	4.9	3.0	1.4	0.2	Iris-setosa	0.612245
2	4.7	3.2	1.3	0.2	Iris-setosa	0.680851
3	4.6	3.1	1.5	0.2	Iris-setosa	0.673913
4	5.0	3.6	1.4	0.2	Iris-setosa	0.720000

在上面的示例中，我们插入了一个预先计算的值。我们还可以传入一个参数的函数，以在分配给它的DataFrame上求值。

iris.assign(sepal_ratio=lambda x: (x['SepalWidth'] / x['SepalLength']))

	SepalLength	SepalWidth	PetalLength	PetalWidth	Name	sepal_ratio
0	5.1	3.5	1.4	0.2	Iris-setosa	0.686275
1	4.9	3.0	1.4	0.2	Iris-setosa	0.612245
2	4.7	3.2	1.3	0.2	Iris-setosa	0.680851
3	4.6	3.1	1.5	0.2	Iris-setosa	0.673913
4	5.0	3.6	1.4	0.2	Iris-setosa	0.720000

Assign始终返回数据的副本，而原始DataFrame保持不变

如果仅仅是想查看某些属性，不想把它加入到DataFrame中，assgin方法就非常有用。
以下是是一个示例：

iris.query('SepalLength > 5').assign(SepalRatio=lambda x: x.SepalWidth / x.SepalLength,
                                     PetalRatio=lambda x: x.PetalWidth / x.PetalLength).plot(kind='scatter', x='SepalRatio', y='PetalRatio')

<matplotlib.axes._subplots.AxesSubplot at 0x216b3606388>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ouZQj4bh-1575951188410)(output_117_1.png)]

需要注意的是以下代码在<=python3.5版本和>3.5版本中的结果不同：

dependent = pd.DataFrame({"A": [1, 1, 1]})
dependent.assign(A=lambda x: x["A"] + 1, B=lambda x: x["A"] + 2)

	A	B
0	2	4
1	2	4
2	2	4

在python3.5中结果为：

	A	B
0	2	3
1	2	3
2	2	3

在python3.6中结果为：

	A	B
0	2	4
1	2	4
2	2	4

索引和选择

基础的索引操作如下：

Operation	Syntax	Result
Select column	df[col]	Series
Select row by label	df.loc[label]	Series
Select row by integer location	df.iloc[loc]	Series
Slice rows	df[5:10]	DataFrame
Select rows by boolean vector	df[bool_vec]	DataFrame

数据对齐和计算

DataFrame有自动对齐数据的功能。

df = pd.DataFrame(np.random.randn(10, 4), columns=['A', 'B', 'C', 'D'])

df2 = pd.DataFrame(np.random.randn(7, 3), columns=['A', 'B', 'C'])

df+df2

	A	B	C	D
0	-3.709482	-1.315695	1.034033	NaN
1	-1.476441	1.368839	-1.693066	NaN
2	0.253196	-1.538661	2.305911	NaN
3	0.825168	0.032810	-4.019238	NaN
4	2.269895	-0.356334	-2.033594	NaN
5	0.822753	-0.644412	0.445278	NaN
6	-1.011380	0.984249	0.114061	NaN
7	NaN	NaN	NaN	NaN
8	NaN	NaN	NaN	NaN
9	NaN	NaN	NaN	NaN

当在DataFrame和Series上操作时，默认行为是将Series的index对齐到DataFrame的columns上。

df - df.iloc[0]

	A	B	C	D
0	0.000000	0.000000	0.000000	0.000000
1	1.621431	1.338117	-1.811021	-1.321338
2	1.543036	-1.885117	1.471726	0.039261
3	2.348554	0.224665	-2.174917	0.551201
4	2.216406	-0.035677	-0.545085	1.691566
5	3.011986	-1.166244	-0.229181	-0.818974
6	1.326064	0.913373	0.634288	0.391074
7	1.713530	1.888000	-0.207436	-1.166970
8	0.764998	-0.517934	0.714397	0.869612
9	2.622506	-1.000590	0.580470	0.564451

df-df['A'] # 默认是在行中查找，当没有符合的数据时，返回的都是NaN

	A	B	C	D	0	1	2	3	4	5	6	7	8	9
0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
5	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
6	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
7	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
8	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
9	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

上面的指令应该改为：

df.sub(df['A'],axis=0)

	B	C	D
0	1.952358	1.721412	1.930427
1	1.669043	-1.711040	-1.012343
2	-1.475796	1.650101	0.426651
3	-0.171531	-2.802059	0.133073
4	-0.299725	-1.040079	1.405587
5	-2.225871	-1.519755	-1.900533
6	1.539667	1.029636	0.995437
7	2.126827	-0.199555	-0.950073
8	0.669426	1.670811	2.035041
9	-1.670737	-0.320624	-0.127628

使用标量运算正如您所期望的:

df * 5 + 2

	A	B	C	D
0	-7.148242	2.613548	1.458817	2.503892
1	0.958915	9.304131	-7.596286	-4.102799
2	0.566941	-6.812037	8.817447	2.700196
3	4.594530	3.736874	-9.415766	5.259895
4	3.933788	2.435162	-1.266607	10.961723
5	7.911687	-3.217670	0.312914	-1.590980
6	-0.517924	7.180412	4.630255	4.459262
7	1.419410	12.053547	0.421637	-3.330956
8	-3.323251	0.023879	5.030804	6.851952
9	5.964286	-2.389400	4.361168	5.326146

1 / df

	A	B	C	D
0	-0.546553	8.149320	-9.239015	9.922759
1	-4.802683	0.684544	-0.521035	-0.819296
2	-3.489039	-0.567406	0.733412	7.140856
3	1.927131	2.878735	-0.437991	1.533792
4	2.585598	11.489986	-1.530640	0.557928
5	0.845782	-0.958282	-2.963690	-1.392378
6	-1.985763	0.965174	1.900956	2.033131
7	-8.611933	0.497337	-3.167839	-0.937918
8	-0.939276	-2.530209	1.649727	1.030513
9	1.261261	-1.139108	2.117596	1.503241

布尔运算符也可以工作：

 df1 = pd.DataFrame({'a': [1, 0, 1], 'b': [0, 1, 1]}, dtype=bool)

df2 = pd.DataFrame({'a': [0, 1, 1], 'b': [1, 1, 0]}, dtype=bool)

df1 & df2

	a	b
0	False	False
1	False	True
2	True	False

转置

df.T

	0	1	2	3	4	5	6	7	8	9
A	-1.829648	-0.208217	-0.286612	0.518906	0.386758	1.182337	-0.503585	-0.116118	-1.064650	0.792857
B	0.122710	1.460826	-1.762407	0.347375	0.087032	-1.043534	1.036082	2.010709	-0.395224	-0.877880
C	-0.108237	-1.919257	1.363489	-2.283153	-0.653321	-0.337417	0.526051	-0.315673	0.606161	0.472234
D	0.100778	-1.220560	0.140039	0.651979	1.792345	-0.718196	0.491852	-1.066191	0.970390	0.665229

DataFrame与NumPy函数的互操作

假设其中的数据是数字，则可以逐元素使用NumPy ufuncs（log，exp，sqrt等）和其他各种NumPy函数，在Series和DataFrame上都不会出现问题：

np.exp(df)

	A	B	C	D
0	0.160470	1.130556	0.897415	1.106032
1	0.812031	4.309519	0.146716	0.295065
2	0.750803	0.171631	3.909813	1.150319
3	1.680189	1.415347	0.101962	1.919335
4	1.472200	1.090932	0.520315	6.003512
5	3.261990	0.352208	0.713611	0.487631
6	0.604360	2.818155	1.692236	1.635343
7	0.890370	7.468613	0.729298	0.344317
8	0.344848	0.673529	1.833379	2.638975
9	2.209701	0.415663	1.603572	1.944936

np.asanyarray(df)

array([[-1.82964832,  0.12270962, -0.10823665,  0.10077842],
       [-0.20821696,  1.46082625, -1.91925727, -1.22055986],
       [-0.28661189, -1.76240741,  1.36348942,  0.14003923],
       [ 0.51890601,  0.34737476, -2.28315323,  0.65197895],
       [ 0.38675768,  0.0870323 , -0.65332134,  1.79234468],
       [ 1.18233731, -1.04353408, -0.33741721, -0.7181959 ],
       [-0.50358472,  1.03608235,  0.52605097,  0.49185231],
       [-0.11611795,  2.01070936, -0.31567263, -1.06619111],
       [-1.06465022, -0.3952242 ,  0.60616076,  0.9703904 ],
       [ 0.79285728, -0.87788004,  0.47223364,  0.66522921]])

DataFrame并不打算替代ndarray，因为它的索引语义和数据模型与n维数组在某些地方有很大的不同。

版本0.25.0中的变化:当多个系列被传递给ufunc时，它们在执行操作之前是对齐的。
例如，在两个具有不同顺序标签的系列上使用numpy. residual()将在操作之前对齐。

ser1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
ser2 = pd.Series([1, 3, 5], index=['b', 'a', 'c'])

ser1

a    1
b    2
c    3
dtype: int64

ser2

b    1
a    3
c    5
dtype: int64

 np.remainder(ser1, ser2) # 求余数

a    1
b    0
c    3
dtype: int64

当二进制ufunc应用于一个series和index时，series实现优先并，返回一个系列。

ser = pd.Series([1, 2, 3]) # 一个Series
idx = pd.Index([4, 5, 6]) # 一个Index
np.maximum(ser, idx)

0    4
1    5
2    6
dtype: int64

NumPy ufuncs可以安全地应用于由非ndarray数组支持的序列，例如SparseArray(参见稀疏计算)。如果可能，应用ufunc时不需要将底层数据转换为ndarray。

控制台显示

非常大的DataFrame将被截断以在控制台中显示它们。您还可以使用info（）获得摘要。

如果DataFrame的宽度太大时，可以通过设置display.width 选项来更改在单行上打印的数量：

pd.set_option('display.width',10)  # default is 80  在jupyter中不起作用。
pd.DataFrame(np.random.randn(3, 12))

	0	1	2	3	4	5	6	7	8	9	10	11
0	1.019574	-1.159182	1.353618	0.637682	0.813734	-0.063821	-0.678584	-2.029914	-0.334250	-1.855452	0.267427	0.159262
1	-0.644825	-0.299352	-1.103211	1.296674	2.638383	0.389328	-0.078553	0.700434	-0.768123	0.101834	-0.472484	0.346692
2	0.844621	-0.082751	1.801776	0.106621	-1.405854	1.105250	1.174156	-2.414765	0.335145	0.148878	-0.723033	-0.186628

访问DataFram的column

访问DataFrame的column主要有两种方法。

df = pd.DataFrame({'foo1': np.random.randn(5), 'foo2': np.random.randn(5)})

df

	foo1	foo2
0	-2.011344	-1.554834
1	0.090704	1.385963
2	0.884089	1.258341
3	1.756175	-1.526961
4	0.356461	-0.958286

df['foo1'] # 方法一

0   -2.011344
1    0.090704
2    0.884089
3    1.756175
4    0.356461
Name: foo1, dtype: float64

df.foo1 # 方法二

0   -2.011344
1    0.090704
2    0.884089
3    1.756175
4    0.356461
Name: foo1, dtype: float64

wyc-

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
pandas数据结构简介

pandas数据结构简介文章目录pandas数据结构简介SeriesSeries是一个ndarray-likedict-like序列化操作和Series的标签对齐Name属性DataFrame根据Series的dict或者dicts创建DataFrame根据ndarrays或者lists的dict创建DataFrame根据结构化或记录数组创建DataFrame根据dicts的list创建Data...
复制链接

扫一扫

专栏目录

	A	B	C	D	0	1	2	3	4	5	6	7	8	9
0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
5	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
6	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
7	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
8	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
9	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

	A	B	C	D	0	1	2	3	4	5	6	7	8	9
0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
5	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
6	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
7	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
8	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
9	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

pandas数据结构简介

pandas数据结构简介

文章目录

Series

Series是一个ndarray-like

dict-like

序列化操作和Series的标签对齐

Name属性

DataFrame

根据Series的dict或者dicts创建DataFrame

根据ndarrays或者lists的dict创建DataFrame

根据结构化或记录数组创建DataFrame

根据dicts的list创建DataFrame

根据tuples的dict创建DataFrame

根据Series创建DataFrame

构造器

column的选择，添加和删除

在方法链中赋值新的columns

索引和选择

数据对齐和计算

转置

DataFrame与NumPy函数的互操作

控制台显示

访问DataFram的column

“相关推荐”对你有帮助么？

	A	B	C	D	0	1	2	3	4	5	6	7	8	9
0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
5	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
6	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
7	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
8	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
9	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN