pandas-DataFrame

最新推荐文章于 2024-08-23 09:48:05 发布

z0n1l2

最新推荐文章于 2024-08-23 09:48:05 发布

阅读量671

点赞数

分类专栏：数据分析 pandas 文章标签： pandas

数据分析同时被 2 个专栏收录

6 篇文章 0 订阅

订阅专栏

pandas

3 篇文章 0 订阅

订阅专栏

pandas DataFrame是一种带标签的二维数据结构，支持多种类型的数据来源，如dict、Series、ndarray等。它提供了丰富的数据操作功能，如缺失数据处理、列的选择、添加和删除、数据对齐和算术运算。DataFrame与其他数据结构如numpy的交互性良好，还支持在终端上的显示控制。此外，可以通过assign()函数在方法链中添加新列，方便进行数据处理和分析。

摘要由CSDN通过智能技术生成

重点

index是行,column是列

index/selection

Operation	Syntax	Result
Select column	df[col]	Series
Select row by label	df.loc[label]	Series
Select row by integer location	df.iloc[loc]	Series
Slice rows	df[5:10]	DataFrame
Select rows by boolean vector	df[bool_vec]	DataFrame

info(),describe(),pd.set_option()

源

DataFrame是带标签的二维数据结构,不同列可以支持不同的类型.可以把它想象成电子表格或SQL表或Series的dict.它是pandas中最常用的数据结果. 和Serise一样,DataFrame支持多种不同类型:
* Dict of 1D ndarray, list, dict, or Series
* 2-D numpy.ndarray
* 以结构或记录为元素的ndarray
* Series
* DataFrame
可以选择设置index(行索引)和column(列索引)参数.

From dict of Series or dict

取所有Series的index的并作为输出的index.嵌套的dict会先转换成Series.不设置列时,默认使用字典关键字次序.
译注:创建DataFrame时,如果没有传入index/column,则使用输入数据对应维度的并集,否则只生成输入参数指定的index/column

In [34]: d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
   ....:      'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
   ....: 

In [35]: df = pd.DataFrame(d)

In [36]: df
Out[36]: 
   one  two
a  1.0  1.0
b  2.0  2.0
c  3.0  3.0
d  NaN  4.0

In [37]: pd.DataFrame(d, index=['d', 'b', 'a'])
Out[37]: 
   one  two
d  NaN  4.0
b  2.0  2.0
a  1.0  1.0

In [38]: pd.DataFrame(d, index=['d', 'b', 'a'], columns=['two', 'three'])
Out[38]: 
   two three
d  4.0   NaN
b  2.0   NaN
a  1.0   NaN

通过index和columns方位行和列

In [39]: df.index
Out[39]: Index(['a', 'b', 'c', 'd'], dtype='object')

In [40]: df.columns
Out[40]: Index(['one', 'two'], dtype='object'

From dict of ndarray/lists

ndarray的长度需要一致,传入的index长度也要一致.不指定index,则长度就是ndarray长度

In [41]: d = {'one' : [1., 2., 3., 4.],
   ....:      'two' : [4., 3., 2., 1.]}
   ....: 

In [42]: pd.DataFrame(d)
Out[42]: 
   one  two
0  1.0  4.0
1  2.0  3.0
2  3.0  2.0
3  4.0  1.0

In [43]: pd.DataFrame(d, index=['a', 'b', 'c', 'd'])
Out[43]: 
   one  two
a  1.0  4.0
b  2.0  3.0
c  3.0  2.0
d  4.0  1.0

From structured or record array

只针对字典数组

In [44]: data = np.zeros((2,), dtype=[('A', 'i4'),('B', 'f4'),('C', 'a10')])

In [45]: data[:] = [(1,2.,'Hello'), (2,3.,"World")]

In [46]: pd.DataFrame(data)
Out[46]: 
   A    B         C
0  1  2.0  b'Hello'
1  2  3.0  b'World'

In [47]: pd.DataFrame(data, index=['first', 'second'])
Out[47]: 
        A    B         C
first   1  2.0  b'Hello'
second  2  3.0  b'World'

In [48]: pd.DataFrame(data, columns=['C', 'A', 'B'])
Out[48]: 
          C  A    B
0  b'Hello'  1  2.0
1  b'World'  2  3.0

注意: DataFrame和2D numpy ndarray有差别

From a list of dicts

译注:一个字典对应一行, 字典的key对应列

In [49]: data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]

In [50]: pd.DataFrame(data2)
Out[50]: 
   a   b     c
0  1   2   NaN
1  5  10  20.0

In [51]: pd.DataFrame(data2, index=['first', 'second'])
Out[51]: 
        a   b     c
first   1   2   NaN
second  5  10  20.0

In [52]: pd.DataFrame(data2, columns=['a', 'b'])
Out[52]: 
   a   b
0  1   2
1  5  10

From a dict of tuples

利用元组+字典可以创建multi-indexed帧

In [53]: pd.DataFrame({('a', 'b'): {('A', 'B'): 1, ('A', 'C'): 2},
   ....:               ('a', 'a'): {('A', 'C'): 3, ('A', 'B'): 4},
   ....:               ('a', 'c'): {('A', 'B'): 5, ('A', 'C'): 6},
   ....:               ('b', 'a'): {('A', 'C'): 7, ('A', 'B'): 8},
   ....:               ('b', 'b'): {('A', 'D'): 9, ('A', 'B'): 10}})
   ....: 
Out[53]: 
       a              b      
       b    a    c    a     b
A B  1.0  4.0  5.0  8.0  10.0
  C  2.0  3.0  6.0  7.0   NaN
  D  NaN  NaN  NaN  NaN   9.0

From a Series

以Series的行作为DataFrame的行,Series的名字作为DataFrame的列(除非另行指定)

Missing Data

用np.nan表示miss data可以创建带有miss data的DataFrame.也可以用numpy.MaskArray指定哪些元素属于miss data

Alternate Constructors

DataFrame.from_dict

输入以dict或类似数组的序列为元素的dict,输出DataFrame.其和DataFrame构造函数类似,但有个orient参数,可以指定dict中每个dict的key是用来作为列还是行.

In [54]: pd.DataFrame.from_dict(dict([('A', [1, 2, 3]), ('B', [4, 5, 6])]))
Out[54]: 
   A  B
0  1  4
1  2  5
2  3  6

默认orient=’columns’,如果设置成orient=’index’, dict的key将作为行索引使用.

In [55]: pd.DataFrame.from_dict(dict([('A', [1, 2, 3]), ('B', [4, 5, 6])]),
   ....:                        orient='index', columns=['one', 'two', 'three'])
   ....: 
Out[55]: 
   one  two  three
A    1    2      3
B    4    5      6

DataFrame.from_records

输入一个tuple或结构体数组列表,输出DataFrame. 它和普通的DataFrame构造函数的区别是可以指定index

In [56]: data
Out[56]: 
array([(1,  2., b'Hello'), (2,  3., b'World')],
      dtype=[('A', '<i4'), ('B', '<f4'), ('C', 'S10')])

In [57]: pd.DataFrame.from_records(data, index='C')
Out[57]: 
          A    B
C               
b'Hello'  1  2.0
b'World'  2  3.0

Column selection,addition,deletion

可以把DataFrame看作一个Series字典,列的读取,设置和删除的操作了dict操作类似.

In [58]: df['one']
Out[58]: 
a    1.0
b    2.0
c    3.0
d    NaN
Name: one, dtype: float64

In [59]: df['three'] = df['one'] * df['two']

In [60]: df['flag'] = df['one'] > 2

In [61]: df
Out[61]: 
   one  two  three   flag
a  1.0  1.0    1.0  False
b  2.0  2.0    4.0  False
c  3.0  3.0    9.0   True
d  NaN  4.0    NaN  False

用了dict类似的方法删除列

In [62]: del df['two']

In [63]: three = df.pop('three')

In [64]: df
Out[64]: 
   one   flag
a  1.0  False
b  2.0  False
c  3.0   True
d  NaN  False

插入一个标量,会自动延拓到整个列

In [65]: df['foo'] = 'bar'

In [66]: df
Out[66]: 
   one   flag  foo
a  1.0  False  bar
b  2.0  False  bar
c  3.0   True  bar
d  NaN  False  bar

如果插入的Series的行和DataFrame不一致,结果和DataFrame为准,有可能补NaN

In [67]: df['one_trunc'] = df['one'][:2]

In [68]: df
Out[68]: 
   one   flag  foo  one_trunc
a  1.0  False  bar        1.0
b  2.0  False  bar        2.0
c  3.0   True  bar        NaN
d  NaN  False  bar        NaN

可以插入ndarray,但其行数要和DataFrame保持一致.
默认插入的列在最后的位置,insert()函数可以指定插入位置

In [69]: df.insert(1, 'bar', df['one'])

In [70]: df
Out[70]: 
   one  bar   flag  foo  one_trunc
a  1.0  1.0  False  bar        1.0
b  2.0  2.0  False  bar        2.0
c  3.0  3.0   True  bar        NaN
d  NaN  NaN  False  bar        NaN

Assigning New Columns in Method Chains

assign()函数允许基于已有的列创建一个新的列

In [71]: iris = pd.read_csv('data/iris.data')

In [72]: iris.head()
Out[72]: 
   SepalLength  SepalWidth  PetalLength  PetalWidth         Name
0          5.1         3.5          1.4         0.2  Iris-setosa
1          4.9         3.0          1.4         0.2  Iris-setosa
2          4.7         3.2          1.3         0.2  Iris-setosa
3          4.6         3.1          1.5         0.2  Iris-setosa
4          5.0         3.6          1.4         0.2  Iris-setosa

In [73]: (iris.assign(sepal_ratio = iris['SepalWidth'] / iris['SepalLength'])
   ....:      .head())
   ....: 
Out[73]: 
   SepalLength  SepalWidth  PetalLength  PetalWidth         Name  sepal_ratio
0          5.1         3.5          1.4         0.2  Iris-setosa       0.6863
1          4.9         3.0          1.4         0.2  Iris-setosa       0.6122
2          4.7         3.2          1.3         0.2  Iris-setosa       0.6809
3          4.6         3.1          1.5         0.2  Iris-setosa       0.6739
4          5.0         3.6          1.4         0.2  Iris-setosa       0.7200

上面的例子中插入的是一个预先计算好的值.也可以传入只有一个参数的函数.

In [74]: iris.assign(sepal_ratio = lambda x: (x['SepalWidth'] /
   ....:                                      x['SepalLength'])).head()
   ....: 
Out[74]: 
   SepalLength  SepalWidth  PetalLength  PetalWidth         Name  sepal_ratio
0          5.1         3.5          1.4         0.2  Iris-setosa       0.6863
1          4.9         3.0          1.4         0.2  Iris-setosa       0.6122
2          4.7         3.2          1.3         0.2  Iris-setosa       0.6809
3          4.6         3.1          1.5         0.2  Iris-setosa       0.6739
4          5.0         3.6          1.4         0.2  Iris-setosa       0.7200

assign()函数返回的是新增了列的副本DataFrame,不修改原始的DataFrame. assign()返回副本DataFrame的特性可以
支持直接显示处理后的DataFrame

In [75]: (iris.query('SepalLength > 5')
   ....:      .assign(SepalRatio = lambda x: x.SepalWidth / x.SepalLength,
   ....:              PetalRatio = lambda x: x.PetalWidth / x.PetalLength)
   ....:      .plot(kind='scatter', x='SepalRatio', y='PetalRatio'))
   ....: 
Out[75]: <matplotlib.axes._subplots.AxesSubplot at 0x1c29ff5b70>

plot函数直接显示两个新列,而不必显示获得新DataFrame的句柄

Indexing/Selection

基础的行操作包括

Operation	Syntax	Result
Select column	df[col]	Series
Select row by label	df.loc[label]	Series
Select row by integer location	df.iloc[loc]	Series
Slice rows	df[5:10]	DataFrame
Select rows by boolean vector	df[bool_vec]	DataFrame

选择单列,返回的是Series,其index是DataFrame的Column

In [80]: df.loc['b']
Out[80]: 
one              2
bar              2
flag         False
foo            bar
one_trunc        2
Name: b, dtype: object

In [81]: df.iloc[2]
Out[81]: 
one             3
bar             3
flag         True
foo           bar
one_trunc     NaN
Name: c, dtype: object

译注:选择多行时,返回的是DataFrame,index和原始index保持一致

Data alignment and arithmetric

DataFrame之间的计算会在行/列上自动匹配,默认的结果是DataFrame行/列的并集

In [82]: df = pd.DataFrame(np.random.randn(10, 4), columns=['A', 'B', 'C', 'D'])

In [83]: df2 = pd.DataFrame(np.random.randn(7, 3), columns=['A', 'B', 'C'])

In [84]: df + df2
Out[84]: 
        A       B       C   D
0  0.0457 -0.0141  1.3809 NaN
1 -0.9554 -1.5010  0.0372 NaN
2 -0.6627  1.5348 -0.8597 NaN
3 -2.4529  1.2373 -0.1337 NaN
4  1.4145  1.9517 -2.3204 NaN
5 -0.4949 -1.6497 -1.0846 NaN
6 -1.0476 -0.7486 -0.8055 NaN
7     NaN     NaN     NaN NaN
8     NaN     NaN     NaN NaN
9     NaN     NaN     NaN NaN

DataFrame和Series之间的计算,默认的操作是Series的行和DataFrame的列匹配 (译注:即把Series作为一行,然后tile(N,1)到DataFrame相同行数)

In [85]: df - df.iloc[0]
Out[85]: 
        A       B       C       D
0  0.0000  0.0000  0.0000  0.0000
1 -1.3593 -0.2487 -0.4534 -1.7547
2  0.2531  0.8297  0.0100 -1.9912
3 -1.3111  0.0543 -1.7249 -1.6205
4  0.5730  1.5007 -0.6761  1.3673
5 -1.7412  0.7820 -1.2416 -2.0531
6 -1.2408 -0.8696 -0.1533  0.0004
7 -0.7439  0.4110 -0.9296 -0.2824
8 -1.1949  1.3207  0.2382 -1.4826
9  2.2938  1.8562  0.7733 -1.4465

包含时间的Series和DataFrame之间的计算是个特列

In [86]: index = pd.date_range('1/1/2000', periods=8)

In [87]: df = pd.DataFrame(np.random.randn(8, 3), index=index, columns=list('ABC'))

In [88]: df
Out[88]: 
                 A       B       C
2000-01-01 -1.2268  0.7698 -1.2812
2000-01-02 -0.7277 -0.1213 -0.0979
2000-01-03  0.6958  0.3417  0.9597
2000-01-04 -1.1103 -0.6200  0.1497
2000-01-05 -0.7323  0.6877  0.1764
2000-01-06  0.4033 -0.1550  0.3016
2000-01-07 -2.1799 -1.3698 -0.9542
2000-01-08  1.4627 -1.7432 -0.8266

In [89]: type(df['A'])
Out[89]: pandas.core.series.Series

In [90]: df - df['A']
Out[90]: 
            2000-01-01 00:00:00  2000-01-02 00:00:00  2000-01-03 00:00:00  \
2000-01-01                  NaN                  NaN                  NaN   
2000-01-02                  NaN                  NaN                  NaN   
2000-01-03                  NaN                  NaN                  NaN   
2000-01-04                  NaN                  NaN                  NaN   
2000-01-05                  NaN                  NaN                  NaN   
2000-01-06                  NaN                  NaN                  NaN   
2000-01-07                  NaN                  NaN                  NaN   
2000-01-08                  NaN                  NaN                  NaN   

            2000-01-04 00:00:00 ...  2000-01-08 00:00:00   A   B   C  
2000-01-01                  NaN ...                  NaN NaN NaN NaN  
2000-01-02                  NaN ...                  NaN NaN NaN NaN  
2000-01-03                  NaN ...                  NaN NaN NaN NaN  
2000-01-04                  NaN ...                  NaN NaN NaN NaN  
2000-01-05                  NaN ...                  NaN NaN NaN NaN  
2000-01-06                  NaN ...                  NaN NaN NaN NaN  
2000-01-07                  NaN ...                  NaN NaN NaN NaN  
2000-01-08                  NaN ...                  NaN NaN NaN NaN  

[8 rows x 11 columns]

DataFrame和标量的计算就是对每个元素做处理

In [91]: df * 5 + 2
Out[91]: 
                 A       B       C
2000-01-01 -4.1341  5.8490 -4.4062
2000-01-02 -1.6385  1.3935  1.5106
2000-01-03  5.4789  3.7087  6.7986
2000-01-04 -3.5517 -1.0999  2.7487
2000-01-05 -1.6617  5.4387  2.8822
2000-01-06  4.0165  1.2252  3.5081
2000-01-07 -8.8993 -4.8492 -2.7710
2000-01-08  9.3135 -6.7158 -2.1330

In [92]: 1 / df
Out[92]: 
                 A       B        C
2000-01-01 -0.8151  1.2990  -0.7805
2000-01-02 -1.3742 -8.2436 -10.2163
2000-01-03  1.4372  2.9262   1.0420
2000-01-04 -0.9006 -1.6130   6.6779
2000-01-05 -1.3655  1.4540   5.6675
2000-01-06  2.4795 -6.4537   3.3154
2000-01-07 -0.4587 -0.7300  -1.0480
2000-01-08  0.6837 -0.5737  -1.2098

In [93]: df ** 4
Out[93]: 
                  A       B           C
2000-01-01   2.2653  0.3512  2.6948e+00
2000-01-02   0.2804  0.0002  9.1796e-05
2000-01-03   0.2344  0.0136  8.4838e-01
2000-01-04   1.5199  0.1477  5.0286e-04
2000-01-05   0.2876  0.2237  9.6924e-04
2000-01-06   0.0265  0.0006  8.2769e-03
2000-01-07  22.5795  3.5212  8.2903e-01
2000-01-08   4.5774  9.2332  4.6683e-01

Boolen操作也类似

In [94]: df1 = pd.DataFrame({'a' : [1, 0, 1], 'b' : [0, 1, 1] }, dtype=bool)

In [95]: df2 = pd.DataFrame({'a' : [0, 1, 1], 'b' : [1, 1, 0] }, dtype=bool)

In [96]: df1 & df2
Out[96]: 
       a      b
0  False  False
1  False   True
2   True  False

In [97]: df1 | df2
Out[97]: 
      a     b
0  True  True
1  True  True
2  True  True

In [98]: df1 ^ df2
Out[98]: 
       a      b
0   True   True
1   True  False
2  False   True

In [99]: -df1
Out[99]: 
       a      b
0  False   True
1   True  False
2  False  False

Transpose

和numpy类似,用T属性或transpose函数都可以

# only show the first 5 rows
In [100]: df[:5].T
Out[100]: 
   2000-01-01  2000-01-02  2000-01-03  2000-01-04  2000-01-05
A     -1.2268     -0.7277      0.6958     -1.1103     -0.7323
B      0.7698     -0.1213      0.3417     -0.6200      0.6877
C     -1.2812     -0.0979      0.9597      0.1497      0.1764

DataFrame interoperability with numpy functions

只要元素是数值,numpy中对元素的操作,比如log,exp等函数都可以应用到DataFrame上

In [101]: np.exp(df)
Out[101]: 
                 A       B       C
2000-01-01  0.2932  2.1593  0.2777
2000-01-02  0.4830  0.8858  0.9068
2000-01-03  2.0053  1.4074  2.6110
2000-01-04  0.3294  0.5380  1.1615
2000-01-05  0.4808  1.9892  1.1930
2000-01-06  1.4968  0.8565  1.3521
2000-01-07  0.1131  0.2541  0.3851
2000-01-08  4.3176  0.1750  0.4375

In [102]: np.asarray(df)
Out[102]: 
array([[-1.2268,  0.7698, -1.2812],
       [-0.7277, -0.1213, -0.0979],
       [ 0.6958,  0.3417,  0.9597],
       [-1.1103, -0.62  ,  0.1497],
       [-0.7323,  0.6877,  0.1764],
       [ 0.4033, -0.155 ,  0.3016],
       [-2.1799, -1.3698, -0.9542],
       [ 1.4627, -1.7432, -0.8266]])

dot()函数是DataFrame的矩阵乘法

In [103]: df.T.dot(df)
Out[103]: 
         A       B       C
A  11.3419 -0.0598  3.0080
B  -0.0598  6.5206  2.0833
C   3.0080  2.0833  4.3105

dot()函数在Series上是点乘

In [104]: s1 = pd.Series(np.arange(5,10))

In [105]: s1.dot(s1)
Out[105]: 255

Console display

很大的DataFrame被分割成几个部分在终端上显示,可以利用info()获得摘要信息

In [106]: baseball = pd.read_csv('data/baseball.csv')

In [107]: print(baseball)
       id     player  year  stint  ...   hbp   sh   sf  gidp
0   88641  womacto01  2006      2  ...   0.0  3.0  0.0   0.0
1   88643  schilcu01  2006      1  ...   0.0  0.0  0.0   0.0
..    ...        ...   ...    ...  ...   ...  ...  ...   ...
98  89533   aloumo01  2007      1  ...   2.0  0.0  3.0  13.0
99  89534  alomasa02  2007      1  ...   0.0  0.0  0.0   0.0

[100 rows x 23 columns]

In [108]: baseball.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 23 columns):
id        100 non-null int64
player    100 non-null object
year      100 non-null int64
stint     100 non-null int64
team      100 non-null object
lg        100 non-null object
g         100 non-null int64
ab        100 non-null int64
r         100 non-null int64
h         100 non-null int64
X2b       100 non-null int64
X3b       100 non-null int64
hr        100 non-null int64
rbi       100 non-null float64
sb        100 non-null float64
cs        100 non-null float64
bb        100 non-null int64
so        100 non-null float64
ibb       100 non-null float64
hbp       100 non-null float64
sh        100 non-null float64
sf        100 non-null float64
gidp      100 non-null float64
dtypes: float64(9), int64(11), object(3)
memory usage: 18.0+ KB

to_string()函数在表格形式显示DataFrame,但是宽度可能和终端不匹配

In [109]: print(baseball.iloc[-20:, :12].to_string())
       id     player  year  stint team  lg    g   ab   r    h  X2b  X3b
80  89474  finlest01  2007      1  COL  NL   43   94   9   17    3    0
81  89480  embreal01  2007      1  OAK  AL    4    0   0    0    0    0
82  89481  edmonji01  2007      1  SLN  NL  117  365  39   92   15    2
83  89482  easleda01  2007      1  NYN  NL   76  193  24   54    6    0
84  89489  delgaca01  2007      1  NYN  NL  139  538  71  139   30    0
85  89493  cormirh01  2007      1  CIN  NL    6    0   0    0    0    0
86  89494  coninje01  2007      2  NYN  NL   21   41   2    8    2    0
87  89495  coninje01  2007      1  CIN  NL   80  215  23   57   11    1
88  89497  clemero02  2007      1  NYA  AL    2    2   0    1    0    0
89  89498  claytro01  2007      2  BOS  AL    8    6   1    0    0    0
90  89499  claytro01  2007      1  TOR  AL   69  189  23   48   14    0
91  89501  cirilje01  2007      2  ARI  NL   28   40   6    8    4    0
92  89502  cirilje01  2007      1  MIN  AL   50  153  18   40    9    2
93  89521  bondsba01  2007      1  SFN  NL  126  340  75   94   14    0
94  89523  biggicr01  2007      1  HOU  NL  141  517  68  130   31    3
95  89525  benitar01  2007      2  FLO  NL   34    0   0    0    0    0
96  89526  benitar01  2007      1  SFN  NL   19    0   0    0    0    0
97  89530  ausmubr01  2007      1  HOU  NL  117  349  38   82   16    3
98  89533   aloumo01  2007      1  NYN  NL   87  328  51  112   19    1
99  89534  alomasa02  2007      1  NYN  NL    8   22   1    3    1    0

列比较多时,设置display.width可以修改显示的宽度

In [111]: pd.set_option('display.width', 40) # default is 80

In [112]: pd.DataFrame(np.random.randn(3, 12))
Out[112]: 
         0         1         2         3         4         5         6         7         8         9         10        11
0  1.262731  1.289997  0.082423 -0.055758  0.536580 -0.489682  0.369374 -0.034571 -2.484478 -0.281461  0.030711  0.109121
1  1.126203 -0.977349  1.474071 -0.064034 -1.282782  0.781836 -1.071357  0.441153  2.353925  0.583787  0.221471 -0.744471
2  0.758527  1.729689 -0.964980 -0.845696 -1.340896  1.846883 -1.328865  1.682706 -1.717693  0.888782  0.228440  0.901805

display.max_colwidth可以修改每一列的宽度

   .....:           'path': ["media/user_name/storage/folder_01/filename_01",
   .....:                    "media/user_name/storage/folder_02/filename_02"]}
   .....: 

In [114]: pd.set_option('display.max_colwidth',30)

In [115]: pd.DataFrame(datafile)
Out[115]: 
      filename                           path
0  filename_01  media/user_name/storage/fo...
1  filename_02  media/user_name/storage/fo...

In [116]: pd.set_option('display.max_colwidth',100)

In [117]: pd.DataFrame(datafile)
Out[117]: 
      filename                                           path
0  filename_01  media/user_name/storage/folder_01/filename_01
1  filename_02  media/user_name/storage/folder_02/filename_02