pandas indexing and slicing data

最新推荐文章于 2023-07-03 22:07:15 发布

wangquannuaa

最新推荐文章于 2023-07-03 22:07:15 发布

阅读量598

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/wangquannuaa/article/details/47058983

版权

python 专栏收录该内容

41 篇文章 0 订阅

订阅专栏

`In [9]: df
Out[9]:
           A        B         C         D
2000-01-01 0.469112 -0.282863 -1.509059 -1.135632
2000-01-02 1.212112 -0.173215 0.119209 -1.044236
2000-01-03 -0.861849 -2.104569 -0.494929 1.071804
2000-01-04 0.721555 -0.706771 -1.039575 0.271860
2000-01-05 -0.424972 0.567020 0.276232 -1.087401
2000-01-06 -0.673690 0.113648 -1.478427 0.524988
2000-01-07 0.404705 0.577046 -1.715002 -1.039268
2000-01-08 -0.370647 -1.157892 -1.344312 0.844885
In [10]: df[[’B’, ’A’]] = df[[’A’, ’B’]]
In [11]: df
Out[11]:
           A         B        C         D
2000-01-01 -0.282863 0.469112 -1.509059 -1.135632
2000-01-02 -0.173215 1.212112 0.119209 -1.044236
2000-01-03 -2.104569 -0.861849 -0.494929 1.071804
2000-01-04 -0.706771 0.721555 -1.039575 0.271860
2000-01-05 0.567020 -0.424972 0.276232 -1.087401
2000-01-06 0.113648 -0.673690 -1.478427 0.524988
2000-01-07 0.577046 0.404705 -1.715002 -1.039268
2000-01-08 -1.157892 -0.370647 -1.344312 0.844885

In [21]: dfa[’A’] = list(range(len(dfa.index))) # use this form to create a new column{
In [22]: dfa
Out[22]:
           A B        C         D
2000-01-01 0 0.469112 -1.509059 -1.135632
2000-01-02 1 1.212112 0.119209 -1.044236
2000-01-03 2 -0.861849 -0.494929 1.071804
2000-01-04 3 0.721555 -1.039575 0.271860
2000-01-05 4 -0.424972 0.276232 -1.087401
2000-01-06 5 -0.673690 -1.478427 0.524988
2000-01-07 6 0.404705 -1.715002 -1.039268
2000-01-08 7 -0.370647 -1.344312 0.844885

In [94]: df[df[’A’] > 0]
Out[94]:
           A        B        C         D E 0
2000-01-04 7.000000 0.721555 -1.039575 0.271860 NaN NaN
2000-01-05 0.567020 -0.424972 0.276232 -1.087401 NaN NaN
2000-01-06 0.113648 -0.673690 -1.478427 0.524988 7 NaN
2000-01-07 0.577046 0.404705 -1.715002 -1.039268 NaN NaN

In [95]: df2 = DataFrame({’a’ : [’one’, ’one’, ’two’, ’three’, ’two’, ’one’, ’six’],
....: ’b’ : [’x’, ’y’, ’y’, ’x’, ’y’, ’x’, ’x’],
....: ’c’ : randn(7)})
....:
# only want ’two’ or ’three’
In [96]: criterion = df2[’a’].map(lambda x: x.startswith(’t’))
In [97]: df2[criterion]
Out[97]:
  a   b c
2 two y 0.995761
3 three x 2.396780
4 two y 0.014871
In [99]: df2[criterion & (df2[’b’] == ’x’)]
Out[99]:
  a     b c
3 three x 2.39678

In [104]: s[s.isin([2, 4, 6])]
Out[104]:
2 2
0 4
dtype: int64
In [107]: s_mi = Series(np.arange(6),
.....: index=pd.MultiIndex.from_product([[0, 1], [’a’, ’b’, ’c’]]))
.....:
In [108]: s_mi
Out[108]:
0 a 0
  b 1
  c 2
1 a 3
  b 4
  c 5
dtype: int32
In [109]: s_mi.iloc[s_mi.index.isin([(1, ’a’), (2, ’b’), (0, ’c’)])]
Out[109]:
0 c 2
1 a 3
dtype: int32
In [110]: s_mi.iloc[s_mi.index.isin([’a’, ’c’, ’e’], level=1)]
Out[110]:
0 a 0
  c 2
1 a 3
  c 5
dtype: int32

In [146]: df = DataFrame(randint(n / 2, size=(n, 2)), columns=list(’bc’))
In [147]: df.index.name = ’a’
In [148]: df
Out[148]:
  b c
a
0 2 3
1 4 1
2 4 0
3 4 1
4 1 4
5 1 4
6 0 1
7 0 0
8 4 0
9 4 2
In [149]: df.query(’a < b and b < c’)
Out[149]:
  b c
a
0 2 3

In [157]: import pandas.util.testing as tm
In [158]: n = 10
In [159]: colors = tm.choice([’red’, ’green’], size=n)
In [160]: foods = tm.choice([’eggs’, ’ham’], size=n)
In [163]: index = MultiIndex.from_arrays([colors, foods], names=[’color’, ’food’])
In [164]: df = DataFrame(randn(n, 2), index=index)
In [165]: df
Out[165]:
            0        1
color food
red   ham   0.157622 -0.293555
green eggs  0.111560 0.597679
red   ham   -1.270093 0.120949
green ham   -0.193898 1.804172
red   ham   -0.234694 0.939908
green eggs  -0.171520 -0.153055
red   eggs  -0.363095 -0.067318
green eggs  1.444721 0.325771
      ham   -0.855732 -0.697595
      eggs  -0.276134 -1.258759
In [166]: df.query(’color == "red"’)
Out[166]:
           0        1
color food
red ham    0.157622 -0.293555
    ham    -1.270093 0.120949
    ham    -0.234694 0.939908
    eggs   -0.363095 -0.067318

In [208]: df2 = DataFrame({’a’ : [’one’, ’one’, ’two’, ’three’, ’two’, ’one’, ’six’],
.....: ’b’ : [’x’, ’y’, ’y’, ’x’, ’y’, ’x’, ’x’],
.....: ’c’ : np.random.randn(7)})
.....:
In [209]: df2.duplicated([’a’,’b’])
Out[209]:
0 False
1 False
2 False
3 False
4 True
5 True
6 False
dtype: bool
In [210]: df2.drop_duplicates([’a’,’b’])
Out[210]:
  a     b c
0 one   x 0.932713
1 one   y -0.393510
2 two   y -0.548454
3 three x 1.130736
6 six   x -1.233298
In [211]: df2.drop_duplicates([’a’,’b’], take_last=True)
Out[211]:
  a     b c
1 one   y -0.393510
3 three x 1.130736
4 two   y -0.447217
5 one   x 1.043921
6 six   x -1.233298

In [223]: index = Index(list(range(5)), name=’rows’)
In [224]: columns = Index([’A’, ’B’, ’C’], name=’cols’)
In [225]: df = DataFrame(np.random.randn(5, 3), index=index, columns=columns)
In [226]: df
Out[226]:
cols A        B        C
rows
0    0.603791 0.388713 0.544331
1    -0.152978 1.929541 0.202138
2    0.024972 0.117533 -0.184740
3    1.054144 -0.736061 -0.785352
4    -1.362549 -0.063514 0.487562
In [227]: df[’A’]
Out[227]:
rows
0 0.603791
1 -0.152978
2 0.024972
3 1.054144
4 -1.362549
Name: A, dtype: float64
In [250]: indexed2 = data.set_index([’a’, ’b’])
In [251]: indexed2
Out[251]:
        c d
a   b
bar one z 1
    two y 2
foo one x 3
    two w 4
In [255]: data.set_index(’c’, drop=False)
Out[255]:
  a   b   c d
c
z bar one z 1
y bar two y 2
x foo one x 3
w foo two w 4
In [256]: data.set_index([’a’, ’b’], inplace=True)
In [257]: data
Out[257]:
        c d
a   b
bar one z 1
    two y 2
foo one x 3
    two w 4
In [259]: data.reset_index()
Out[259]:
  a   b   c d
0 bar one z 1
1 bar two y 2
2 foo one x 3
3 foo two w 4
You can use the level keyword to remove only a portion of the index:
In [260]: frame
Out[260]:
          c d
c a   b
z bar one z 1
y bar two y 2
x foo one x 3
w foo two w 4
In [261]: frame.reset_index(level=1)
Out[261]:
      a   c d
c b
z one bar z 1
y two bar y 2
x one foo x 3
w two foo w 4