from pandas import Series, DataFrame
<a target=_blank name="baidusnap1" style="color: rgb(12, 137, 207);"></a><span style="color: black; background-color: rgb(160, 255, 255);">data</span> = DataFrame({'k': [1, 1, 2, 2]})
print <span style="color: black; background-color: rgb(160, 255, 255);">data</span>
IsDuplicated = <span style="color: black; background-color: rgb(160, 255, 255);">data</span>.duplicated()
print IsDuplicated
print type(IsDuplicated)
<span style="color: black; background-color: rgb(160, 255, 255);">data</span> = <span style="color: black; background-color: rgb(160, 255, 255);">data</span>.drop_duplicates()
print <span style="color: black; background-color: rgb(160, 255, 255);">data</span>
执行结果是:
k
0 1
1 1
2 2
3 2
0 False
1 True
2 False
3 True
k
0 1
2 2
DataFrame的duplicated方法返回一个布尔型Series,表示各行是否重复行。
而 drop_duplicates方法,它用于返回一个移除了重复行的DataFrame
这两个方法会判断全部列,你也可以指定部分列进行重复项判段。
例如,希望对名字为k2的列进行去重,
data.drop_duplicates(['k2'])
c.if-then 操作
c.1使用.ix[]
<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;">df=pd<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.DataFrame</span>({<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"A"</span>:[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>],<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"B"</span>:[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">7</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8</span>],<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"C"</span>:[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]}) df<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.ix</span>[df<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.A</span>><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'B'</span>]= -<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span> print df</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li></ul>
df.ix[条件,then操作区域]
c.2使用numpy.where
<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;">df=pd<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.DataFrame</span>({<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"A"</span>:[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>],<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"B"</span>:[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">7</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8</span>],<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"C"</span>:[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]}) df[<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"then"</span>]=np<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.where</span>(df<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.A</span><<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>) print df</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li></ul>
np.where(条件,then,else)
d.根据条件选择取DataFrame
d.1 直接取值df.[]
<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;">df=pd<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.DataFrame</span>({<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"A"</span>:[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>],<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"B"</span>:[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">7</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8</span>],<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"C"</span>:[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]}) df=df[df<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.A</span>>=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>] print df</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li></ul>
d.2 使用.loc[]
<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;">df=pd<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.DataFrame</span>({<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"A"</span>:[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>],<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"B"</span>:[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">7</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8</span>],<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"C"</span>:[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]}) df=df<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.loc</span>[df<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.A</span>><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>] print df</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li></ul>
(还有很多种方法就不一一列举了)
e.Grouping
e.1groupby 形成group
<code class="hljs php has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;">df = pd.DataFrame({<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'animal'</span>: <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'cat dog cat fish dog cat cat'</span>.split(), <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'size'</span>: <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">list</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'SSMMMLL'</span>), <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'weight'</span>: [<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">11</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">20</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">12</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">12</span>], <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'adult'</span> : [<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">False</span>] * <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span> + [<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">True</span>] * <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>}); <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#列出动物中weight最大的对应size</span> group=df.groupby(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"animal"</span>).apply(lambda subf: subf[<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'size'</span>][subf[<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'weight'</span>].idxmax()]) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> group</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li></ul>
e.2 使用get_group 取出其中一分组
<code class="hljs php has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;">df = pd.DataFrame({<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'animal'</span>: <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'cat dog cat fish dog cat cat'</span>.split(), <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'size'</span>: <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">list</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'SSMMMLL'</span>), <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'weight'</span>: [<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">11</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">20</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">12</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">12</span>], <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'adult'</span> : [<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">False</span>] * <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span> + [<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">True</span>] * <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>}); group=df.groupby(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"animal"</span>) cat=group.get_group(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"cat"</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> cat</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li></ul>
其他具体操作请参考CookBook
http://pandas.pydata.org/pandas-docs/stable/cookbook.html
<原创文章,转载请注明出处>