去重与 np的 if——then等,可用于分组但还没尝试

from pandas import Series, DataFrame

<a target=_blank name="baidusnap1" style="color: rgb(12, 137, 207);"></a><span style="color: black; background-color: rgb(160, 255, 255);">data</span> = DataFrame({'k': [1, 1, 2, 2]})

print <span style="color: black; background-color: rgb(160, 255, 255);">data</span>

IsDuplicated = <span style="color: black; background-color: rgb(160, 255, 255);">data</span>.duplicated()

print IsDuplicated
print type(IsDuplicated)

<span style="color: black; background-color: rgb(160, 255, 255);">data</span> = <span style="color: black; background-color: rgb(160, 255, 255);">data</span>.drop_duplicates()
print <span style="color: black; background-color: rgb(160, 255, 255);">data</span>


执行结果是:


   k
0  1
1  1
2  2
3  2
0    False
1     True
2    False
3     True
   k
0  1
2  2


DataFrame的duplicated方法返回一个布尔型Series,表示各行是否重复行


而 drop_duplicates方法,它用于返回一个移除了重复行的DataFrame


这两个方法会判断全部列,你也可以指定部分列进行重复项判段。

例如,希望对名字为k2的列进行重,

data.drop_duplicates(['k2'])



c.if-then 操作

c.1使用.ix[]
<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;">df=pd<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.DataFrame</span>({<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"A"</span>:[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>],<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"B"</span>:[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">7</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8</span>],<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"C"</span>:[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]})
df<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.ix</span>[df<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.A</span>><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'B'</span>]= -<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>
print df</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li></ul>

pandas11

df.ix[条件,then操作区域]

c.2使用numpy.where
<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;">df=pd<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.DataFrame</span>({<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"A"</span>:[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>],<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"B"</span>:[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">7</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8</span>],<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"C"</span>:[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]})
df[<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"then"</span>]=np<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.where</span>(df<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.A</span><<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>)
print df</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li></ul>

pandas12 
np.where(条件,then,else)

d.根据条件选择取DataFrame

d.1 直接取值df.[]
<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;">df=pd<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.DataFrame</span>({<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"A"</span>:[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>],<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"B"</span>:[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">7</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8</span>],<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"C"</span>:[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]})
df=df[df<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.A</span>>=<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>]
print df</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li></ul>

pandas13

d.2 使用.loc[]
<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;">df=pd<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.DataFrame</span>({<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"A"</span>:[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>],<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"B"</span>:[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">7</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8</span>],<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"C"</span>:[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>]})
df=df<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.loc</span>[df<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.A</span>><span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>]
print df</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li></ul>

(还有很多种方法就不一一列举了)

e.Grouping

e.1groupby 形成group
<code class="hljs php has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;">df = pd.DataFrame({<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'animal'</span>: <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'cat dog cat fish dog cat cat'</span>.split(),
                  <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'size'</span>: <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">list</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'SSMMMLL'</span>),
                  <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'weight'</span>: [<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">11</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">20</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">12</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">12</span>],
                  <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'adult'</span> : [<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">False</span>] * <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span> + [<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">True</span>] * <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>});
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">#列出动物中weight最大的对应size</span>
group=df.groupby(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"animal"</span>).apply(lambda subf: subf[<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'size'</span>][subf[<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'weight'</span>].idxmax()])
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> group</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li></ul>

grouping 
e.2 使用get_group 取出其中一分组

<code class="hljs php has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; word-wrap: normal; background: transparent;">df = pd.DataFrame({<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'animal'</span>: <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'cat dog cat fish dog cat cat'</span>.split(),
                  <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'size'</span>: <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">list</span>(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'SSMMMLL'</span>),
                  <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'weight'</span>: [<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">11</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">20</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">12</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">12</span>],
                  <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'adult'</span> : [<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">False</span>] * <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span> + [<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">True</span>] * <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>});

group=df.groupby(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"animal"</span>)
cat=group.get_group(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"cat"</span>)
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">print</span> cat</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li></ul>

get_group

其他具体操作请参考CookBook

http://pandas.pydata.org/pandas-docs/stable/cookbook.html

<原创文章,转载请注明出处>


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值