python随记

1. eval使用方法
eval():将字符串string对象转化为有效的表达式参与求值运算返回计算结果;换句话说,eval()可以直接将字符串类型的公式或列表在使用eval()处理后识别为python可直接处理的公式或列表

语法上:调用的是:eval(expression,globals=None, locals=None)返回的是计算结果

其中:

expression是一个参与计算的python表达式
globals是可选的参数,如果设置属性不为None的话,就必须是dictionary对象了

2. 用for循环直接调取列表里的元组
ex:

dup_amount_proportion = [('eighty', 0.8)] #定义一个列表,列表包含元素为一个元组

for name, proportion in dup_amount_proportion:
    print(name, proportion)
eighty 0.8
type(('eighty', 0.8))
<class 'tuple'>

3. 在pandas里找到重复项
ex:

import pandas as pd
data={'key1':[1,2,3,1,2,3,2,2],'key2':[2,2,1,2,2,4,2,2],'data':[5,6,2,6,1,6,2,8]}
frame=pd.DataFrame(data,columns=['key1','key2','data'])
print (frame)

result:

     key1  key2  data
0     1     2     5
1     2     2     6
2     3     1     2
3     1     2     6
4     2     2     1
5     3     4     6
6     2     2     2
7     2     2     8

##如下输入 dataframe.duplicated([“colmns1”,“colmns2”])得到不显示第一个重复项的所##有重复值

frame[frame.duplicated(['key1','key2'])]

result:

   key1  key2  data
3     1     2     6
4     2     2     1
6     2     2     2
7     2     2     8

官方解释duplicated:
DataFrame.duplicated(subset=None, keep=‘first’)[source]
Return boolean Series denoting duplicate rows, optionally only considering certain columns.它可以有条件的返回重复项的行

Parameters:
subset : column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by default use all of the columns

keep : {‘first’, ‘last’, False}, default ‘first’ 用keep进行条件选择,默认是“first”
first : Mark duplicates as True except for the first occurrence.如果是first,则除了重复项的第一行不返回,其他都返回
last : Mark duplicates as True except for the last occurrence. 如果是last,则除了重复项的最后一行不返回,其他都返回
False : Mark all duplicates as True. 如果是false,则返回所有重复项
Returns:
duplicated : Series

4. 用select_dtypes在datafram里返回特定类型的列
官方解释:
DataFrame.select_dtypes(include=None, exclude=None)[source]
Return a subset of the DataFrame’s columns based on the column dtypes.基于列的类型,返回选定列类型的子列

Parameters:
include, exclude : scalar or list-like:主要参数是include和exclude
A selection of dtypes or strings to be included/excluded. At least one of these parameters must be supplied.

Returns:
subset : DataFrame:返回的是总dataframe里的子dataframe
The subset of the frame including the dtypes in include and excluding the dtypes in exclude.

Raises:
ValueError
If both of include and exclude are empty
If include and exclude have overlapping elements
If any kind of string dtype is passed in.

ex:

>>> df = pd.DataFrame({'a': [1, 2] * 3,
...                    'b': [True, False] * 3,
...                    'c': [1.0, 2.0] * 3})
>>> df
        a      b  c
0       1   True  1.0
1       2  False  2.0
2       1   True  1.0
3       2  False  2.0
4       1   True  1.0
5       2  False  2.0

>>> df.select_dtypes(include='bool')
   b
0  True
1  False
2  True
3  False
4  True
5  False

>>> df.select_dtypes(include=['float64'])
   c
0  1.0
1  2.0
2  1.0
3  2.0
4  1.0
5  2.0

>>> df.select_dtypes(exclude=['int'])
       b    c
0   True  1.0
1  False  2.0
2   True  1.0
3  False  2.0
4   True  1.0
5  False  2.0

5. pandas中groupby使用
任何groupby操作都会涉及到下面的三个操作之一:

Splitting:分割数据
Applying:应用一个函数
Combining:合并结果

在许多情况下,我们将数据分成几组,并在每个子集上应用一些功能。在应用中,我们可以执行以下操作:

Aggregation :计算一些摘要统计
Transformation :执行一些特定组的操作
Filtration:根据某些条件下丢弃数据

import pandas as pd
import numpy as np
ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
         'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
         'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
         'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
         'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)
print(df)
Team  Rank  Year  Points
0   Riders     1  2014     876
1   Riders     2  2015     789
2   Devils     2  2014     863
3   Devils     3  2015     673
4    Kings     3  2014     741
5    kings     4  2015     812
6    Kings     1  2016     756
7    Kings     1  2017     788
8   Riders     2  2016     694
9   Royals     4  2014     701
10  Royals     1  2015     804
11  Riders     2  2017     690

Pandas对象可以拆分为任何对象。分割对象的方法有多种:

obj.groupby(‘key’)
obj.groupby([‘key1’,‘key2’])
obj.groupby(key,axis=1)

df.groupby('Team')

#它会返回一个对象

<pandas.core.groupby.groupby.DataFrameGroupBy object at 0x000001B33FFA0DA0>

# 查看分组
df.groupby('Team').groups

{'Devils': Int64Index([2, 3], dtype='int64'),
 'Kings': Int64Index([4, 6, 7], dtype='int64'),
 'Riders': Int64Index([0, 1, 8, 11], dtype='int64'),
 'Royals': Int64Index([9, 10], dtype='int64'),
 'kings': Int64Index([5], dtype='int64')}

由多列进行分组

df.groupby(['Team','Year']).groups

{('Devils', 2014): Int64Index([2], dtype='int64'),
 ('Devils', 2015): Int64Index([3], dtype='int64'),
 ('Kings', 2014): Int64Index([4], dtype='int64'),
 ('Kings', 2016): Int64Index([6], dtype='int64'),
 ('Kings', 2017): Int64Index([7], dtype='int64'),
 ('Riders', 2014): Int64Index([0], dtype='int64'),
 ('Riders', 2015): Int64Index([1], dtype='int64'),
 ('Riders', 2016): Int64Index([8], dtype='int64'),
 ('Riders', 2017): Int64Index([11], dtype='int64'),
 ('Royals', 2014): Int64Index([9], dtype='int64'),
 ('Royals', 2015): Int64Index([10], dtype='int64'),
 ('kings', 2015): Int64Index([5], dtype='int64')}

遍历分组

grouped = df.groupby('Team')
for name,group in grouped:
    print(name)
    print(group)

Devils
     Team  Rank  Year  Points
2  Devils     2  2014     863
3  Devils     3  2015     673
Kings
    Team  Rank  Year  Points
4  Kings     3  2014     741
6  Kings     1  2016     756
7  Kings     1  2017     788
Riders
      Team  Rank  Year  Points
0   Riders     1  2014     876
1   Riders     2  2015     789
8   Riders     2  2016     694
11  Riders     2  2017     690
Royals
      Team  Rank  Year  Points
9   Royals     4  2014     701
10  Royals     1  2015     804
kings
    Team  Rank  Year  Points
5  kings     4  2015     812

用get_group()来选取一个分组

grouped = df.groupby('Year')
print(grouped.get_group(2014))

Team  Rank  Year  Points
0  Riders     1  2014     876
2  Devils     2  2014     863
4   Kings     3  2014     741
9  Royals     4  2014     701
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值