[Pandas Day2]索引

最新推荐文章于 2020-11-21 23:05:27 发布

double-le

最新推荐文章于 2020-11-21 23:05:27 发布

阅读量255

点赞数 1

文章标签： python 索引

本文链接：https://blog.csdn.net/qq_40545229/article/details/105682799

版权

1.如何更改列或行的顺序？如何交换奇偶行（列）的顺序？

更改顺序：

 #1 自定义
 order = [' ', ' ', '', ...]
 df = df[order]
#2 删除再插入
df_id = df.id
df = df.drop('id',axis=1)
df.insert(0,'id',df_id)

交换顺序

#1 reorder_levels方法（多层交换）
df.reorder_levels([1,0,3，2],axis=0).sort_index().head()
# 
df.reorder_levels(['Address','School','Class'],axis=0).sort_index().head()

2.如果要选出DataFrame的某个子集，请给出尽可能多的方法实现。

iloc表示位置索引（切片右端点不包含）
loc表示标签索引（所有在loc中使用的切片全部包含右端点）
[]（切片操作）

3.query函数比其他索引方法的速度更慢吗？在什么场合使用什么索引最高效

query函数是基于DataFrame列的计算代数式，对于按照某列的规则进行过滤的操作。

4.单级索引能使用Slice对象吗？能的话怎么使用，请给出一个例子。

row = 1103
df.loc[idx['row','row+1'],:]

5.如何快速找出某一列的缺失值所在索引？

data.query('(index in ["NaN"])').index

6.索引设定中的所有方法分别适用于哪些场合？怎么直接把某个DataFrame的索引换成任意给定同长度的索引？

df.set_index(pd.Series(np.random.rand(df.shape[0])))

多级索引有什么适用场合？

有多层切片、提取和对比。

8.什么时候需要重复元素处理？

统计，和上一期Pandas基础那节。

ex 2-1

import numpy as np
import pandas as pd

df = pd.read_csv('data/UFO.csv').head()

# 在所有被观测时间超过60s的时间中，哪个形状最多？
df.rename(columns={'duration (seconds)':'duration'},inplace=True)
df['duration'].astype('float')
print(df.query('duration > 60')['shape'].value_counts().index[0])
# query  查询列大于60的数据，返回对应的行名

# 对经纬度进行划分：-180°至180°以30°为一个划分，-90°至90°以18°为一个划分，请问哪个区域中报告的UFO事件数量最多？
bins_long = np.linspace(-180,180,13).tolist()
bins_la = np.linspace(-90,90,11).tolist()
cuts_long = pd.cut(df['longitude'],bins=bins_long)
df['cuts_long'] = cuts_long
cuts_la = pd.cut(df['latitude'],bins=bins_la)
df['cuts_la'] = cuts_la
print(df.head)
print(df.set_index(['cuts_long','cuts_la']).index.value_counts().head())

ex 2-2

import numpy as np
import pandas as pd

df = pd.read_csv('data/Pokemon.csv')
print(df.head())

# 双属性的Pokemon占总体比例的多少？
print(df['Type 2'].count()/df.shape[0])

# 在所有种族值（Total）不小于580的Pokemon中，非神兽（Legendary=False）的比例为多少？
print(df.query('Total >= 580')['Legendary'].value_counts(normalize=True))

# 在第一属性为格斗系（Fighting）的Pokemon中，物攻排名前三高的是哪些？
# print(df[df['Type 1']=='Fighting'].sort_values(by='Attack',ascending=False).iloc[:3])
print(df[df['Type 1']=='Fighting'].sort_values(by='Attack',ascending=False).head(3))

# 请问六项种族指标（HP、物攻、特攻、物防、特防、速度）极差的均值最大的是哪个属性（只考虑第一属性，且均值是对属性而言）？
df['range'] = df.iloc[:,5:11].max(axis=1)-df.iloc[:,5:11].min(axis=1)
attribute = df[['Type 1','range']].set_index('Type 1')
max_range = 0
result = ''
for i in attribute.index.unique():
    temp = attribute.loc[i,:].mean()
    if temp.values[0] > max_range:
        max_range = temp.values[0]
        result = i
print(result)

# 哪个属性（只考虑第一属性）的神兽比例最高？该属性神兽的种族值也是最高的吗？
print(df.query('Legendary == True')['Type 1'].value_counts(normalize=True).index[0])

attribute = df.query('Legendary == True')[['Type 1','Total']].set_index('Type 1')
max_value = 0
result = ''
for i in attribute.index.unique()[:-1]:
    temp = attribute.loc[i,:].mean()
    if temp[0] > max_value:
        max_value = temp[0]
        result = i
print(result)

数据&代码

参考博客

https://blog.csdn.net/laicikankna/article/details/105646643

double-le

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
[Pandas Day2]索引

1.如何更改列或行的顺序？如何交换奇偶行（列）的顺序？更改顺序： #1 自定义 order = [' ', ' ', '', ...] df = df[order]#2 删除再插入df_id = df.iddf = df.drop('id',axis=1)df.insert(0,'id',df_id)交换顺序#1 reorder_levels方法（多层交换）df.reo...
复制链接

扫一扫