【练习一】 现有一份关于UFO的数据集,请解决下列问题:
pd. read_csv( 'data/UFO.csv' ) . head( )
(a)在所有被观测时间超过60s的时间中,哪个形状最多?
import pandas as pd
df = pd. read_csv( 'data/UFO.csv' )
df. rename( columns= { 'duration (seconds)' : 'duration' } , inplace= True )
df[ 'duration' ] . astype( 'float' )
df. head( )
df. query( 'duration > 60' ) . head( )
df. query( 'duration > 60' ) [ 'shape' ] . value_counts( )
df. query( 'duration > 60' ) [ 'shape' ] . value_counts( ) . index[ 0 ]
(b)对经纬度进行划分:-180°至180°以30°为一个划分,-90°至90°以18°为一个划分,请问哪个区域中报告的UFO事件数量最多?
bins_long = np. linspace( - 180 , 180 , 13 ) . tolist( )
bins_la = np. linspace( - 90 , 90 , 11 ) . tolist( )
cuts_long = pd. cut( df[ 'longitude' ] , bins= bins_long)
df[ 'cuts_long' ] = cuts_long
cuts_la = pd. cut( df[ 'latitude' ] , bins= bins_la)
df[ 'cuts_la' ] = cuts_la
df. head( )
df. set_index( [ 'cuts_long' , 'cuts_la' ] ) . index. value_counts( ) . head( )
【练习二】 现有一份关于口袋妖怪的数据集,请解决下列问题:
pd. read_csv( 'data/Pokemon.csv' ) . head( )
(a)双属性的Pokemon占总体比例的多少?
df[ 'Type 2' ] . count( ) / df. shape[ 0 ]
(b)在所有种族值(Total)不小于580的Pokemon中,非神兽(Legendary=False)的比例为多少?
df_e2[ df[ 'Total' ] >= 580 ] [ 'Legendary' ] . count( )
df_e2[ df[ 'Total' ] >= 580 ] [ 'Legendary' ] . value_counts( )
per = df_e2[ df[ 'Total' ] >= 580 ] [ 'Legendary' ] . value_counts( ) [ 1 ] / df_e2[ df[ 'Total' ] >= 580 ] [ 'Legendary' ] . count( )
(c)在第一属性为格斗系(Fighting)的Pokemon中,物攻排名前三高的是哪些?
df[ df[ 'Type 1' ] == 'Fighting' ] . sort_values( by= 'Attack' , ascending= False ) . iloc[ 0 : 3 ] [ 'Name' ]
(d)请问六项种族指标(HP、物攻、特攻、物防、特防、速度)极差的均值最大的是哪个属性(只考虑第一属性,且均值是对属性而言)?
df[ df[ 'Type 1' ] == 'Fighting' ] . sort_values( by= 'Attack' , ascending= False ) . iloc[ : 3 ]
(e)哪个属性(只考虑第一属性)的神兽比例最高?该属性神兽的种族值也是最高的吗?
df[ 'range' ] = df. iloc[ : , 5 : 11 ] . max ( axis= 1 ) - df. iloc[ : , 5 : 11 ] . min ( axis= 1 )
attribute = df[ [ 'Type 1' , 'range' ] ] . set_index( 'Type 1' )
max_range = 0
result = ''
for i in attribute. index. unique( ) :
temp = attribute. loc[ i, : ] . mean( )
if temp. values[ 0 ] > max_range:
max_range = temp. values[ 0 ]
result = i
result