大家好,我又回来了。含辛茹苦写的代码分享给大家。。。pandas方法很多,记住主要的就迎刃而解了
import numpy as np import pandas as pd # 1. 利用字典 data 和列表 labels 完成以下操作 data = {'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],'priority': ['yes', np.nan, 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']} labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'] # (1) 创建 DataFrame 类型 df,效果如下图, df =pd.DataFrame(data, index = labels) # df1=df.priority.fillna('yes') # df['priority']=df['priority'].replace('NaN','yes')#不同数据类型的值不能随便替换? df.iloc[1,3]='yes'#替换成功 print(df) # (2)输出 df 的前三行,并选择所有 visits 属性值大于 2 的所有行# df1=df[0:3][:] # print(df1) # (3)输出 df 缺失值所在的行,输出'age'与'animal'两列数据 # df3=df[df.isnull().values ==True] # print(df3) # df1=df.iloc[0:2,:] # print(df1) # print(df.where('age','animal')) # (4) 输出 animal==cat 且 age<3 的所有行,并将行为”f”列为”age”的元 # 素值修改为 1.5 # df.iloc[5,1]=1.5 # df5= df[df.animal=='cat'] # df55=df5[df5.age<3] # print(df55) # (5)计算 animal 列所有取值的出现的次数loczifu # n=df.iloc[:,0].value_counts() # print(n) # (6)将 animal 列中所有 snake 替换为 tangyudi # df['animal'] = df['animal'].replace('snake', 'tangyudi') # (7)对 df 按列 anaomal 进行排序 # print(df.sort_values(by='animal')) # print(df.sort_index(axis=1))#按列的索引排序 # (8)在 df 的在后一列后添加一列列名为 No.数据 0,1,2,3,4,5,6,7,8,9 # num = pd.Series([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], index=df.index) # # print(num) # # df['No.'] = num # # df['animal'] = df['animal'].str.upper() # # print(df) # (9)对 df 中的'visits'列求平均值以及乘积、和 # print(df) # age1=df.visits.mean() # avg= df.visits.sum() # j =df.visits.prod()#cumprod()和cumsum()不一样,累计的话输出每一项运算的结果 # print(age1) # print(avg) # print(j) # (10)将 anaomal 对应的列中所有字符串字母变为大写 # df2= df.animal.str.capitalize() # print(df2) # (11)利用浅复制方式创建 df 的副本 df2 并将其所有缺失值填充为 3 # df2 = df.copy() # df2.fillna(value=3) # print(df2) # (12)利用浅复制方式创建 df 的副本 df3 并将其删除缺失值所在的行 # df3 = df.copy() # df3.dropna(how='any') # print(df3) # (13)将 df 写入 animal.csv 文件 # df.to_csv('animal.csv',mode='w+',encoding="utf_8_sig") # 2.读取文件“haberman-kmes.dat”生成名为 df 的 DataFrame,并进行 # 如下操作: # import csv # df = pd.read_csv('haberman-kmes.dat',header=None,encoding='utf-8',delimiter="\t",quoting=csv.QUOTE_NONE) # print(df) # 18 # 数据分析编程基础实验教程 # 19 # (1) 列名为“Class”中取值分别将“negative”和“positive”替换为数字 0 和 1,并统计 0 和 1 各自出现的频数; # (2) 创建df的副本df2,其中df2为除了df最后一列之外的所有列; (3) 将 df2 的每一列数据进行归一化处理,即 # x − 𝑥𝑚𝑖𝑛 # 𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛 # 其中 x 为列中的任一数据,𝑥𝑚𝑖𝑛,𝑥𝑚𝑎𝑥分比为列中所有数据的最 # 大值和最小值; # (4)计算 df2 行(样本或观测值)与行(样本或观测值)之间的欧 # 式距离,并组成新的欧式距离数组 df3。 # (5)将 df3 中所有的行中的数据从小到大的顺序进行排序 # 3. 统计下文中每个单词出现的次数,并利用饼图其中出现次数最多 # 的前五个单词。 # text ='''Hooray! It's snowing! It's time to make a snowman.James runs out. He # makes a big pile of snow. He puts a big snowball on top. He adds a # scarf and a hat. He adds an orange for the nose. He adds coal for the # eyes and buttons.In the evening, James opens the door. What does he # see? The snowman is moving! James invites him in. The snowman has # never been inside a house. He says hello to the cat. He plays with # paper towels.A moment later, the snowman takes James's hand and # goes out.They go up, up, up into the air ! They are flying ! What a # wonderful night!The next morning, James jumps out of bed. He runs # to the door.He wants to thank the snowman. But he's gone.''' # text=text.replace(',','').replace('.','').replace('!','') # text=text.split() # print(text) # setword=set(text) # for i in setword: # count=text.count(i) # print(i,'出现次数:',count)