多个分类的比例或数值,需要找到每一行最大的5个(或者n个)比例或数值,以及它们对应的类别,并输出。
比如:
ID1:类别5 11%,类别10 10%,类别1 9%,类别15 9%,类别3 5%
……
id | 类别1 | 类别2 | 类别3 | 类别4 | 类别5 | 类别6 | 类别7 | 类别8 | 类别9 | 类别10 | 类别11 | 类别12 | 类别13 | 类别14 | 类别15 |
1 | 9% | 2% | 5% | 1% | 11% | 1% | 5% | 1% | 3% | 10% | 1% | 4% | 2% | 1% | 9% |
2 | 2% | 6% | 2% | 3% | 18% | 0% | 3% | 0% | 0% | 9% | 5% | 2% | 2% | 1% | 24% |
3 | 3% | 6% | 4% | 2% | 17% | 0% | 1% | 1% | 0% | 5% | 3% | 15% | 2% | 2% | 22% |
4 | 1% | 8% | 3% | 1% | 40% | 0% | 0% | 0% | 0% | 6% | 4% | 3% | 0% | 0% | 23% |
5 | 1% | 10% | 7% | 2% | 2% | 0% | 9% | 0% | 2% | 16% | 1% | 1% | 1% | 3% | 0% |
6 | 5% | 6% | 6% | 8% | 17% | 1% | 1% | 3% | 1% | 8% | 2% | 7% | 2% | 1% | 11% |
7 | 2% | 6% | 13% | 0% | 17% | 2% | 2% | 1% | 1% | 7% | 4% | 1% | 3% | 2% | 15% |
8 | 0% | 0% | 0% | 0% | 0% | 0% | 13% | 0% | 0% | 63% | 0% | 0% | 25% | 0% | 0% |
9 | 24% | 5% | 30% | 6% | 5% | 6% | 5% | 3% | 2% | 3% | 2% | 2% | 2% | 2% | 2% |
10 | 3% | 16% | 0% | 0% | 29% | 0% | 0% | 0% | 0% | 9% | 0% | 9% | 0% | 0% | 15% |
# -*- coding: utf-8 -*-
"""
Created on Tue May 17 19:18:57 2022
@author: weisssun
"""
import pandas as pd
# 原始数据路径
input_path = r'D:\输入数据.xlsx'
# 输出数据路径
output_path = r'D:\输出数据.xlsx'
# 读取输入数据
input_data = pd.read_excel(input_path)
# 定义输出数据的DataFrame
output_data = pd.DataFrame()
# 读取输入数据的列名并输出,可省略
input_data_col_name = input_data.columns.tolist()
print(input_data_col_name)
# 获取作为id列的所有id,去重,转换为列表
id_list=input_data['id'].drop_duplicates().tolist()
print(id_list)
# 对每个id循环
for the_id in id_list:
i = id_list.index(the_id)
output_data.loc[i, 'id'] = the_id
# 对每个id,将列名和id对应的行转化为字典
dict1 = dict(zip(input_data.columns,input_data.loc[i]))
# 从字典里去除 id 的那组元素
del dict1['id']
#print(dict1)
# 把字典转换为列表并排序
sort_list = sorted(dict1.items(),key=lambda item:item[1],reverse=True)
# 截断列表的Top n 元素
n = 5
sort_list = sort_list[:n]
result = ''
# 将列表的Top n 元素取出来,转换为输出的字符串
for item in sort_list:
result = result + item[0] + '%.0f%%'%(item[1]* 100) + ','
# 数值需要输出为百分比形式,%.0f%% 表示不保留小数的百分比,%.2f%% 保留2位小苏的百分比
# result = result + item[0] + '%.1f'%item[1] + ','
# 数值输出普通数值形式,%.1f 表示保留1位小数
print(result)
output_data.loc[i, 'top5'] = result
# 保存到输出文档
output_data.to_excel(output_path)