字典dict高级应用两例

字典实例1:成绩分级统计

1.1 任务解析

●构造班级成绩信息:随机生成10位学生的成绩信息,格式如下: {‘stu0’:{‘Math’:90,‘Python’:98,‘En’:90,‘PE’:60}}
●按科目分类统计班级成绩:A类:score>=90; B类:80=<score<90; C类:70=<score<80; D类:70=<score<60; E类:不及格

提示:

★ 使用random.randint 生成随机成绩
★ 使用itertools.groupby()进行分类

1.2 理解groupby()方法

#成绩分类标准
def split_score(score):
    '''
    按score值分类
    '''
    if score >= 90:
        return 'A'
    elif score >=80:
        return 'B'
    elif score >=70:
        return 'C'
    elif score >=60:
        return 'D'
    else:
        return 'E'
score_s = [88, 52, 68, 45, 92, 71, 50, 77, 34, 43]
scores_grade = groupby(sorted(score_s), key=split_score)
print(scores_grade)

<itertools.groupby object at 0x000001ECC7E5DCA8>

for grade,group in scores_grade:
    print(grade,"->",list(group))   #tuple()也可以

E -> [34, 43, 45, 50, 52]
D -> [68]
C -> [71, 77]
B -> [88]
A -> [92]

1.3 实例代码

1.3.1 产生模拟数据

from random import randint
def creat_simulate_data(n=10):
    '''
    产生n个模拟数据
    '''
    simulate_data = {'stu'+ str(i):{'MA':randint(30,100),
                                    'PY':randint(30,100),
                                    'EN':randint(30,100),
                                    'PE':randint(30,100)
                                   } 
                     for i in range(n)
                    }
    return simulate_data
NUM = 10   #学生数量
stu_scores_info = creat_simulate_data(n = NUM)
for stu, scores in stu_scores_info.items():
    print(f"{stu}-->{scores}")

stu0–>{‘MA’: 74, ‘PY’: 44, ‘EN’: 40, ‘PE’: 77}
stu1–>{‘MA’: 31, ‘PY’: 97, ‘EN’: 33, ‘PE’: 97}
stu2–>{‘MA’: 99, ‘PY’: 60, ‘EN’: 62, ‘PE’: 38}
stu3–>{‘MA’: 48, ‘PY’: 89, ‘EN’: 85, ‘PE’: 54}
stu4–>{‘MA’: 48, ‘PY’: 92, ‘EN’: 61, ‘PE’: 89}
stu5–>{‘MA’: 51, ‘PY’: 62, ‘EN’: 55, ‘PE’: 75}
stu6–>{‘MA’: 61, ‘PY’: 76, ‘EN’: 36, ‘PE’: 48}
stu7–>{‘MA’: 79, ‘PY’: 78, ‘EN’: 90, ‘PE’: 60}
stu8–>{‘MA’: 73, ‘PY’: 82, ‘EN’: 30, ‘PE’: 58}
stu9–>{‘MA’: 65, ‘PY’: 63, ‘EN’: 44, ‘PE’: 60}

1.3.2 成绩分类标准

#成绩分类标准
def split_score(score):
    '''
    按score值分类
    '''
    if score >= 90:
        return 'A'
    elif score >=80:
        return 'B'
    elif score >=70:
        return 'C'
    elif score >=60:
        return 'D'
    else:
        return 'E'

1.3.3 统计分析

from itertools import groupby

def statis_stu_score_grade(stu_scores_info:dict):
    '''
    1 将学生成绩按科目提取出来,
    2 将分数转成类别并统计
    3 结果: {'MA':{'A':2,'B':'3',..}}
    '''
    #=====S1:先由一个一个学生的成绩单===>===科目成绩情况==================================
    subject_scores = dict() # 科目成绩情况:{'MA':[55,89,90...],'PY':[90,88,60...]}
    for stu,stu_scores in stu_scores_info.items(): 
        # {stu0:{'MA': 55, 'PY': 93, 'EN': 88, 'PE': 75}}
        #print(stu,stu_scores)
        for subject,score in stu_scores.items():
            # {'MA': 55, 'PY': 93, 'EN': 88, 'PE': 75}
            #print(subject,score)
            subject_scores[subject] = subject_scores.get(subject,[])  #先{'MA':[]} 再{'MA':[].append(score)}
            subject_scores[subject].append(score)
            #{'MA':[55,89,90...],'PY':[90,88,60...]}
    #print("subject_scores:\n",subject_scores)
    #======S2:======根据科目成绩情况=====进行分级处理并统计====================================
    statis_result = dict()  # 成绩分级统计:{'MA':{'A':2,'B':'3',..},'PY':{'A':2,'B':'3',..}}
    for subject,score_s in subject_scores.items():
        #print(subject, score_s)
        statis_result[subject] = dict()   
        for grade,group in groupby(sorted(score_s,reverse=True), split_score):
            statis_result[subject][grade] = len(list(group))              
    return statis_result

1.3.4 输出结果

grade_result = statis_stu_score_grade(stu_scores_info)
#print(grade_result)
for subject,grade_sta in grade_result.items():
    print(subject,':')
    for grade,num in grade_sta.items():
        print(f"{grade}->{num}")

MA :
A->1
C->3
D->2
E->4
PY :
A->2
B->2
C->2
D->3
E->1
EN :
A->1
B->1
D->2
E->6
PE :
A->1
B->1
C->2
D->2
E->4

2 字典实例2 :推荐算法

2.1 任务解析

●生成50条信息:学生阅读书目及其评分,

学生阅读的书目:从book1~book20随机生成
学生阅读的数量:每人8本左右,随机生成
学生阅读书后给的评分:1~5分,随机生成

{‘stu0’:{‘book1’:3,‘book5’:2,‘book6’:5,‘book7’:2}}
{‘stu1’:{‘book3’:3,‘book4’:2,‘book7’:5,‘book9’:2,‘book10’:5,}}

● 统计书的受欢迎度,即某一本书的读者,阅读的人/次数:

book1 : {‘stu17’, ‘stu6’, ‘stu11’, ‘stu8’, ‘stu0’, ‘stu13’, ‘stu10’}
{‘book3’:20,‘book9’:15,… }

● 再生成一个测试数据,作为考察对象,根据已有数据,为该其推荐书目:

test_stu = {‘book3’:3,‘book4’:2,‘book7’:5,‘book9’:2,‘book10’:5}
recommend_result: recommend_book,book_rating
推荐的书目,及推荐指数

2.2 实例代码

2.2.1 生成模拟数据

from  random import randint
STU_NUM = 20   #学生数
BOOK_NUM = 8   #每人的阅读数
simulate_data = {"stu"+str(i):{"book"+str(randint(1,20)):randint(1,5) for j in range(randint(BOOK_NUM-1,BOOK_NUM+1))}
                 for i in range(STU_NUM)}
for stu,books_scores in simulate_data.items():
    print(stu,"->",books_scores)

stu0 -> {‘book16’: 1, ‘book19’: 3, ‘book8’: 5, ‘book5’: 1, ‘book20’: 5, ‘book18’: 4, ‘book4’: 5}
stu1 -> {‘book18’: 2, ‘book19’: 5, ‘book5’: 2, ‘book3’: 3, ‘book4’: 5, ‘book7’: 5, ‘book14’: 4}
stu2 -> {‘book18’: 1, ‘book17’: 3, ‘book14’: 1, ‘book15’: 5, ‘book13’: 4, ‘book6’: 1, ‘book9’: 5}
stu3 -> {‘book10’: 3, ‘book11’: 3, ‘book16’: 1, ‘book5’: 5, ‘book15’: 4, ‘book20’: 1, ‘book1’: 3}
stu4 -> {‘book17’: 3, ‘book16’: 3, ‘book18’: 1, ‘book9’: 1, ‘book11’: 3, ‘book20’: 4}
stu5 -> {‘book12’: 4, ‘book3’: 1, ‘book10’: 2, ‘book9’: 3, ‘book14’: 5, ‘book17’: 4, ‘book15’: 5}
stu6 -> {‘book9’: 4, ‘book7’: 4, ‘book1’: 4, ‘book15’: 3, ‘book20’: 2, ‘book16’: 5, ‘book4’: 5, ‘book17’: 2, ‘book5’: 2}
stu7 -> {‘book4’: 4, ‘book20’: 4, ‘book6’: 2, ‘book18’: 5, ‘book14’: 5, ‘book19’: 1, ‘book1’: 1, ‘book5’: 3}
stu8 -> {‘book6’: 2, ‘book18’: 5, ‘book17’: 1, ‘book12’: 2, ‘book19’: 2, ‘book7’: 2}
stu9 -> {‘book14’: 2, ‘book6’: 2, ‘book10’: 4, ‘book1’: 5, ‘book18’: 3, ‘book20’: 5}
stu10 -> {‘book9’: 5, ‘book2’: 2, ‘book10’: 4, ‘book11’: 4, ‘book8’: 2, ‘book16’: 3}
stu11 -> {‘book1’: 4, ‘book15’: 5, ‘book11’: 1, ‘book14’: 1, ‘book10’: 2, ‘book9’: 2, ‘book12’: 1}
stu12 -> {‘book10’: 1, ‘book6’: 5, ‘book20’: 2, ‘book17’: 2, ‘book11’: 3, ‘book5’: 3, ‘book12’: 5, ‘book15’: 5}
stu13 -> {‘book11’: 3, ‘book5’: 2, ‘book17’: 1, ‘book8’: 4, ‘book3’: 4, ‘book18’: 1}
stu14 -> {‘book17’: 1, ‘book1’: 5, ‘book10’: 4, ‘book18’: 4, ‘book11’: 5, ‘book7’: 1, ‘book19’: 4, ‘book6’: 5}
stu15 -> {‘book8’: 4, ‘book13’: 1, ‘book18’: 2, ‘book12’: 2, ‘book9’: 2, ‘book16’: 4, ‘book14’: 5}
stu16 -> {‘book14’: 5, ‘book1’: 5, ‘book16’: 3, ‘book11’: 2, ‘book5’: 2, ‘book12’: 5, ‘book17’: 4, ‘book13’: 3}
stu17 -> {‘book12’: 1, ‘book16’: 4, ‘book13’: 2, ‘book7’: 4, ‘book20’: 2, ‘book11’: 5, ‘book17’: 4, ‘book6’: 5, ‘book5’: 3}
stu18 -> {‘book11’: 2, ‘book15’: 4, ‘book1’: 3, ‘book2’: 1, ‘book7’: 3, ‘book16’: 5, ‘book6’: 4, ‘book17’: 2}
stu19 -> {‘book18’: 2, ‘book5’: 4, ‘book12’: 2, ‘book3’: 4, ‘book2’: 4, ‘book15’: 2, ‘book17’: 3}

2.2.2 统计书的受欢迎度

一本书被哪些学生读过了: {‘book3’:{‘stu0’,‘stu3’,…},‘book9’:{‘stu1’,‘stu2’,…},… }
再统计次数,结果形式如:{‘book3’:20,‘book9’:15,… }

book_readers = dict()  #{"book10":{'stu4', 'stu5', 'stu9', 'stu3', 'stu0'}}
for stu, books in simulate_data.items():
    #  stu1 -> {'book8': 5, 'book4': 4, 'book16': 1, 'book18': 5, 'book3': 2}
    for book in books.keys():
        # {'book8': 5, 'book4': 4, 'book16': 1, 'book18': 5, 'book3': 2}
        book_readers[book] = book_readers.get(book,set())   
        book_readers[book].add(stu)
        #{'book3':{'stu0','stu3',...},'book9':{'stu1','stu2',...},... }
        
for book,readers in book_readers.items():
    print(book,"->",readers) 

book16 -> {‘stu4’, ‘stu6’, ‘stu17’, ‘stu18’, ‘stu16’, ‘stu0’, ‘stu15’, ‘stu3’, ‘stu10’}
book19 -> {‘stu7’, ‘stu0’, ‘stu14’, ‘stu8’, ‘stu1’}
book8 -> {‘stu0’, ‘stu13’, ‘stu15’, ‘stu10’}
book5 -> {‘stu7’, ‘stu19’, ‘stu3’, ‘stu17’, ‘stu12’, ‘stu16’, ‘stu0’, ‘stu13’, ‘stu6’, ‘stu1’}
book20 -> {‘stu4’, ‘stu7’, ‘stu6’, ‘stu17’, ‘stu12’, ‘stu0’, ‘stu9’, ‘stu3’}
book18 -> {‘stu4’, ‘stu7’, ‘stu19’, ‘stu2’, ‘stu14’, ‘stu15’, ‘stu0’, ‘stu13’, ‘stu9’, ‘stu8’, ‘stu1’}
book4 -> {‘stu0’, ‘stu7’, ‘stu6’, ‘stu1’}
book3 -> {‘stu5’, ‘stu13’, ‘stu19’, ‘stu1’}
book7 -> {‘stu6’, ‘stu17’, ‘stu18’, ‘stu14’, ‘stu8’, ‘stu1’}
book14 -> {‘stu11’, ‘stu7’, ‘stu5’, ‘stu2’, ‘stu16’, ‘stu15’, ‘stu9’, ‘stu1’}
book17 -> {‘stu6’, ‘stu19’, ‘stu5’, ‘stu2’, ‘stu17’, ‘stu14’, ‘stu12’, ‘stu18’, ‘stu16’, ‘stu4’, ‘stu13’, ‘stu8’}
book15 -> {‘stu11’, ‘stu19’, ‘stu3’, ‘stu5’, ‘stu2’, ‘stu18’, ‘stu12’, ‘stu6’}
book13 -> {‘stu16’, ‘stu2’, ‘stu17’, ‘stu15’}
book6 -> {‘stu7’, ‘stu2’, ‘stu17’, ‘stu18’, ‘stu12’, ‘stu14’, ‘stu9’, ‘stu8’}
book9 -> {‘stu11’, ‘stu5’, ‘stu2’, ‘stu4’, ‘stu15’, ‘stu6’, ‘stu10’}
book10 -> {‘stu11’, ‘stu5’, ‘stu12’, ‘stu14’, ‘stu9’, ‘stu3’, ‘stu10’}
book11 -> {‘stu11’, ‘stu17’, ‘stu14’, ‘stu12’, ‘stu18’, ‘stu16’, ‘stu4’, ‘stu13’, ‘stu3’, ‘stu10’}
book1 -> {‘stu11’, ‘stu7’, ‘stu3’, ‘stu18’, ‘stu16’, ‘stu14’, ‘stu9’, ‘stu6’}
book12 -> {‘stu11’, ‘stu19’, ‘stu5’, ‘stu17’, ‘stu12’, ‘stu16’, ‘stu15’, ‘stu8’}
book2 -> {‘stu19’, ‘stu18’, ‘stu10’}

#统计书的阅读频次
book_read_count = dict() # {'book12': 1, 'book10': 5, ...}
book_read_count = {item[0]:len(item[1]) for item in book_readers.items()}
print(book_read_count)  

{‘book16’: 9, ‘book19’: 5, ‘book8’: 4, ‘book5’: 10, ‘book20’: 8, ‘book18’: 11, ‘book4’: 4, ‘book3’: 4, ‘book7’: 6, ‘book14’: 8, ‘book17’: 12, ‘book15’: 8, ‘book13’: 4, ‘book6’: 8, ‘book9’: 7, ‘book10’: 7, ‘book11’: 10, ‘book1’: 8, ‘book12’: 8, ‘book2’: 3}

#根据阅读频次降序排序
sorted_book_counter = sorted(book_read_count.items(),key=lambda item:item[1],reverse=True)
print("sorted_book_counter:\n",sorted_book_counter)
max_read_counter = max(book_read_count.items(),key=lambda item:item[1])
print("max_read_counter:",max_read_counter) #最受欢迎的书

sorted_book_counter:
[(‘book17’, 12), (‘book18’, 11), (‘book5’, 10), (‘book11’, 10), (‘book16’, 9), (‘book20’, 8), (‘book14’, 8), (‘book15’, 8), (‘book6’, 8), (‘book1’, 8), (‘book12’, 8), (‘book9’, 7), (‘book10’, 7), (‘book7’, 6), (‘book19’, 5), (‘book8’, 4), (‘book4’, 4), (‘book3’, 4), (‘book13’, 4), (‘book2’, 3)]
max_read_counter: (‘book17’, 12)

2.2.3 推荐算法

2.2.3.1 生成一个测试样本
from  random import randint
test_stu = {"book"+str(randint(1,20)):randint(1,5) for j in range(randint(BOOK_NUM-1,BOOK_NUM+1))}
print(test_stu)

{‘book5’: 2,
‘book11’: 4,
‘book10’: 1,
‘book18’: 5,
‘book17’: 5,
‘book6’: 2,
‘book16’: 3}

找其相似的读者

何为最相似呢?

(1) 两个读者所看书目相同的最多: 看过的书取交集,再求长度len(),取最大
(2) 两个读者所读书目后所给评分最接近: 对应的评分做方差,取最小

注意:

(1)取最大,(2)取最小,两个可统一成取最小。
即把(1)max(len()) <=> min(-len())

# 同时满足条件(1)和条件(2)
similar_reader,similar_books = min(simulate_data.items(),
                                   key = lambda item:(
                                                      -len(test_stu.keys() & item[1].keys()),
                                                      sum([(test_stu[book]-item[1][book])**2  for book in test_stu.keys()&item[1].keys()])                          
                                                     )
                                   )

print(similar_reader,similar_books)

stu17 {‘book12’: 1, ‘book16’: 4, ‘book13’: 2, ‘book7’: 4, ‘book20’: 2, ‘book11’: 5, ‘book17’: 4, ‘book6’: 5, ‘book5’: 3}

#测试样本与相似者的对比
print(f"test_stu 读的书及评分为:{test_stu}")  
print(f"{similar_reader} 读的书及评分为:{similar_books}")  

test_stu 读的书及评分为:{‘book5’: 2, ‘book11’: 4, ‘book10’: 1, ‘book18’: 5, ‘book17’: 5, ‘book6’: 2, ‘book16’: 3}
stu17 读的书及评分为:{‘book12’: 1, ‘book16’: 4, ‘book13’: 2, ‘book7’: 4, ‘book20’: 2, ‘book11’: 5, ‘book17’: 4, ‘book6’: 5, ‘book5’: 3}

2.2.3.3 开始推荐

(1) 要推荐还没看过的书: 两个看过的书做差集合
(2) 推荐顺序依据评分,从高到低:根据评分,降序排序

# 可以推荐的书目
recommend_book_names = similar_books.keys()- test_stu.keys()
print(recommend_book_names)

{‘book20’, ‘book7’, ‘book13’, ‘book12’}

# 可推荐的书及评分
recomend_books = {book:similar_books[book] for book in similar_books.keys()-test_stu.keys()}
recomend_books

{‘book20’: 2, ‘book7’: 4, ‘book13’: 2, ‘book12’: 1}

#按评分降序排序后的推荐列表
recommend_result = sorted(recomend_books.items(),key=lambda item:item[1],reverse=True )
recommend_result

[(‘book7’, 4), (‘book20’, 2), (‘book13’, 2), (‘book12’, 1)]

#只推荐2本
recommend_result[:2]

[(‘book7’, 4), (‘book20’, 2)]

3总结

所涉及的知识点回顾

  1. 字典
  2. 字典的keys(),values(),items()
  3. 字典推导式
  4. itertools.groupby()的使用
  5. 推荐算法初步
  • 2
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值