字典实例1:成绩分级统计
1.1 任务解析
●构造班级成绩信息:随机生成10位学生的成绩信息,格式如下: {‘stu0’:{‘Math’:90,‘Python’:98,‘En’:90,‘PE’:60}}
●按科目分类统计班级成绩:A类:score>=90; B类:80=<score<90; C类:70=<score<80; D类:70=<score<60; E类:不及格
提示:
★ 使用random.randint 生成随机成绩
★ 使用itertools.groupby()进行分类
1.2 理解groupby()方法
#成绩分类标准
def split_score(score):
'''
按score值分类
'''
if score >= 90:
return 'A'
elif score >=80:
return 'B'
elif score >=70:
return 'C'
elif score >=60:
return 'D'
else:
return 'E'
score_s = [88, 52, 68, 45, 92, 71, 50, 77, 34, 43]
scores_grade = groupby(sorted(score_s), key=split_score)
print(scores_grade)
<itertools.groupby object at 0x000001ECC7E5DCA8>
for grade,group in scores_grade:
print(grade,"->",list(group)) #tuple()也可以
E -> [34, 43, 45, 50, 52]
D -> [68]
C -> [71, 77]
B -> [88]
A -> [92]
1.3 实例代码
1.3.1 产生模拟数据
from random import randint
def creat_simulate_data(n=10):
'''
产生n个模拟数据
'''
simulate_data = {'stu'+ str(i):{'MA':randint(30,100),
'PY':randint(30,100),
'EN':randint(30,100),
'PE':randint(30,100)
}
for i in range(n)
}
return simulate_data
NUM = 10 #学生数量
stu_scores_info = creat_simulate_data(n = NUM)
for stu, scores in stu_scores_info.items():
print(f"{stu}-->{scores}")
stu0–>{‘MA’: 74, ‘PY’: 44, ‘EN’: 40, ‘PE’: 77}
stu1–>{‘MA’: 31, ‘PY’: 97, ‘EN’: 33, ‘PE’: 97}
stu2–>{‘MA’: 99, ‘PY’: 60, ‘EN’: 62, ‘PE’: 38}
stu3–>{‘MA’: 48, ‘PY’: 89, ‘EN’: 85, ‘PE’: 54}
stu4–>{‘MA’: 48, ‘PY’: 92, ‘EN’: 61, ‘PE’: 89}
stu5–>{‘MA’: 51, ‘PY’: 62, ‘EN’: 55, ‘PE’: 75}
stu6–>{‘MA’: 61, ‘PY’: 76, ‘EN’: 36, ‘PE’: 48}
stu7–>{‘MA’: 79, ‘PY’: 78, ‘EN’: 90, ‘PE’: 60}
stu8–>{‘MA’: 73, ‘PY’: 82, ‘EN’: 30, ‘PE’: 58}
stu9–>{‘MA’: 65, ‘PY’: 63, ‘EN’: 44, ‘PE’: 60}
1.3.2 成绩分类标准
#成绩分类标准
def split_score(score):
'''
按score值分类
'''
if score >= 90:
return 'A'
elif score >=80:
return 'B'
elif score >=70:
return 'C'
elif score >=60:
return 'D'
else:
return 'E'
1.3.3 统计分析
from itertools import groupby
def statis_stu_score_grade(stu_scores_info:dict):
'''
1 将学生成绩按科目提取出来,
2 将分数转成类别并统计
3 结果: {'MA':{'A':2,'B':'3',..}}
'''
#=====S1:先由一个一个学生的成绩单===>===科目成绩情况==================================
subject_scores = dict() # 科目成绩情况:{'MA':[55,89,90...],'PY':[90,88,60...]}
for stu,stu_scores in stu_scores_info.items():
# {stu0:{'MA': 55, 'PY': 93, 'EN': 88, 'PE': 75}}
#print(stu,stu_scores)
for subject,score in stu_scores.items():
# {'MA': 55, 'PY': 93, 'EN': 88, 'PE': 75}
#print(subject,score)
subject_scores[subject] = subject_scores.get(subject,[]) #先{'MA':[]} 再{'MA':[].append(score)}
subject_scores[subject].append(score)
#{'MA':[55,89,90...],'PY':[90,88,60...]}
#print("subject_scores:\n",subject_scores)
#======S2:======根据科目成绩情况=====进行分级处理并统计====================================
statis_result = dict() # 成绩分级统计:{'MA':{'A':2,'B':'3',..},'PY':{'A':2,'B':'3',..}}
for subject,score_s in subject_scores.items():
#print(subject, score_s)
statis_result[subject] = dict()
for grade,group in groupby(sorted(score_s,reverse=True), split_score):
statis_result[subject][grade] = len(list(group))
return statis_result
1.3.4 输出结果
grade_result = statis_stu_score_grade(stu_scores_info)
#print(grade_result)
for subject,grade_sta in grade_result.items():
print(subject,':')
for grade,num in grade_sta.items():
print(f"{grade}->{num}")
MA :
A->1
C->3
D->2
E->4
PY :
A->2
B->2
C->2
D->3
E->1
EN :
A->1
B->1
D->2
E->6
PE :
A->1
B->1
C->2
D->2
E->4
2 字典实例2 :推荐算法
2.1 任务解析
●生成50条信息:学生阅读书目及其评分,
学生阅读的书目:从book1~book20随机生成
学生阅读的数量:每人8本左右,随机生成
学生阅读书后给的评分:1~5分,随机生成{‘stu0’:{‘book1’:3,‘book5’:2,‘book6’:5,‘book7’:2}}
{‘stu1’:{‘book3’:3,‘book4’:2,‘book7’:5,‘book9’:2,‘book10’:5,}}
● 统计书的受欢迎度,即某一本书的读者,阅读的人/次数:
book1 : {‘stu17’, ‘stu6’, ‘stu11’, ‘stu8’, ‘stu0’, ‘stu13’, ‘stu10’}
{‘book3’:20,‘book9’:15,… }
● 再生成一个测试数据,作为考察对象,根据已有数据,为该其推荐书目:
test_stu = {‘book3’:3,‘book4’:2,‘book7’:5,‘book9’:2,‘book10’:5}
recommend_result: recommend_book,book_rating
推荐的书目,及推荐指数
2.2 实例代码
2.2.1 生成模拟数据
from random import randint
STU_NUM = 20 #学生数
BOOK_NUM = 8 #每人的阅读数
simulate_data = {"stu"+str(i):{"book"+str(randint(1,20)):randint(1,5) for j in range(randint(BOOK_NUM-1,BOOK_NUM+1))}
for i in range(STU_NUM)}
for stu,books_scores in simulate_data.items():
print(stu,"->",books_scores)
stu0 -> {‘book16’: 1, ‘book19’: 3, ‘book8’: 5, ‘book5’: 1, ‘book20’: 5, ‘book18’: 4, ‘book4’: 5}
stu1 -> {‘book18’: 2, ‘book19’: 5, ‘book5’: 2, ‘book3’: 3, ‘book4’: 5, ‘book7’: 5, ‘book14’: 4}
stu2 -> {‘book18’: 1, ‘book17’: 3, ‘book14’: 1, ‘book15’: 5, ‘book13’: 4, ‘book6’: 1, ‘book9’: 5}
stu3 -> {‘book10’: 3, ‘book11’: 3, ‘book16’: 1, ‘book5’: 5, ‘book15’: 4, ‘book20’: 1, ‘book1’: 3}
stu4 -> {‘book17’: 3, ‘book16’: 3, ‘book18’: 1, ‘book9’: 1, ‘book11’: 3, ‘book20’: 4}
stu5 -> {‘book12’: 4, ‘book3’: 1, ‘book10’: 2, ‘book9’: 3, ‘book14’: 5, ‘book17’: 4, ‘book15’: 5}
stu6 -> {‘book9’: 4, ‘book7’: 4, ‘book1’: 4, ‘book15’: 3, ‘book20’: 2, ‘book16’: 5, ‘book4’: 5, ‘book17’: 2, ‘book5’: 2}
stu7 -> {‘book4’: 4, ‘book20’: 4, ‘book6’: 2, ‘book18’: 5, ‘book14’: 5, ‘book19’: 1, ‘book1’: 1, ‘book5’: 3}
stu8 -> {‘book6’: 2, ‘book18’: 5, ‘book17’: 1, ‘book12’: 2, ‘book19’: 2, ‘book7’: 2}
stu9 -> {‘book14’: 2, ‘book6’: 2, ‘book10’: 4, ‘book1’: 5, ‘book18’: 3, ‘book20’: 5}
stu10 -> {‘book9’: 5, ‘book2’: 2, ‘book10’: 4, ‘book11’: 4, ‘book8’: 2, ‘book16’: 3}
stu11 -> {‘book1’: 4, ‘book15’: 5, ‘book11’: 1, ‘book14’: 1, ‘book10’: 2, ‘book9’: 2, ‘book12’: 1}
stu12 -> {‘book10’: 1, ‘book6’: 5, ‘book20’: 2, ‘book17’: 2, ‘book11’: 3, ‘book5’: 3, ‘book12’: 5, ‘book15’: 5}
stu13 -> {‘book11’: 3, ‘book5’: 2, ‘book17’: 1, ‘book8’: 4, ‘book3’: 4, ‘book18’: 1}
stu14 -> {‘book17’: 1, ‘book1’: 5, ‘book10’: 4, ‘book18’: 4, ‘book11’: 5, ‘book7’: 1, ‘book19’: 4, ‘book6’: 5}
stu15 -> {‘book8’: 4, ‘book13’: 1, ‘book18’: 2, ‘book12’: 2, ‘book9’: 2, ‘book16’: 4, ‘book14’: 5}
stu16 -> {‘book14’: 5, ‘book1’: 5, ‘book16’: 3, ‘book11’: 2, ‘book5’: 2, ‘book12’: 5, ‘book17’: 4, ‘book13’: 3}
stu17 -> {‘book12’: 1, ‘book16’: 4, ‘book13’: 2, ‘book7’: 4, ‘book20’: 2, ‘book11’: 5, ‘book17’: 4, ‘book6’: 5, ‘book5’: 3}
stu18 -> {‘book11’: 2, ‘book15’: 4, ‘book1’: 3, ‘book2’: 1, ‘book7’: 3, ‘book16’: 5, ‘book6’: 4, ‘book17’: 2}
stu19 -> {‘book18’: 2, ‘book5’: 4, ‘book12’: 2, ‘book3’: 4, ‘book2’: 4, ‘book15’: 2, ‘book17’: 3}
2.2.2 统计书的受欢迎度
一本书被哪些学生读过了: {‘book3’:{‘stu0’,‘stu3’,…},‘book9’:{‘stu1’,‘stu2’,…},… }
再统计次数,结果形式如:{‘book3’:20,‘book9’:15,… }
book_readers = dict() #{"book10":{'stu4', 'stu5', 'stu9', 'stu3', 'stu0'}}
for stu, books in simulate_data.items():
# stu1 -> {'book8': 5, 'book4': 4, 'book16': 1, 'book18': 5, 'book3': 2}
for book in books.keys():
# {'book8': 5, 'book4': 4, 'book16': 1, 'book18': 5, 'book3': 2}
book_readers[book] = book_readers.get(book,set())
book_readers[book].add(stu)
#{'book3':{'stu0','stu3',...},'book9':{'stu1','stu2',...},... }
for book,readers in book_readers.items():
print(book,"->",readers)
book16 -> {‘stu4’, ‘stu6’, ‘stu17’, ‘stu18’, ‘stu16’, ‘stu0’, ‘stu15’, ‘stu3’, ‘stu10’}
book19 -> {‘stu7’, ‘stu0’, ‘stu14’, ‘stu8’, ‘stu1’}
book8 -> {‘stu0’, ‘stu13’, ‘stu15’, ‘stu10’}
book5 -> {‘stu7’, ‘stu19’, ‘stu3’, ‘stu17’, ‘stu12’, ‘stu16’, ‘stu0’, ‘stu13’, ‘stu6’, ‘stu1’}
book20 -> {‘stu4’, ‘stu7’, ‘stu6’, ‘stu17’, ‘stu12’, ‘stu0’, ‘stu9’, ‘stu3’}
book18 -> {‘stu4’, ‘stu7’, ‘stu19’, ‘stu2’, ‘stu14’, ‘stu15’, ‘stu0’, ‘stu13’, ‘stu9’, ‘stu8’, ‘stu1’}
book4 -> {‘stu0’, ‘stu7’, ‘stu6’, ‘stu1’}
book3 -> {‘stu5’, ‘stu13’, ‘stu19’, ‘stu1’}
book7 -> {‘stu6’, ‘stu17’, ‘stu18’, ‘stu14’, ‘stu8’, ‘stu1’}
book14 -> {‘stu11’, ‘stu7’, ‘stu5’, ‘stu2’, ‘stu16’, ‘stu15’, ‘stu9’, ‘stu1’}
book17 -> {‘stu6’, ‘stu19’, ‘stu5’, ‘stu2’, ‘stu17’, ‘stu14’, ‘stu12’, ‘stu18’, ‘stu16’, ‘stu4’, ‘stu13’, ‘stu8’}
book15 -> {‘stu11’, ‘stu19’, ‘stu3’, ‘stu5’, ‘stu2’, ‘stu18’, ‘stu12’, ‘stu6’}
book13 -> {‘stu16’, ‘stu2’, ‘stu17’, ‘stu15’}
book6 -> {‘stu7’, ‘stu2’, ‘stu17’, ‘stu18’, ‘stu12’, ‘stu14’, ‘stu9’, ‘stu8’}
book9 -> {‘stu11’, ‘stu5’, ‘stu2’, ‘stu4’, ‘stu15’, ‘stu6’, ‘stu10’}
book10 -> {‘stu11’, ‘stu5’, ‘stu12’, ‘stu14’, ‘stu9’, ‘stu3’, ‘stu10’}
book11 -> {‘stu11’, ‘stu17’, ‘stu14’, ‘stu12’, ‘stu18’, ‘stu16’, ‘stu4’, ‘stu13’, ‘stu3’, ‘stu10’}
book1 -> {‘stu11’, ‘stu7’, ‘stu3’, ‘stu18’, ‘stu16’, ‘stu14’, ‘stu9’, ‘stu6’}
book12 -> {‘stu11’, ‘stu19’, ‘stu5’, ‘stu17’, ‘stu12’, ‘stu16’, ‘stu15’, ‘stu8’}
book2 -> {‘stu19’, ‘stu18’, ‘stu10’}
#统计书的阅读频次
book_read_count = dict() # {'book12': 1, 'book10': 5, ...}
book_read_count = {item[0]:len(item[1]) for item in book_readers.items()}
print(book_read_count)
{‘book16’: 9, ‘book19’: 5, ‘book8’: 4, ‘book5’: 10, ‘book20’: 8, ‘book18’: 11, ‘book4’: 4, ‘book3’: 4, ‘book7’: 6, ‘book14’: 8, ‘book17’: 12, ‘book15’: 8, ‘book13’: 4, ‘book6’: 8, ‘book9’: 7, ‘book10’: 7, ‘book11’: 10, ‘book1’: 8, ‘book12’: 8, ‘book2’: 3}
#根据阅读频次降序排序
sorted_book_counter = sorted(book_read_count.items(),key=lambda item:item[1],reverse=True)
print("sorted_book_counter:\n",sorted_book_counter)
max_read_counter = max(book_read_count.items(),key=lambda item:item[1])
print("max_read_counter:",max_read_counter) #最受欢迎的书
sorted_book_counter:
[(‘book17’, 12), (‘book18’, 11), (‘book5’, 10), (‘book11’, 10), (‘book16’, 9), (‘book20’, 8), (‘book14’, 8), (‘book15’, 8), (‘book6’, 8), (‘book1’, 8), (‘book12’, 8), (‘book9’, 7), (‘book10’, 7), (‘book7’, 6), (‘book19’, 5), (‘book8’, 4), (‘book4’, 4), (‘book3’, 4), (‘book13’, 4), (‘book2’, 3)]
max_read_counter: (‘book17’, 12)
2.2.3 推荐算法
2.2.3.1 生成一个测试样本
from random import randint
test_stu = {"book"+str(randint(1,20)):randint(1,5) for j in range(randint(BOOK_NUM-1,BOOK_NUM+1))}
print(test_stu)
{‘book5’: 2,
‘book11’: 4,
‘book10’: 1,
‘book18’: 5,
‘book17’: 5,
‘book6’: 2,
‘book16’: 3}
找其相似的读者
何为最相似呢?
(1) 两个读者所看书目相同的最多: 看过的书取交集,再求长度len(),取最大
(2) 两个读者所读书目后所给评分最接近: 对应的评分做方差,取最小
注意:
(1)取最大,(2)取最小,两个可统一成取最小。
即把(1)max(len()) <=> min(-len())
# 同时满足条件(1)和条件(2)
similar_reader,similar_books = min(simulate_data.items(),
key = lambda item:(
-len(test_stu.keys() & item[1].keys()),
sum([(test_stu[book]-item[1][book])**2 for book in test_stu.keys()&item[1].keys()])
)
)
print(similar_reader,similar_books)
stu17 {‘book12’: 1, ‘book16’: 4, ‘book13’: 2, ‘book7’: 4, ‘book20’: 2, ‘book11’: 5, ‘book17’: 4, ‘book6’: 5, ‘book5’: 3}
#测试样本与相似者的对比
print(f"test_stu 读的书及评分为:{test_stu}")
print(f"{similar_reader} 读的书及评分为:{similar_books}")
test_stu 读的书及评分为:{‘book5’: 2, ‘book11’: 4, ‘book10’: 1, ‘book18’: 5, ‘book17’: 5, ‘book6’: 2, ‘book16’: 3}
stu17 读的书及评分为:{‘book12’: 1, ‘book16’: 4, ‘book13’: 2, ‘book7’: 4, ‘book20’: 2, ‘book11’: 5, ‘book17’: 4, ‘book6’: 5, ‘book5’: 3}
2.2.3.3 开始推荐
(1) 要推荐还没看过的书: 两个看过的书做差集合
(2) 推荐顺序依据评分,从高到低:根据评分,降序排序
# 可以推荐的书目
recommend_book_names = similar_books.keys()- test_stu.keys()
print(recommend_book_names)
{‘book20’, ‘book7’, ‘book13’, ‘book12’}
# 可推荐的书及评分
recomend_books = {book:similar_books[book] for book in similar_books.keys()-test_stu.keys()}
recomend_books
{‘book20’: 2, ‘book7’: 4, ‘book13’: 2, ‘book12’: 1}
#按评分降序排序后的推荐列表
recommend_result = sorted(recomend_books.items(),key=lambda item:item[1],reverse=True )
recommend_result
[(‘book7’, 4), (‘book20’, 2), (‘book13’, 2), (‘book12’, 1)]
#只推荐2本
recommend_result[:2]
[(‘book7’, 4), (‘book20’, 2)]
3总结
所涉及的知识点回顾
- 字典
- 字典的keys(),values(),items()
- 字典推导式
- itertools.groupby()的使用
- 推荐算法初步