实验4：列表与字典应用

向阳733

已于 2025-04-25 10:50:36 修改

阅读量362

点赞数 3

CC 4.0 BY-SA版权

文章标签： vscode 开发语言大数据 python

于 2025-04-25 10:03:52 首次发布

本文链接：https://blog.csdn.net/2402_82403595/article/details/147499020

目的：熟练操作组合数据类型。
实验任务：

1. 基础：生日悖论分析。如果一个房间有23 人或以上，那么至少有两个人的生日相同的概率大于50%。编写程序，输出在不同随机样本数量下，23 个人中至少两个人生日相同的概率。

代码：

import random
import matplotlib.pyplot as plt

def birthday_paradox(num_people, num_simulations):
    """
    生日悖论模拟
    参数:
        num_people: 房间中的人数
        num_simulations: 模拟次数
    返回:
        至少两人生日相同的概率
    """
    count = 0
    for _ in range(num_simulations):
        birthdays = [random.randint(1, 365) for _ in range(num_people)]
        if len(birthdays) != len(set(birthdays)):
            count += 1
    return count / num_simulations

# 测试不同样本数量下的概率
sample_sizes = [100, 1000, 10000, 100000]
probabilities = []
for size in sample_sizes:
    prob = birthday_paradox(23, size)
    probabilities.append(prob)
    print(f"样本数: {size}, 概率: {prob:.4f}")

# 绘制结果
plt.plot(sample_sizes, probabilities, 'bo-')
plt.xscale('log')
plt.xlabel('样本数量(log scale)')
plt.ylabel('概率')
plt.title('生日悖论模拟 (23人)')
plt.grid(True)
plt.show()

运行截图：

2. 进阶：统计《一句顶一万句》文本中前10 高频词，生成词云。

代码：

from wordcloud import WordCloud
import jieba
from collections import Counter
import matplotlib.pyplot as plt

# 假设文本已保存在文件中
def analyze_text(text_path, top_n=10):
    with open(text_path, 'r', encoding='utf-8') as f:
        text = f.read()
    
    # 使用jieba分词
    words = jieba.lcut(text)
    
    # 过滤停用词和单字
    stopwords = ['的', '了', '和', '是', '在', '我', '有', '他', '她', '它']
    words = [word for word in words if len(word) > 1 and word not in stopwords]
    
    # 统计词频
    word_counts = Counter(words)
    top_words = word_counts.most_common(top_n)
    
    # 生成词云
    wordcloud = WordCloud(font_path='simhei.ttf', 
                          background_color='white',
                          width=800, height=600).generate(' '.join(words))
    
    # 显示结果
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.axis('off')
    plt.show()
    
    return top_words

# 使用示例
# top_words = analyze_text('一句顶一万句.txt')
# print("前10高频词:", top_words)

运行截图：

3. 拓展：金庸、古龙等武侠小说写作风格分析。输出不少于3 个金庸（古龙）作品的最常用10 个词语，找到其中的相关性，总结其风格。

代码：

import jieba
from collections import Counter
import matplotlib.pyplot as plt

def analyze_author_style(book_paths, author_name):
    """
    分析作者风格
    参数:
        book_paths: 该作者多本书的路径列表
        author_name: 作者姓名
    """
    all_words = []
    
    for path in book_paths:
        with open(path, 'r', encoding='utf-8') as f:
            text = f.read()
            words = jieba.lcut(text)
            # 过滤停用词和单字
            stopwords = ['的', '了', '和', '是', '在', '我', '有', '他', '她', '它']
            words = [word for word in words if len(word) > 1 and word not in stopwords]
            all_words.extend(words)
    
    # 统计词频
    word_counts = Counter(all_words)
    top_words = word_counts.most_common(10)
    
    # 可视化
    words, counts = zip(*top_words)
    plt.figure(figsize=(10, 6))
    plt.bar(words, counts)
    plt.title(f'{author_name}作品高频词分析')
    plt.xticks(rotation=45)
    plt.show()
    
    return top_words

# 使用示例
# 金庸作品分析
# jin_yong_books = ['book1.txt', 'book2.txt', 'book3.txt']
# jin_yong_top_words = analyze_author_style(jin_yong_books, '金庸')