考试成绩分析：父母教育与备考课程的影响-CSDN博客

本文链接：https://blog.csdn.net/m0_66235114/article/details/129885641

分析了备考课程对学生考试成绩的提升作用，发现成绩与父母教育水平高度相关。硕士子女成绩最高，参加备考课程的学生平均阅读分数提升明显。各科成绩间存在强正相关性，阅读和写作相关系数达0.95。备考课程对低教育水平家庭学生提分效果显著。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

考试成绩分析

📖 背景

你最好的朋友是一所大学校的行政人员。学校要求每个学生参加年终数学、阅读和写作考试。

由于您最近学习了数据操作和可视化，建议您帮助您的朋友分析评分结果。学校校长想知道备考课程是否有帮助。她还想探讨父母教育水平对考试成绩的影响。

💾 数据

该文件具有以下字段:

“gender” - male / female
“race/ethnicity” - one of 5 combinations of race/ethnicity
“parent_education_level” - highest education level of either parent
“lunch” - whether the student receives free/reduced or standard lunch
“test_prep_course” - whether the student took the test preparation course
“math” - exam score in math
“reading” - exam score in reading
“writing” - exam score in writing

💪 挑战

创建一份报告来回答校长的问题。包括：

有/没有备考课程的学生的平均阅读分数是多少？
不同父母教育水平下，学生的平均分数是多少？
比较在不同家长教育水平的下，有/没有参加考试准备课程的学生的平均分数。
校长想知道在一门科目上表现出色的孩子是否在其他科目上也取得了不错的成绩。查看分数之间的相关性。

探索性数据分析

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

df = pd.read_csv('exams.csv')

df.head()

	gender	race/ethnicity	parent_education_level	lunch	test_prep_course	math	reading	writing
0	female	group B	bachelor's degree	standard	none	72	72	74
1	female	group C	some college	standard	completed	69	90	88
2	female	group B	master's degree	standard	none	90	95	93
3	male	group A	associate's degree	free/reduced	none	47	57	44
4	male	group C	some college	standard	none	76	78	75

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   gender                  1000 non-null   object
 1   race/ethnicity          1000 non-null   object
 2   parent_education_level  1000 non-null   object
 3   lunch                   1000 non-null   object
 4   test_prep_course        1000 non-null   object
 5   math                    1000 non-null   int64 
 6   reading                 1000 non-null   int64 
 7   writing                 1000 non-null   int64 
dtypes: int64(3), object(5)
memory usage: 62.6+ KB

Q1：有/没有备考课程的学生阅读平均成绩是多少？

df.groupby('test_prep_course')['reading'].mean()

test_prep_course
completed    73.893855
none         66.534268
Name: reading, dtype: float64

df.groupby('test_prep_course')['reading'].mean().plot(kind='bar')
plt.ylabel('score')
plt.title('the average reading scores for students with/without the test preparation course');

在这里插入图片描述

with_test_prep = df[df['test_prep_course'] == 'completed']
without_test_prep = df[df['test_prep_course'] == 'none']

fig, ax = plt.subplots(1, 3, figsize=(15,6), sharey=True)
cols = ['math', 'reading', 'writing']
for i, col in enumerate(cols):
    sns.kdeplot(with_test_prep[col], ax=ax[i], label=str(col) + 'with test prep')
    sns.kdeplot(without_test_prep[col], linestyle='--', ax=ax[i], label=str(col) + 'without test prep')
    ax[i].legend()

plt.suptitle('KDE Plots of Exam Scores With and Without Preparation', fontsize = 20, fontweight = 'bold');

在这里插入图片描述

参加考试准备课程的学生各科的分数均有所提高。
参加考试准备课程的学生的平均阅读分数约为 74，而没有参加的学生为 66.5

Q2：不同父母教育水平下，学生的平均分数是多少？

df.groupby('parent_education_level')[['math', 'reading', 'writing']].mean().style.background_gradient(cmap='RdYlGn_r')

	math	reading	writing
parent_education_level
associate's degree	67.882883	70.927928	69.896396
bachelor's degree	69.389831	73.000000	73.381356
high school	62.137755	64.704082	62.448980
master's degree	69.745763	75.372881	75.677966
some college	67.128319	69.460177	68.840708
some high school	63.497207	66.938547	64.888268

# x = df.groupby('parent_education_level')[['math', 'reading', 'writing']].mean()
# x.style.apply(lambda m: ["background: red" if i == m.argmax() else '' for i,_ in enumerate(m)])

df.groupby('parent_education_level')[['math', 'reading', 'writing']].mean().plot(kind='bar')
plt.legend(loc='upper left', bbox_to_anchor=(1, 1));

在这里插入图片描述

avg_scores = df.groupby('parent_education_level')[['math', 'reading', 'writing']].mean()
fig, ax = plt.subplots()
sns.pointplot(data = avg_scores, x = avg_scores.index, y = 'math',label='Math')
sns.pointplot(data = avg_scores, x = avg_scores.index, y = 'reading', label='Reading',color='r')
sns.pointplot(data = avg_scores, x = avg_scores.index, y = 'writing', label='Writing',color='g')

ax.legend(handles=ax.lines[::len(avg_scores)+1], labels=["Math","Reading","Writing"])

# ax.set_xticklabels([t.get_text().split("T")[0] for t in ax.get_xticklabels()])
# plt.gcf().autofmt_xdate()

plt.xticks(rotation=45);

在这里插入图片描述

除了高中外，总体来看，父母的教育程度越高，孩子的学习成绩就越高。
拥有硕士学位的父母，孩子的每门学科中的平均分最高。

Q3：比较在不同家长教育水平的下，有/没有参加考试准备课程的学生的平均分数。

cols = ['math', 'reading', 'writing']
fig, axes = plt.subplots(3,1, figsize=(10, 6), sharex=True, gridspec_kw={
   'hspace': 0.5}) 
for i, col in enumerate(cols):
    sns.boxplot(x='parent_education_level', y=col, hue='test_prep_course', data=df, ax=axes[i])
    axes[i].set_title(col.capitalize() + ' Scores')  
    #axes[i].set_xlabel('Parent Education Level')  
    axes[i].set_ylabel(col.capitalize() + ' Score') 
    axes[i].set_xticklabels(axes[i].get_xticklabels(), rotation