Coursera课程属性与评论情感分析

文章对Coursera平台的课程数据进行了深入分析,包括课程评分分布、难度级别与学生注册情况。数据显示,课程平均评分较高,中级课程评分可能因学习者期望严格而稍低。此外,还探讨了课程难度、评分和证书类型之间的关系,以及课程提供者的评价排名。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

介绍

Coursera

Coursera是全球最大的在线教育平台,拥有超过4000万注册用户,Coursera由美国斯坦福大学两名计算机科学教授达芙妮科勒 (Daphne Koller) 和吴恩达 (Andrew Ng) 于2012年在美国加利福尼亚州创办,其愿景是为世界各地的学习者提供变革性的学习体验。
Coursera上的所有课程都是由知名大学和顶尖公司或者其他学术机构提供的,认可度高。提供了世界上一些呗级大学的学位,例如耶鲁大学,伦敦帝国理工学院和 日内瓦大学。用户可以像完成普通大学一样,修卖完整的学士或硕士学位并获得认证。Coursera通过领先机构的高度合作关系,实现了在线教育领域变革性改变。迄今 为止,Cousera已经吸引了7.6亿亿多名学习者,100多家世界500强公司,包括以及6,400多家学校,企业和政府加入到Coursera。

数据集

Course dataset scrapped from Coursera website. This dataset contains mainly 6 columns and 890 course data. The detailed description:

course_title : Contains the course title.
course_organization : It tells which organization is conducting the courses.
courseCertificatetype : It has details about what are the different certifications available in courses.
course_rating : It has the ratings associated with each course.
course_difficulty : It tells about how difficult or what is the level of the course.
course_students_enrolled : It has the number of students that are enrolled in the course.

课程属性分析

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as sps
import os
import seaborn as sns
from scipy import stats
import warnings

import os
warnings.filterwarnings('ignore')
df=pd.read_csv("/kaggle/input/coursera-course-dataset/coursea_data.csv")
df=df.drop("Unnamed: 0",axis=1) 
df.describe(include=['object']).T
countuniquetopfreq
course_title891888Developing Your Musicianship2
course_organization891154University of Pennsylvania59
course_Certificate_type8913COURSE582
course_difficulty8914Beginner487
course_students_enrolled891205120k22

课程平均分为4.68,课程质量较高,最小值3.3,最大值5.
————————————————–——————————
评分分布

sns.set(rc={'figure.figsize':(10,10)})
ax = sns.boxenplot( y="course_rating", data=df,
                   showfliers=False,color='orange')
ax = sns.stripplot( y="course_rating", data=df,
                   size=4, color="maroon",alpha=0.2)
ax.axes.set_title("\ncourse Rating Distribution\n",fontsize=30)

在这里插入图片描述

sns.set(rc={'figure.figsize':(10,5)})
p=sns.distplot(df['course_rating'],color='darkcyan',fit_kws={"color":"red"},fit=stats.gamma, label="label 1")
p.axes.set_title("\ncourse Rating Distribution\n",fontsize=30)

在这里插入图片描述
难度-评分 联合分布

sns.set(rc={'figure.figsize':(20,10)})
ax = sns.countplot(hue="course_rating", x="course_difficulty", data=df,palette="mako")
ax.axes.set_title("\nFrequency Distribution based on difficulty\n",fontsize=20)

在这里插入图片描述
高级课程由于数量较少评分起伏不大;初级课程的评分分布与总的评分分布相接近;中级课程的评分峰值不是很高,可能由于学习者积累了一定知识,评分时更加严格。
评分-类型 联合分布

sns.set(rc={'figure.figsize':(20,10)})
ax = sns.countplot(hue="course_rating", x="course_Certificate_type", data=df,palette="ch:s=-.2,r=.6")
ax.axes.set_title("\nRating distribution per course type\n",fontsize=20)

在这里插入图片描述
评分-类型-难度 联合分布

sns.set(rc={'figure.figsize':(10,5)})
ax = sns.countplot(hue="course_difficulty", x="course_Certificate_type", data=df,palette="rocket_r")
ax.axes.set_title("\nRating distribution per course type : Combined\n",fontsize=20)

在这里插入图片描述
混合难度的课程分布不稳定,其余分布均比较稳定
证书分布

sns.set(rc={'figure.figsize':(10,5)})
ax = sns.countplot( x="course_difficulty", data=df,palette="crest")
ax.axes.set_title("\nDistribution per course type \n",fontsize=20)

在这里插入图片描述

sns.set(rc={'figure.figsize':(10,5)})
ax = sns.countplot( x="course_Certificate_type", data=df,palette="ch:s=.8,r=.1")
ax.axes.set_title("\nRating distribution per course certification type : Combined\n",fontsize=20)

在这里插入图片描述
数据清洗

  • 去除第一列
  • 删除课程名属性(课程名不唯一,可以用id替代)
  • 其它

修改注册人数数值表示

df_fe1=df.copy()
def course_students_enrolled_modifier(x):
    return x[:-2]
df_fe1['course_students_enrolled_modified']=df_fe1['course_students_enrolled'].apply(course_students_enrolled_modifier)
df_fe1['course_students_enrolled_modified']=df_fe1['course_students_enrolled_modified'].apply(pd.to_numeric)
df_fe1 =df_fe1.drop(['course_students_enrolled'],axis=1)
df_fe1
course_titlecourse_organizationcourse_Certificate_typecourse_ratingcourse_difficultycourse_students_enrolled_modified
0(ISC)² Systems Security Certified Practitioner…(ISC)²SPECIALIZATION4.7Beginner5.0
1A Crash Course in Causality: Inferring Causal…University of PennsylvaniaCOURSE4.7Intermediate1.0
2A Crash Course in Data ScienceJohns Hopkins UniversityCOURSE4.5Mixed13.0
3A Law Student’s ToolkitYale UniversityCOURSE4.7Mixed9.0
4A Life of Happiness and FulfillmentIndian School of BusinessCOURSE4.8Mixed32.0

将课程难度属性修改为数值类型

def course_difficulty_modifier(x):
    if x=="Beginner":
        return "0"
    elif x=="Intermediate":
        return "1"
    elif x=="Mixed":
        return "0.5"
    elif x=="Advanced":
        return "2"
    else:
        return "0" 
"""as most courses are beginner level, we are assuming undefined will be beginner too."""
df_fe1['course_difficulty_modified']=df_fe1['course_difficulty'].apply(course_difficulty_modifier)
df_fe1['course_difficulty_modified']=df_fe1['course_difficulty_modified'].apply(pd.to_numeric)
df_fe1 =df_fe1.drop(['course_difficulty'],axis=1)
df_fe1
course_titlecourse_organizationcourse_Certificate_typecourse_ratingcourse_difficultycourse_students_enrolled_modified
0(ISC)² Systems Security Certified Practitioner…(ISC)²SPECIALIZATION4.70.05.0
1A Crash Course in Causality: Inferring Causal…University of PennsylvaniaCOURSE4.71.01.0
2A Crash Course in Data ScienceJohns Hopkins UniversityCOURSE4.50.513.0
3A Law Student’s ToolkitYale UniversityCOURSE4.70.59.0
4A Life of Happiness and FulfillmentIndian School of BusinessCOURSE4.80.532.0

注册人数分布

df_fe1[['course_difficulty_modified','course_students_enrolled_modified']].describe()
course_difficulty_modifiedcourse_students_enrolled_modified
count891.000000881.000000
mean0.3698098.511918
std0.47273810.731756
min0.0000001.000000
25%0.0000002.000000
50%0.0000005.000000
75%0.5000009.000000
max2.00000083.000000
sns.set(rc={'figure.figsize':(10,5)})
p=sns.distplot(df_fe1['course_students_enrolled_modified'],color='indigo')
p.axes.set_title("\n Course_students_enrolled Distribution\n",fontsize=20)

在这里插入图片描述

相关性分析

corrM = df_numaric.corr()
corrM
course_ratingcourse_students_enrolled_modifiedcourse_difficulty_modified
course_rating1.0000000.015939-0.089810
course_students_enrolled_modified0.0159391.000000-0.011343
course_difficulty_modified-0.089810-0.0113431.000000
sns.set(rc={'figure.figsize':(10,5)})
ax = sns.scatterplot( x='course_rating', y='course_difficulty_modified', data=df_numaric,palette="crest")

在这里插入图片描述
课程难度与评分无显著相关性
Top Rated Course Provider

df['course_title']=df['course_title']

g_uni['overall_rating']=(g_uni['course_students_enrolled_modified']/g_uni['course_students_enrolled_modified'].max())*3+(g_uni['course_rating']/g_uni['course_rating'].max())*7
g_uni=g_uni.sort_values(by='overall_rating',ascending=False)

g_uni.overall_rating.describe().T

在这里插入图片描述

g_unix=g_uni[g_uni['overall_rating']>=8.5]
g_unix
course_organizationcourse_students_enrolled_modifiedcourse_ratingsizeoverall_rating
58McMaster University23.0000004.80000019.857143
33Google - Spectrum Sharing21.0000004.90000019.739130
151École Polytechnique19.0000004.80000019.335404
52Ludwig-Maximilians-Universität München (LMU)19.0000004.75000029.263975
150deeplearning.ai18.3444954.743750169.169546
30Georgia Institute of Technology17.7000004.660000108.965839
142University of Washington16.6000004.66000058.822360
48Johns Hopkins University15.6785714.660714288.703194
123University of California, Irvine16.1481484.596296278.672418
79SAS13.6666674.76666738.592133
149Yonsei University13.7500004.75000048.579193
20Duke University14.5000004.664286288.554570
145Vanderbilt University14.3333334.66666738.536232
92The Museum of Modern Art13.0000004.78333368.528986
sns.set(rc={'figure.figsize':(25,5)})
plt.xticks(fontsize=20,rotation='vertical')
p=sns.barplot(x='course_organization',y="course_rating",data=g_unix,hue_order=g_uni['overall_rating'])
p.axes.set_title("\nBest course providers\n\n",fontsize=30)

在这里插入图片描述

课程评论分析

r=pd.read_csv('/kaggle/input/course-reviews-on-coursera/Coursera_reviews.csv')
r.head()
reviewsreviewersdate_reviewsratingcourse_id
0Pretty dry, but I was able to pass with just t…By Robert SFeb 12, 20204google-cbrs-cpi-training
1would be a better experience if the video and …By Gabriel E RSep 28, 20204google-cbrs-cpi-training
2Information was perfect! The program itself wa…By Jacob DApr 08, 20204google-cbrs-cpi-training
3A few grammatical mistakes on test made me do …By Dale BFeb 24, 20204google-cbrs-cpi-training
4Excellent course and the training provided was…By Sean GJun 18, 20204google-cbrs-cpi-training
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
from PIL import Image
import nltk  
nltk.download('stopwords') 
from nltk.corpus import stopwords 
from nltk.stem.porter import PorterStemmer 
from collections import Counter
import cufflinks as cf
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyser = SentimentIntensityAnalyzer()
wordcloud = WordCloud(max_font_size=50, max_words=100, background_color="white").generate(' '.join(r.sample(10).reviews))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

在这里插入图片描述
利用 Vader Sentiment 实现评论情感分析

def sentiment_analyzer_scores(sentence):
    score = analyser.polarity_scores(sentence)
    return score
neg=[]
neu=[]
pos=[]
comp=[]
for review in r.reviews:
    scores=sentiment_analyzer_scores(str(review))
    neg.append(scores['neg'])
    pos.append(scores['pos'])
    neu.append(scores['neu'])
    comp.append(scores['compound'])
r['s_pos']=pos
r['s_neu']=neu
r['s_neg']=neg
r['s_comp']=comp

r.head()
reviewsreviewersdate_reviewsratingcourse_ids_poss_neus_negs_comp
0Pretty dry, but I was able to pass with just t…By Robert SFeb 12, 20204google-cbrs-cpi-training0.1980.7070.0940.8504
1would be a better experience if the video and …By Gabriel E RSep 28, 20204google-cbrs-cpi-training0.0560.9440.0000.4404
2Information was perfect! The program itself wa…By Jacob DApr 08, 20204google-cbrs-cpi-training0.1610.7460.0930.6572
3A few grammatical mistakes on test made me do …By Dale BFeb 24, 20204google-cbrs-cpi-training0.1750.7430.0810.4633
4Excellent course and the training provided was…By Sean GJun 18, 20204google-cbrs-cpi-training0.3840.6160.0000.7823
for i in range(6,11):
    print ("----------------------------------------")
    print("Review : ",r['reviews'][i])
    print("Positivity:",r['s_pos'][i])
    print("Negativity:",r['s_neg'][i])
    print ("----------------------------------------")
----------------------------------------
Review :  Solid presentation all the way through. I really appreciated the intermittent questions that popped up to check on learning as well the regular (but not needless) quizzing. There was visuals such as charts / .ppt for those of us more visually inclined as well as a transcript below the video that followed along with the presentation!
Positivity: 0.135
Negativity: 0.0
----------------------------------------
----------------------------------------
Review :  Probably the best certification course I've taken in this respect. The course is planned out carefully, and I believe gave me everything I needed to ace my exam the first time around. The trainer for the majority of the course was awesome. She delivered the material in a great, professional mannor, but was never boring or monotoned. 
Positivity: 0.212
Negativity: 0.0
----------------------------------------
----------------------------------------
Review :  The ProctorU.com system took 2 times the amount of time spent on this course over 3 days to complete.  It is the worse production user system I have used in 20+ years of my IT career.    You should switch to another vendor.
Positivity: 0.0
Negativity: 0.07
----------------------------------------
----------------------------------------
Review :  Covered all of the required information in an easy to understand way and WITH VIDEO! Great, easy way to learn. The exam process was a bit drawn out and more extensive then it needed to be, but over all a great experience
Positivity: 0.246
Negativity: 0.0
----------------------------------------
----------------------------------------
Review :  Great course, lectures were straight forward and easy to follow along.  The course provided all the information necessary to pass the CPI examination for certification.
Positivity: 0.288
Negativity: 0.0
----------------------------------------

情感分布

sns.set(rc={'figure.figsize':(20,5)})
plt.xticks(fontsize=12)
p=sns.distplot(r['s_pos'],color='green')
p.axes.set_title("Positive Reviews",fontsize=20)

在这里插入图片描述

sns.set(rc={'figure.figsize':(20,5)})
plt.xticks(fontsize=12)
p=sns.distplot(r['s_neg'],color='red')
p.axes.set_title("Negative Reviews",fontsize=20)

在这里插入图片描述

sns.set(rc={'figure.figsize':(20,5)})
plt.xticks(fontsize=12)
p=sns.distplot(r['s_neu'],color='blue')
p.axes.set_title("Neutral Reviews",fontsize=20)

在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值