Udemy Courses 数据分析

文章分析了Udemy平台上3678条课程数据,涉及商业金融、平面设计、乐器和网页设计四大科目。大部分课程是付费的,且主要针对所有水平和初学者。课程价格与订阅人数、评论数量及内容时长的关系并不明显,但时长可能影响价格。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

数据集介绍

  • Udemy是一个大规模在线开放课程(MOOC)平台,提供免费和付费课程。任何人都可以创建课程,这种商业模式使Udemy拥有数十万门课程。
  • Udemy Courses数据集包含3.682条来自4个科目(商业金融、平面设计、乐器和网页设计)Udemy的课程记录。

数据分析

原始数据展示

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

data = pd.read_csv('/kaggle/input/udemy-courses/udemy_courses.csv')
data.head(10)
course_idcourse_titleurlis_paidpricenum_subscribersnum_reviewsnum_lectureslevelcontent_durationpublished_timestampsubject
01070968Ultimate Investment Banking Coursehttps://www.udemy.com/ultimate-investment-bank…True20021472351All Levels1.52017-01-18T20:58:58ZBusiness Finance
11113822Complete GST Course & Certification - Grow You…https://www.udemy.com/goods-and-services-tax/True752792923274All Levels39.02017-03-09T16:34:20ZBusiness Finance
21006314Financial Modeling for Business Analysts and C…https://www.udemy.com/financial-modeling-for-b…True4521747451Intermediate Level2.52016-12-19T19:26:30ZBusiness Finance
31210588Beginner to Pro - Financial Analysis in Excel …https://www.udemy.com/complete-excel-finance-c…True9524511136All Levels3.02017-05-30T20:07:24ZBusiness Finance
41011058How To Maximize Your Profits Trading Optionshttps://www.udemy.com/how-to-maximize-your-pro…True20012764526Intermediate Level2.02016-12-13T14:57:18ZBusiness Finance
5192870Trading Penny Stocks: A Guide for All Levels I…https://www.udemy.com/trading-penny-stocks-a-g…True150922113825All Levels3.02014-05-02T15:13:30ZBusiness Finance
6739964Investing And Trading For Beginners: Mastering…https://www.udemy.com/investing-and-trading-fo…True65154017826Beginner Level1.02016-02-21T18:23:12ZBusiness Finance
7403100Trading Stock Chart Patterns For Immediate, Ex…https://www.udemy.com/trading-chart-patterns-f…True95291714823All Levels2.52015-01-30T22:13:03ZBusiness Finance
8476268Options Trading 3 : Advanced Stock Profit and …https://www.udemy.com/day-trading-stock-option…True19551723438Expert Level2.52015-05-28T00:14:03ZBusiness Finance
91167710The Only Investment Strategy You Need For Your…https://www.udemy.com/the-only-investment-stra…True2008271415All Levels1.02017-04-18T18:13:32ZBusiness Finance
data.shape

(3678, 12)

df.info()

在这里插入图片描述

df.describe()
course_idpricenum_subscribersnum_reviewsnum_lecturescontent_duration
count3.678000e+033678.0000003678.0000003678.0000003678.0000003678.000000
mean6.759720e+0566.0494833197.150625156.25910840.1087554.094517
std3.432732e+0561.0057559504.117010935.45204450.3833466.053840
min8.324000e+030.0000000.0000000.0000000.0000000.000000
25%4.076925e+0520.000000111.0000004.00000015.0000001.000000
50%6.879170e+0545.000000911.50000018.00000025.0000002.000000
75%9.613555e+0595.0000002546.00000067.00000045.7500004.500000
max1.282064e+06200.000000268923.00000027445.000000779.00000078.500000

数据预处理

  • 检查空值
  • 去除无用列属性
  • 去除重复行
  • 更改列属性类型
df.isnull().sum() #检查空值

在这里插入图片描述

df.drop('url',axis=1,inplace=True) #去除无用的“url”属性
df['published_timestamp']=pd.to_datetime(df['published_timestamp']) # 将“published_timestamp”属性规范化为日期格式
df[df.duplicated()] #展示重复行
course_idcourse_titleis_paidpricenum_subscribersnum_reviewsnum_lectureslevelcontent_durationpublished_timestampsubject
787837322Essentials of money value: Get a financial Life !True200020All Levels0.6166672016-05-16 18:28:30+00:00Business Finance
7881157298Introduction to Forex Trading Business For Beg…True200027Beginner Level1.5000002017-04-23 16:19:01+00:00Business Finance
8941035638Understanding Financial StatementsTrue250010All Levels1.0000002016-12-15 14:56:17+00:00Business Finance
11001084454CFA Level 2- Quantitative MethodsTrue400035All Levels5.5000002017-07-02 14:29:35+00:00Business Finance
1473185526MicroStation - CélulasTrue20009Beginner Level0.6166672014-04-15 21:48:55+00:00Graphic Design
256128295Learn Web Designing & HTML5/CSS3 Essentials in…True754328552524All Levels4.0000002013-01-03 00:55:31+00:00Web Development
df.drop_duplicates(inplace=True)

可视化分析

# 词云展示
from wordcloud import WordCloud 
text = " ".join(subject_titles for subject_titles in df["course_title"])
word_cloud = WordCloud(collocations = False,background_color='white', colormap = 'YlGnBu', min_font_size = 8).generate(text)
plt.figure(figsize = (20, 8))
plt.imshow(word_cloud, interpolation = 'bilinear')
plt.axis("off")
plt.show()

在这里插入图片描述

sns.countplot('is_paid',data=df, palette = [ 'darkblue', 'lightseagreen'])
plt.title('Paid vs Free courses')

在这里插入图片描述
大部分课程为付费课程

df['is_paid'].value_counts()

在这里插入图片描述

sns.heatmap(df.corr()[["price"]], cmap="Blues", annot=True);

在这里插入图片描述

plt.figure(figsize=(16,8))
sns.heatmap(df.corr(),annot=True,cmap="Blues")
plt.show()

在这里插入图片描述
大部分属性与“价格”属性相关

import plotly.express as px
%matplotlib inline
df['tmp'] = 1
fig = px.pie(df, names='level',values='tmp',hole = 0.8,title='relation tips',color_discrete_sequence=px.colors.diverging.Portland)
fig.update_traces(textposition='outside', textinfo='percent+label')
fig.update_layout(
    title_text="level percentage",
    annotations=[dict(text='Course Levels', x=0.5, y=0.5, font_size=20, showarrow=False)])
plt.figure(figsize=(8,6))
sns.countplot('level',data=df, palette = [ 'darkblue', 'lightseagreen', 'teal', 'cadetblue'] )

在这里插入图片描述
大部分课程面向各经验段或初学者

df['subject'].value_counts()

在这里插入图片描述
大部分课程为为网页开发或商业金融方向

levels = ['All Levels','Beginner Level','Intermediate Level','Expert Level']
numbers = []
for i in df['subject'].unique():
    tempDF = df[df['subject']==i]
    for j in levels:
        numbers.append([i,j,len(tempDF[tempDF['level']==j])])
data = {
    'Subject':[i[0] for i in numbers],
    'Level':[i[1] for i in numbers],
    'Count':[i[2] for i in numbers]
}
splitDF = pd.DataFrame(numbers,columns=['Subject','Level','Count'])
plt.figure(figsize=(16,8))
sns.barplot(data=splitDF,x='Subject',y='Count',hue='Level',palette = [ 'darkblue', 'lightseagreen', 'teal', 'cadetblue'])
plt.title(label='Level distribution matched to each subject.')
plt.show()

在这里插入图片描述
不同学习方向的课程难度分布

plt.figure(figsize=(12,8))
sns.regplot(x='price',y='num_subscribers',data=df)

在这里插入图片描述
课程订阅人数 VS 课程价格

plt.figure(figsize=(12,8))
sns.scatterplot(x='price',y='num_subscribers',data=df)

在这里插入图片描述
课程价格与订阅人数无关

plt.figure(figsize=(12,8))
sns.regplot(x='price',y='num_reviews',data=df)

在这里插入图片描述
课程价格与评论人数无关

plt.figure(figsize=(12,8))
sns.regplot(x='price',y='content_duration',data=df)

在这里插入图片描述
课程时长可能影响课程价格

plt.figure(figsize=(12,8))
sns.regplot(x='num_subscribers',y='content_duration',data=df)

在这里插入图片描述
课程时长越长,订阅人数越多

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值