网易云数据分析实战

最新推荐文章于 2025-03-27 22:57:22 发布

高中不复，大学纷飞

最新推荐文章于 2025-03-27 22:57:22 发布

阅读量5.2k

点赞数 9

分类专栏： matplotlib Pandas 文章标签：数据分析

本文链接：https://blog.csdn.net/qq_44760912/article/details/111499667

版权

网易云数据分析

字段：title,tag,text,collection,play,songs,comments

导入模块，读取数据

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import squarify

df = pd.read_excel('D:/Pandas/music_message.xlsx',header=0,names=['title','tag','text','collection','play','songs','comments'])
df.head()

	title	tag	text	collection	play	songs	comments
0	梦想无处安放的日子也要时常拿出来晾晒	华语-治愈-感动	介绍：和过去的时光，聊一聊用文字记下的回忆和许下的心愿，如今正裹挟着时光的风霜，与你撞个满...	824	339509	30	8
1	治愈说唱 I 越过黑暗的那道光	华语-说唱-治愈	介绍：今天，你又是为何戴上耳机？才发现，音乐真的是有力量的，治愈的力量。愿你有好运气，如果没...	2657	387541	42	12
2	《姐姐的爱乐之程》云南站路演	华语-流行-综艺	介绍：一个热爱音乐的纯朴村寨，一颗生长在茶山上的苍天大树，一轮雨后艳丽的彩虹，一段闲适的时光...	1175	204799	8	31
3	你搜不到的土嗨神曲	欧美-摇滚-流行	介绍：【不喜勿喷，自用歌单+村民推荐】这歌单容易嗨上头，如果你开车/走路请注意安全！现在很多...	167万	97074200	212	3972
4	是你的垚/刘大壮/王小帅/王泽科	华语-伤感-翻唱	介绍：喜欢歌单的可以点个关注哟歌单制作：小攀哟歌单创建：2020.4.1歌单修改：2020....	12万	7869796	115	280

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1289 entries, 0 to 1288
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   title       1289 non-null   object
 1   tag         1289 non-null   object
 2   text        1289 non-null   object
 3   collection  1289 non-null   object
 4   play        1289 non-null   int64 
 5   songs       1289 non-null   int64 
 6   comments    1289 non-null   object
dtypes: int64(2), object(5)
memory usage: 70.6+ KB

数据处理、清洗

df['collection'] = df['collection'].astype('string').str.strip()
df['collection'] = [int(str(i).replace('万','0000')) for i in df['collection']]
df['text'] = [str(i)[3:] for i in df['text']]
df['comments'] = [0 if '评论' in str(i).strip() else int(i) for i in df['comments']]

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1289 entries, 0 to 1288
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   title       1289 non-null   object
 1   tag         1289 non-null   object
 2   text        1289 non-null   object
 3   collection  1289 non-null   int64 
 4   play        1289 non-null   int64 
 5   songs       1289 non-null   int64 
 6   comments    1289 non-null   int64 
dtypes: int64(4), object(3)
memory usage: 70.6+ KB

df.shape

(1289, 7)

数据预览

df.head()

	title	tag	text	collection	play	songs	comments
0	梦想无处安放的日子也要时常拿出来晾晒	华语-治愈-感动	和过去的时光，聊一聊用文字记下的回忆和许下的心愿，如今正裹挟着时光的风霜，与你撞个满怀。那...	824	339509	30	8
1	治愈说唱 I 越过黑暗的那道光	华语-说唱-治愈	今天，你又是为何戴上耳机？才发现，音乐真的是有力量的，治愈的力量。愿你有好运气，如果没有，愿...	2657	387541	42	12
2	《姐姐的爱乐之程》云南站路演	华语-流行-综艺	一个热爱音乐的纯朴村寨，一颗生长在茶山上的苍天大树，一轮雨后艳丽的彩虹，一段闲适的时光。一场...	1175	204799	8	31
3	你搜不到的土嗨神曲	欧美-摇滚-流行	【不喜勿喷，自用歌单+村民推荐】这歌单容易嗨上头，如果你开车/走路请注意安全！现在很多年轻人...	1670000	97074200	212	3972
4	是你的垚/刘大壮/王小帅/王泽科	华语-伤感-翻唱	喜欢歌单的可以点个关注哟歌单制作：小攀哟歌单创建：2020.4.1歌单修改：2020.9.1...	120000	7869796	115	280

构建通用函数，简化代码

get_matplot(x,y,chart,title,ha,size,color)
x表示充当x轴数据；
y表示充当y轴数据；
chart表示图标类型，这里分为三种barh、hist、squarify.plot；
ha表示文本相对朝向；
size表示字体大小；
color表示图表颜色；

#构建函数
def get_matplot(x,y,chart,title,ha,size,color):
    # 设置图片显示属性,字体及大小
    plt.rcParams['font.sans-serif'] = ['Microsoft YaHei']
    plt.rcParams['font.size'] = size
    plt.rcParams['axes.unicode_minus'] = False
    # 设置图片显示属性
    fig = plt.figure(figsize=(16, 8), dpi=80)
    ax = plt.subplot(1, 1, 1)
    ax.patch.set_color('white')
    # 设置坐标轴属性
    lines = plt.gca()
    # 设置显示数据
    if x ==0:
        pass
    else:
        x.reverse()
        y.reverse()
        data = pd.Series(y, index=x)
    # 设置坐标轴颜色
    lines.spines['right'].set_color('none')
    lines.spines['top'].set_color('none')
    lines.spines['left'].set_color((64/255, 64/255, 64/255))
    lines.spines['bottom'].set_color((64/255, 64/255, 64/255))
    # 设置坐标轴刻度
    lines.xaxis.set_ticks_position('none')
    lines.yaxis.set_ticks_position('none')
    if chart == 'barh':
        # 绘制柱状图,设置柱状图颜色
        data.plot.barh(ax=ax, width=0.7, alpha=0.7, color=color)
        # 添加标题,设置字体大小
        ax.set_title(f'{title}', fontsize=18, fontweight='light')
        # 添加歌曲出现次数文本
        for x, y in enumerate(data.values):
            plt.text(y+0.3, x-0.12, '%s' % y, ha=f'{ha}')
    elif chart == 'hist':
        # 绘制直方图,设置柱状图颜色
        ax.hist(y, bins=30, alpha=0.7, color=(21