首先在kaggle上得到一组数据,https://www.kaggle.com/c/talkingdata-mobile-user-demographics,比赛是2016年开始,所以数据估计是2015左右的。
直接下载phone_brand_device_model.csv.zip,然后统计每个手机品牌出现的次数的比例,作为市场占有率,最后绘制柱状图:
#coding=utf-8
import pandas as pd
from pandas import Series,DataFrame
import random
import numpy as np
import time
from datetime import date
import datetime as dt
from numpy import nan as NA
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
from sklearn.metrics import auc
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
#读取数据
gdata = pd.read_csv("phone_brand_device_model.csv",header=0)
print(gdata.shape)#观察数据情况
print(gdata.head(5))
a = gdata.phone_brand.value_counts(normalize=True)
#print(a.shape)
b = a[:10]
c = b.index
print(c)
d = b.values
print(d)
import matplotlib.pyplot as plt
import matplotlib
matplotlib.rcParams['font.sans-serif'] = 'WenQuanYi Micro Hei'
name_list = c
num_list = d
plt.bar(range(len(num_list)), num_list,align="center",color='rgb',tick_label=name_list)
plt.show()
看看这个图真是有点感慨啊,当时小米还是遥遥领先的,现在如果再做个这样的图应该是有翻天覆地的变化了。
注:
默认的matplotlib不能显示中文,请自己百度解决此问题。