【用pyecharts做地理图】

最新推荐文章于 2024-08-06 10:36:37 发布

wx1871428

最新推荐文章于 2024-08-06 10:36:37 发布

阅读量399

点赞数

分类专栏： Python

本文链接：https://blog.csdn.net/wx1871428/article/details/118415518

版权

本文通过pyecharts在天池大数据平台的项目中，对airbnb的listings表进行数据处理和探索性分析，特别是利用经纬度信息制作了地理价格图。文章详细介绍了数据清洗过程，如处理空值、异常值，并展示了不同区域和房屋类型的价格分布。

摘要由CSDN通过智能技术生成

项目介绍

项目来源：天池大数据平台
项目思路：针对airbnb中listings表做数据处理，探索分析以及针对经纬度以及价格做地理价格图（pyecharts）
python：3.7.1
pyecharts：1.2.0

天池平台的这个比赛比较常见，本文给出了地理可视化的新思路
(想看图的直接拉到3/4就可）

模块导入

分析思路
![在这里插入图片描述](https://img-blog.csdnimg.cn/20200411221638743.png?x-oss-
process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MjIyNTEyMg==,size_16,color_FFFFFF,t_70)

    #数据处理包导入
    import pandas as pd
    import numpy as np
    from scipy import stats
    
    #画图包导入
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    #日期处理包导入
    import calendar
    from datetime import datetime 
    
    #jupyter notebook绘图设置
    %matplotlib inline
    
    #中文字体正确显示
    plt.rcParams['font.sans-serif']=['SimHei']
    plt.rcParams['axes.unicode_minus']=False
    
    #警告删除
    import warnings
    warnings.simplefilter(action="ignore", category=FutureWarning)
    warnings.filterwarnings('ignore')
    
    #多行输出
    from IPython.core.interactiveshell import InteractiveShell
    InteractiveShell.ast_node_interactivity = "all"
    
    # 显示正负号与中文不显示问题
    plt.rcParams['axes.unicode_minus'] = False
    sns.set_style('darkgrid', {'font.sans-serif':['SimHei', 'Arial']})

数据处理

导入数据

    listings = pd.read_csv('listings.csv',parse_dates=['last_review'],dtype={'id':str,'host_id':str})# keep_default_na=False将空值设置为不显示方便之后处理

处理

    listings.info()

![listings.info()](https://img-blog.csdnimg.cn/2020040723453254.png?x-oss-
process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MjIyNTEyMg==,size_16,color_FFFFFF,t_70)

观察发现listings表有几个问题：
neighbourhood_group列存在很多空值，查看统计信息
neighbourhood列有中文有英文，仅保留neighbourhood列中文部分
查看经纬度，价格，最小入住天数，365天，房东拥有房屋数中是否有异常值
查看房屋类型有多少种
查看评论数前10的id
查看每月评论数前十的id
name,last_review和reviews_per_month中都存在空值，不过影响不大，name是因为没有命名标准，last_review为空值有可能说明没有评论过，没有last_review也就没有reviews_per_month

    listings.head()

![listings.head()](https://img-blog.csdnimg.cn/20200407234754670.png?x-oss-
process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MjIyNTEyMg==,size_16,color_FFFFFF,t_70)
name虽然杂乱有中英文有英文但是不重要不需要处理，host_name中有错乱的数据比如East
Apartments也不重要也不需要怎么处理，neighbourhood_group全为空值可以删除这列，neighbourhood可以截取为全中文列

    listings.drop(['neighbourhood_group'],axis=1,inplace=True)
    listings['neighbourhood'] = listings['neighbourhood'].str.split('/',expand=True)[0]
    listings.sample(3)

![listings.sample(3)](https://img-
blog.csdnimg.cn/20200407234936608.png?x-oss-
process=image/watermark,t