广州餐饮景观：1000家餐厅的数据分析和可视化呈现（Python）

UD3

于 2024-06-23 22:47:09 发布

阅读量813

点赞数 24

文章标签：数据分析 python 数据挖掘 echarts 数据可视化

本文链接：https://blog.csdn.net/weixin_45535473/article/details/139907338

版权

1. 项目简介

1.1 项目目标

本项目旨在通过对餐厅数据的分析和挖掘，为用户提供以下几个方面的信息和洞见：
1.餐饮市场分析：通过数据分析市场的供需情况、竞争态势和潜在发展趋势
2.消费者行为分析：了解消费者对不同类型餐厅的偏好、消费习惯和评论行为
3.推荐系统开发：基于用户喜好和餐厅特征构建推荐系统，提供个性化推荐服务
4.区域餐饮热度分析：分析各区县餐饮店数量、评分情况，探索区域餐饮市场发展潜力

1.2 项目描述

本项目旨在利用Python进行广州市1000家餐厅数据的处理、分析和可视化，通过数据挖掘和统计分析揭示市场趋势、消费者行为及菜系偏好，为餐饮企业和消费者提供决策支持和个性化推荐服务

2. 数据介绍

2.1 数据来源：

广州美食数据集

2.2 数据内容

这个数据集包含了广州市1000家餐厅的相关信息。每家餐厅的属性包括：

id: 店铺id，用于唯一标识每家餐厅
name: 店铺名称，即餐厅的名字
avg_price: 店铺人均消费，表示顾客在餐厅平均消费的金额
score: 店铺评分，反映顾客对餐厅服务和菜品的整体评价
comment_count: 店铺评论数量，记录了该餐厅收到的评价数量
cuis_name: 菜系归类，描述了餐厅提供的主要菜系或风味
city: 餐厅所在城市，这里是广州
county: 餐厅所在区县，即具体的行政区域
address: 餐厅详细地址，通常是街道和门牌号
dis_stus: 营业时间，描述餐厅的营业状态和营业时间段
phone: 商铺电话，提供顾客联系餐厅的电话号码
这些属性可以用来分析广州市餐饮行业的消费水平、评分趋势、主要菜系的分布情况，以及不同区域内餐厅数量和特征的差异

数据内容预览：
在这里插入图片描述

3. 需求分析

基于这个数据集，可以进行一些有趣的分析和洞察：

3.1 店铺整体情况分析：

人均消费水平：通过计算平均价格（avg_price），可以了解不同店铺的价格定位，从而比较各店铺之间的经济水平
店铺评分分布：分析店铺的评分（score）分布情况，可以了解顾客对店铺整体服务和品质的满意度
评论数量分析：统计每家店铺的评论数量（comment_count），可以推断出店铺的受欢迎程度和顾客参与度

3.2 菜系分析：

热门菜系：通过分析菜系归类（cuis_name），可以识别出哪些菜系在市场上更受欢迎，以及不同城市或区县对菜系的偏好
菜系评分比较：计算每种菜系的平均评分，了解顾客对不同菜系的整体喜好程度

3.3 地理分布分析：

城市和区县的店铺分布：分析店铺在不同城市和区县的分布情况，可以发现哪些地区的店铺密度较高，哪些地区可能是市场空白
店铺位置与经营状况：通过店铺地址（address）、区县（county）和营业时间（dis_stus）等信息，可以分析店铺的地理分布特征以及不同店铺的经营状态

3.4 用户评论分析：

评论内容分析：分析用户对店铺的评论内容，可以了解用户对服务、菜品质量、环境等方面的具体评价和反馈
评分细则分析：分析用户的评分细则（scores），可以识别出用户在具体评价方面的倾向和侧重点，例如口味、服务态度等

4. python具体实现

4.1 载入数据

将数据存储在列表（这里跳过了第一行是因为文件第一行存储的是属性名称）

# 加载 Excel 文件
workbook = load_workbook(filename="guangzhou_1000restaurants.xlsx")

# 获取第一个工作表
sheet = workbook.active

# 存储所有行数据的列表
data_list = []

# 跳过第一行标记
first_row_skipped = False

# 遍历每一行数据并存储在列表中
for row in sheet.iter_rows(values_only=True):
    if not first_row_skipped:
        first_row_skipped = True
        continue

    data_list.append(row)

4.2 制作广州美食菜系分布（饼图）

源码：

# 存储菜系的列表
    cuisine_lis = []
    # 遍历每一行,将元组的第六个数据添加到cuisine_lis中
    for row in data_list:
        row = row[5]
        cuisine_lis.append(row)

    # 使用 Counter 统计各个菜系的数量
    cuisine_count = Counter(cuisine_lis)

    # 建立两个空列表用于存储菜系名和对应的数量
    x_data = []
    y_data = []

    # 遍历 Counter 对象，将菜系名和对应数量分别存储到列表中
    for cuisine, count in cuisine_count.items():
        x_data.append(cuisine)
        y_data.append(count)

    # 将 None 值替换为 '其他'
    x_data = ['其他' if item is None else item for item in x_data]
    (
        Pie()
        .add(
            series_name="菜系",
            data_pair=[list(z) for z in zip(x_data, y_data)],
            radius=["50%", "70%"],
            label_opts=opts.LabelOpts(is_show=False, position="center"),
        )
        .set_global_opts(legend_opts=opts.LegendOpts(pos_left="legft", orient="vertical"))
        .set_series_opts(
            tooltip_opts=opts.TooltipOpts(
                trigger="item", formatter="{a} <br/>{b}: {c} ({d}%)"
            ),
            # label_opts=opts.LabelOpts(formatter="{b}: {c}")
        )
        .render("广州美食菜系分布.html")
    )

可视化呈现：在这里插入图片描述
结果分析:可以看出广州餐厅的风格种类还是很多的，其中四个菜系（粤菜，快餐简餐，小吃，面包甜点）占比了75%左右，其中粤菜占比最高（31.5%）

4.3 各区美食人均消费（柱状图）

源码：

    # 存储店铺人均消费价格和广州每个区的列表
    avg_price_list = []
    county_list = []

    # 使用字典统计每个县的总消费价格和店铺数量
    county_price_sum = {}
    county_shop_count = {}

    # 遍历数据列表，处理每一行的价格和县信息
    for row in data_list:
        price = row[2]
        county = row[7]

        # 检查价格是否为数值类型或可以转换为数值类型
        if isinstance(price, (int, float)):
            price = float(price)
        elif isinstance(price, str) and price.replace('.', '', 1).isdigit():
            price = float(price)
        else:
            print(f"警告：价格数据不是有效的数值，无法处理：price={price}, county={county}")
            continue

        if isinstance(county, str):
            # 统计每个县的总消费价格和店铺数量
            if county not in county_price_sum:
                county_price_sum[county] = 0
                county_shop_count[county] = 0

            county_price_sum[county] += price
            county_shop_count[county] += 1
        else:
            print(f"警告：县数据类型错误，county={county}")

    # 计算每个县的平均价格，并避免除零错误
    county_avg_price = {}
    for county in county_price_sum:
        if county_shop_count[county] != 0:
            avg_price = county_price_sum[county] / county_shop_count[county]
            county_avg_price[county] = avg_price
        else:
            print(f"警告：{county} 的店铺数量为零，无法计算平均价格。")

    # 打印每个县的平均价格
    for county, avg_price in county_avg_price.items():
        county_list.append(county)
        avg_price_list.append(avg_price)

    # 保留平均价格为两位小数
    avg_price_list_formatted = [round(price, 2) for price in avg_price_list]

    # 制作各区美食人均消费柱状图
    bar = Bar()

    # 准备随机颜色生成函数
    def generate_random_color():
        color = "#{:02x}{:02x}{:02x}".format(random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))
        return color

    # 创建柱状图对象
    bar = Bar()

    # 准备数据和设置颜色
    data = []
    for county, price in zip(county_list, avg_price_list_formatted):
        color = generate_random_color()
        data.append((county, price, color))

    # 添加数据到柱状图
    bar.add_xaxis([item[0] for item in data])
    bar.add_yaxis(
        "",
        [item[1] for item in data],
        label_opts=opts.LabelOpts(is_show=True, position="top"),
        itemstyle_opts=opts.ItemStyleOpts(color=[item[2] for item in data]),
    )

    # 设置全局选项和渲染
    bar.set_global_opts(
        title_opts=opts.TitleOpts(title="各区平均消费价格"),
        datazoom_opts=opts.DataZoomOpts(),
    )
    bar.render("各区美食人均消费.html")

可视化呈现：在这里插入图片描述
结果分析：广州各区的人均美食消费都高于44，可见广州的美食消费价格还是很高的，其中天河区为最贵人均97.46元断层领先

4.4 各菜系的平均评分（词云）

源码：

# 统计每种菜系的总评分和数量
    cuisine_scores = {}
    cuisine_counts = {}

    # 遍历店铺数据，排除score或cuisine为零的情况
    for shop in data_list:
        score = shop[3]
        cuisine = shop[5]

        # 如果score为None或cuisine为空字符串，则跳过当前店铺
        if score is None or score == 0 or cuisine == "":
            continue

        # 将score为None的情况跳过
        if score is not None:
            # 如果该菜系尚未在统计中，则初始化评分和计数
            if cuisine not in cuisine_scores:
                cuisine_scores[cuisine] = 0
                cuisine_counts[cuisine] = 0

            # 累加该菜系的评分和数量
            cuisine_scores[cuisine] += score
            cuisine_counts[cuisine] += 1

    # 存储每种菜系的平均评分结果的元组列表
    avg_scores = []

    # 计算每种菜系的平均评分并存储在元组中
    for cuisine in cuisine_scores:
        if cuisine_counts[cuisine] > 0:
            avg_score = cuisine_scores[cuisine] / cuisine_counts[cuisine]
            avg_scores.append((cuisine, avg_score))
        else:
            avg_scores.append((cuisine, "无数据"))

    # 制作词云
    c = (
        WordCloud()
        .add("", avg_scores, word_size_range=[20, 100], shape=SymbolType.DIAMOND)
        .set_global_opts(title_opts=opts.TitleOpts(title="WordCloud-shape-diamond"))
        .render("各菜系的平均评分.html")
    )

可视化呈现：在这里插入图片描述
结果分析：韩国料理，创意菜，湖北菜的评分最为高

4.5 各区县餐厅数量（圆环图）

源码：

# 创建一个空字典，用于统计每个区的餐厅数量
restaurant_count_by_county = {}

# 遍历 data_list，统计每个区的餐厅数量
for line in data_list:
    county = line[7]  # 获取区县信息，假设在 data_list 中的索引是第八列
    if county:
        # 使用字典的 get 方法获取该区县的餐厅数量，若不存在则初始化为 0
        restaurant_count = restaurant_count_by_county.get(county, 0)
        # 将该区县的餐厅数量加一
        restaurant_count_by_county[county] = restaurant_count + 1


# 获取字典中的所有键和对应的值
counties = list(restaurant_count_by_county.keys())
counts = list(restaurant_count_by_county.values())

c = (
    Pie()
    .add(
        "",
        [list(z) for z in zip(counties, counts)],
        radius=["40%", "75%"],
    )
    .set_global_opts(
        title_opts=opts.TitleOpts(title="各区县餐厅数量"),
        legend_opts=opts.LegendOpts(orient="vertical", pos_top="15%", pos_left="2%"),
    )
    .set_series_opts(label_opts=opts.LabelOpts(formatter="{b}: {c}"))
    .render("各区县餐厅数量.html")
)

可视化呈现：
在这里插入图片描述结果分析:在这1000家餐厅的数据集中越秀区的餐厅数量最为多有280家，最低是为从化区，只统计了5家

4.6 评论数前十餐厅（词云）

源码：

 # 构建存储（餐厅名，评论数）的列表
    restaurant_list = []
    for line in data_list:
        name = line[1]
        comment = line[4]
        restaurant_list.append((name, comment))

    # 使用 lambda 函数和 sort() 方法，根据评论数量（第二个元素）进行降序排序
    restaurant_list.sort(key=lambda x: x[1], reverse=True)

    # 排除前十名的餐厅
    top_ten_restaurants = restaurant_list[:10]

    print(top_ten_restaurants)

    # 构建词云
    (
        WordCloud()
        .add(series_name="评论数前十餐厅", data_pair=top_ten_restaurants, word_size_range=[6, 50])
        .set_global_opts(
            title_opts=opts.TitleOpts(
                title="评论数前十餐厅", title_textstyle_opts=opts.TextStyleOpts(font_size=23)
            ),
            tooltip_opts=opts.TooltipOpts(is_show=True),
        )
        .render("评论数前十餐厅.html")
    )