python项目实战：酒店需求分析（hotel demand booking）

最新推荐文章于 2024-05-13 04:33:20 发布

菜鸟努力点吧

最新推荐文章于 2024-05-13 04:33:20 发布

阅读量6.4k

点赞数 6

本文链接：https://blog.csdn.net/qq_45093647/article/details/106986855

版权

酒店预定需求分析报告

数据来源：https://www.kaggle.com/jessemostipak/hotel-booking-demand
包含了城市酒店与度假酒店的预定信息

hotel:酒店(H1=度假酒店或H2=城市酒店)
is_canceled:值，表明预订是否取消（1)或不取消(0）
lead_time:输入预订日期至抵达日期之间的天数
arrival_date_year:抵达日期
arrival_date_month:抵达日期月份
arrival_date_week_number:到达日期的年份周数
arrival_date_day_of_month:抵达日期
stays_in_weekend_nights:周末（星期六或星期天）客人入住或预定入住酒店的次数
stays_in_week_nights:每周晚上（星期一至星期五）客人入住或预定入住酒店的次数
adults:成人人数
children:儿童人数
babies:婴儿数量
meal:预订的餐型。类别以标准招待餐包提供：
- 未定义/SC-无餐包；
- BB-早餐；
- HB-早餐和其他一顿饭-通常是晚餐）；
- FB-早餐、午餐和晚餐）
country:原籍国。类别以ISO3155-3：2013格式表示
market_segment:市场细分名称。
distribution_channel:预订分销渠道。「TA」一词指「旅行社」，「TO」指「旅游经营者」
is_repeated_guest:值，指示预订名称是否来自重复的客人（1)或不(0）
previous_cancellations:客户在当前预订前取消的先前预订数
previous_bookings_not_canceled:客户在本次预订前未取消的先前预订数
reserved_room_type:房间类型的代码保留。代码是以匿名为由而不是指定的。
assigned_room_type:指定预订的房间类型代码。有时，由于酒店经营的原因，指定的房间类型与预订的房间类型不同(例如。超额预订)或客户要求。代码是以匿名为由而不是指定的。
booking_changes:从预订在PMS系统中输入之日起至入住或取消之日止，对预订所作的更改/修改的数目
deposit_type:说明客户是否存款以保证预订。这个变量可以假设三类：
- Non Deposit-无预付保证金；
- Non Refund-房价全额提前预付，取消不退款；
- Refundable-部分房价预付，取消可退款。
agent:预订的旅行社的身份证
company:进行预订的公司/实体的ID或负责支付预订。以身份证明而不是匿名为由指定
days_in_waiting_list:在客户确认预订前，预订在等待名单中的天数
customer_type:预订类型，假设四类之一：
- 合同-当预订有分配或与之相关的其他类型的合同时；
- 集团-当预订与一个集团相关联时；
- 短暂-当预订不是一个集团或合同的一部分，并且与其他短暂预订无关时；
- 短暂-当预订是短暂的，但至少与其他短暂预订有关时
ADR:每日平均收费，除以所有住宿交易之和以住宿夜总数
required_car_parking_spaces:客户要求的汽车停车位数量
total_of_special_requests:客户提出的特殊要求的数量(例如。双人床或高层)
reservation_status:保留最后状态，假设三类之一：
- 取消-预订被客户取消；
- 退房-客户已入住，但已离开；
- 不-展示-客户没有入住，并确实通知酒店为什么
reservation_status_date:设置最后状态的日期。此变量可与预订状态一起使用，以了解预订何时取消或客户何时退房。

2.数据处理

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
#可视化显示在页面 
%matplotlib inline
#中文字体
plt.rcParams['font.sans-serif']=['SimHei']
#负数正常显示
plt.rcParams['axes.unicode_minus']=False
#忽略警告
import warnings
warnings.filterwarnings('ignore')

#导入数据
df=pd.read_csv('hotel_booking_demand.csv',encoding='gbk')
df.head()

	hotel	is_canceled	lead_time	arrival_date_year	arrival_date_month	arrival_date_week_number	arrival_date_day_of_month	stays_in_weekend_nights	stays_in_week_nights	adults	...	deposit_type	agent	company	days_in_waiting_list	customer_type	adr	required_car_parking_spaces	total_of_special_requests	reservation_status	reservation_status_date
0	Resort Hotel	0	342	2015	July	27	1	0	0	2	...	No Deposit	NaN	NaN	0	Transient	0.0	0	0	Check-Out	2015-07-01
1	Resort Hotel	0	737	2015	July	27	1	0	0	2	...	No Deposit	NaN	NaN	0	Transient	0.0	0	0	Check-Out	2015-07-01
2	Resort Hotel	0	7	2015	July	27	1	0	1	1	...	No Deposit	NaN	NaN	0	Transient	75.0	0	0	Check-Out	2015-07-02
3	Resort Hotel	0	13	2015	July	27	1	0	1	1	...	No Deposit	304.0	NaN	0	Transient	75.0	0	0	Check-Out	2015-07-02
4	Resort Hotel	0	14	2015	July	27	1	0	2	2	...	No Deposit	240.0	NaN	0	Transient	98.0	0	1	Check-Out	2015-07-03

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 119390 entries, 0 to 119389
Data columns (total 32 columns):
 #   Column                          Non-Null Count   Dtype  
---  ------                          --------------   -----  
 0   hotel                           119390 non-null  object 
 1   is_canceled                     119390 non-null  int64  
 2   lead_time                       119390 non-null  int64  
 3   arrival_date_year               119390 non-null  int64  
 4   arrival_date_month              119390 non-null  object 
 5   arrival_date_week_number        119390 non-null  int64  
 6   arrival_date_day_of_month       119390 non-null  int64  
 7   stays_in_weekend_nights         119390 non-null  int64  
 8   stays_in_week_nights            119390 non-null  int64  
 9   adults                          119390 non-null  int64  
 10  children                        119386 non-null  float64
 11  babies                          119390 non-null  int64  
 12  meal                            119390 non-null  object 
 13  country                         118902 non-null  object 
 14  market_segment                  119390 non-null  object 
 15  distribution_channel            119390 non-null  object 
 16  is_repeated_guest               119390 non-null  int64  
 17  previous_cancellations          119390 non-null  int64  
 18  previous_bookings_not_canceled  119390 non-null  int64  
 19  reserved_room_type              119390 non-null  object 
 20  assigned_room_type              119390 non-null  object 
 21  booking_changes                 119390 non-null  int64  
 22  deposit_type                    119390 non-null  object 
 23  agent                           103050 non-null  float64
 24  company                         6797 non-null    float64
 25  days_in_waiting_list            119390 non-null  int64  
 26  customer_type                   119390 non-null  object 
 27  adr                             119390 non-null  float64
 28  required_car_parking_spaces     119390 non-null  int64  
 29  total_of_special_requests       119390 non-null  int64  
 30  reservation_status              119390 non-null  object 
 31  reservation_status_date         119390 non-null  object 
dtypes: float64(4), int64(16), object(12)
memory usage: 29.1+ MB

总共有119389条数据，32个观测指标，存在缺失数据。

###查找缺失数据
df.isnull().sum()[df.isnull().sum()!=0]

children         4
country        488
agent        16340
company     112593
dtype: int64

发现children,country,agent,company中存在缺失数据。

其中children很可能是因为没有儿童入住，所以可以用0填补缺失值；

country则可以用众数取代；

而agent和company缺失值过多，可以删除该观测指标。

#缺失值处理
df['children']=df['children'].fillna(0)
df['country']=df['country'].fillna(value=df.country.mode()[0])
df.drop(['agent'],axis=1,inplace=True)
df.drop(['company'],axis=1,inplace=True)

#检查
df.isnull().sum()[df.isnull().sum()!=0]

Series([], dtype: int64)

#更改数据类型(更改reservation_status_date为日期型)
df['reservation_status_date']=df['reservation_status_date'].astype('datetime64[ns]')

最低0.47元/天解锁文章

菜鸟努力点吧

关注

6
点赞
踩
84

收藏

觉得还不错? 一键收藏
4
评论
python项目实战：酒店需求分析（hotel demand booking）

酒店预定需求分析报告数据来源：https://www.kaggle.com/jessemostipak/hotel-booking-demand包含了城市酒店与度假酒店的预定信息依据酒店类型的比较酒店自身运营情况总营业额按年按月人均每晚房价比较月份房间类型市场细分入住率与取消率比较取消的影响因素当前预订前取消的先前预订的影响提前预订的天数付款方式的不同预定渠道的不同客户行为分析国籍入住时间提前预定时长入住时长餐型选择预定渠道字
复制链接

扫一扫