What's the differece between high price houses and low price houses of airbnb?


在这里插入图片描述

1 商业理解(business understanding)

Problem I want to solve:
I just split Seatte houses into two parts by the price.The high price house’s price is more than median price (119).The low price house’s price is less than median price(119).
then I want to find out that:
Question1. What’s the differece between high price houses and low price houses.
Question2. If you are a low/high house host,what should you do to improve the review score value?
Question3. Question3 If we are the house hosts,and if we want to be a superhost,what should we do while we are high price house host or low price house host?

2. 数据理解(data understanding)

2.1 Load the data

数据预览

2.2 Preview the data

The data are mainly divided into the following aspects:
Host information
host_response_time,host_response_rate,host_is_superhost,host_listings_count,host_total_listings_count
House hardware information
neighbourhood_group_cleansed,zipcode,property_type,room_type,accommodates,bathrooms,bedrooms
House other information
price,security_deposit,cleaning_fee,minimum_nights,maximum_nights,availability_365,instant_bookable,cancellation_policy
House scrore information review_scores_rating,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value

The table blow show us that:

n. The length of the col value.

type. The type of the col.

mean.std.min. The mean,std,min of the col,and of course if the col is object type,it will be null.

25%,50%,75%. The quantile of col.

value0,value1,value2,value3,value4,value5. The most five proportion value of the col.

freq0,freq 1,freq 2,freq 4,freq 4. The most five proportion value’s count of the col.

freqNull,freqOther. The Null/Other value’s count of the col.

数据预览

Discussion:
From the table above, we can see that several features have just single value or have a high miss rate or have a high proportion value,those features have little value for us to analysis,so we will process them first.

3 数据准备(data preparation)- Data clean

3.1 First process.

1Singe value process. If a feature only have one unque value,then it have no value for our analysis. And at last,we delete scrape_id,experiences_offered,and so on .

2 Null value process. If a feature only have a miss rate more than 0.85,then it have no value for our analysis. And at last,we delete thumbnail_url,xl_picture_url,and so on .

3 Big proportion process. If a feature only have one value rate more than 0.9,then it have litte value for our analysis. And at last,we delete host_has_profile_pic,street,and so on.

1Singe value process.

If a feature only have one unque value,then it have no value for our analysis. And at last,we delete scrape_id,experiences_offered,and so on

remove features:
在这里插入图片描述

2 Null value process.

If a feature only have a miss rate more than 0.85,then it have no value for our analysis. And at last,we delete thumbnail_url,xl_picture_url,and so on

remove features
在这里插入图片描述

3 Big proportion process.

If a feature only have one value rate more than 0.9,then it have litte value for our analysis. And at last,we delete host_has_profile_pic,street,and so on.

remove features:
在这里插入图片描述

data review
在这里插入图片描述

3.2 Choose variables to continue observe

预测各个价格区间段内,对用户多次订购影响最大的因素,从以下几个方面选择

After the first process step,we select features to watch in the following ways:

  1. Host information. host_response_time,host_response_rate,host_is_superhost,host_listings_count,host_total_listings_count
  2. House hardware information. neighbourhood_group_cleansed,zipcode,property_type,room_type,accommodates,bathrooms,bedrooms
  3. House other information. price,security_deposit,cleaning_fee,minimum_nights,maximum_nights,availability_365,instant_bookable,cancellation_policy
  4. House scrore information. review_scores_rating,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value

data preview:
再次预览数据

3.3 Variable transformation(针对性处理)

  1. host_response_time. The feature ‘host_response_time’ can means if a host’respone time is faster ,then we can say the host have a better sevice.so I process it to be a sequence variable.The varible is bigger ,then the sevice is better.
  2. host_response_rate. the host_response_rate should be a numerical value,so I trim the “%” from the value.
  3. **price,security_deposit,cleaning_fee.**Those two col are money value,so I trim ‘$’ from them.

3.4 Numerical variable processing(数值变量处理。)

We select num_features to process:

  1. If the miss rate is more than 0.6 then delete this variable,and add a col to indicate wheather the value is null.
  2. If the miss rate is less than 0.6,then fill the miss value with random value from the not miss value.

remove features:
在这里插入图片描述

3.5 Categorical variable processing(分类变量处理)

We process the categorical varibles in the following ways:

  1. If miss rate is more than 0.8 then delete this variable,else fill then miss value with ‘-1’.
  2. One-hot encoding.
  • 1)空值处理。如果空值占比 >0.8,删除;否则使用特殊值进行填充。
  • 2)one-hot编码。

data review:
在这里插入图片描述

3.6 reveiw again(再次遍历处理)

after we process features by the ways above all,we should process the single value ,the big proportion again.

  • 1)缺失值
  • 2)单一值处理。

data preview:
在这里插入图片描述
Because host_is_superhost_f and host_is_superhost_t are strongly correlated, so we just keep one of them.
And then we do same operate to instant_bookable_f and instant_bookable_t

4.EDA(数据探索)

I want to find out that :

  1. what’s the difference between the high price house and the low price house.
  2. If we are the host,when our houses is high/low price house,what should we do to improve the review score?
  3. If the host is a superhost,what’s difference between high/low price houses.

Question1 What’s the differece between high price houses and low price houses?

Figture explain:
In the follow figtures, I will choose the most 5 proportion value and Null Value and the Other Value ,
to check their differece between high price houses and low price houses

accommodates
The most 5 proportion accommodates of high price house is (4,2,6,3,5),while the low price house is (2,4,3,1,5).
So the high price houses have more accommodates than the low price houses.

accommodates compare

bathrooms
In general, most of the high price houses and low price houses only have one bathrooms, but on average, the high price houses have more bathrooms than low price houses

bathrooms compare

bedrooms
Be similar like bathrooms. The most houses wheather high price houses or low price houses have only one bedrooms,but on average, the high price houses have more bethrooms than low price houses

bedrooms compare

beds
In general, most of the high price houses and low price houses only have one beds , but on average, the high price houses have more beds than low price houses

beds compare

security deposit
In general, the security deposit of high price houses is much more than low pirce houses

security compare
cleaning fee
In general, the cleaning fee of high price houses is much more than low pirce houses

cleaning fee compare
minimum_nights
In general, the minimum nights of high price houses is a little bit more than low pirce houeses

minimum nights compare

review scores cleanlines
In general, the review scores cleanlines of high price houses is a little bit less than low pirce houeses

在这里插入图片描述

host_is_superhost_t
In general, the low pirce houses have more superhosts than high price houses.

host superhost compare

cancellation_policy_flexible
In general, the low pirce houses have more cancellation policy flexible houses than high price houses.

cancellation_policy_flexible

review_scores_value
In general, the review scores value of high price houses is a little bit more than low pirce houeses,but not very much.

review_scores_value

Question1 What’s the differece between high price houses and low price houses?
Conclusion

  1. Household Appliances. The high price houses provide more facility than low price houses,like accommodates,bedrooms,bathrooms and beds.
  2. House sevice. The low price houses performance better than the high price houses,for example,low price houses needs less cleaning fee than
    high price houses,and more proportion of low price houses’ hosts are superhost.
  3. review score value. The price dosen’t influence review scores value very much.

5. Build Module(建立模型)

Question2 If you are a low/high house host,what should you do to improve the review score value?

The features importance in low price house reiview score value
在这里插入图片描述

The features importance in high price house reiview score value
在这里插入图片描述

Question2 If you are a low/high house host,what should you do to improve the review score value?
Conclusion

  1. From the pictures above,we can see both high price houses’ users and low price houses’ users care about
    review_scores_cleanliness,review_scores_cleanliness,cleaning_fee,security_deposit,maximum_nights,minimum_nights.accommodates.
  2. If you are a low price houses’s host,you should try to be a superhost at first,and then maybe you should not make your the houses cancellation policy to be a strict grace period.
  3. If you are a high price houses’ host , more care about beds,and bedrooms,and wheather the house is a Apartment.

Question3 What features influence host to a superhost while the house is a high or low price house?

The features importance in low price house在这里插入图片描述
The features importance in high price house

在这里插入图片描述

Question3 If we are the house hosts,If we want to be a superhost,what should we do while we are high price house host or low price house host?
conclusion

From the figtures above,we can see that both of low/high price house’s hosts are been influenced by cleaning_fee,maximum_nights,review_scores_value,secutity_deposit,review_scores_cleanliness,host_reponse_rate_flag,host_reponse_time_flag.
So if we want to be superhost,there have not much different between low price houses and high price houses

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值