ECON235 Report PART 1Java

Java Python ECON235

Report Assignment PART 1 (40 marks)

Due date: 5 pm, August 30, 2024. Upload your answers and R script. in the Assessment 2 section on Moodle.

Instructions:  This is an individual assignment. Use the starter R script. form  Moodle to create the dataset of hotel prices for your assigned city (London, Paris or Rome). The answer the questions below and submit your answers along with the full R script. Include your name, student number and tutorial group.

Style: We will not follow any particular style. for this part. Just type your answers, include required tables and graphs, save it as a PDF file and  upload to Moodle along with your R script.

1.1 (2 marks) Make sure that'hotelbookingdata.csv'file is in your Working Directory and then run the starter R script. to create a dataset for your    assigned city. How many observations does it have?

1.2 (10 marks) Let’s have a look at the following variables:

Variable

Typical value

Type

accommodationtype

_ACCOM_TYPE@Hotel

character string

guestreviewsrating

3.7 /5

character string

distance1

9.0 miles

character string

distance2

7.5 miles

character string

All of them are character strings which contain superfluous characters that we don’t need. Clean these variables so that they look as follows (pay attention to the variable type). Which commands will you use to do this?

Variable

Typical value

Type

acc_type

Hotel, etc

character string

rating

3.7

double

distance1

9.0

double

distance2

7.5

double

1.3 (4 marks) Tabulate these four variables to check if there are any missing (NA) values. Restrict the dataset only to observations with non- missing values. How many observations did you have to drop (if any)?

1.4  (2 marks) Now restrict your dataset only to observations with account type “Hotel” and star rating of 3 or 4 stars (use the star variable). How many observations you ended up with?

1.5 (2 marks)  Produce a summary table showing the standard summary statistics for all variables in your dataset (include the table in your answer)

1.6 (3 marks) Now produce a more detailed table for the two key variables price and distance1. It should include the following statistics: Mean, Median, SD, Min, Max,P25, P75. (include the table in your answer). How do these statistics in your city compare with those in Vienna?

1.7 (4 marks) Produce the histograms showing the distribution of the two variables price and distance1 in your sample. Do you see any extreme values in these two variables in your city? Argue if you want to keep or discard these extreme values (if there are any) and modify your dataset accordingly. (include the graphs in your answer)

1.8 (3 marks) Reproduce the histogram for the two variables with your new dataset, but now add the kernel density graph to the histogram.

Challenge question (optional): Can you produce a graph of kernel density plots comparing the distribution of hotel prices in your city and Vienna (see Lecture 3 for the London-Vienna comparison). We have not done this in tutorials, so you will need to figure out how to do it.

1.9 (10 marks) Now let’s make some basic conditional mean comparisons. Split your sample into two parts based on the median distance from the city centre (median of distance1 in your dataset):

Near = all the hotels with the distance from the city below or equal to the median distance

Far = all the hotels with the distance from the city larger than the median distance

Now compute the mean hotel price in the Near and Far subsamples. What did you find? Does your finding make sense to you? How would you make this comparison more meaningful (you don’t need to implement it, just discuss what you need to do)         

  • 18
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值