css餐厅_餐厅的评分预测

最新推荐文章于 2023-07-13 21:38:09 发布

weixin_26752765

最新推荐文章于 2023-07-13 21:38:09 发布

阅读量1.1k

点赞数

文章标签： python

原文链接：https://medium.com/@kishanraj_16649/rating-prediction-of-restaurants-b51afd857e30

版权

该博客探讨了如何使用Python进行餐厅评分预测，基于特定的数据集和机器学习算法，为餐厅评级建立预测模型。

摘要由CSDN通过智能技术生成

css餐厅

描述 (Description)

Restaurants from all over the world can be found here in Bengaluru. From United States to Japan, Russia to Antarctica, you get all type of cuisines here. Delivery, Dine-out, Pubs, Bars, Drinks,Buffet, Desserts you name it and Bengaluru has it. The number of restaurants are increasing day by day. Currently which stands at approximately 12,000 restaurants. With such a high number of restaurants. This industry hasn’t been saturated yet. And new restaurants are opening every day. However it has become difficult for them to compete with already established restaurants. The key issues that continue to pose a challenge to them include high real estate costs, rising food costs, shortage of quality manpower, fragmented supply chain and over-licensing. This Zomato data aims at analyzing demography of the location. Most importantly it will help new restaurants in deciding their theme, menus, cuisine, cost etc for a particular location. It also aims at finding similarity between neighborhoods of Bengaluru on the basis of food. The dataset also contains reviews for each of the restaurant which will help in finding overall rating for the place.

班加罗尔(Bengaluru)遍布世界各地的餐厅。从美国到日本，从俄罗斯到南极洲，您可以在这里找到所有类型的美食。送货，外出就餐，酒吧，酒吧，饮料，自助餐，甜品，您自己命名，班加罗尔拥有。饭店的数量每天都在增加。目前拥有约12,000家餐厅。拥有如此众多的餐厅。这个行业还没有饱和。新餐厅每天都在营业。然而，与已建立的餐馆竞争已经变得困难。继续对他们构成挑战的关键问题包括高昂的房地产成本，不断上涨的食品成本，缺乏优质的人力，供应链分散和许可过度。该Zomato数据旨在分析该位置的人口统计学。最重要的是，它将帮助新餐厅确定特定位置的主题，菜单，美食，成本等。它还旨在根据食物发现班加罗尔居民区之间的相似性。数据集还包含每个餐厅的评论，这将有助于查找该地点的总体评分。

The basic idea of analyzing the Zomato dataset is to get a fair idea about the factors affecting the establishment of different types of restaurant at different places in Bengaluru, aggregate rating of each restaurant, Bengaluru being one such city has more than 12,000 restaurants with restaurants serving dishes from all over the world. With each day new restaurants opening the industry hasn’t been saturated yet and the demand is increasing day by day. Inspite of increasing demand it however has become difficult for new restaurants to compete with established restaurants. Most of them serving the same food. Bengaluru being an IT capital of India. Most of the people here are dependent mainly on the restaurant food. as they don’t have time to cook for themselves. With such an overwhelming demand of restaurants it has therefore become important to study the demography of a location. What kind of a food is more popular in a locality. Does the entire locality loves vegetarian food. If yes then is that locality populated by a particular sect of people for eg. Jain, Marwaris, Gujaratis who are mostly vegetarian.

分析Zomato数据集的基本思想是，对影响班加罗尔不同地区不同类型餐厅的建立因素，每个餐厅的综合评分产生一个公平的想法，班加罗尔是一个这样的城市，拥有超过12,000家餐厅，并提供餐厅服务来自世界各地的菜肴。每天都有新餐厅开业，该行业尚未饱和，需求日益增加。尽管需求增加，但是新餐厅很难与老店竞争。他们大多数人都提供相同的食物。班加罗尔是印度的IT之都。这里的大多数人主要依靠餐厅的食物。因为他们没有时间自己做饭。由于餐馆的需求如此巨大，因此研究场所的人口统计学变得很重要。哪种食物在当地更受欢迎。整个地方都喜欢素食吗？如果是，那么该地点是否由某个特定人群组成，例如。 in那教，马瓦里斯和古吉拉特人大多是素食主义者。

Objective: Design a machine learning model to predict the rating of the restaurants which accepts the order from zomato.

目标：设计一个机器学习模型来预测接受zomato订单的餐馆的评级。

Prerequisites: This post assumes familiarity with machine learning basic concepts like Linear Regression, Decision Trees, Random Forest, Gradient Boosted Decision Trees, One vs Rest classifiers, Multicollinearity, Model based imputations, CNN, CNN-LSTM, hyperparamter tuning, mean squared error.

先决条件：这篇文章假设您熟悉机器学习的基本概念，例如线性回归，决策树，随机森林，梯度提升决策树，One vs Rest分类器，多重共线性，基于模型的估算，CNN，CNN-LSTM，超参数调整，均方误差。

指数： (INDEX:)

Reading Data- Reading the csv file and storing into a dataframe
读取数据-读取csv文件并将其存储到数据帧中
Missing Value imputation-Using model based, mean based and frequency based imputations replace NULL values.
缺失值插补-使用基于模型，基于均值和基于频率的插补替换NULL值。
Exploratory Data Analysis- Graph plots like pieplot, counterplot and barplot
探索性数据分析-图形图，例如饼图，反图和条形图
Data Preprocessing- Removing stopwords and unnecessary characters from the the text data
数据预处理-从文本数据中删除停用词和不必要的字符
Vectorization- Used countervectorizer, tfidfvectorizer and normlizer to vectorize the data
向量化-使用反向量化器，tfidfvectorizer和normlizer来向量化数据
Building models- Building different machine learning and deep learning models.
构建模型-构建不同的机器学习和深度学习模型。

Dataset Overview: Each row contains a click record, with the following features.

数据集概述：每行包含一个具有以下功能的点击记录。

-> url: contains the url of the restaurant in the zomato website

->网址：包含zomato网站中餐厅的网址

-> address: contains the address of the restaurant in Bengaluru

->地址：包含班加罗尔餐厅的地址

-> name: contains the name of the restaurant

->名称：包含餐厅名称

-> online_order: whether online ordering is available in the restaurant or not

-> online_order：餐厅是否提供在线订购

-> book_table: table book option available or not

-> book_table：桌面书选项是否可用

-> votes: contains total number of rating for the restaurant as of the above mentioned date

->票数：包含截至上述日期的餐厅的评分总数

-> phone: contains the phone number of the restaurant

->电话：包含餐厅的电话号码

-> location: contains the neighborhood in which the restaurant is located

->位置：包含餐厅所在的社区

-> rest_type: restaurant type like Quick Bytes, Casual Bytes.

-> rest_type：餐厅类型，例如“快速字节”，“休闲字节”。

-> dish_liked: dishes people liked in the restaurant

-> dish_liked：人们在餐厅喜欢的菜肴

-> cuisines: food styles, separated by comma

->美食：用逗号分隔的美食风格

-> approx_cost(for two people): contains the approximate cost for meal for two people

->大约费用(两个人)：包含两个人的大概用餐费用

-> reviews_list: list of tuples containing reviews for the restaurant, each tuple consists of two values, rating and review by the customer

-> reviews_list：包含餐厅评论的元组列表，每个元组包含两个值，即顾客的评价和评论

-> menu_item: contains list of menus available in the restaurant

-> menu_item：包含餐厅中可用菜单的列表

-> listed_in(type): type of meal

->列名(类型)：用餐类型

-> listed_in(city): contains the neighborhood in which the restaurant is listed

->列表中的(城市)：包含列出餐厅的社区

Real-world/Business objectives and constraints:

现实世界/业务目标和约束条件：

->No strict latency requirement.

->没有严格的延迟要求。

-> Interpretability is not important.

->可解释性并不重要。

Performance Metrics: Hence this is a regression problem so our performance metrics is Mean Squared Error. We will try to reduce the MSE Value as much as possible.

绩效指标：因此这是一个回归问题，因此我们的绩效指标为均方误差。我们将尝试尽可能降低MSE值。

读取资料 (Reading Data)

(51717 , 17)

This dataset has 51717 rows and 17 columns.

该数据集具有51717行和17列。

检查每个功能的NULL值百分比 (Checking for percentage of NULL values for each features)

填写缺失值 (Filling the Missing values)

We are using 3 different approaches to fill the missing values ie. model based imputation, mean based and frequency based imputation

我们正在使用3种不同的方法来填充缺失值，即。基于模型的归因，基于均值和基于频率的归因

i. Model Based Imputation method : In order to fill the missing values of the columns “rate” and “dish_liked”, we are using model based imputation

一世。基于模型的插补方法：为了填充“ rate”和“ dish_liked”列的缺失值，我们使用基于模型的插补

Initially we’ve divided the original dataframe into 2 different dataframes. First dataframe containing no null values and second dataframe containing only null values. we’ve build the model using the first dataframe and find the missing values of second dataframe.

最初，我们将原始数据帧分为2个不同的数据帧。第一个数据帧不包含空值，第二个数据帧仅包含空值。我们使用第一个数据框构建模型，然后查找第二个数据框的缺失值。

Here is the model to predict the missing values of “dish_liked” column

这是预测“ dish_liked”列缺失值的模型

array([‘Murgh Ghee Roast, Egg Fried Rice, Thali, Mutton Biryani, Naan, Andhra Meal’, ‘Pizza, Mocktails, Coffee, Nachos, Salad, Pasta, Sandwiches’, ‘Pizza, Potato Wedges, Country Feast, Pasta, Garlic Bread, Lemonade’, …, ‘Ferrero Rocher Cake, Chocolate Cake’, ‘Ferrero Rocher Cake, Chocolate Cake’, ‘Ferrero Rocher Cake, Chocolate Cake’], dtype=’<U134')

数组(['Murgh酥油烤，鸡蛋炒饭，塔利，羊肉Biryani，Naan，安得拉餐”，“比萨饼，鸡尾酒，咖啡，玉米片，沙拉，面食，三明治”，“比萨饼，土豆楔子，乡村盛宴，面食，蒜味面包，柠檬水”，…，“费雷罗罗切蛋糕，巧克力蛋糕”，“费雷罗罗切蛋糕，巧克力蛋糕”，“费雷罗罗切蛋糕，巧克力蛋糕”]，dtype ='<U134')

Here is the model to predict the missing values of “rate” column

这是预测“比率”列缺失值的模型

array([3.47331694, 3.48851577, 3.44792981, …, 3.58956974, 3.58956974, 3.58956974])

数组([3.47331694、3.48851577、3.44792981，...，3.58956974、3.58956974、3.58956974])

ii. Mean based imputation to find missing Values

ii。基于均值的估算以找到缺失值

iii. Frequency based approach to find missing values

iii。基于频率的方法来查找缺失值

We have checked the most frequently occouring values for these columns and replaced the missing values of columns with the most frequent occouring value.

我们检查了这些列的最频繁出现值，并用最频繁出现值替换了列的缺失值。

探索性数据分析 (Exploratory Data Analysis)

一世。餐厅位置分析 (i. Analysis on Location of restaurant)

Conclusion- There is a variation in restaurants as per the locations. BTM has the highest number of the restaurants in Bangalore that 3108 restaurants. New BEL Road contains the least number of restaurants followed by banashankari. Btm has 17.24% of the total restaurants in bangalore

结论 -餐馆因地点而异。 BTM在班加罗尔的餐厅数量最多，为3108家。新的BEL路的餐厅数量最少，其次是banashankari。 Btm在班加罗尔的餐厅总数中占17.24％

ii。网上订单分析 (ii. Analysis on online_order)

Conclusion- Number of restaurants that allows online order are more than those restaurants who don’t allows online order. There are 29342 restaurants in bangalore which are accepting the online orders and 20098 restaurants which don’t accepts the online order. There are 59.65% of restaurants that allows online ordering

结论 -允许在线下单的餐厅数量要多于不允许在线下单的餐厅。班加罗尔有29342家餐厅接受在线订单，而20098家餐厅不接受在线订单。有59.65％的餐厅允许在线订购

iii。评级分析 (iii. Analysis on ratings)

Conclusion- Majority of restaurants has ratings between 3.6 to 3.9. 15% of the restaurants have an approx rating of 3.7 . Minimum rating for the restaurants is 1.8 . There is not even a single restaurant in bangalore where rating is equal to 5.

结论-大多数餐厅的评分在3.6到3.9之间。 15％的餐厅的评分大约为3.7。餐厅的最低评分是1.8。班加罗尔甚至没有一家餐厅的评分等于5。

iv。分析各餐厅的店铺数 (iv. Analysis on number of stores for each restaurants)

Conclusion- There is a variation in the number of stores in bangalore. CCD has maximum number of stores in bangalore followed by onesta and just bake. There are various restaurants that are having only 1 stores such as SV Juice Corner Tiffin, Brown box etc. The total no. of stores of CCD composed of 9.26 % of the entire stores present in bangalore

结论-班加罗尔的商店数量有所不同。 CCD在班加罗尔拥有最多的商店，其次是onesta，然后烘烤。 SV Juice Corner Tiffin，Brown box等各种各样的餐馆只有1家门店。 CCD商店的数量占班加罗尔全部商店的9.26％

v。餐厅分析允许预订餐桌 (v. Analysis on Restaurants allows booking of tables)

Conclusion- There are 43120 restaurants that are accepting the booking of table and 6320 restaurants that are not accepting the booking of table. Majority of restaurants may be street food type restaurant as it is not allowing booking of table. 87.22% of the restaurants are not allowing the booking of tables

结论 -有43120家餐厅接受餐桌预订，而6320家餐厅不接受餐桌预订。大多数餐厅可能是街头食品类型的餐厅，因为它不允许预订餐桌。 87.22％的餐厅不允许预订餐桌

vi。大多数餐厅出售的美食类型 (vi. Types of cuisines sold by most of the restaurants)

Conclusion — North indian and chinese are the two most sold cuisines in bangalore. Number of restaurants where north indian cuisine is available is close to 20,000 and number of restaurants where chinese food is available is close to 14,000.

结论 —印度北部和中国是班加罗尔最畅销的两种美食。提供北印度美食的餐厅数量接近20,000，提供中餐的餐厅数量接近14,000。

七。班加罗尔人民喜欢的物品 (vii. Items liked by peoples in Bangalore)

Conclusion- Biryani is the most liked dish by the peoples of bangalore. There are around 12000 restaurants where biryani is one of the most famous recipe. Chicken is the second most famous dish liked in bangalore

结论 -Biryani是班加罗尔人民最喜欢的菜。大约有12000家餐厅，其中Biryani是最著名的食谱之一。鸡肉是班加罗尔第二受欢迎的美食

八。餐饮成本分析 (viii. Analysis on cost of dining)

Conclusion- Majority of restaurants in bangalore has average cost for 2 person is 561. The minimum cost for the dining is 40 and maximum cost is 6000. It concludes that there are all sorts of food at different prices are available in bangalore

结论 -班加罗尔的大多数餐馆的2人平均消费为561。最低用餐成本为40，最高消费为6000。得出的结论是，班加罗尔提供了各种价格不同的食物

ix。投票分析 (ix. Analysis on votes)

Conclusion The restaurants in Bangalore has an average vote of 296.76 . Minimum vote for the restaurant is 0 and the maximum votes are 16832. Very few restaurants in bangalore has no. of votes greater than 1700

结论班加罗尔的餐馆平均投票率为296.76。该餐厅的最低投票为0，最高投票为16832。班加罗尔极少数餐厅没有。大于1700的选票

X。餐馆评分与online_order评分 (x. Rating of restaurants vs online_order)

Conclusion — Only for those restaurants whose rating is 3.7, the number of restaurants accepting online order is more than the restaurants who don’t accepts the online order. For all the restaurants (whose rating is other than 3.7), there are more no. of restaurants that accepts online order rather than the restaurants who don’t accepts the online order.

结论 —仅对于那些评级为3.7的餐厅，接受在线订单的餐厅数量要多于不接受在线订单的餐厅。对于所有餐厅(评级不为3.7)，没有更多。接受在线订单的餐厅，而不是不接受在线订单的餐厅。

。餐厅类型 (xi. Type of restaurant)

Conclusion — Around 50% of the restaurants in bangalore belongs to the delivery type of restaurants. The least type of restaurants in bangalore belongs to pubs and bars, buffet, drinks and nightlife. Also there are lot of restaurants (34%) which allows dine-out service. In total there are 24728 restaurants that belongs to delivery type. The number of Pubs and bar is 669 which the minimum among all the types of restaurants

结论 —班加罗尔大约50％的餐馆属于餐馆的交付类型。班加罗尔最少的餐馆类型是酒吧，自助餐，饮料和夜生活。也有很多餐厅(34％)提供外出就餐服务。总共有24728家属于交付类型的餐厅。酒吧和酒吧的数量是669，在所有类型的餐厅中最少

十二。配对图 (xii. Pairplots)

Conclusion from this pairplot

该对图的结论

In the plot of votes vs rate, most of restaurants having higher no. of votes has better ratings also
在票数与比率的关系图中，大多数餐馆的门槛都较高。的投票也有更好的评分
In the plot of approx_cost vs rate, the restaurant whose rating is high has more price.
在roximate_cost vs rate图中，评分较高的餐厅的价格更高。
In the graph of rate vs cost, rate vs votes, the data points are linearly separable
在费率与成本，费率与投票的关系图中，数据点是线性可分离的

EDA摘要 (EDA Summary)

BTM alone has 3108 restaurants which is the highest number of Restaurants in Bangalore as compared to any other location. BEL has the least Number of restaurants ie. 725. Number of restaurants in BTM comprise of 17% of total restaurants.
仅BTM就有3108家餐厅，这是班加罗尔餐厅数量最多的餐厅。 BEL的饭店数量最少 725. BTM的餐厅数量占餐厅总数的17％。
The number of restaurants that takes online order is more than those which don’t accepts online order. There are more 29342 restaurants that are accepting online orders and there are 20098 restaurants that are not accepting online order
接受在线订购的餐厅数量要多于不接受在线订购的餐厅数量。有更多29342家餐厅接受在线订单，还有20098家餐厅不接受在线订单
There is a variation in ratings of restaurants between 1.8 to 4.9. The average rating of restaurants is 3.7.
饭店的评分在1.8到4.9之间变化。餐馆的平均评分是3.7。
CCD has 93 stores in bangalore which the highest number of stores for any restaurant in bangalore followed by onesta having 85 restaurants.
CCD在班加罗尔拥有93家商店，这是班加罗尔所有餐厅中商店数量最多的，其次是onesta拥有85家餐厅。
There are 43120 restaurants that are accepting the booking of table and 6320 restaurants that are not accepting the booking of table. Majority of restaurants may be street food type restaurant as it is not allowing booking of table
有43120家餐厅接受餐桌预订，而6320家餐厅不接受餐桌预订。大多数餐厅可能是街头食品类型的餐厅，因为它不允许预订餐桌
North Indian, Chinese and South indian are the top 3 cuisines available in the most of restaurants.
大多数餐厅都提供北印度，中国和南印度三大美食。
Chicken is the most liked dish by the peoples of bangalore followed by Biryani and rice.
鸡肉是班加罗尔人民最喜欢的菜，其次是比里亚尼和米饭。
The average cost of restaurants for the dining is 561. Minimum cost is 40 and max cost is 4000. Overall, 87.22% of the restaurants are not allowing the booking of tables
用餐的餐厅的平均费用为561。最低费用为40，最高费用为4000。总体而言，有87.22％的餐厅不允许预订餐桌
Only for those restaurants whose rating is 3.7, the number of restaurants accepting online order is more than the restaurants who don’t accepts the online order. For all the other restaurants (whose rating is other than 3.7), there are more no. of restaurants that accepts online order rather than the restaurants who don’t accepts the online order.
仅对于那些评级为3.7的餐厅，接受在线订单的餐厅数量要多于不接受在线订单的餐厅。对于其他所有餐厅(评级不为3.7)，没有更多。接受在线订单的餐厅，而不是不接受在线订单的餐厅。
Around 50% of the restaurants in bangalore belongs to the delivery type of restaurants. The least type of restaurants in bangalore belongs to pubs and bars, buffet, drinks and nightlife. Also there are lot of restaurants (34%) which allows dine-out service. In total there are 24728 restaurants that belongs to delivery type. The number of Pubs and bar is 669 which the minimum among all the types of restaurants
班加罗尔大约50％的餐厅属于餐厅的外卖类型。班加罗尔最少的餐馆类型是酒吧，自助餐，饮料和夜生活。也有很多餐厅(34％)提供外出就餐服务。总共有24728家属于交付类型的餐厅。酒吧和酒吧的数量是669，在所有类型的餐厅中最少
The maximum no. restaurants that allows table booking has an average rating of 4.2 . The maximum number of restaurants, which don’t allows table booking has an average rating of 3.7 . Irrespective of ratings, the number of restaurants that allows booking of tables are less than the restaurants which don;t allows that.
最大编号允许订餐的餐馆的平均评分为4.2分。不允许进行餐桌预订的餐厅数量最多，平均评分为3.7。与等级无关，允许预订餐桌的餐厅数量少于不允许的餐厅数量。

检查多重共线性 (Checking for multicollinearity)

Defining a function to check multicollinearity using vif method

定义使用vif方法检查多重共线性的函数

Using label encoding as shown below

使用标签编码，如下所示

Conclusion — Hence by analyzing the vif values, we can conclude that there is no multicollinearity between any independent variables because the vif values are very small for each of the independent variables.

结论 —因此，通过分析vif值，我们可以得出结论，因为每个自变量的vif值都非常小，因此任何自变量之间都没有多重共线性。

特征工程 (Feature Engineering)

Total No. of cuisines available in each of the restaurant
每间餐厅提供的美食总数

2. Total number of dishes liked by the customers. It may be directly proportional to the rating

2.顾客喜欢的菜肴总数。它可能与等级成正比

3. Facilities offered by restaurants : there are 2 major facilities that a restaurant can provide is online order and booking tables. so, here we are summing both of them to find the overall quality of service by the restaurant.

3.餐馆提供的设施：餐馆可以提供的两种主要设施是在线订购和预订表。因此，在这里我们对两者进行汇总，以找到餐厅的整体服务质量。

4. This function is used to convert categorical features into response coded features. It simply perform MEAN VALUE REPLACEMENT.

4.此功能用于将分类特征转换为响应编码的特征。它只是执行MEAN VALUE REPLACEMENT。

功能工程摘要 (Feature Engineering Summary)

Mean value replacement for dish_liked — Here, first we have done response coding followed by mean value replacement for dish_liked column. We found its value is almost similar to the rate column
“ disish_liked”的均值替换—在这里，首先，我们完成了响应编码，然后“ discount_liked”列的均值替换。我们发现它的值几乎与“汇率”列相似
Mean value replacement for cuisines — Here also, first we have done response coding followed by mean value replacement for cuisines column.
菜式的均值替换—在这里，首先，我们完成了响应编码，然后是菜式的均值替换列。
Number of cuisines available- This column contains the total number of cuisines available in each restaurants
提供的美食数量-此列包含各餐厅提供的美食总数
Number of dish_liked — This column contains the total number of dishes liked by the customers in each restaurants.
dish_liked的数量—此列包含每个餐厅的顾客喜欢的菜肴总数。
Facilities offered — If the restaurant is allowing both online_order and booking_table, then we have given the facilities offered values as 2. If restaurant is allowing either of the them, then we’ve given the values as 1. If the restaurant is not allowing any of the facilities, then we’ve given the value as 0.
提供的设施-如果餐厅同时允许online_order和booking_table，则我们给设施提供的值是2。如果餐厅允许这两个设施中的任何一个，那么我们给的值就是1。如果餐厅不允许任何值的设施，那么我们将值设为0。

特征的预处理 (Preprocessing of Features)

We are removing the stopwords and other special characters that are not essential from the column of preprocessed_reviews. Finally we are replacing the original column of review with the preprocessed_review column.

我们将从preprocessed_reviews列中删除不需要的停用词和其他特殊字符。最后，我们用preprocessed_review列替换了原始的评论列。

向量化 (Vectorization)

Here we are using countvectorizer for categorical features, tfidf for text features and normalizer for numerical features.

在这里，我们将countvectorizer用于分类特征，将tfidf用于文本特征，将normalizer用于数字特征。

Countvectorizer for categorical feature :

Countvectorizer用于分类功能：

[‘no’, ‘yes’] Shape of training dataset one hot encoding & corresponding class label (23215, 2) (23215,) Shape of cv dataset one hot encoding & corresponding class label (11435, 2) (11435,) Shape of test dataset one hot encoding & corresponding class label (17067, 2) (17067,)

['no'，'yes']训练数据集的形状一个热编码和相应的类别标签(23215，2)(23215，)cv数据集的形状一个热编码和相应的类别标签(11435，2)(11435，)形状测试数据集的一种热编码和相应的类标签(17067，2)(17067，)

Normalizer for numerical feature :

数值特征的归一化器：

Tfidf for text features :

Tfidf的文字功能：

随机森林算法的超参数调整 (Hyperparamter tuning for Random forest algorithm)

Here we are trying to find the best value of n_estimators and max_depth which provides the minimum mse value for the regression model

在这里，我们试图找到n_estimators和max_depth的最佳值，从而为回归模型提供最小的mse值

RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion=’mse’, max_depth=None, max_features=’auto’, max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=None, oob_score=False, random_state=None, verbose=0, warm_start=False)

RandomForestRegressor(bootstrap = True，ccp_alpha = 0.0，条件='mse'，max_depth = None，max_features ='auto'，max_leaf_nodes = None，max_samples = None，min_impurity_decrease = 0.0，min_impurity_split = None，min_samples_leaf = 1， min_weight_fraction_leaf = 0.0，n_estimators = 100，n_jobs = None，oob_score = False，random_state = None，verbose = 0，warm_start = False)

应用具有最佳超参数的随机森林模型 (Applying Random forest model with best hyperparameters)

0.027927709527412244

0.027927709527412244

深度学习模型： (Deep learning models:)

Now, we’ve used few deep learning models to predict the accuracy of the model. we’ve used lstm, lstm-cnn and cnn with conv1d. Although in this problem, the machine learning model are performing better as compared to deep learning models.

现在，我们使用了很少的深度学习模型来预测模型的准确性。我们在conv1d中使用了lstm，lstm-cnn和cnn。尽管存在此问题，但与深度学习模型相比，机器学习模型的性能更好。