航司满意度预测项目

注意:本文引用自专业人工智能社区Venus AI

更多AI知识请参考原站 ([www.aideeplearning.cn])

问题陈述:

航空公司乘客满意度
有很多因素会影响企业的生存能力,从竞争力到声誉和客户满意度。
本研究的目的是确定乘客的满意度水平,了解航空公司提供的服务质量、获得客户满意度的关键因素,并确定航空业如何提高服务质量。

项目使用模型与依赖库:

  • 随机森林分类器
  • 支持向量机
  • 决策树分类器
  • K邻居分类器
  • 高斯朴素贝叶斯

开发项目时使用的库

pandas==2.0.2
scikit_learn==1.2.2

项目结构

1)首先导入所有库
2)从Excel文件中读取训练/测试数据
3) 数据分析
4)数据清洗/预处理
5)模型训练
6)模型评估

结论:我们使用以下模型:随机森林分类器、支持向量机、决策树分类器、KNeighbors 分类器、高斯朴素贝叶斯。 此随机森林分类器最适合此数据集。

项目结论

图片[1]-航司满意度预测项目-VenusAI

项目详情

导入必要的库

import pandas as pd
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB

读取数据

data=pd.read_csv('train.csv')
data.head()

5 rows × 25 columns

data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 103904 entries, 0 to 103903
Data columns (total 25 columns):
 #   Column                             Non-Null Count   Dtype  
---  ------                             --------------   -----  
 0   Unnamed: 0                         103904 non-null  int64  
 1   id                                 103904 non-null  int64  
 2   Gender                             103904 non-null  object 
 3   Customer Type                      103904 non-null  object 
 4   Age                                103904 non-null  int64  
 5   Type of Travel                     103904 non-null  object 
 6   Class                              103904 non-null  object 
 7   Flight Distance                    103904 non-null  int64  
 8   Inflight wifi service              103904 non-null  int64  
 9   Departure/Arrival time convenient  103904 non-null  int64  
 10  Ease of Online booking             103904 non-null  int64  
 11  Gate location                      103904 non-null  int64  
 12  Food and drink                     103904 non-null  int64  
 13  Online boarding                    103904 non-null  int64  
 14  Seat comfort                       103904 non-null  int64  
 15  Inflight entertainment             103904 non-null  int64  
 16  On-board service                   103904 non-null  int64  
 17  Leg room service                   103904 non-null  int64  
 18  Baggage handling                   103904 non-null  int64  
 19  Checkin service                    103904 non-null  int64  
 20  Inflight service                   103904 non-null  int64  
 21  Cleanliness                        103904 non-null  int64  
 22  Departure Delay in Minutes         103904 non-null  int64  
 23  Arrival Delay in Minutes           103594 non-null  float64
 24  satisfaction                       103904 non-null  object 
dtypes: float64(1), int64(19), object(5)
memory usage: 19.8+ MB

data.shape
(103904, 25)
data.describe()

数据清洗

data.isna().sum()
Unnamed: 0                             0
id                                     0
Gender                                 0
Customer Type                          0
Age                                    0
Type of Travel                         0
Class                                  0
Flight Distance                        0
Inflight wifi service                  0
Departure/Arrival time convenient      0
Ease of Online booking                 0
Gate location                          0
Food and drink                         0
Online boarding                        0
Seat comfort                           0
Inflight entertainment                 0
On-board service                       0
Leg room service                       0
Baggage handling                       0
Checkin service                        0
Inflight service                       0
Cleanliness                            0
Departure Delay in Minutes             0
Arrival Delay in Minutes             310
satisfaction                           0
dtype: int64
data.dropna(axis=0, inplace=True)
data.isna().sum()
Unnamed: 0                           0
id                                   0
Gender                               0
Customer Type                        0
Age                                  0
Type of Travel                       0
Class                                0
Flight Distance                      0
Inflight wifi service                0
Departure/Arrival time convenient    0
Ease of Online booking               0
Gate location                        0
Food and drink                       0
Online boarding                      0
Seat comfort                         0
Inflight entertainment               0
On-board service                     0
Leg room service                     0
Baggage handling                     0
Checkin service                      0
Inflight service                     0
Cleanliness                          0
Departure Delay in Minutes           0
Arrival Delay in Minutes             0
satisfaction                         0
dtype: int64
### Encoding ###
le = LabelEncoder()
data["Gender"] = le.fit_transform(data["Gender"])
data["Customer Type"] = le.fit_transform(data["Customer Type"])
data["Type of Travel"] = le.fit_transform(data["Type of Travel"])
data["satisfaction"] = le.fit_transform(data["satisfaction"])

### Labeling ###
data["Class"] = data["Class"].replace({"Eco":1,"Eco Plus":2,"Business":3})

实例化LabelEncoder

le = LabelEncoder()

这一步创建了一个LabelEncoder对象le,用于后续的标签编码。

对各个特征进行编码

对于数据集data中的每个分类特征,LabelEncoder的fit_transform方法被用来转换其值。

data["Gender"] = le.fit_transform(data["Gender"]):对Gender特征进行编码,将文本标签(如"Male","Female")转换为数字(如0, 1)。

data["Customer Type"] = le.fit_transform(data["Customer Type"]):对Customer Type特征进行同样的处理。

data["Type of Travel"] = le.fit_transform(data["Type of Travel"]):对Type of Travel特征进行编码。

data["satisfaction"] = le.fit_transform(data["satisfaction"]):对satisfaction特征进行编码。

fit_transform方法首先将标签拟合到数据上,然后将它们转换为适当的数值标签。

手动标签替换

除了使用LabelEncoder,代码还手动替换了Class特征中的标签。

data["Class"] = data["Class"].replace({"Eco":1,"Eco Plus":2,"Business":3})

这一行代码将Class特征中的每个类别("Eco", "Eco Plus", "Business")映射到一个具体的数字(1, 2, 3)。这是一种更直接的方法来进行类别编码,特别是当类别的数量不多,且您希望指定特定的数值时。

模型训练

X_train = data.drop("satisfaction", axis=1)
y_train= data["satisfaction"]
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
models = pd.DataFrame(columns=["Model Name","Accuracy Score"])
model_list = [("Random Forest Classifier",RandomForestClassifier(random_state=42)),
             ("Support Vector Machines",SVC(random_state=42)),
             ("Decision Tree Classifier", DecisionTreeClassifier(random_state=42)),
             ("KNeighbors Classifier",KNeighborsClassifier(n_neighbors=2)),
             ("Gaussian Naive Bayes", GaussianNB())]

模型验证

testData=pd.read_csv('test.csv')
testData.dropna(axis=0, inplace=True)

### Encoding ###
le = LabelEncoder()
testData["Gender"] = le.fit_transform(testData["Gender"])
testData["Customer Type"] = le.fit_transform(testData["Customer Type"])
testData["Type of Travel"] = le.fit_transform(testData["Type of Travel"])
testData["satisfaction"] = le.fit_transform(testData["satisfaction"])

### Labeling ###
testData["Class"] = testData["Class"].replace({"Eco":1,"Eco Plus":2,"Business":3})

Xtest = testData.drop("satisfaction", axis=1)
ytest= testData["satisfaction"]
Xtest = scaler.fit_transform(Xtest)
# 创建一个空的DataFrame来存储结果
models = pd.DataFrame(columns=["Model Name", "Accuracy Score"])

# 循环遍历模型
for algoName, model in model_list:
    model.fit(X_train, y_train)
    predictions = model.predict(Xtest)
    score = accuracy_score(ytest, predictions)
    new_row = {"Model Name": algoName, "Accuracy Score": score}

    # 使用 pd.concat 而不是 append
    models = pd.concat([models, pd.DataFrame([new_row])], ignore_index=True)

# 对模型按照准确率降序排列
models = models.sort_values(by="Accuracy Score", ascending=False)

# 显示模型及其准确率
models
Model NameAccuracy Score
0Random Forest Classifier0.963890
1Support Vector Machines0.956050
2Decision Tree Classifier0.945700
3KNeighbors Classifier0.909319
4Gaussian Naive Bayes0.861970

项目资源下载

详情请见航司满意度预测项目-VenusAI (aideeplearning.cn) 

  • 27
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值