机器学习基础之对数几率回归(泰坦尼克号数据集)

这段代码展示了如何利用TensorFlow1.0进行机器学习,以预测泰坦尼克号乘客的生存率。首先,从CSV文件加载数据,预处理缺失值并进行特征编码。然后,构建一个简单的神经网络模型,使用梯度下降优化器进行训练,并在测试数据上评估模型的准确率。
摘要由CSDN通过智能技术生成

数据集
百度网盘提取码:0000
github

import numpy as np
import pandas as pd
import tensorflow as tf

# 使用tensorflow1.0
tf = tf.compat.v1
tf.disable_v2_behavior()

data = pd.read_csv('../../dataset/泰坦尼克数据集/train.csv')
# 筛选出一个子集
data = data[['Survived', 'Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']]
# 用0填充None值
data = data.fillna(0)
# 将sex字段映射为01
data['Sex'] = pd.factorize(data.Sex)[0]
# 将船票等级转换为独热编码
data['p1'] = np.array(data['Pclass'] == 1).astype(np.float64)
data['p2'] = np.array(data['Pclass'] == 2).astype(np.float64)
data['p3'] = np.array(data['Pclass'] == 3).astype(np.float64)

# 删除Pclass字段
del data['Pclass']

print(data.Embarked.unique())

data['e1'] = np.array(data['Embarked'] == 'S').astype(np.float64)
data['e2'] = np.array(data['Embarked'] == 'C').astype(np.float64)
data['e3'] = np.array(data['Embarked'] == 'Q').astype(np.float64)

del data['Embarked']

data_data = np.stack(
    [data.Sex.values.astype(np.float64), data.Age.values.astype(np.float64), data.SibSp.values.astype(np.float64),
     data.Parch.values.astype(np.float64), data.Fare.values.astype(np.float64), data.p1.values,
     data.p2.values, data.p3.values, data.e1.values, data.e2.values, data.e3.values]).T

data_target = np.reshape(data.Survived.values.astype(np.float64), (891, 1))
print(np.shape(data_target), np.shape(data_data))

# 定义网络
x = tf.placeholder('float', shape=[None, 11])
y = tf.placeholder('float', shape=[None, 1])
weight = tf.Variable(tf.random_normal([11, 1]))
bias = tf.Variable(tf.random_normal([1]))
# 矩阵相乘
output = tf.matmul(x, weight) + bias
# 预测
pred = tf.cast(tf.sigmoid(output) > 0.5, tf.float32)
loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=output))
train_step = tf.train.GradientDescentOptimizer(0.0001).minimize(loss)
accuracy = tf.reduce_mean(tf.cast(tf.equal(pred, y), tf.float32))

sess = tf.Session()
sess.run(tf.global_variables_initializer())

for i in range(10000):
    for n in range(len(data_target) // 100):
        # 乱序
        index = np.random.permutation(len(data_target))
        data_data=data_data[index]
        data_target=data_target[index]
        batch_xs = data_data[n:n + 100]
        batch_ys = data_target[n:n + 100]
        sess.run(train_step, feed_dict={x: batch_xs, y: batch_ys})
    if i % 1000 == 0:
        print(sess.run((loss, accuracy), feed_dict={x: batch_xs, y: batch_ys}))
data_test = pd.read_csv('../../dataset/泰坦尼克数据集/test.csv')
data_test = data_test.fillna(0)
data_test['Sex'] = pd.factorize(data_test.Sex)[0]
data_test['p1'] = np.array(data_test['Pclass'] ==1).astype(np.float64)
data_test['p2'] = np.array(data_test['Pclass'] ==2).astype(np.float64)
data_test['p3'] = np.array(data_test['Pclass'] ==3).astype(np.float64)
data_test['e1'] = np.array(data_test['Embarked'] =='S').astype(np.float64)
data_test['e2'] = np.array(data_test['Embarked'] =='C').astype(np.float64)
data_test['e3'] = np.array(data_test['Embarked'] =='Q').astype(np.float64)
test_data = np.stack([data_test.Sex.values.astype(np.float64),data_test.Age.values.astype(np.float64),data_test.SibSp.values.astype(np.float64),
                      data_test.Parch.values.astype(np.float64),data_test.Fare.values.astype(np.float64),data_test.p1.values,
                      data_test.p2.values,data_test.p3.values,data_test.e1.values,data_test.e2.values,data_test.e3.values]).T
test_lable = pd.read_csv('../../dataset/泰坦尼克数据集/gender.csv')
test_lable = np.reshape(test_lable.Survived.values.astype(np.float64),(418,1))
print(sess.run(accuracy,feed_dict={x: test_data, y: test_lable}))
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值