【读书笔记】只有浅出没有深入的数据分析(十 )

最新推荐文章于 2021-08-12 09:30:04 发布

Canon__

最新推荐文章于 2021-08-12 09:30:04 发布

阅读量194

点赞数

本文链接：https://blog.csdn.net/Canon__/article/details/82686930

版权

这一章是承接上一章的，这一章主要是回归问题，设计一个“加薪计算器”，也就是一种加薪算法。

这里依旧给出了数据，先让我们把加薪者的要求和结果作成直方图：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.read_csv('./hfda_data/hfda_ch10_employees.csv')


plt.figure(1)
plt.hist(df[df.negotiated == True].iloc[::, 1], bins=np.arange(0, 25, 0.5),  histtype='bar', facecolor='blue',
         edgecolor='black', )

plt.figure(2)
plt.hist(df[df.negotiated == True].iloc[::, 2], bins=np.arange(0, 25, 0.5),  histtype='bar', facecolor='blue',
         edgecolor='black', )


plt.show()

我们可以看到两张图非常相似，但没有将需求和真实加薪放在一起，具体情况无从得知。

因此我们这里使用散点图来看两者关系:

df = pd.read_csv('./hfda_data/hfda_ch10_employees.csv')


plt.figure(1)
plt.scatter(df[df.negotiated == True].iloc[::, 2],
            df[df.negotiated == True].iloc[::, 1], c='b', s=20, linewidths=0.5, marker='o', edgecolors='black')

plt.show()

这里就需要线性回归了，书中用的r，我这里用梯度下降来作一元回归。(大家也可以试试别的库）

import pandas as pd
import tensorflow as tf
import numpy as np

df = pd.read_csv('./hfda_data/hfda_ch10_employees.csv')


X = np.array(df[df.negotiated == True].iloc[::, 2])[:, np.newaxis]
Y = np.array(df[df.negotiated == True].iloc[::, 1])[:, np.newaxis]

xs = tf.placeholder(tf.float32, [None, 1])
ys = tf.placeholder(tf.float32, [None, 1])

Weights = tf.Variable(tf.random_normal([1], -1, 1))
biases = tf.Variable(tf.zeros([1]) + 0.1)

Wx_b = xs*Weights + biases

loss = tf.reduce_mean(tf.reduce_sum((tf.square(ys - Wx_b)), reduction_indices=[1]))

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

init = tf.global_variables_initializer()

sess = tf.Session()

sess.run(init)
for i in range(5000):
    sess.run(train_step, feed_dict={xs: X, ys: Y})
    if i % 50 == 0:
        print(i, sess.run(Weights), sess.run(biases))

书中结果为0.7x+2.3 我得到的结果为0.72507244x+2.3120737与书中基本一致。

最后我们手动验证一下结果：

df = pd.read_csv('./hfda_data/hfda_ch10_employees.csv')
df1 = df[df.negotiated==True]
print(df1.corr())

我们可以看到两者的相关系数是0.665648，然后我们用:

df1.describe()

求得两者方差

然后再用 r* std(y)/std(x) 来求得斜率，这里的r是相关系数，std(y)是实际工资增长的标准差，std(x)是要求工资增长的标准差。

我们费力设计的“加薪计算器”并没有按照计划行事，不过这一章到这里就结束了，剩下的问题会写到下一章。

Canon__

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【读书笔记】只有浅出没有深入的数据分析(十 )

这一章是承接上一章的，这一章主要是回归问题，设计一个“加薪计算器”，也就是一种加薪算法。这里依旧给出了数据，先让我们把加薪者的要求和结果作成直方图：import pandas as pdimport numpy as npimport matplotlib.pyplot as pltdf = pd.read_csv('./hfda_data/hfda_ch10_employees...
复制链接

扫一扫