贝叶斯统计第二版第五章答案_贝叶斯统计第二部分

贝叶斯统计第二版第五章答案

In this post, I will compare the output of frequentist and Bayesian statistics, and explain how these two approaches can be complementary, in particular for unclear results resulting from a frequentist approach.

在这篇文章中,我将比较常客和贝叶斯统计的输出,并解释这两种方法如何互补,特别是对于常客方法产生的不确定结果。

For a first proof of concept, I will use the famous Titanic data set, that every first Kaggle user is exposed to upon registration. These statistics can be of course applied on any other data set. I selected the Titanic data set because it has a large range of variables, and readers might already know the data.

作为第一个概念验证,我将使用著名的Titanic数据集,每个第一个Kaggle用户在注册时都会接触到它。 这些统计信息当然可以应用于任何其他数据集。 我选择“泰坦尼克号”数据集是因为它具有广泛的变量范围,并且读者可能已经知道这些数据。

For the ones not familiar with this data set, it offers a range of variables that can be used to predict the likelihood of having survived the accident that sunk the boat back then. You will find all kind of approaches online to analyze this data set, as well as machine learning techniques to predict survival.

对于不熟悉此数据集的人,它提供了一系列变量,可用于预测当时沉没在事故中幸存下来的可能性。 您将在线找到用于分析该数据集的各种方法,以及用于预测生存率的机器学习技术。

I downloaded it from some source on the net, and you can find the exact data set I used here.

我是从网上的一些来源下载的,您可以在这里找到我使用的确切数据集。

FYI, the variables are listed below:

仅供参考,以下列出了变量:

[print(i) for i in df.columns]PassengerId
Survived
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked

缺乏证据 (Absence of Evidence)

If you do the analysis yourself, you will find out that some variables are pretty good at predicting survival. For the sake of argumentation, and because I think it offers a nice explanatory power, let’s look at the variable age:

如果您自己进行分析,您会发现某些变量非常擅长预测存活率。 为了论证,并且因为我认为它提供了很好的解释能力,让我们看一下可变年龄:

df.Age.plot(kind='hist')

Since we want to investigate the effect of age on survival, let’s split that accordingly:

由于我们想研究年龄对生存的影响,因此我们将其相应地拆分:

(df.groupby('Survived')
.apply(lambda d: pd.Series({
"std": d.Age.std(),
"sem": d.Age.std() / d.Age.count(),
"avg": d.Age.mean()
}))
.plot(kind='barh',
y = "avg",
legend = False,
title = "Mean Age per Surival Class +/- std",
xerr = "std"
));
Image for post
Figure 2 | Mean age per survival category. Errorbars shows the standard deviation of the distribution.
图2 | 每个生存类别的平均年龄。 误差线显示分布的标准偏差。

From a simple bar plot, there does not seem to be a crazy difference in the age of passengers that survived and did not survived the accident. Looking at the error bars, we might think that these distributions are not significantly different.

从简单的条形图来看,幸存和未幸免于事故的乘客年龄似乎没有疯狂的差异。 查看误差线,我们可能会认为这些分布没有显着差异。

Let’s test that statistically.

让我们进行统计测试。

For the demonstration of Bayesian statistics, I will be using the open source software JASP, which offers a user-friendly interface. There are many other packages out there that would allow you to run Bayesian stats from code. Since the readers might not be well versed in code, I use this software to show how to run basic Bayesian testing.

为了演示贝叶斯统计,我将使用开源软件JASP ,它提供了用户友好的界面。 还有许多其他软件包,可让您从代码中运行贝叶斯统计信息。 由于读者可能不精通代码,因此我使用此软件来演示如何运行基本的贝叶斯测试。

Let’s first load the Titanic data set in JASP:

让我们首先在JASP中加载Titanic数据集:

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值