贝叶斯统计第二版第五章答案_贝叶斯统计第二部分

最新推荐文章于 2021-12-03 18:51:51 发布

weixin_26730921

最新推荐文章于 2021-12-03 18:51:51 发布

阅读量3.8k

点赞数

文章标签： python 算法

原文链接：https://medium.com/@julien.her/statistics-part-ii-bayesian-to-the-rescue-877cc18c8bfd

版权

这篇内容提供了贝叶斯统计第二版第五章的解答，详细解析了相关问题，帮助读者深入理解贝叶斯统计概念。

摘要由CSDN通过智能技术生成

贝叶斯统计第二版第五章答案

In this post, I will compare the output of frequentist and Bayesian statistics, and explain how these two approaches can be complementary, in particular for unclear results resulting from a frequentist approach.

在这篇文章中，我将比较常客和贝叶斯统计的输出，并解释这两种方法如何互补，特别是对于常客方法产生的不确定结果。

For a first proof of concept, I will use the famous Titanic data set, that every first Kaggle user is exposed to upon registration. These statistics can be of course applied on any other data set. I selected the Titanic data set because it has a large range of variables, and readers might already know the data.

作为第一个概念验证，我将使用著名的Titanic数据集，每个第一个Kaggle用户在注册时都会接触到它。这些统计信息当然可以应用于任何其他数据集。我选择“泰坦尼克号”数据集是因为它具有广泛的变量范围，并且读者可能已经知道这些数据。

For the ones not familiar with this data set, it offers a range of variables that can be used to predict the likelihood of having survived the accident that sunk the boat back then. You will find all kind of approaches online to analyze this data set, as well as machine learning techniques to predict survival.

对于不熟悉此数据集的人，它提供了一系列变量，可用于预测当时沉没在事故中幸存下来的可能性。您将在线找到用于分析该数据集的各种方法，以及用于预测生存率的机器学习技术。

I downloaded it from some source on the net, and you can find the exact data set I used here.

我是从网上的一些来源下载的，您可以在这里找到我使用的确切数据集。

FYI, the variables are listed below:

仅供参考，以下列出了变量：

[print(i) for i in df.columns]PassengerId
Survived
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked

缺乏证据 (Absence of Evidence)

If you do the analysis yourself, you will find out that some variables are pretty good at predicting survival. For the sake of argumentation, and because I think it offers a nice explanatory power, let’s look at the variable age:

如果您自己进行分析，您会发现某些变量非常擅长预测存活率。为了论证，并且因为我认为它提供了很好的解释能力，让我们看一下可变年龄：

df.Age.plot(kind='hist')

Since we want to investigate the effect of age on survival, let’s split that accordingly:

由于我们想研究年龄对生存的影响，因此我们将其相应地拆分：

(df.groupby('Survived')
.apply(lambda d: pd.Series({
       
    "std": d.Age.std(),
    "sem": d.Age.std() /  d.Age.count(),
    "avg": d.Age.mean()
}))
.plot(kind='barh', 
       y = "avg", 
       legend = False, 
       title = "Mean Age per Surival Class +/- std", 
       xerr = "std"
      ));

Image for post — Figure 2 | Mean age per survival category. Errorbars shows the standard deviation of the distribution.

From a simple bar plot, there does not seem to be a crazy difference in the age of passengers that survived and did not survived the accident. Looking at the error bars, we might think that these distributions are not significantly different.

从简单的条形图来看，幸存和未幸免于事故的乘客年龄似乎没有疯狂的差异。查看误差线，我们可能会认为这些分布没有显着差异。

Let’s test that statistically.

让我们进行统计测试。

For the demonstration of Bayesian statistics, I will be using the open source software JASP, which offers a user-friendly interface. There are many other packages out there that would allow you to run Bayesian stats from code. Since the readers might not be well versed in code, I use this software to show how to run basic Bayesian testing.

为了演示贝叶斯统计，我将使用开源软件JASP ，它提供了用户友好的界面。还有许多其他软件包，可让您从代码中运行贝叶斯统计信息。由于读者可能不精通代码，因此我使用此软件来演示如何运行基本的贝叶斯测试。

Let’s first load the Titanic data set in JASP:

让我们首先在JASP中加载Titanic数据集：

最低0.47元/天解锁文章

weixin_26730921

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
贝叶斯统计第二版第五章答案_贝叶斯统计第二部分

贝叶斯统计第二版第五章答案In this post, I will compare the output of frequentist and Bayesian statistics, and explain how these two approaches can be complementary, in particular for unclear results resulting from...
复制链接

扫一扫