深度学习模型建立过程_所有深度学习都是统计模型的建立

深度学习模型建立过程

Deep learning is often used to make predictions for data driven analysis. But what are the meanings of these predictions?

深度学习通常用于对数据驱动的分析进行预测。 但是这些预测的含义是什么?

This post explains how neural networks used in deep learning provide the parameters of a statistical model describing the probability of the occurrence of events.

这篇文章解释了深度学习中使用的神经网络如何提供描述事件发生概率的统计模型的参数。

事件的发生和不确定性 (The occurrence of events and aleatoric uncertainty)

Data, observables, events or any other way of describing the things we can see and/or collect is absolute: we roll two sixes on a pair of six-sided dice or we get some other combination of outcomes; we toss a coin 10 times and we get heads each time or we get some other mixture of heads and tails; our universe evolves some way and we observe it, or it doesn’t — and we don’t. We do not know, a priori, whether we will get two sixes with our dice roll or heads each time we toss a coin or what possible universes could exist for us to come into being and observe it. We describe the uncertainty due to this lack of knowledge as aleatoric. It is due to fundamental missing information about the generation of such data — we can never exactly know what outcome we will obtain. We can think of aleatoric uncertainty as not being able to know the random seed of some random number generating process.

数据,可观察到的事物,事件或任何其他描述我们可以看到和/或收集的事物的方式都是绝对的:我们在一对六面骰子上掷两个六点,或者得到其他结果组合。 我们抛硬币十次,每次都正面,或者正面和反面都有其他混合; 我们的宇宙以某种方式进化,我们观察它,或者观察不到-我们也没有。 我们不知道是先验的,是否每次掷硬币都将掷骰子或掷骰子获得两个六分,或者我们可能存在什么宇宙来观察它。 由于缺乏知识,我们将不确定性描述为偶然的。 这是由于根本缺少有关此类数据生成的信息-我们永远无法确切知道我们将获得什么结果。 我们可以将预言性不确定性视为无法了解某个随机数生成过程的随机种子。

We describe the probability of the occurrence of events using a function, P‏‏‎ ‎:‏‏‎ ‎d‏‏‎ ‎∈‏‏‎ ‎E‏‏‎ ‎‏‏‎ ‎P(d)‏‏‎ ‎∈‏‏‎ ‎[0 ,1], i.e. the probability distribution function, P, assigns a value between 0 and 1 to any event, d, in the space of all possible events, E. If an event is impossible then P(d)‏‏‎ ‎=‏‏‎ ‎0, whilst a certain outcome has a probability P(d)‏‏‎ ‎=‏‏‎ ‎1. This probability is additive such that the union of all possible events d‏‏‎ ‎∈‏‏‎ ‎E is certain, i.e. P(E)‏‏‎ ‎=‏‏‎ ‎1.

我们使用功能描述事件发生的概率,P:d∈Ë↦P(d) […,0] ,即概率分布函数P在所有可能的事件E的范围内为任何事件d分配一个介于0和1之间的值如果一个事件不可能,则P (d)= 0,而在一定的结果具有概率P(d)= 1。 此概率是添加剂,使得所有可能的事件的联合ð∈E是肯定的, ,P(E)= 1。

Using a slight abuse of notation we can write d‏‏‎ ‎~‏‏‎ ‎P, which means that some event, d, is drawn from the space of all possible events, E, with a probability P(d). This means that there is a 100×P(d)% chance that event d is observed. d could be any observation, event or outcome of a process, for example, when rolling n = 2 six-sided dice and obtaining a six with both, d = (d¹ = 0, d² = 0, d³ = 0, d= 0, d= 0, d= 2). We do not know, beforehand, exactly what result we will obtain by rolling these two dice, but we know there is a certain probability that any particular outcome will be obtained. Under many repetitions of the dice roll experiment (with perfectly balanced dice and identical conditions) we should see that the probability of d occurring is P(d)‏‏‎ ‎≈‏‏‎ ‎¹/₃₆. Even without performing many repetitions of the dice roll we could provide our believed estimate of the distribution of how likely we are to see particular outcomes.

使用的符号,我们可以写d〜P,这意味着一些事件,d,从所有可能的事件,E,以概率P(d)的空间引出一个轻微的滥用。 这意味着观察到事件d的可能性为100× P(d) %。 d可以是过程的任何观察,事件或结果,例如,当滚动n = 2个六边形骰子并同时获得六个骰子时, d =(= 0,= 0,= 0, d = 0,d = 0,d = 2) 。 我们事先不知道确切地知道通过掷骰子这两个骰子会得到什么结果,但是我们知道有可能获得任何特定的结果。 在骰子掷骰实验的许多次重复下(骰子完全平衡且条件相同),我们应该看到d发生的概率为P(d) 。 即使不进行骰子掷骰的许多次重复,我们也可以提供我们对看到特定结果的可能性分布的可靠估计。

统计模型 (Statistical models)

To make statistical predictions we model the distribution of data using parameterisable distributions, Pₐ. We can think of a as defining a statistical model which contains a description of the distribution of data and any possible unobservable parameters, v Eᵥ, of the model. The distribution function then attributes values of probability to the occurrence of observable/unobservable events Pₐ : (d, v) (E, Eᵥ) ↦ Pₐ(d, v) ∈ [0, 1]. It is useful to note that we can write this joint probability distribution as a conditional statement, Pₐ = Lₐ · pₐ = ρₐ · eₐ. These probability distribution functions are:

为了使统计预测模型 ,我们使用参数化的分布,Pₐ数据的分布。 我们可以想到是定义包含数据的分布任何可能的不可观测参数,V∈Eᵥ,该模型的描述的统计模型。 然后,分布函数将概率值归因于可观察/不可观察事件Pₐ的出现:(d,v)(E,Eᵥ ) ↦Pₐ(d,v) ∈[0,1]。 值得注意的是,我们可以将此联合概率分布写为条件语句Pₐ=Lₐ · pₐ=ρₐ · eₐ 。 这些概率分布函数是:

  • The likelihood — Lₐ : (d, v) (E, Eᵥ) ↦ Lₐ(d|v) ∈ [0, 1]

    似然—Lₐ:(d,v)(E,Eᵥ ) ↦Lₐ(d | v) ∈[0,1]

  • The prior — pₐ : v Eᵥpₐ(v) ∈ [0, 1]

    现有- pₐ:v∈Eᵥ↦pₐ(V)∈[0,1]

  • The posterior — ρₐ : (d, v) (E, Eᵥ) ↦ ρₐ(v|d) ∈ [0, 1]

    后验—ρₐ:(d,v)(E,Eᵥ ) ↦ρₐ(v | d) ∈[0,1]

  • The evidence — eₐ : d E eₐ(d) ∈ [0, 1]

    证据- eₐ:d∈È↦eₐ(d)∈[0,1]

The introduction of these functions allow us to interpret the probability of observing d and v as being equal to the probability of observing d given the value, v, of the model parameters multiplied by how likely these model parameter values are — likewise, it is equal to the probability of the value, v, of model parameters given that d is observed multiplied by how likely d is to be observed in the model.

这些函数的引入使我们可以将观察dv的概率解释为等于给定模型参数的值v乘以这些模型参数值的可能性的观察d的概率-同样,它等于假设观察到的d乘以在模型中观察到d的可能性,则将模型参数的值v的概率乘以n。

For the dice roll experiment we could (and do) model the distribution of data using a multinomial distribution, Pₐ = ∏ᵢ n!/dⁱ! pᵢᵈⁱ where the fixed parameters of the multinomial model are v = {p₁, p₂, p₃, p₄, p₅, p₆, n} = {pᵢ, n| i ∈ [1, 6]} with pᵢ as the probabilities of obtaining value i ∈ [1, 6] from a die and n as the number of rolls. If we are considering completely unbiased dice then p₁ = p₂ = p₃ = p₄ = p₅ = p₆ = ¹/₆. The probability of observing two sixes, d = (d¹ = 0, d² = 0, d³ = 0, d</

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值