what is Bayesian deep learning ?
1 : To achieve integrated intelligence that involves both perception and inference, it is naturally desirable to tightly integrate deep learning and bayesian models within a principled probabilistic framework, which we call Bayesian deep learning .
More important ability is thinking in AI system, and BDL can solve this problem
2 : However, in order to build a real AI system, simply being able to see, read, and hear is far from enough. It should, above all, possess the ability of thinking.
3 : it is the thinking part that defines a doctor. Specifically, the ability of thinking here could involve causal inference, logic deduction, and dealing with uncertainty, which is apparently beyond the capability of conventional deep learning methods. Fortunately, another type of models, probabilistic graphical models (PGM), excels at causal inference and dealing with uncertainty. The problem is that PGM is not as good as deep learning models at perception tasks. To address the problem, it is, therefore, a natural choice to tightly integrate deep learning and PGM within a principled probabilistic framework, which we call Bayesian deep learning (BDL) in this paper.
What dose the Bayesian deep learning do?
4 : With the tight and principled integration in Bayesian deep learning, the perception task and inference task are regarded as a whole and can benefit from each other. In the example above, being able to see the medical image could help with the doctor’s diagnosis and inference. On the other hand, diagnosis and inference can in return help with understanding the medical image. Suppose the doctor may not be sure about what a dark spot in a medical image is, but if she is able to infer the etiology of the symptoms and disease, it can help him better decide whether the dark spot is a tumor or not.
Other application
5 : Besides recommender systems, the need for Bayesian deep learning may also arise when we are dealing with control of non-linear dynamical systems with raw images as input.Consider controlling a complex dynamical system according to the live video stream received from a camera. This problem can be transformed into iteratively performing two tasks, perception from raw images and control based on dynamic models. The perception task can be taken care of using multiple layers of simple nonlinear transformation (deep learning) while the control task usually needs more sophisticated models like hidden Markov models and Kalman filters,The feedback loop is then completed by the fact that actions chosen by the control model can affect the received video stream in return. To enable an effective iterative process between the perception task and the control task, we need two-way information exchange between them. The perception component would be the basis on which the control component estimates its states and the control component with a dynamic model built in would be able to predict the future trajectory (images). In such cases, Bayesian deep learning is a suitable choice
The benefits of BDL
6 : Apart from the major advantage that BDL provides a principled way of unifying deep learning and PGM, another benefit comes from the implicit regularization built in BDL. By imposing a prior on hidden units, parameters defining a neural network, or the model parameters specifying the causal inference(因果推理), BDL can to some degree avoid overfitting, especially when we do not have sufficient data.
What does the BDL consists of
7 : Usually, a BDL model consists of two components, a perception component that is a Bayesian formulation of a certain type of neural networks(是一种特定类型的神经网络的贝叶斯公式) and a task-specific component that describes the relationship among different
hidden or observed variables using PGM. Regularization is crucial for them both. Neural networks usually have large numbers of free parameters that need to be regularized properly. Regularization techniques like weight decay and dropout are shown to be effective in improving performance of neural networks and they both have Bayesian interpretations. In terms of the task-specific component, expert knowledge or prior information, as a kind of regularization, can be incorporated into the model through the prior we imposed to guide the model when data are scarce.
Yet another advantage of using BDL for complex tasks (tasks that need both perception and inference) is that it provides a principled Bayesian approach of handling parameter uncertainty. When BDL is applied to complex tasks, there are three kinds of parameter uncertainty that need to be taken into account:
- Uncertainty on the neural network parameters.
- Uncertainty on the task-specific(特定任务) parameters.
- Uncertainty of exchanging information between the perception component and the task-specific component.
By representing the unknown parameters using distributions instead of point estimates(??), BDL offers a promising framework to handle these three kinds of uncertainty in a unified way.It is worth noting that the third uncertainty could only be handled under a unified framework like BDL. If we train the perception component and the task-specific component separately, it is equivalent to assuming no uncertainty when exchanging information between the two components.
The bottlenecks of BDL
Of course, there are challenges when applying BDL to real-world tasks. (1) First, it is nontrivial(不容易的) to design an efficient Bayesian formulation of neural networks with reasonable time complexity. This line of work is pioneered by [24], [37], [40], but it has not been widely adopted due to its lack of scalability(可扩展性). Fortunately, some recent advances in this direction [1], [7], [19], [22], [32] seem to shed light on the practical adoption of Bayesian neural network. (2) The second challenge is to ensure efficient and effective information exchange between the perception component and the task-specific component. Ideally both the first-order and second-order information (e.g., the mean and the variance) should be able to flow back and forth between the two components. A natural way is to represent the perception component as a PGM and seamlessly connect it to the task-specific PGM, as done in [15], [59], [60]