机器学习学习笔记 PRML Chapter 1.6 : Information Theory

Chapter 1.6 : Information Theory

PRML, OXford University Deep Learning Course, Machine Learning, Pattern Recognition
Christopher M. Bishop, PRML, Chapter 1 Introdcution

1. Information h(x)

Given a random variable x and we ask how much information is received when we observe a specific value for this variable.

  • The amount of information can be viewed as the “degree of surprise” on learning the value of x .

    • information h(x) :
      h(x)=log2p(x)(1.92)
      where the negative sign ensures that information is positive or zero.
    • the units of h(x) :
      • using logarithms to the base of 2: the units of h(x) are bits (‘binary digits’).
      • using logarithms to the base of e , i.e., natural logarithms: the units of h(x) are nats.
    • 2. Entropy H(x): average amount of information

      2.1 Entropy H(x)

      Firstly we interpret the concept of entropy in terms of the average amount of information needed to specify the state of a random variable.

      Now suppose that a sender wishes to transmit the value of a random variable to a receiver. The average amount of information that they transmit in the process is obtained by taking the expectation of (1.92) with respect to the distribution p(x) and is given as
      - discrete entropy for discrete random variable by

      H[x]=xp(x)log2p(x).(1.93)

      - or differential/continuous entropy for continuous random variable by
      H[x]=p(x)lnp(x)dx.(1.104)

      • Note that limp0pln(p)=0 and so we shall take p(x)lnp(x)=0 whenever we encounter a value for x such that p(x)=0 .
      • The nonuniform distribution has a smaller entropy than the uniform one.

      2.2 Noiseless coding theorem (Shannon, 1948)

      The noiseless coding theorem states that the entropy is a lower bound on the number of bits needed to transmit the state of a random variable.

      2.3 Alternative view of entropy H(x)

      Secondly, let us introduces the concept of entropy in physics in the context of equilibrium thermodynamics and later given a deeper interpretation as a measure of disorder through developments in statistical mechanics.

      Consider a set of N identical objects that are to be divided amongst a set of bins, such that there are ni objects in the ith bin. Consider the number of different ways of allocating the objects to the bins.
      - There are N ways to choose the first object, (N1) ways to choose the second object, and so on, leading to a total of N! ways to allocate all N objects to the bins.
      - However, we don’t wish to distinguish between rearrangements of objects within each bin. In the ith bin there are ni! ways of reordering the objects, and so the total number of ways of allocating the N objects to the bins is given by

      W=N!ini!(1.94)
      which is called the multiplicity.
      - The entropy is then defined as the logarithm of the multiplicity scaled by an appropriate constant

      H=1NlnW=1NlnN!1Nilnni!(1.95)

      - We now consider the limit N , in which the fractions ni/N are held fixed, and apply Stirling’s approximation
      lnN!NlnNN(1.96)

      - which gives
      Hlnni!=nilnninilnN!=NlnNNiniN=1ini=N=
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值