Probability and distributions

Content

Chapter 2 Probability and distributions

2.1 Probability & Probabilities

2.2 Randomness & Variability

2.3 Degree of freedon

2.4 Quantiles

2.5 Questions


Chapter 2 Probability and distributions

2.1 Probability & Probabilities

"Probability = Likelihood = Plausibility"(可能性) is quantified using "Probabilities"(概率): an expected relative frequency, that is, an expected count, expressed as a percentahe of the total.

  1. Frequencies can be observed or expected.  When expected, relative frequencies are called probabilities.
  2. Probability is not an observed, but expected frequency.
  3. know probabilities in 2 ways: through experience; through knowledge of the underlying mechanism.
  4. The collection of probabilities is called distribution.
  5. When a distribution is flat (all probabilities are similar), the uncertainty regarding which of the states will come out is maximal. (probabilities associated with the two possible events are equal.) Information is minimum.
  6. A given probability is greater than others are, the uncertainty regarding the future outcome is smaller.
  7. The probability distribution reveals both uncertainty and its opposite, information.
  8. two types of mechanisms dictates the plausibility of events: Additive (act independently) & Multiplicative (link to the eariler year by a sequential)

2.2 Randomness & Variability

"Random": states/ occurences cannot be predicted exactly.

  1. two causes limit the randomness of attributes:  Span or range is non-infinite; The tendency for states to be more frequent, more plausible, the closer they are to a "central value"
  2. the closer to a central value, the more likely a state is; the further away from the central value, the less likely a state is.
  3. Randomness, which seems to indicate vagueness leads in fact to strict conclusions about what is expectable and what is not.

"Variety": denotes its number of states (or "occurrences")

  1. Variety is not the same as randomness. Attributes may vary but not be random. What characterizes randomness is unpredictability, not variety.
  2. measure variability depends on the type of attribute.
  3. Nominal attributes are variable according to the number of states they admit.
  4. In orders, variability is measured using sort and counts. This is "non-parametric" comparison. Percentiles and median values can compare the variability of ordinal attributes.
  5. When a variable is continuous, there are two practical ways to measure variability. First, span, the range defined by two boundaries, the lower and higher, beyond which it is unlikely to find objects. Second, central value.
  6. Variability around a central value is measured as the "SSQ", the sum of all squared differences observed with respect to the mean: SSQ = sum(xj-mean)^2.
  7. SSQ, produces more variability for many objects and less variability for few objects. It cannot compare variability directly.

"distribution": counts the number of objects by state.

  1. Distribution is a collection of the frequencies either observed or expected for each state. IF frequencies are expected, the distribution is called a "probability distribution"; IF frequencies are observed, the distribution is called a "frequency distribution".
  2. for nominal attribute, each state usually is a class for counting; the position of classes is not important nor is the cumulative frequency.
  3. in ordinal attribute, position and cumulative frequency becomes meaingful.
  4. in scale attribute, class division is arbitrary since possible states are endless.
  5. less than 100 objects do not allow more than 5 classes. Some 500 objects allow using 9 or 10 classes - at least 20 objects per class always.
  6. Histogram may show absolute frequencies, or relative frequencies(percentage).
  7. Mode and Median are useful in ordinal distributions; In the scale, mean values are generally perferred.

2.3 Degree of freedon

"df": the number of objects less the number of restrictions to the variability of those objects.

  1. "Variance" or MSQ(mean squares) is SSQ divided by degrees of freedom. It is an average SSQ. Varience = SSQ / df = sum(xj-mean)^2 / n-1
  2. "Standard deviation" is the square root of variance. The advantage is to expresses the same measurement units.

2.4 Quantiles

"quantile": is each of any set of values, which divide a frequency distribution into groups containing the same fraction of the total number of objects.

  • The first decile is the value encompassing the initial 10% of cases.
  • The first quintile is the value encompassing the initial 20% of cases.
  • The first quartile is the value encompassing the initial 25% of cases.
  • The median is the value encompassing 50% of cases.
  1. quantiles clearly tell which values of an attribute are closer to the limits of likehood.
  2. 5th decile (or the 50th percentile) is the median.
  3. the median and specific quantiles are a good substitute for the mean and the variance(central value and variability) when the frequency distribution of an attribute is far from normal.
  4. A median value is as likely to occur as not to occur, which is the value about which a priori information is nil.

2.5 Questions

  • An attribute is random when it is impossible to anticipate its states exactly.
  • Flat distribution, means uncertainty maximum & information minimum(show no information).
  • A distribution with all classes empty except one shows maximum information. (There is only one class possible, the other classes are impossible to happen.)
  • The opposite of variety is constancy and constant attributes are useless. (constant bring nothing new.)
  • When attributes do not fit into any parametric distribution, their distribution is used as a measure of variability and span.(When it is required to describe variability in detail, not in the form of a parameter, the whole distribution is used as the variability measure. eg: Bank risk, for example, is a distribution. Distributions tell you everything about frequencies because they are the collection of all possible frequencies.)
  • What characterizes randomness is unpredictability, not variety.
  • A distribution is a picture of variety.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值