Outlier Detection with DPM Slides from JSM 2011

(This article was first published on  BioStatMatt » R, and kindly contributed to R-bloggers)     

Here are the 14 slides I used during my talk at the Joint Statistical Meetings 2011: shotwell-jsm-2011.pdf. I'm trying hard to minimize the text in my presentation slides. But, this usually requires that I practice more. Hence, you will know which talks I have practiced thoroughly by the amount of text in the slides :)  . Below are a few notes to accompany the slides (in numerical order):

  1. This is the title slide. The work presented was with my advisor Elizabeth Slate, and was recently accepted to appear in the journal Bayesian Analysis.
  2. Dirichlet Process Mixture (DPM). This slide presents hierarchical notation for the DPM, and illustrates how (implicit) clustering occurs among draws from DP-distributed distributions.
  3. Product Partition Model (PPM). This is the PPM representation of the DPM in the previous slide. The partition parameter 'z' makes the data partition explicit. I think this model is easier to describe and understand than the DPM representation on the previous slide. Note that PPMs are a much larger class of models than DPMs. Only when the prior distribution over 'z' takes the form of the expression given in the slide, does the PPM represent a DPM.
  4. Outlier Detection Using Partitioning. When we do clustering, we can think of 'small' clusters as outlying, relative to other clusters. The trick is to decide what 'small' means. The '1% of n' rule prescribes that clusters are considered small then they consist of less than or equal to 1% of the total number of observations.
  5. Quantifying Evidence to Detect Outliers: Questions. Partition estimation, or clustering, isn't enough to make inferences about outliers. These are some key unanswered questions.
  6. A Criterion for Outlier Detection: Setup. These are some candidate partitions. The first consists of three clusters, where clusters 2 and 3 consist of just one observations apiece. The remaining four candidate partitions are formed by merging one or both of clusters 2 and 3 with cluster 1, or with one another. The key point here, is that outlier detection may be cast as a decision between the first candidate (the 'outlier partition') and the remaining four candidate partitions.
  7. A Criterion for Outlier Detection: The Trick. This slide illustrates how, under the decision principle of largest posterior mass (yes, yes, zero-one loss), the fixed-precision DPM imposes a lower bound on the Bayes factor favoring the outlier partition versus any partition formed by merging one or more outlier clusters. The inverse DPM precision parameter is then interpreted as the fold increase in said Bayes factor, required for each detected outlier.
  8. A Criterion for Outlier Detection: How to Fix α. Since the inverse precision parameter forms a lower bound on a Bayes factor, it's natural to consider an established scale of evidence for Bayes factors.
  9. A Criterion for Outlier Detection: Nice Properties. This slide is self-explanatory.
  10. Microarray Time Series in Cell Cycle Synchronized Yeast. The grey lines in this figure represent 297 yeast RNA microarray probes, monitored over a 120 minute time-series. These probes were determined by the original author (Spellman et al., 1998) to be regulated in the yeast cell cycle, because of their periodic expression. Our goal was to identify the outlier probes (if any) in these data. That is, each grey line is a potential outlier. Though I didn't mention this in the talk, the likelihood for these data was a normal linear model, where the time covariate is transformed onto a collection of periodic and non-linear basis functions, in order to capture periodic and non-linear expression.
  11. Microarray Time Series in Cell Cycle Synchronized Yeast. For DPM precision fixed at 1/150, this figure represents the maximum a posteriori (MAP) data partition estimate. Using the '1% of n' rule, any cluster with fewer than four observations is considered outlying. Consider, for example, the collection of partitions that might result from merging the upper rightmost cluster with one of the other clusters. By fixing the precision parameter to 1/150, we have ensured that the Bayes factor favoring the MAP partition estimate versus any such partition is at least 150. Hence, there is 'very strong' evidence that this cluster is outlying.
  12. MAP Estimation for 'z'. We considered several existing methods, and proposed a new method that is free of posterior sampling. More details will be available in the forthcoming Bayesian Analysis article. The R package profdpm implements each method.
  13. Outlier Detection with Finite Mixtures. This slide mentions the comparison between outlier detection with DPMs and finite mixtures in the Fraley and Raftery framework. The DPM method is a bit more conservative than the finite mixture method. Again, more details will be had in the article.
  14. This slide is a list of references used in the presentation.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
智慧校园信息化系统解决方案旨在通过先进的信息技术,实现教育的全方位创新和优质资源的普及共享。该方案依据国家和地方政策背景,如教育部《教育信息化“十三五”规划》和《教育信息化十年发展规划》,以信息技术的革命性影响为指导,推进教育信息化建设,实现教育思想和方法的创新。 技术发展为智慧校园建设提供了强有力的支撑。方案涵盖了互连互通、优质资源共享、宽带网络、移动APP、电子书包、电子教学白板、3D打印、VR虚拟教学等技术应用,以及大数据和云计算技术,提升了教学数据记录和分析水平。此外,教育资源公共服务平台、教育管理公共服务平台等平台建设,进一步提高了教学、管控的效率。 智慧校园系统由智慧教学、智慧管控和智慧办公三大部分组成,各自具有丰富的应用场景。智慧教学包括微课、公开课、精品课等教学资源的整合和共享,支持在线编辑、录播资源、教学分析等功能。智慧管控则通过平安校园、可视对讲、紧急求助、视频监控等手段,保障校园安全。智慧办公则利用远程视讯、无纸化会议、数字会议等技术,提高行政效率和会议质量。 教育录播系统作为智慧校园的重要组成部分,提供了一套满足学校和教育局需求的解决方案。它包括标准课室、微格课室、精品课室等,通过自动五机位方案、高保真音频采集、一键式录课等功能,实现了优质教学资源的录制和共享。此外,录播系统还包括互动教学、录播班班通、教育中控、校园广播等应用,促进了教育资源的均衡化发展。 智慧办公的另一重点是无纸化会议和数字会议系统的建设,它们通过高效的文件管理、会议文件保密处理、本地会议的音频传输和摄像跟踪等功能,实现了会议的高效化和集中管控。这些系统不仅提高了会议的效率和质量,还通过一键管控、无线管控等设计,简化了操作流程,使得会议更加便捷和环保。 总之,智慧校园信息化系统解决方案通过整合先进的信息技术和教学资源,不仅提升了教育质量和管理效率,还为实现教育均衡化和资源共享提供了有力支持,推动了教育现代化的进程。
智慧校园信息化系统解决方案旨在通过先进的信息技术,实现教育的全方位创新和优质资源的普及共享。该方案依据国家和地方政策背景,如教育部《教育信息化“十三五”规划》和《教育信息化十年发展规划》,以信息技术的革命性影响为指导,推进教育信息化建设,实现教育思想和方法的创新。 技术发展为智慧校园建设提供了强有力的支撑。方案涵盖了互连互通、优质资源共享、宽带网络、移动APP、电子书包、电子教学白板、3D打印、VR虚拟教学等技术应用,以及大数据和云计算技术,提升了教学数据记录和分析水平。此外,教育资源公共服务平台、教育管理公共服务平台等平台建设,进一步提高了教学、管控的效率。 智慧校园系统由智慧教学、智慧管控和智慧办公三大部分组成,各自具有丰富的应用场景。智慧教学包括微课、公开课、精品课等教学资源的整合和共享,支持在线编辑、录播资源、教学分析等功能。智慧管控则通过平安校园、可视对讲、紧急求助、视频监控等手段,保障校园安全。智慧办公则利用远程视讯、无纸化会议、数字会议等技术,提高行政效率和会议质量。 教育录播系统作为智慧校园的重要组成部分,提供了一套满足学校和教育局需求的解决方案。它包括标准课室、微格课室、精品课室等,通过自动五机位方案、高保真音频采集、一键式录课等功能,实现了优质教学资源的录制和共享。此外,录播系统还包括互动教学、录播班班通、教育中控、校园广播等应用,促进了教育资源的均衡化发展。 智慧办公的另一重点是无纸化会议和数字会议系统的建设,它们通过高效的文件管理、会议文件保密处理、本地会议的音频传输和摄像跟踪等功能,实现了会议的高效化和集中管控。这些系统不仅提高了会议的效率和质量,还通过一键管控、无线管控等设计,简化了操作流程,使得会议更加便捷和环保。 总之,智慧校园信息化系统解决方案通过整合先进的信息技术和教学资源,不仅提升了教育质量和管理效率,还为实现教育均衡化和资源共享提供了有力支持,推动了教育现代化的进程。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值