Accepted Papers
- Tracking Climate Change Opinions from Twitter Data. Xiaoran An (Northeastern University), Auroop Ganguly, Yi Fang (Santa Clara University), Steven Scyphers (Northeastern University), Ann Hunter (Northeastern University), Jennifer Dy (Northeastern University).
- 研究问题:用Twitter(微博)语料来补足对人们关于气候变化认知调查的不足,即从Twitter语料中发现人们对气候变化的认知情况。
- 主要方法:文本挖掘,层次化情感分析,时间序列方法
- 采用的信息:有关气候的微博原文(不包括转发,因为假定不能从转发中看出情感倾向)
- Probabilistic Soft Logic for Social Good. Stephen Bach (University of Maryland), Bert Huang, Lise Getoor (UCSC)
- 大意:PSL是一种语言,是提供概率判断的一种工具
- Calling for Better Measurement: Estimating an Individual's Wealth and Well-Being from Mobile Phone Transaction Records. Joshua Blumenstock (University of Washington)
- 无论文
- Harnessing Craigslist Personal Ads to Inform Federal HIV Prevention Funding. Varoon Bashyakarla (Yale University), Claire Kelley, Allen Lin (Harvard University)
- 研究问题:通过研究网站上同性恋人发布广告的比例来判断该区域HIV防治的有效性
- 主要方法:文本挖掘
- 采用的信息:广告唯一标识id,广告来源url,ad的城市、state,poster的年龄,用户location,ad中有无照片,title,内容,date,更新时间
- Data Analytics for Energy Efficiency: Opportunities and Challenges. Shyam Boriah (FirstFuel Software), Badri Raghavan.
- 主要论点:数据科学将给能源效率带来的机遇与挑战,以及在这方面data science可以做的工作。
- A Case Study on Fundraising Analytics at Memorial Sloan Kettering Cancer Center. Kathryn Chamberlin (Memorial Sloan Kettering)
- 无论文
- Data Science and the Policy Completion Problem. Sanjay Chawla (University of Sydney), Federico Girosi (University of Western Sydney), Fei Wang (University of Sydney).
- 主要问题:探究data science可以对政策分析做的工作;从训练数据到测试数据的迁移可以近似政策分析;特别的,矩阵分解可以成为预测和评价新政策影响的有力工具。
- 假设:新政策会改变数据生成模型
- 主要方法:
- 采用的信息:health data
- An Approach to Analyze Web Privacy Policy Documents. Parvathi Chundi (University of Nebraska); Pranav Subramaniam (Millard North High School, Omaha, NE)
- 主要问题:运用文本挖掘的方法帮助人们理解政策文件
- 主要方法:结合LDA和完整的链接聚类
- 文档数据集:policies at 46 websites
- Topic Modeling Official Secrecy. Matthew Connelly (Columbia), Thomas Nyberg (Columbia), Daniel Krasner (Kfit), David Allen (Columbia), Ian Langmore (Google)
- Experimentation Standards for Crisis Informatics. Fernando Diaz (Microsoft)
- 论点:呼吁人们建立一个用于危机反应(应该类似于心理咨询什么的吧)的标准实验集
- Real-time Topic Models for Crisis Counseling. Karthik Dinakar (MIT), Allison Chaney (Princeton University), Henry Lieberman (MIT), David Blei (Princeton University).
- 主要问题:用主题模型处理人们线上危机咨询的对话,来帮助咨询师更好地处理咨询者的问题
- 主要方法:主题模型,图模型
- 采用的信息:线上对话
- Evaluating Unlabeled Spatio-Temporal Patterns: A Global Ocean Eddy Monitoring Application. James Faghmous (University of Minnesota), Hung Nguyen (University of Minnesota), Snigdhansu Chaterjee (University of Minnesota), Vipin Kumar (University of Minnesota).
- 主要问题:从时空场识别物体(海洋漩涡)
- 主要方法:在没有训练数据的情况下,以经验的、客观的标准来评价特征的质量,平衡假阴性和假阳性错误概率
- 采用的数据:卫星连续传回的数据
- Crowdsourcing Land Use Maps via Twitter. Vanessa Frias-Martinez (University of Maryland), Enrique Frias-Martinez (Telefonica Research).
- 主要问题:利用带有地点信息的Twitter来帮助城市规划,来建议土地的功能。
- 主要方法:使用无监督的学习,对地理区域根据相似的微博活动模式进行聚类,识别土地的功能。
- Smarter Crisis Crowdsourcing. Kayla Jacobs (Technion), Kwang-Sung Jun (University of Wisconsin), Nathan Leiby (Clever), Elena Eneva (Accenture)
- 主要问题:帮助审稿人更高效地查重、识别报告的语言、类别、地点、个性信息等。
- Follow the $$$: Networks and Flows of Disaster Recovery Funding. Rupinder Paul Khandpur (Virginia Tech), Naren Ramakrishnan (Virginia Tech), James Bohland (Virginia Tech)
- 主要问题:发现灾后重建资金流向
- Exploring Clinical Care Processes Using Visual and Data Analytics: Challenges and Opportunities. Vikas Kumar (Georgia Tech), Hyunwoo Park, Rahul Basole, Mark Braunstein, Minsuk Kahng, Duen Horng Chau, Daniel Hirsh, Nicoleta Serban, James Bost, Burton Lesnick, Beth Schissel, Acar Tamersoy, Michael Thompson.
- 主要问题:定义了用数据挖掘来解决医学问题的困难之处
- 主要数据:小儿哮喘的数据
- A Graph-based Approach to Clustering and Matching Providers Across Disparate Healthcare Datasets. Sangkeun Lee (Oak Ridge National Laboratory).
- 主要问题:建立实体之间的关系、连接
- 主要方法:基于路径聚类算法
- An interpretable model for stroke prediction using rules and Bayesian analysis. Ben Letham (MIT), Cynthia Rudin (MIT), Tyler McCormick (University of Washington), David Madigan (Columbia University).
- 主要问题:建立一个预测中风模型,并且具有可解释性
- 主要方法:规则+贝叶斯
- Synthesis of Nutrition and Price Data Across Supermarket Retailers. Christian Pecaut (Harris School of Public Policy, University of Chicago).
- RFID Based Biometrics Analysis Over Cloud. Faisal Rahman (UTD Data Mining Lab), Mohiuddin Solaimani (University of Texas at Dallas), Latifur Khan (University of Texas at Dallas)
- 主要问题:用Spark及大数据技术为病人的基本特点建立模型,当病人生物统计特征偏离时提出恰当的治疗方案
- Data-driven Modeling in the Social Sciences - A pragmatic approach for policy-makers. Shyam Ranganathan (Uppsala University), Viktoria Spaiser (Institute for Futures Studies, Sweden), Stamatios Nicolis (Uppsala University) Ranjula Bali Swain (Uppsala University), David Sumpter (Uppsala University).
- 主要问题:建议从宏观上分析数据,以帮助决策者,而不是看个人
- Readmissions Score as a Service(RaaS). Vivek Rao, Kiyana Zolfaghar, David Hazel, Vani Mandava, Senjuti Basu Roy, Ankur Teredesai.
- 主要问题:介绍RaaS,一个医疗风险计算的服务器,会计算病情复发的概率等
- Finding Patterns with a Rotten Core: Data Mining for Crime Series with Core Sets. Cynthia Rudin (MIT), Tong Wang (MIT), Daniel Wagner (Cambridge Police Department), Rich Seveiri (Cambridge Police Department).
- 主要问题:找出连环犯罪的模式
- 主要方法:图模型
- Data Science for Public Policy: Of the people, for the people, by the people 2.0 ?. Sreenivas Sukumar (ORNL)
- 主要问题:构建public policy的生命周期(话题演进什么的)
- Machine learning in the Big Data era: Are we there yet?. Sreenivas Sukumar (ORNL)
- 主要问题:大数据与机器学习
- Efficient and Tailor-made Anonymization for Relational and Transactional Medical Records. Tsubasa Takahashi (NEC Corporation), Koji Sobataka, Takuya Mori.
- 主要问题:有效处理医疗记录(大数据量)
- Big Data for Positive Social Change: An LMIC Perspective. Linnet Taylor (Institute for Social Science Research, University of Amsterdam), Josh Cowls (Oxford Internet Institute), Ralph Schroeder (Oxford Internet Institute).
- 无论文
- Using Large-scale Open Source Data to Identify Potential Forced Migration. Yifang Wei, Abbie Taylor, Nili Yossinger, Eleanor Swingewood, Dennis Quinn, Susan Martin, Susan McGrath, Jeff Collmann, Sidney Berkowitz, Lisa Singh (Georgetown University).
- 主要问题:从大数据中发现伊拉克强制移民的内容
- Hyper Sequence Pattern Mining on ADLs. Xinran Yu (UTSA).
- 主要问题:从人们的日常生活活动数据中识别个人的行为。序列模式sequence pattern
- 主要方法:图模型
总结:本次KDD大会主题是Data Science for Social Good,比起理论研究的深入,大多数论文集中研究于社会应用,例如医疗方面的应用是大热的话题。对于我在研究中可以借鉴的文章,或者说可以再读一遍的文章,主要有《Tracking Climate Change Opinions from Twitter Data》,《Data Science and the Policy Completion Problem》,《An Approach to Analyze Web Privacy Policy Documents》及《Real-time Topic Models for Crisis Counseling》四篇。