消息来源:http://www.comap.com/undergraduate/contests/index.html
美国大学生数学建模竞赛(MCM/ICM)主办单位 COMAP发布了一份名为“MCM Problem C Overview”的文档,就MCM的C题进行了说明。下面简单介绍一下该文档的主要内容,希望对备战2017 MCM/ICM的朋友有所帮助。
C题是MCM于2016年新增设的题目,被称为Data Insights类问题,关注与数据有关的数学模型。因此,与之前的MCM赛题相比,统计、模式识别等领域的模型可能用的更多。
C题是与数据有关的实际问题,建模的时候可能会遇到各种困难,如数据集较大(但还不是大数据级别),混合的数据类型,数据缺失等。但C题不是大数据(big data)问题,不需要参赛队掌握特殊的计算机科学知识,如数据处理的基本算法、分析技巧,或是访问高性能计算平台等。
- 题目的数据是可以公开访问的。
- 虽然不是大数据问题,但是压缩后的数据文件可能会超过100MB,这比往年MCM赛题的数据要大。选题时要考虑是否有足够的实力处理这么大的数据集。顺便说一下,经常有人问竞赛的时候找不到数据怎么办,真正去年C题给了数据,又说处理不了。
- 压缩文件中除了数据库文件,可能还会有字典,映射文件,或者代码,用以建立标签。
- 将以多种格式提供数据文件,如SAS、SPSS、STATA和CSV。
- 可以使用软件,如Statistic, JMP, SAS, SPSS, Excel, R, Matlab等,但不要求必须使用某种特定的软件。如果竞赛中使用了特殊的软件或者代码,要了解其背后的数学原理。
- 竞赛只需要提交论文,不需要提交数据库文件。
对C题感兴趣的朋友可以查看原始文档
附:原始文档内容:
The 2016 MCM introduces a new modeling challenge – Problem C - that is best described as Data Insights. Problem C is intended to focus on and amplify specific elements of mathematical modeling challenges associated with data. In this sense, techniques stemming from statistics and pattern classification will play a larger role in creating a mathematical model on this problem than in previous contests.
While not a ‘big data’ challenge in the sense of teams needing to develop specialized computer science-based data handling algorithms and analysis techniques or have access to high performance computing platforms, the problem will provide teams with an opportunity to encounter real-world, challenging data that have interesting characteristics. Naturally occurring complicating factors such as data set size (but not big data), blend of data types, breadth of representation in data elements, cross-discipline sources, time series dependencies, censored or missing data, and others could present themselves depending on the specifics of the modeling problem.
MCM Problem C: Data Insights
Teams will be given access to database files that will be made available from a public website.
The database files will be compressed for size but the file size could still be 100mbs or more and teams should take this into consideration prior to choosing Problem C.
Each zipped file may include the database files along with the data dictionary, data mapping file, and program code to create value labels.
The database will be made available in multiple formats SAS, SPSS, STATA and CSV.
Software such as Statistica, JMP, SAS, SPSS, Excel, R, Matlab or other applications may be used to aid in your solution but no one particular piece of software is endorsed or required. If specialized software or custom code is used to support the contest effort, teams should take care to clearly communicate an understanding of the mathematics and assumptions applied via tools and algorithms in the software.
When submitting your final electronic solution you are NOT required to submit back the database file or any data for that matter. The only thing that should be submitted is your electronic (word or PDF) solution.