Content
Crowdsourcing
Outsourcing some tasks to a crowd -> Crowdsourcing
Improve the quality, timeliness and breadth of data
将一些任务外包给人群 -> Crowdsourcing
提高数据的质量、及时性和广度
Key questions:
-
What computational problems can/should be solved?
Data augmenting, Data processing -
What are the programming paradigms/platforms?
A programming paradigm is the classification, style or way of programming. It is an approach to solve problems by using programming languages. -
How do we guarantee that the solution is accurate, efficient and economical?
Quality, cost and latency -
How do we motivate participation and leverages their unique expertise and interests of workers?
-
How do we leverage the joint efforts of both automated and
human computers as workers?
3 central aspects of crowdsourcing
- What
- What tasks can be performed by machines
- Decompose the macro and micro tasks
- Who
- Expertise of workers (如何模拟工人的专业知识)
- Manage cultural aspects and language barrier
- How
- How to design and execute tasks
- Aggregate noisy & complex output ( defines how intelligent aggregation techniques should be, such as Hierarchical—cluster-based aggregation) 聚合嘈杂和复杂的输出(定义智能聚合技术应该如何,例如分层 - 基于集群的聚合)
Overall process
Process
- 使用Parallel安排worker
- Operations & Control: 多产线并行,成本高
- Cost vs latency:cost high, low latency 成本高,延迟小
- 使用sequential安排worker
- Operations & Control: 一个接一个
- Cost vs latency:延迟高,需要等上一个工人的结果,但如果计划分配三名工人,如果他们中的两个同意结果,那么不需要执行另一个 HIT,节约成本
- Operations & Control:
- Repetition
You repeat the tasks until you are satisfied
重复任务直到满意 - Selection
You retrieve tasks using selection mechanisms
使用选择机制检索任务
- Repetition
Aggregating output
Challenges
- Outputs are noisy (lack of expertise)
- Humans are not always reliable (cheating)
- Cultural context may bias the answers
Goal
- Automatic procedure to merge HIT results
Assumptions
- There exists a “true” answer
- Redundancy helps
挑战
- 输出嘈杂(缺乏专业知识)
- 人类并不总是可靠的(作弊)
- 文化背景可能会影响答案
目标
- 自动合并 HIT 结果的程序
假设
- 存在一个“真实”的答案
- 冗余有帮助
Latent Class models
crowdsourcing