端到端的机器学习项目
主要步骤:
- Frame the problem and look at the big picture
- Get the data
- Explore the data to gain insight
- Prepare the data to better expose the underlying data patterns to Machine Learning algorithms
- Explore many different models and short-list the best ones
- Fine-tune your models and combine them into a great solution
- Present your solution
- Launch, monitor, and maintarin your system
1. Frame the Problem and Look at the Big Picture
- Define the object in bussiness terms
- How will your solution be used?
- What are the current solutions/workarounds(if any?)
- How should you frame this problem (supervised/unsupervised, online/offline, etc.)?
- How should performance be measured?
- Is the performance measure alighed with bussiness objective?
- What would be the minimum performance needed to reach the bussiness objective?
- What are comparable problem? Can you reuse experience or tools?
- Is human expertise available?
- How would you solve the problem manually?
- List the assumptions you (or others) have made so far?
- Verify assumptions if possible
2. Get the Data
Note: automate as much as possible so you can easily get flesh data.
- List the data you need and how much you need
- Find and document where you can get that data
- Check how much space it will take
- Check legal obligations, and get authorizati