数据科学AWS实践1-AutoML|Analyze Datasets and Train ML models using AutoML

Week 1

1 Specialization overview

请添加图片描述

2 Working with Data

2.1 Data ingestion and exploration

AWS S3

请添加图片描述

  • Data lakes are often built on top of object storage, such as AWS S3.

请添加图片描述

  • With object storage, data is stored and managed as objects, which consists of the data itself, any relevant metadata, such as when the object was last modified, and a unique identifier.
  • Object storage is particularly helpful for storing and retrieving growing amounts of data of any type, hence it’s the perfect foundation for data lakes.

AWS Data Wrangler

请添加图片描述

AWS Glue

请添加图片描述

请添加图片描述

AWS Athena

请添加图片描述

  • Athena is serverless

请添加图片描述

  • Athena is based on Presto, an open source distributed SQL engine, developed for this exact use case, running interactive queries against data sources of all sizes.

2.2 Data visualization

请添加图片描述

Week 2

1 Statistical bias

1.1 Statistical bias

请添加图片描述

1.2 Statistical bias causes

请添加图片描述

  • Societal: This is societal bias. These biases could be introduced because of preconceived notions that exist in society. Data generated by humans can be biased because all of us have unconscious bias.
  • Data drift (data shift): Data drift happens, especially when the data distribution significantly varies from the distribution of the training data that was used to initially train the model.
    • Covariant drift: The distribution of the independent variables or features that make up your dataset can change.
    • Prior probability drift: The distribution of your labels or the targeted variables might change.
    • Concept drift: The relationship between the features and the labels can change.

1.3 Measuring statistical bias

请添加图片描述

请添加图片描述

1.4 Detecting statistical bias

请添加图片描述

1.5 Detect statistical bias with Amazon SageMaker Clarify

请添加图片描述
请添加图片描述

请添加图片描述

请添加图片描述

请添加图片描述

请添加图片描述

1.6 Approaches to statistical bias detection

请添加图片描述

  • Sagemaker Data Wrangle:
    • Connect to multiple data scources abd explore data in more visual format.
    • Only use a subset of your data to detect bias
  • Sagemaker Clarify:
    • Large volumes of data

1.7 Feature importance: SHAP

请添加图片描述

请添加图片描述

Week 3

1 Automated Machine Learning

1.1 Automated Machine Learning (AutoML)

请添加图片描述

1.2 AutoML Workflow

Ingest & Analyze

请添加图片描述

请添加图片描述

请添加图片描述

请添加图片描述

请添加图片描述

请添加图片描述

请添加图片描述

请添加图片描述

请添加图片描述

1.3 Amazon SageMaker Autopilot

请添加图片描述

请添加图片描述

请添加图片描述

请添加图片描述

1.4 Running experiments with Amazon SageMaker Autopilot

请添加图片描述

请添加图片描述

请添加图片描述

1.5 Amazon SageMaker Autopilot: evaluating output

请添加图片描述

请添加图片描述

1.6 Model Hosting

请添加图片描述

请添加图片描述

请添加图片描述

Week 4

1 Build in algorithm

1.1 Build in algorithm

请添加图片描述

请添加图片描述

1.2 Use cases and algorithms

请添加图片描述

请添加图片描述请添加图片描述

请添加图片描述

1.3 Text analysis

请添加图片描述

请添加图片描述

  • One challenge with Word2Vec: Out of vocabulary issues
    • Its vocabulary only contains three million words. The vocabulary is a set if words that the model learned in the training phase. Out of vocabulary words are words that were not present in the text data set the model was initially trained on. If the word is not found in its vocabulary, the model architecture assigns a zero to that words which is basically discarding the word.

请添加图片描述

请添加图片描述

1.4 Train a text classifier

请添加图片描述

1.5 Deploy the text classifier

请添加图片描述

HW Answer

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值