摘要Summaries--课时四(Lesson 4)

  • New and Exciting Content
  • Why Hugging Face transformer
    • Will we in this lecture fine-tune a pretrained NLP model with HF rather than fastai library?
    • Why use transformer rather than fastai library?
    • Is Jeremy in the process of integrating transformer into fastai library?
    • Does transformer has the same layered architecture of fastai? Is it high level enough?
    • Why it is a good thing to use a reasonably high level library (not as high as fastai)?
  • Understand Fine-tuning
    • Do we have the foundations to understand the details of fine-tuning now?
    • How to understand pretrained model in terms of parameters confidence? 03:51
    • Is fine-tuning trying to increase on the parameters which are not confident?
  • ULMFiT: the first fine-tuned NLP model
    • Where this model was first developed and taught?
    • Who wrote the paper?
    • What’s its impact?
  • ULMFiT step 1: a language model from scratch
    • What is the first language model in step one?
    • What’s the model trying to predict? What’s the dataset?
    • Why is this task so difficult? 06:10
    • How much knowledge does the model have to understand in order to predict?
    • How well can this first model predict in step one?
  • Step 2: fine-tuned the first model on IMDB
    • How did Jeremy build the second language model?
    • Where did the second model start with? What was the dataset for the second model?
    • What was the second model good at predicting?
  • Step 3: turn a language model to classify
  • Labels of language models
    • What are the labels for the datasets of the first two models?
  • Transformer models vs ULMFiT
    • When did the transformers first appear?
    • What’s transformers models are built to take advantage of?
    • What is not transformers trying to predict? (reason in part 2)
    • How transformers modified its dataset and what does it predict? 09:41
    • Does ULMFiT and Transformers really differ much on what to predict?
    • How much different are the 3 steps between ULMFiT and Transformers?
  • What a model knows
    • What can lower and higher layers of parameters/weights learn? 11:08
    • What we do to those layers of weights for transfer learning? 13:20
    • Zeiler and Fergus paper
  • NLP beginner on Kaggle competition
    • Using a Kaggle competition to introduce NLP for beginners, isn’t it amazing!
    • Why we should take Kaggle competition more seriously? 15:06
    • What real world tasks can NLP classification do? 15:57
  • Examine the competition dataset
    • What is inside the competition dataset?
    • How classificationish does the dataset look like?
    • What do we predict about ‘anchor’ and ‘target’?
    • What value to predict?
    • Why it is not really a straightforward classification?
    • What is the use of ‘context’?
  • Model Strategy
    • How to modify the dataset in order to turn a similarity problem into a classification problem?
    • Should we always try to solve a problem by turning it into a problem we are familiar with?
  • Get notebook ready
    • When and how to use a GPU on Kaggle?
    • Why Jeremy recommend Paperspace over Kaggle as your workstation?
    • How easy has Jeremy made it to download Kaggle dataset and work on Paperspace or locally?
    • How to do both python and bash in the same jupyter cell?
  • Get raw dataset into documents
    • How to check what inside the dataset folder?
    • Why it is important to read Competition data introduction which is often overlooked?
    • How to read a csv file with pandas? 24:30
    • What are the key four libraries for data science in python? 24:46
    • What is the other book besides fastbook recommended by Jeremy? 25:36
    • Why you must read it too?
    • How to access and show the dataset in dataframe? 26:39
    • How to describe the dataset? What does it tell us in general? 27:10
    • What did the number of unique data samples mean to Jeremy at first? 27:57
    • How to create a single string based on the model strategy? 28:26
    • How to refer to a column of a dataframe in reading and writing a column data?
  • Tokenization: Intro
    • How to turn strings/documents into numbers for neuralnet?
    • Do we split the string into words first?
    • What’s the problem with the Chinese language on words?
    • What are vocabularies compared with splitted words?
    • What to do with the vocabulary?
    • Why we want the vocabulary to be concise not too big?
    • What nowadays people prefer rather than words to be included in vocab?
  • Subwords tokenization by Transformer
    • How to turn our dataframe into Hugging Face Dataset?
    • What does HF Dataset look like?
    • What is tokenization? What does it do?
    • Why should we choose a pretrained model before tokenization?
    • Why must we use the model’s vocab instead of making our own?
    • How similar is HF model hub to TIMM? 33:10
    • What Jeremy’s advice on how to use HF model hub?
    • Are there some models generally good for most of practical problems? 34:17
    • When did NLP models start to be actually very useful? 34:35
    • Why we don’t know much about those models which potentially are good for most of things?
    • Why should we choose a small model to start with?
    • How to get the tokens, vocabs and related info of the pretrained model? 36:04
    • How to tokenize a sentence by the model’s style?
    • After a document is splitted into a list of vocab, do we turn the list of vocab into a list of numbers? Numericalization 38:30
    • Can you get a sense of what subword vs word is from the examples of tokenization
    • How to tokenize all the documents with parallel computing? 38:50
    • Given the input column is the document, what’s inside the input_id column?
  • Special treatment to build input?
    • Do we need to follow some special treatment when building a document or an input from dataset?
    • What about when the document is very long?
  • Start ULMFiT on large documents
    • What ULMFiT is best at doing?
    • Why ULMFiT can work on large documents fast and without that much GPU?
    • How large is large for a document?
  • Some obscure documentations of Transformer library
  • The most important idea in ML
    • Is it the idea of having separate training, testing, validation datasets?
  • Underfitting vs Overfitting
    • How to create a function to plot a polynomial function with a degree variable?
    • What are 1st/2nd/3rd degree polynomial?
    • What does Jeremy think of sklearn? When to use it? 47:37
    • What is underfitting? Why a too-simple model is a problem or is systematically biased? 48:12
    • What is overfitting? What does an overfit look like? 48:58
    • What is the cause of overfitting?
    • It is easy to spot underfitting, but how to filter an overfitting from the function we want?
  • Validation: avoid overfitting on training set
    • How to get a validation dataset and use it?
    • Why you need to be careful when use other libraries other than fastai?
  • How and Why to create a good validation set
    • Did you know simply random 20% of dataset as a validation set is not good enough?
    • For example, shouldn’t you select validation dataset so that your model can predict the future rather than the past?
    • Why is Kaggle competition a great and real-world way to appreciate using validation set to avoid overfitting?
    • How validation set can help avoid overfitting in 2 Kaggle competition on real world problems? 54:44
    • Watch out when touching cross-validation 56:03
    • Why should you be careful when simply using library-ready tools of selecting validation set randomly?
    • Validation post by Rachel
  • Test set: avoid overfitting on validation set
    • What is a test set for?
    • Why need it when we have a validation set?
    • When or how can you overfit on a validation set? or
    • Why is validation set not enough to overcome model overfitting?
    • Why Kaggle prepares two test sets? or
    • Why Kaggle thinks that two test sets are enough to filter overfitting models
  • Metrics functions vs Loss functions
    • How we use validation set to check on the performance of model?
    • Will Kaggle competition choose the metrics for you?
    • Should the metrics be our loss function?
    • What kind of functions you should use as loss function? (bumpy vs smooth)
    • So, always be aware: the loss your model tries to beat may not be the same function to rate your model
    • Why one metric is always not enough and can cause much problem?
  • Metrics: you can’t feel it from math
    • What is Pearson correlation (r) and how to interpret it?
    • Which can teach you how r behave, its math function or its performance on datasets?
    • Should we a plot with a 1000 random data point or a plot with the entire a million data points?
    • How to get correlation coefficient for every variable to every other variable? 1:06:27
    • How to read the correlation coefficient matrix?
    • How to get a single correlation coefficient between two things?
    • How to tell how good is a correlation coefficient number? 1:07:45
    • What are the things to spot? (tendency line, variation around the line, outliers)
    • How to create transparency on the plot?
    • How can we tell from another example that r is very sensitive to outliers? 1:09:47
    • How much can removing or mess up a few outliers really affect your scores on r? or
    • Why do you have to be careful with every row of data when dealing with r?
    • Can we know how good is r = 0.34 or r = -0.2 without a plot?
    • Don’t forget to get the data format right for HF
  • HF train-validation split
    • How to do the random split with HF?
    • Will Jeremy talk about proper split in another notebook?
  • Training a model
    • What to use for training a model in HF?
    • What is batch and batch size?
    • How large should a batch size be?
    • How to find a good learning rate? (details in a future lecture)
    • Where to prepare all the training arguments in HF library?
    • Which type of tasks do we use for picking the model?
    • How to create the learner or trainer after model?
    • How to train?
    • Why is the result on metrics so good right from the first epoch?
  • Dealing with outliers
    • Should we get a second analysis for the outliers rather than simply removing them?
    • What outliers really are in the real world?
    • Doesn’t outliers usually tell us a lot of surprisingly useful info than in the limiting statistical sense?
    • What is Jeremy’s advice on outliers?
  • Predict and submit
    • How to do prediction with HF?
    • Should we always check the prediction output as well as the test set input?
    • What is the common problem with the output? (proper solution may be in the next lecture)
    • What is the easy solution?
    • How to submit your answer to Kaggle?
  • Huge opportunities in research and business
  • Misuses of NLP
    • Can NLP chatbots can create 99% of online chat which almost non-distinguishable from real humans?
    • Can GTP-3 create even longer and more sophisticated prose which is even more human-like?
    • How machined generated public opinions can influence public policies or laws?
  • Issues on num_labels in HF library
  • 25
    点赞
  • 18
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值