Overview of text processing tasks
- Finding Parts of Text–>split/tokenization
- Finding Sentences–>Sentence Boundary Disambiguation (SBD)
- Finding People and Things–>Name Entity Recognition
- Detecting Parts of Speech–>POS Tagging
- Classification(with label)/Clustering(without label)
- Extracting Relationships–>IR
- Combined Approaches
Split/Tokenization-->Sentence(SBD)-->NER-->POS-->Classification/Cluster-->IR
Understanding NLP models
The basic steps include:
- Identifying the task
- Select a model
- Understanding the problem domain and
the required quality of results permits us to select the appropriate model
- Understanding the problem domain and
- Building and trainning the model
- Training a model is the process of executing an algorithm against a set of data, formulating the model, and then verifying the model
- labeled samples or dataset is called a corpus
- Verifying the model
- split sample and test sets
- Often, only part of a corpus is used for training
while the other part is used for verification
- Using the model
Preparing Data
This includes data for training purposes and the data that needs to be processed.