Session Format
I’ll teach core concepts in 5-10 minute chunks using group activities, discussion and lecture. We’ll then immediately put the new concepts into practice through coding together as a class. There will also be independent exercises available if students progress quickly during the session or to complete later after the session is over.
OutlineIntroduction to Natural Language Processing (NLP) with Python Representing Text in Vector Form:
- Represent text using term frequency matrices for Machine Learning applications.
- Explore the varieties and available parameters for term frequency matrices and discuss their strengths, weaknesses and use cases.
- Briefly touch on other ways of representing text that are used in Machine Learning
Modeling:
- Use vectorized text for classification (structured learning) when classes are known.
- Use clustering (unstructured learning) to find patterns in a corpus when classes are unknown (unstructured learning).
Evaluation:
- Evaluate results of our classification and clustering models by using accuracy metrics, confusion matrices and visualizations.