Course Syllabus
NLP 243 – Machine Learning for Natural Language Processing
Winter 2020
Course Information
Lecture times: Mon & Wed, 5:20pm – 6:50pm
Virtual Classroom
Instructor Information
Dr. Dilek Hakkani-Tür
email: dhakkani @ ucsc [dot] edu
Office Hours: I’ll stay on the class meeting channel 30 min after each class for questions. You can also send me an email to get an appointment for other times.
Zoom Link for classes: https://zoom.us/j/95852116173
Teaching Assistant
Rishi Rajasekaran
Email: rrajasek @ ucsc [dot] edu
Office hours
- Time: Wednesdays, 2-3:30pm
- Zoom Link: https://ucsc.zoom.us/j/96938066796?pwd=dnVXZjVZM25iNlN4TVFKbVk5RS9PQT09
Sections
- You must attend the section weekly. We will take attendance!
- Time: Mondays, 2-4PM
- Zoom Link: https://ucsc.zoom.us/j/96434245012?pwd=cHR3NUZkZlNRU3dpZmRnSVlJc0pJQT09
- All Section Slides are available under Files > Section Slides
- The section recordings are available under YuJa > All Courses > NLP-243-01
- Link to Python self-test: https://colab.research.google.com/drive/1yiYE9LdUjkriAAG7krmecqA-7BJRZwNj?usp=sharing
- Section 2 Python Basics: https://colab.research.google.com/drive/1LkBmNPi8ZXtmSr-iW2Uypa3sbhNq4gk6?usp=sharing
- Section 3 - Basics of SciKit Learn: https://colab.research.google.com/drive/1Ldma3WPhLexR6ttqMaPYO4auUwJ6JDh-?usp=sharing
- Section 4 - PyTorch basics:
https://colab.research.google.com/drive/1vgTuMLpkK7DTuKxxfyUZhZhW5nBu1oaD?usp=sharing - Section 5 - PyTorch Multilayer Perceptron and Convolutional Neural Networks: https://colab.research.google.com/drive/1UCpug78_XvieSJhSp0v4fAE7E6sEBr0y?usp=sharing
- Section 6 - PyTorch RNNs: https://colab.research.google.com/drive/1fVXNmNi_g1o77-oU4QaX4XeRof8myIJj?usp=sharing
- Section 7 - Sequence Tagging using RNNs: https://colab.research.google.com/drive/1hy0-T3oK6-nmZN9AUVyHNIdQv8DLRSba?usp=sharing
Course Description
Introduction to machine learning models and algorithms for Natural Language Processing. Introduces learning models from fields of statistical decision theory, artificial intelligence, and deep learning. Topics include an introduction to standard neural network learning methods such as feed-forward neural networks, recurrent neural networks, convolutional neural networks, with applications to natural language processing problems such as utterance classification and sequence tagging. Requirements include 3 programming assignments and a final project.
Textbooks:
Dive Into Deep Learning, Ashton Zhang, Zack C. Lipton, Mu Li, Alex Smola. http://d2l.ai
Natural Language Processing with PyTorch. Delip Rao and Brian McMahan. https://proquest-safaribooksonline-com.oca.ucsc.edu/9781491978221
Foundations of Statistical NLP. Chris Manning, Hinrich Schuetze. https://nlp.stanford.edu/fsnlp/
Speech and Language Processing. Daniel Jurafsky and James Martin. https://web.stanford.edu/~jurafsky/slp3/
I will also provide pointers to other reading when needed.
Canvas Link: https://canvas.ucsc.edu/courses/37453
Piazza Link: https://piazza.com/ucsc/fall2020/nlp24301
(Access code: ucsc-nlp-243)
Grading:
- Attendance (5%)
- Homeworks and Final Project: 55%
- HW1: 8%
- HW2: 12%
- HW3: 15%
- Final: 20%
- Midterm 20%
- Final 20%
Homework Delivery:
We will organize one in-class competition and a leaderboard for each homework (i.e., on Kaggle or codalab). Every student should create a CodaLab account to participate in. In the CodaLab in-class competition, students are given the training data and labels. They need to train the requested models and submit their predictions on test data on CodaLab. CodaLab will rank their results according to evaluation metric (e.g. accuracy and F1 score). Students also need to turn in one report (must be PDF only) and a zip file with their code on Canvas assignments. Grades will consider both the ranking on leaderboard and the reports: 25% of grading will be based on performance on leaderboard, 50% will be based on the report accompanying the homework, and 25% will be based on the code.
Schedule
Schedule for reading and homework assignments are shown in the syllabus below.
- THIS SCHEDULE IS SUBJECT TO CHANGE
- Check Canvas for specific due dates and times of all assignments.
SYLLABUS
Week 1:
Oct 5th:
Topics:
- Class Logistics
- What is natural language processing?
- What is machine learning?
- What is deep learning?
Readings:
Oct 7th:
Topics
- Preliminaries:
- Linear Algebra
- PyTorch Basics
- Probability
- Basics
- Conditional Probability and Independence
- Calculus – Derivatives and Differentiation
Readings:
Week 2:
Oct 12th:
- ML and NLP Basics
- Review NLP toolkits (NLTK, Spacy, sklearn for homework)
- Background on commonly used ML approaches for NLP
- Naïve Bayes
Readings:
Oct 14th:
- Background on commonly used ML approaches for NLP (cont.)
- Decision Trees
- Support Vector Machines
- Getting ready for homeworks: knowledge graphs and querying knowledge graphs
Readings:
Week 3:
Oct 19th:
- Review of Possible Topics for Final Projects
- K-nearest neighbors
- Linear Regression
- Homework 1 assigned
Readings:
Oct 21st:
- Linear Regression (cont.)
- Gradient Descent (and versions)
- Practical Tips
Readings:
Week 4:
Oct 26th:
- Final Project Teaming up event
Oct 28th:
- Homework 1 due date
- Sign up teams of 3 people for the final project.
- Activation and Loss Functions Using PyTorch
- Multi-layer perceptron
Readings:
Week 5:
Nov 2nd:
- Homework 2 assigned
- Multi-Layer Perceptron (cont.)
- Computation Graphs
- Back-propagation
- Overfitting Revisited
- Weight Decay
- Dropout
- Distributional Similarity
- Words, vectors and co-occurrence matrices
- Word Embeddings
- What unexpected things might we learn with word embeddings?
Readings:
- Continuing Ch4 of Dive into Deep Learning
- A good review paper: Yoav Goldberg. A Primer on Neural Network Models for Natural Language Processing
- Chapter 5 of NLP with PyTorch
-
Other suggested reading:
- Chapter 6 of the Speech and Language Understanding book by D. Jurafsky and J. Martin
Nov 4th: Final Project Proposal Presentations (also due date for proposal write-ups)
Week 6:
Nov 9th: Midterm during class time
Nov 11th: Veterans day holiday, no class.
Week 7:
Nov 16th:
- Glove Embeddings
- Playing with word embeddings
- Convolutional Neural Networks
- Text Classification Using Convolutional Neural Networks
- Convolutional Neural Networks (cont.)
- Text Classification with CNNs
- CNNs in PyTorch
Readings:
Nov 18th:
- Homework 2 due date
- Homework 3 assigned
- Language Modeling
- Recurrent Neural Networks
- Sequence Classification Tasks
- Homework 3 introduction
Readings:
Week 8:
Nov 23rd:
- Quick review of RNNs from previous lecture
- Case Study: Natural Language Understanding in Conversational Systems
- Homework 3 discussion
- Implementing RNNs
Nov 25th:
- Implementing RNNs (continuing from previous lecture)
- Long Short Term Memory (LSTM)
- Implementing LSTMs
- Gated Recurrent Units (GRU)
Readings:
Week 9:
Nov 30th:
- Discussion of midterm grades review and HW2 questions
- Encoder-Decoder Architecture
- Sequence-to-sequence (S2S) models
- Beam Search
- Attention
Readings:
- Continuing Chapter 9 of Dive Into Deep Learning
- Chapter 8 of NLP with PyTorch
- Bahdanu et al., Neural Machine Translation by Jointly Learning to Align and Translate. ICLR, 2015.
Dec 2nd:
- Homework 3 due date
- Applications for RNNs and Attention: Task Specific Variations of Network Topologies
- SLU in Dialogue Systems
- Seq2seq Models with Attention
- Representations of Conversation Context
- Scaling to new domains
- Scaling to new languages
- S2S models for Response Generation in Social Dialogue Systems
- Hierarchical RNNs for Conversation Context
- Memory Networks for Knowledge Integration
- Pointer-Generator Networks
- Generating Diverse Responses
Readings:
- Links for papers covered are in the slide deck
Week 10:
Dec 7th: Final project presentations.
Project | Members |
Topical Chat Bot | Austin King, Devavrat Joshi, Morgan Eidam |
Emotion Detection | Angela Ramirez, Christopher-Garcia Cordova, Mamon Alsahily |
Visual Question Answering | Raghav Chaudhary, Sam Shamsan, Adam Fidler |
Sentiment Analysis | Tianxiao Zhang, Youyou Zhao, Phill Lee |
Dec 9th: Final project presentations.
Project | Members |
Generating Creative Content for Dialogue | Kevin Bowden, Eduardo Zamora, Jeshwanth Bheemanpally |
Fake News Detection | Alex Lue, Nilay Patel, Kaleen Shreshta |
Question and Answering Machine | Zachary Sweet, John Lara, Kit Lao |
Financial News Sentiment Analysis | Cecilia Li, Liren Wu, David Li |
Dec 13th: Final Project reports due.
Finals week:
Dec 14-18: Final, date TBD.
Course Summary:
Date | Details | Due |
---|---|---|