Content
What is this course about?
Language models serve as the cornerstone of modern natural language processing (NLP) applications and open up a new paradigm of having a single general purpose system address a range of downstream tasks. As the field of artificial intelligence (AI), machine learning (ML), and NLP continues to grow, possessing a deep understanding of language models and generative ai in general becomes essential for scientists and engineers alike. This course is designed to provide students with a comprehensive understanding of language models by walking them through the entire process of how a language model is developed and optimized. Unlike most of the courses, this course will include a lot of coding the concepts along you learn along the way, which includes developing an entire language model and from scratch and optimizing it in various aspects.
Prerequisites
-
Proficiency in Python
The majority of class assignments will be in Python, hence it is suggested that the students attending are comfortable with basic-intermediate Python codes. The amount of code that you will be writing will be an order of magnitude higher than other classes, hence for the ones who have not started with Python, this should be enough to get started with. You will be learning more along the way.
-
Experience with deep learning
A significant part of the course will involve making million-billion parameter scale neural networks and learning how to run them quickly and efficiently on GPUs across multiple machines. We expect students to have some understanding of basic neural networks. You may refer to this resource.
-
College Calculus, Linear Algebra (e.g. 12th Grade, MA11001 (1st Semester) or equivalent)
You should be comfortable understanding matrix/vector notation and operations.
-
Basic Probability and Statistics (e.g. 12th Grade, MA21001 (3rd semester) or equivalent)
You should know the basics of probabilities, Gaussian distributions, mean, standard deviation, etc.
Note that this is not a graded class. The purpose of this class is to introduce the mechanistics and system level optimizations of large generative models to students which are not available as distinct courses on the internet.