After successful completion of the course, students are able to extract structure from natural language data by applying standard methods for text segmentation, word and sequence tagging, or syntactic parsing. They will have a high-level overview of the most important rule-based and learning-based approaches to each task and the standard methods for evaluating them. Students will gain a fundamental understanding of artificial neural networks and methods for training them, with a special emphasis on architectures for processing sequential data, allowing them to solve a variety of NLP tasks with deep learning. An overview of information extraction tasks will be given, allowing students to approach various problems involving the extraction of structured information from unstructured text data. A survey of common specialized IE tasks is also provided, acquainting the students with some of the most common NLP applications.
- Basics of text processing: segmentation, tokenization, decompounding, stemming, lemmatization; regular expressions
- N-gram language modeling, simple classification tasks in NLP
- Part-of-speech tagging, named entity recognition, and shallow parsing with Hidden Markov Models
- Syntactic representations and syntactic parsing
- Basics of natural language semantics
- Neural network basics. Feed forward networks and recurrent neural networks
- Sequence modeling and sequence-to-sequence models.
- Neural language modeling. Word vectors and contextualized language models.
- Information extraction tasks: entity recognition, relation extraction, knowledge base population
- Information extraction applications: summarization, question answering, chatbots