AL3501 Natural Language Processing Syllabus:
AL3501 Natural Language Processing Syllabus – Anna University Regulation 2021
COURSE OBJECTIVES:
To learn the fundamentals of natural language processing.
To learn the word level analysis methods .
To explore the syntactic analysis concepts.
To understand the semantics and pragmatics.
To learn to analyze discourses and Lexical Resources.
UNIT I INTRODUCTION
Origins and challenges of NLP – Language Modeling: Grammar-based LM, Statistical LM – Regular Expressions, Finite-State Automata – English Morphology, Transducers for lexicon and rules, Tokenization, Detecting and Correcting Spelling Errors, Minimum Edit Distance
UNIT II WORD LEVEL ANALYSIS
Unsmoothed N-grams, Evaluating N-grams, Smoothing, Interpolation and Backoff – Word Classes, Part-of-Speech Tagging, Rule-based, Stochastic and Transformation-based tagging, Issues in PoS tagging – Hidden Markov and Maximum Entropy models.
UNIT III SYNTACTIC ANALYSIS
Context-Free Grammars, Grammar rules for English, Treebanks, Normal Forms for grammar – Dependency Grammar – Syntactic Parsing, Ambiguity, DynamicProgramming parsing – Shallow parsing – Probabilistic CFG, Probabilistic CYK,Probabilistic Lexicalized CFGs – Feature structures, Unification of feature structures.
UNIT IV SEMANTICS AND PRAGMATICS
Requirements for representation, First-Order Logic, Description Logics – Syntax-Driven Semantic analysis, Semantic attachments – Word Senses, Relations between Senses, Thematic Roles, selectional restrictions – Word Sense Disambiguation, WSD using Supervised, Dictionary & Thesaurus, Bootstrapping methods – Word Similarity using Thesaurus and Distributional methods.
UNIT V DISCOURSE ANALYSIS AND LEXICAL RESOURCES
Discourse segmentation, Coherence – Reference Phenomena, Anaphora Resolution using Hobbs and Centering Algorithm – Coreference Resolution – Resources: Porter Stemmer, Lemmatizer, Penn Treebank, Brill’s Tagger, WordNet, PropBank, FrameNet, Brown Corpus, British National Corpus (BNC).
45 PERIODS
PRACTICAL EXERCISES: 30 PERIODS
1. Word Analysis
2. Word Generation
3. Morphology
4. N-Grams
5. N-Grams Smoothing
6. POS Tagging: Hidden Markov Model
7. POS Tagging: Viterbi Decoding
8. Building POS Tagger
9. Chunking
10. Building Chunker
COURSE OUTCOMES:
At the end of this course, the students will be able to:
CO1: tag a given text with basic Language features
CO2 implement a rule based system to tackle morphology/syntax of a language
CO3: design a tag set to be used for statistical processing for real-timeapplications.
CO4: compare and contrast the use of different statistical approaches for different types of NLP applications.
CO5: use tools to process natural language and design innovative NLP applications.
TOTAL:75 PERIODS
TEXT BOOKS:
1. Daniel Jurafsky, James H. Martin―Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech, Pearson Publication, 2014.
2. Steven Bird, Ewan Klein and Edward Loper, ―Natural Language Processing with Python, First Edition, O‘Reilly Media, 2009.
REFERENCES:
1. Breck Baldwin, ―Language Processing with Java and LingPipe Cookbook, Atlantic Publisher, 2015.
2. Richard M Reese, ―Natural Language Processing with Javaǁ, O‘Reilly Media, 2015.
3. Nitin Indurkhya and Fred J. Damerau, ―Handbook of Natural Language Processing, Second Edition, Chapman and Hall/CRC Press, 2010.
4. Tanveer Siddiqui, U.S. Tiwary, “Natural Language Processing and Information Retrieval”, Oxford University Press, 2008.
