Use the Tab and Up, Down arrow keys to select menu items.
This course introduces central problems and methods in natural language processing. There is a special focus on the challenges presented by low-resource languages in the Pacific. Through their experiences in this course, students will be able to describe the central problems and methods in natural language processing, apply standard methods and models to existing text datasets, compare standard methods by their assumptions and applications, design an application of existing methods to a NZ-specific context, and evaluate the performance of the above application against reasonable baselines.
In this course we will examine Natural Language Processing theory and applications with an emphasis on how NLP algorithms are built typically, though not exclusively, using statistical machine learning. The theoretical topics we will cover include:• Encoding natural language as features.• Estimating features using smoothing, normalization, sampling, and expectation-maximization.• Classifying text, training and cross-validation.• Distributed word representations such as skip-grams, word2vec and evaluating stability and similarity.• Language models: training and evaluation (perplexity), word prediction, and other applications.• Sequence models: problem of transitions, Viterbi algorithm, and parsing Applications of these concepts that we will look at include:• Corpus similarity measures• Building dictionaries• Named-entity recognition• Part-of-speech tagging• Language identification• Topic classification• Finding lexical clusters• Phrase completion• Predicting sentence probabilities
(1) COSC262; (2) Approval by the Head of Department of Computer Science and Software Engineering
Ben Adams
Jonathan Dunn
Domestic fee $1,033.00
International Postgraduate fees
* All fees are inclusive of NZ GST or any equivalent overseas tax, and do not include any programme level discount or additional course-related expenses.
For further information see Computer Science and Software Engineering .