Você está na página 1de 25

1

CSC348 - Natural Language


Processing (NLP)
Week-1 Introduction
Semester-# Spring 2018
Prepared by:

Shamaila Shafiq

Hassaan Rafiq
2
Instructor Contact Details

 Name: Hassaan Rafiq /Shamaila Shafiq


 Course : CSC348- Natural Language Processing
 Credit Hours: 3
 Designation : Assistant Professor /Lecturer
 Office Location: CS-Staff Room Cabin
 Email: hassaan.rafiq@lgu.edu.pk/shamailashafiq@lgu.edu.pk
 Visiting Hours: Wednesday- Friday (0800-1000)
 Lab Work: Video Tutorials & Supervision by Instr

Lahore Garrison University


3
Course Objectives

 The main objective of this course is to understand natural language


processing (NLP) and to learn how to apply basic algorithms in this field.
 This course familiarizes the fundamental concepts and techniques of
natural language processing.
 To get the students familiar with the algorithmic description of the main
language levels: morphology, syntax, semantics, and pragmatics, as well
as the resources of natural language data - corpora.
 To learn about basic NLP problems, tasks and methods and to make clear
basic programming tools for NLP.

Lahore Garrison University


4
Course Information

Textbooks
 Daniel Jurafsky and James H. Martin. 2008. Speech and Language
Processing: An Introduction to Natural Language Processing, 43,
Computational Linguistics and Speech Recognition. Second Edition.
Prentice Hall.
 Foundations of Statistical Natural Language Processing, Manning and
Schütze, MIT Press. Cambridge, MA: May 1999

Lahore Garrison University


5
Course Topics

 Basic Text Processing


 Language Modeling
 Text Classification
 Sentiment Analysis
 Named Entity Recognition
 Part-of-Speech(POS) Tagging
 Parsing
 Information Retrieval
 Named Entity Recognition

Lahore Garrison University


6
Course Resources

 Class
 Textbooks
 Slides

 Python
 Python's homepage, tutorials, …: www.python.org

 Natural Language Toolkit in Python with online book: www.nltk.org

Lahore Garrison University


7
Applications of Natural Language
Processing

 Question Answering : IBM’s Watson


 Won Jeopardy on February 6, 2011.

Lahore Garrison University


8
Applications of Natural Language
Processing

 Information Extraction:

Lahore Garrison University


9
Applications of Natural Language
Processing

 Information Extraction & Sentiment Analysis


 Attributes:
 zoom
 affordability
 size and weight
 flash
 ease of use

Lahore Garrison University


10
Applications of Natural Language
Processing

 Size and Weight

Lahore Garrison University


11
Applications of Natural Language
Processing (Machine Translation)

Lahore Garrison University


12
Applications of Natural Language
Processing (Language Technology)

Lahore Garrison University


13
Ambiguity makes NLP hard

“Crash blossoms”
Headlines with two meanings
 Violinist Linked to JAL Crash Blossoms
 Teacher Strikes Idle Kids
 Red Tape Holds Up New Bridges
 Hospitals Are Sued by 7 Foot Doctors
 Juvenile Court to Try Shooting Defendant
 Local High School Dropouts Cut in Half

Lahore Garrison University


14
Why else is natural language
understanding difficult

Lahore Garrison University


15
Natural Language Processing
Motivation

 Google Search Engine


 Intelligently responding to the query: e.g., Where is Badshahi mosque?
 Predicting next word for autocompletion
 Ability to do spelling corrections
 Segmenting words that may be joined without space
 Ranking the search results
 Google translate
 Gmail
 E.g, Understand contents of an email through NLP and alert the user

Lahore Garrison University


16
Natural Language Processing

 Google Search Engine


 Intelligently responding to the query: e.g., Where is Badshahi mosque?
 Predicting next word for autocompletion
 Ability to do spelling corrections
 Segmenting words that may be joined without space
 Ranking the search results
 Google translate
 Gmail
 E.g, Understand contents of an email through NLP and alert the user

Lahore Garrison University


17
Introduction to Natural Language
Processing

Natural language processing is a field at the intersection of


 computer science
 artificial intelligence
 And linguistics.
A vibrant interdisciplinary field with many names corresponding to its many
facets, names like
 speech and language processing, human language technology, natural
language processing, computational linguistics, and speech recognition
and synthesis.

Lahore Garrison University


18
Introduction to Natural Language
Processing

 Goal: for computers to process or “understand” natural language in order


to perform tasks that are useful, e.g.,
 Performing Tasks, like making appointments, buying things
 Language translation
 Question Answering
 Siri, Google Assistant, Facebook M, Cortana …

 Fully understanding and representing the meaning of language (or even


defining it) is a difficult goal.
 Perfect language understanding is AI-complete

Lahore Garrison University


19
Some More Applications of Natural
Language Processing

 Valuable areas of language processing such as:


 Spelling Correction
 Grammar Checking
 Information Retrieval
 Machine Translation

Lahore Garrison University


20
Trends that accelerate NLP research

 Availability of web and social data


 Mobile devices as a source of data
 Increasing availability of datasets in open web
 e.g. Freebase, dbpedia, kaggle etc

Lahore Garrison University


21
Categories of Knowledge of
Language

Engaging in complex language behavior requires various kinds of knowledge


of language:
 Phonetics and Phonology— knowledge about linguistic sounds
 Morphology— knowledge of the meaningful components of words
 Syntax— knowledge of the structural relationships between words
 Semantics—knowledge of meaning
 Pragmatics— knowledge of the relationship of meaning to the goals and
intentions of the speaker.
 Discourse— knowledge about linguistic units larger than a single utterance

Lahore Garrison University


22
Course Prerequisites

 Students should have basic programming skills and general understanding


of Calculus and Statistics.
 Should have concepts of Discrete Structures.

Lahore Garrison University


23
Course Contents

Following contents will be covered throughout the semester in week wise lectures:
Introduction and Overview, Ambiguity and uncertainty in language, Regular Expressions. Chomsky
hierarchy, regular languages and their limitations., Finite-state automata. Practical regular expressions for
finding and counting language phenomena. In class demonstrations of exploring a large corpus with regex
tools, String Edit Distance and Alignment, Key algorithmic tool: dynamic programming, first a simple
example, then its use in optimal alignment of sequences. String edit operations, edit distance, and
examples of use in spelling correction, and machine translation, Context Free Grammars, Constituency,
CFG definition, use and limitations. Chomsky Normal Form. Top-down parsing; bottom-up parsing, and the
problems with each. The desirability of combining evidence from both directions, Information Theory, What
is information? Measuring it in bits. The "noisy channel model.“ The "Shannon game"--motivated by
language! Entropy, cross-entropy, information gain. Its application to some language phenomena,
Language modeling and Naive Bayes, Probabilistic language modeling and its applications. Markov
models. N-grams. Estimating the probability of a word, and smoothing. Generative models of language.
Their application to building an automatically-trained email spam filter, and automatically determining the
language, Part of Speech Tagging and Hidden Markov Models, The concept of parts-of-speech, examples,
usage. The Penn Treebank and Brown Corpus. Probabilistic (weighted) finite state automata. Hidden
Markov models (HMMs), definition and use, Probabilistic Context Free Grammars, Weighted context free
grammars, Maximum Entropy Classifiers, The maximum entropy principle, and its relation to maximum
likelihood. The need in NLP to integrate many pieces of weak evidence. Maximum entropy classifiers and
their application to document classification, sentence segmentation.

Lahore Garrison University


24

Q&A

Lahore Garrison University


25
References

 These lecture notes were taken from following source:

 Daniel Jurafsky and James H. Martin. 2008. Speech and Language


Processing: An Introduction to Natural Language Processing, 43,
Computational Linguistics and Speech Recognition. Second Edition.
Prentice Hall, Chapter 1

 Most slides adapted from Chris Manning and Dan Jurafsky's NLP class on
Coursera

Lahore Garrison University

Você também pode gostar