Você está na página 1de 5

Journal for Research | Volume 03 | Issue 10 | December 2017

ISSN: 2395-7549

Comprehensive Analysis of Natural Language


Processing Technique
Utkarsha Singh Harkiran Kaur
Research Scholar Faculty
Department of Computer Science and Engineering Department of Computer Science and Engineering
Thapar Institute of Engineering and Technology Thapar Institute of Engineering and Technology
(Deemed to be University) Patiala, India (Deemed to be University) Patiala, India

Abstract
Natural Language Processing (NLP) techniques are one of the most used techniques in the field of computer applications. It has
become one of the vast and advanced techniques. Language is the means of communication or interaction among humans and in
present scenario when everything is dependent on machine or everything is computerized, communication between computer and
human has become a necessity. To fulfill this necessity NLP has been emerged as the means of interaction which narrows the
gap between machines (computers) and humans. It was evolved from the study of linguistics which was passed through the
Turing test to check the similarity between data but it was limited to small set of data. Later on various algorithms were
developed along with the concept of AI (Artificial Intelligence) for the successful execution of NLP. In this paper, the main
emphasis is on the different techniques of NLP which have been developed till now, their applications and the comparison of all
those techniques on different parameters.
Keywords: Disclosure Integration, Natural Language Processing, Parts Of Speech, Turing Test, Semantic Analysis,
Syntactic Analysis
_______________________________________________________________________________________________________

I. INTRODUCTION

NLP is an area of application which focuses on how computers can be utilized to understand natural language texts or
speech to perform practical tasks.
It takes some input either in the form of text, spoken language or keyboard input and performs task (translation of
natural language to another language) in order to grasp easily and represent the context of text which is further used for
various applications.
NLP involves following basic steps for its working:
1) Lexical Analysis: Analysis of structure of text (input) and dividing into paragraphs, s entences and words.
2) Syntactic Analysis: Analysis of sentences and arranging words of sentences in such a manner so that it shows relation
between words.
3) Semantic Analysis: Extraction of meaningful text from sentences (output of Syntactic Analysis).
4) Disclosure Integration: Extraction of meaningful sentence in such a way that each and every preceding sentence has
meaning which is connected to next sentence.
5) Pragmatic Analysis: Derives those aspects of language which requires practical knowledge.
All the above mentioned steps are the base of NLP and they are applied in different fields to achieve useful
applications.
In this paper, different works and applications based on NLP is represented.

II. LITERATURE REVIEW

This literature review is based on the different applications of NLP and its different techniques. The literature review
helps in understanding the NLP and also helps to know to what extent work has been done in the field and what would be
the future aspect.
Agung Fatwanto in [1] worked on designing of SRS using the concept of NLP as designing of SRS using NLP is
common, however, his research work was on the dynamic feature of software system. His proposed method offers a
conversion of dynamic requirements specification (written in scenario like format) to formal specification (object
oriented model).
Methods involve in his approach are: translation using AI theory and translation based on grammatical analysis.
Requirements specified in natural language will be parsed using Reed -Kellogg sentence diagramming system [1] which
follows following schemata:

All rights reserved by www.journal4research.org 11


Comprehensive Analysis of Natural Language Processing Technique
(J4R/ Volume 03 / Issue 10 / 003)

Sentence-Subject + Predicate
Sentence-Subject | Verb | Direct Object; or Sentence-Subject |
Copula | Predicate
Requirement-Subject + Verb + Target + [Way]
Fig. 1. Schemata for parsing requirements in natural language [1]

This schemata helps in incorporating dynamic aspect of software system.


M. P. di Buono, P. Ronzino in [2] worked in the field of Archaeology to help in decision making and enhance the data
retrieval process related to problem solving procedure s.
Two steps were developed: creation of ontology through input – archaeological texts (booklets, appendices, excavation
diaries etc.) which helps in determining archaeologists decisions and interpretations and to describe archaeological
processes, which is the conclusion process of an archaeologist in the primary assessment, the decisions made during the
excavation itself, the techniques and approaches involved, and second step is the ratification of the concluding decisions.
Their framework was based on a defined language which is robust, which creates grammars and lexicons. Their
architecture ensures better process for decision making by processing highly complicated sentences.
Ravi Prakash Verma, Md. Rizwan Beg in [3] proposed a way on the concept of knowledge representation graph of
parsed tree which is generated through parser which helps in creating test cases. The proposed method involves
“Preprocessing” (removing unwanted words from the input text, “POS Tagging” ( assignment of parts of speech to each
word to retrieve useful information), “Parsing” (determines syntactic tree structure, “Knowledge Representation”
(representation of data written in natural language in the form of graph) and “Merging Graphs” ( all the graphs are
combined which are generated through each parsed tree into one graph which represents knowledge of requirements in detail
which are expressed in natural languages).
Test cases are automatically generated through boundary value analysis.
Takayuki Suzuki, Kiminori Gemba et al. in [4] analysed communication among customers regarding their experiences related
to different hotels using NLP technique. An automated system is used to collect data from various user review sites and on the
basis of that data co-occurrence words are identified using NLP technique. By co-occurrence words pattern is observed in terms
of most frequently used words which helps hotels in analysing and offering better services to their customers.
Veronika Vincze, Rich´ard Farkas in [5] research work was on “Named-Entity Recognition”, “De-identification” in NLP for
medical as well as social media. They proposed the use of Named-Entity Recognition by using CRF (Conditional Random Field)
approach which is closely related to de-identification and anonymization as most of the categories to be de-identified are some
kind of named entities.
Rodica Potolea, Mihaela Dinsoreanu in [6] proposed an architecture for spam detection which includes following components:
1) Feature Extraction Module: Features are extracted from the comment in form of links, whitespaces, and punctuation marks
to differentiate the spam comment from the legitimate comment.
2) Topic Extraction Module: It extracts contents of post and comment to determine if there is similarity or common topics
between post and comment.
3) Post-Comment Similarity Module: This module checks the similarity between the post and comment as spam comments
mostly contain irrelevant data which has no connection with the post.
Once all the above modules are executed, Decision Tree Classifier is implemented to provide final results.
Abinash Tripathy, Santanu Ku. Rath in [7] developed a methodology for object oriented systems to identify class names and
their relationships from SRS automatically. For analysis of SRS, different tables are constructed on the basis of adjectives,
prepositions, adverbs, articles, pronouns and so on. The words of SRS are tokenized and are searched in tables. If word is
founded in table then it is identified as output, otherwise, it is considered as noun and are stored in NON array including the
words found in verb table. This NON array containing verbs and nouns helps in creating class diagrams (class-association-class).
K. Kamalabalan, T. Uruththirakodeeswaran et al. in [8] presented a sample tool for traceability of software artefacts (SRS, test
cases, designs) which are discovered in SDLC. Their tool is named as SATAnalyzer(Software Artefact Traceability Analyzer).
File Server is used to store artefacts that are discovered during SDLC and are converted into XML files. Standard NLP tool is
used to extract information from requirement specifications. Classes and their sub classes are derived from XML files which
helps in generating traceability links.
Dean J. Jones, Gunjan Mansingh in [9] presented work in progress on an algorithm that accounts for the similarity or
oppositeness of given words. They have made the use of WordNet which is a publically available lexical database which consists
of lakhs of nouns, verbs, adjectives and adverbs which are grouped into some synsets. Word Comparison is done through
Comparison Metric.
Yuki Sawa, Ram Bhakta et al. in [10] proposed an approach to detect suspicious comments on social sites using NLP. Social
Engineering Attack is very common these days. Parse tree is generated by parsing sentence using context free grammar of
English and it is analyzed to find whether the sentence is carrying question (request private questions) or command (to do
unauthorized actions). If question or command is detected, then by Topic Extraction analysis is done to fine potential verb/noun
topic pairs which describes topic of question/command. Then it is compared to topic backlist (prepared on the basis of violation
of security protocols) to detect whether question or command is suspicious or not.

All rights reserved by www.journal4research.org 12


Comprehensive Analysis of Natural Language Processing Technique
(J4R/ Volume 03 / Issue 10 / 003)

1) Staircase Pattern Learning: To find brand and product named varied in main content.
2) Staircase Pattern Matching: Comparison is done to find match.
The proposed algorithm can be used for location based m commerce. Once the customer connects to wireless access point in
shopping mall, algorithm recognizes customer’s interest by extracting product or band name from log files which helps in
increasing sales of products.
Sweta P. Lende, Dr.M.M. Raghuwanshi in [11] concluded that closed domain QA (Question Answering) system gives
accurate answer in comparison to open domain QA system. Proposed method includes following steps:
1) Collection of education acts data sets from different websites.
2) Creation of Corpus(Datasets)
3) Preprocessing through Stop Words Removal and Stemming
4) Storing of extracted keywords into index term dictionary
5) Preprocessing of input query through POS tagging
6) Document is retrieved by doing comparison between indexed dictionary and extracted keywords from preprocessing of
query
7) Ranking of keyword is done by finding out score between query keywords and document retrieval
8) Application of POS tagging on all keywords ranking for Answer Extraction
9) Answer is selected from Answer Extraction for which user is looking for.
The proposed system gives more accurate results.
Akkamahadevi R Hanni, Mayur M Patil et al. in [12] developed an application which helps in providing efficient summary of
product reviews which the buyers have given for that product and also helps other buyers to decide whether to go for that product
or not by reading those reviews. Feature Extraction and opinion mining extraction are the main modules which are used to give
efficient reviews. Classifiers classifies reviews into positive and negative which helps customers in their decision making.
Yilin Yang, Xinhai Huang et al. in [13] studied about the test case prioritization from different projects which results in
effective software testing. Keywords are extracted from training test cases through NLP techniques and keyword dictionary is
constructed. For test case prioritization three strategies are used: Risk Strategy, Diversity Strategy and Hybrid Strategy. On the
basis of these strategies, test cases are prioritized so that defects can be detected in earlier stages.
Shivaprasad T K, Jyothi Shetty in [14] presented the taxonomy in Fig. 1 [14] of various Sentiment Analysis methods using
NLP to understand various reviews of product.

Fig. 2: Sentiment Analysis Process Model [14]

Reviews are taken as data set which are analyzed and classified into sentiments to provide result.
Liu Shufengi, Shen Shaohong et al. in [15] developed a method to recognise Chinese characters in static background. NLP
technique is used for feature extraction to get Chinese characters area and then, K means clustering algorithm is used to
differentiate between character and background area. After identification of Chinese character area, two parameters are generated
namely stroke across and energy density which identify Chinese characters in a very accurate manner.
Georgi M. Kanchev, Pradeep K. Murukannaiah et al. in [16] proposed a tool based assistance named as CANARY to extract
requirements from online forums to enhance interactions between stakeholders. The proposed tool extracts raw data from online
forums, apply annotations to those raw data and runs query which results in social data that helps in interaction between
stakeholders.
Glen Coppersmith, Casey Hilland et al. in [17] introduced some recent advancements in the field of mental health by using
NLP and machine learning. Whitespace information helps in analysing mental illness of a person. Linguistic signals are extracted
from whitespaces which are processed through NLP technique and are compare with the already stored signals which helps in
proper treatment and diagnosis of patient.

All rights reserved by www.journal4research.org 13


Comprehensive Analysis of Natural Language Processing Technique
(J4R/ Volume 03 / Issue 10 / 003)

III. COMPARATIVE ANALYSIS OF NLP TECHNIQUES

A comparative study is done after going through distinctive research papers of NLP’s applications and its techniques. A
correlation table is drawn below on the basis of different parameters such as accuracy, performance, decision making, application
and approaches/techniques.
Table - 1
NLP Techniques with Their Influencing Factors
Parameters Approaches/Techniques
Representation Sentiment
Grammatical Analysis Feature Extraction
Graph Analysis
Spam Detection

Translation of SRS written Detection of characters


in natural language in from background
Creation of Product
Application formal language covering Supports interaction
Test Cases Reviews
dynamic aspect of software between different
system stakeholders by
extracting requirements
from online forums
Less in
Accuracy More More More comparison
to others
Medium
Performance High Medium High

Parser/Parsed
Proposed
Tree K means Clustering
Method/Algorithm/Schemata to Reed Kellog Sentence
POS Tagging Algorithm
carry out the techniques of NLP Diagramming System
Boundary CANARY tool
or Implementation
Value Analysis
Decision Making Yes No Yes Yes

IV. CONCLUSION

This paper is a comparative analysis of different NLP techniques which provides an overview of NLP’s proposed techniques and
methodologies. On the basis of different parameters efficiency of different proposed methodologies are defined. This analysis
helps in understanding which technique is better for what kind of application and what future work can be done or the scope in
the field of NLP. It concludes that NLP has wide range of applications in the field of medical, security, social media, software
development, robotics and so on and can be used with other techniques to provide a platform where interaction between human
and computer becomes easy.

REFERENCES
[1] A. Fatwanto, "Software requirements specification analysis using natural language processing technique," 2013 International Conference on QiR, pp. 105-
110, 2013.
[2] M. P. Di Buono, M. Monteleone, P. Ronzino, V. Vassallo and S. Hermon, "Decision making support systems for the Archaeological domain: A Natural
Language Processing proposal," Proceedings of the DigitalHeritage 2013 - Federating the 19th Int'l VSMM, 10th Eurographics GCH, and 2nd UNESCO
Memory of the World Conferences, Plus Special Sessions fromCAA, Arqueologica 2.0 et al., vol. 2, pp. 397-400, 2013.
[3] R. P. Verma and B. Md. Rizwan, "Generation of Test Cases from Software Requirements Using Natural Language Processing," 2013 6th International
Conference on Emerging Trends in Engineering and Technology, pp. 140-147, 2013.
[4] T. Suzuki, K. Gemba and A. Aoyama, "Hotel Classification Visualization Using Natural Language Processing of User Reviews," pp. 892-895, 2013.
[5] V. Vincze and R. Farkas, "De-identification in Natural Language Processing," no. May, pp. 1300-1303, 2014.
[6] C. Rădulescu, R. Potolea and M. Dinsoreanu, "Identification of Spam Comments using Natural Language Processing Techniques," pp. 29-53, 2014.
[7] A. Tripathy and S. K. Rath, "Application of Natural Language Processing in Object Oriented Software Development," 2014 International Conference on
Recent Trends in Information Technology, pp. 1-7, 2014.
[8] K. Kamalabalan, T. Uruththirakodeeswaran, G. Thiyagalingam, D. B. Wijesinghe, I. Perera, D. Meedeniya and D. Balasubramaniam, "Tool Support for
Traceability of Software Artefacts," MERCon 2015 - Moratuwa Engineering Research Conference, pp. 318-323, 2015.
[9] D. J. Jones and G. Mansingh, "Not Just Dissimilar, but Opposite An Algorithm for Measuring Similarity and Oppositeness between Words," 2016.
[10] Y. Sawa, R. Bhakta, I. G. Harris and C. Hadnagy, "Detection of Social Engineering Attacks Through Natural Language Processing of Conversations," 2016
IEEE Tenth International Conference on Semantic Computing (ICSC), pp. 262-265, 2016.

All rights reserved by www.journal4research.org 14


Comprehensive Analysis of Natural Language Processing Technique
(J4R/ Volume 03 / Issue 10 / 003)

[11] S. P. Lende and M. M. Raghuwanshi, "Question answering system on education acts using NLP techniques," IEEE WCTFTR 2016 - Proceedings of 2016
World Conference on Futuristic Trends in Research and Innovation for Social Welfare, 2016.
[12] A. R. Hanni, M. M. Patil and P. M. Patil, "Summarization of customer reviews for a product on a website using natural language processing," 2016
International Conference on Advances in Computing, Communications and Informatics, ICACCI 2016, pp. 2281-2285, 2016.
[13] Y. Yang, X. Huang, X. Hao, Z. Liu and Z. Chen, "An Industrial Study of Natural Language Processing Based Test Case Prioritization," 2017 IEEE
International Conference on Software Testing, Verification and Validation (ICST), pp. 548-549, 2017.
[14] S. T.K. and J. Shetty, "Sentiment Analysis of Product Reviews:A Review," 2017.
[15] L. Shufeng, S. Shaohong and S. Zhiyuan, "Research on Chinese Characters Recognition in Complex Background Images," pp. 214-217, 2017.
[16] G. M. Kanchev, P. K. Murukannaiah, A. K. Chopra and P. Sawyer, "Canary: An Interactive and Query-Based Approach to Extract Requirements from
Online Forums," pp. 31-32, 2017.
[17] G. Coppersmith, C. Hilland, O. Frieder and R. Leary, "Scalable mental health analysis in the clinical whitespace via natural language processing," 2017
IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), pp. 393-396, 2017.

All rights reserved by www.journal4research.org 15

Você também pode gostar