Você está na página 1de 4

Sci.

Int(Lahore),26(1),181-184,2014

ISSN 1013-5316; CODEN: SINTE 8

181

AN ESSENTIAL FRAMEWORK FOR CONCEPT BASED


EVOLUTIONARY QURANIC SEARCH ENGINE (CEQSE)
Syed Ali Raza1*, Muhammad Rehan1, Amjad Farooq2, S. M. Ahsan2, M. Saleem Khan1
1

Department of Computer Science, GC University Lahore


2
Department of Computer Science, UET Lahore
*Corresponding Author: arjafri104@gmail.com

ABSTRACT: The Holy Quran has affected the lives of Muslim nation and it is among one of the most
reading books. Despite the fact that the Quran is recited heavily overall on the globe yet there is less
concentration on Quranic search. Currently available models exploit keyword based searching. Currently
available models exploit keyword based searching which are not only less efficient as well as keyword
based searching techniques does not search Quranic concept accurately. This research paper, addresses
the deficiencies of keyword based searching and the issues related to semantic search in the Holy Quran,
and propose a model that is capable of performing semantic search.
INTRODUCTION:
The Holy Quran is most sacred scripture among Muslim
nation and is an ultimate source of information and
assortment of diverse knowledge and dissimilar subjects. It
discusses almost all fields of life and provide basics for all
areas of knowledge. Neutrally, today on earth one of every
five people is Muslim [1]. Therefore, significance of
understanding the Holy Quran for every Muslim as well as
for those scholars who are interested in the study of man and
society is very high. In view of the fact that Holy Quran has
been effectively influential not only in molding the destinies
of Islamic societies, but also in changing the destiny of
mankind as a whole [2]. Therefore, understanding the
concepts of the Quran is of paramount significance if one
wishes to study this book comprehensively.
The Holy Quran has its own style of describing different
concepts which is unique in many ways. Generally a concept
has been discussed in different chapters. For example, the
concept of Hell is discussed in various chapters and similarly
the oneness of the Almighty has been discussed throughout
the Holy Quran. It is also possible that one verse may
contain more than one theme. For example, Verse 40 of the
chapter 76 contains only seven words having 5 different
concepts in it. Such that the first concept is we (Allah) have
warned you (Human); second one is we (Allah) have warned
of chastisement; third one is chastisement Is near at hand;
forth is Man shall see (in Qayamat) what his(human) two
hands have sent before and the last one is Unbeliever shall
say (in Qayamat) I were dust. One underlying point in these
verses is that the word Allah and Qayamat have never been
used in this verse but the context reveals what is being said.
One other unique style of Quran is that one term has been
used in many different styles depending on the context. For
example, Muhammad is used as Ahmad, Mudhathir,
Muzammil, Mubashir, Nazeer and Heaven is used as The
Garden & Paradise etc. A term may also be used in different
meanings. The disambiguation between meanings depends
on the context in which term is being used.
Even though The Quran is recited heavily overall on the
globe yet there is less concentration on searching the
Quranic concepts digitally. Currently available models
exploit keyword based searching, where statistical and
keyword-based techniques have achieved some success in
data mining and information retrieval systems [3]. Despite
this fact such systems are not only less efficient as well as
keyword based searching techniques have several limitations

in connection of quality of search results and overall usage


of systems [3]. Therefore there is an immense need of
building an intelligent tool to assist readers of the Quran to
search most relevant and effective results for better
understanding of underlying concepts. In this regard massive
amount of semantic knowledge is required to continue
progress in textual-information management; for this the tool
should be embodied with the capacity of profound and deep
understanding of meanings.
RELATED WORK:
Currently numerous tools exploiting keyword based
searching are available in digital format [4-8]. These
Quranic softwares and databases are providing searching of
the Holy Quran in form of audio, video or text files. A
chatbot was developed by Abu-Shawar and Atwell [4] in
2007 for Holy Quran. This chatbot is good at answering the
questions from the Quran but have no capability to
understand the input. All it can do is to try to find most
considerable words in the question, and then perform simple
keyword based searching to find relevant verses from the
Quran. This is essentially an extension of keyword-search,
the user can type in a question as a full sentence, rather
than just some keywords, but the system still in effect
performs keyword-searches. The Search Truth Quran search
tool [5] allows users to search the Quran using many
translations at a time such as Mohsin khan, Yousaf ali,
Shakir, Pickthal etc. This tool also allows phonetic search.
Main drawback of this system is that it does not search for
the exact match of the word but rather if the word (to be
searched for) is part of any word in the Quran. For instance,
if you search for the word ship, all the verses that
contain the words worship, friendship etc will be retrieved.
The Guided Ways Quran search tool [6], allows users to
search the Quran using many translations at a time such as
Pickthal, Mohsin Khan, Yousaf Ali, Shakir, Ahmad Ali,
Jhalandhary etc as in [5]. User can choose one or more
Quranic translation used in the search process as well as this
search can be performed for different languages. It searches
the Quran for an exact match of the input word. The
IslamiCity search tool [7], searches the Holy Quran using
many translations at a time. When a user inputs a word for
searching the tool matches it against identical matches or
partial matches (part of the word). The USC Quran search
tool [8] searches the Quran using three English
translations; (Yousaf Ali, Pickthall, Shakir). It matches exact
words only.

182

ISSN 1013-5316; CODEN: SINTE 8

These tools are useful for searching keywords using


different translations but are not good enough to search for a
concept. For instance if the word paradise is searched these
tools would only return those verses which contains the
word paradise while the fact is Quran uses word heaven and
garden in same manner. Recently some research has been
done in this regard but such concept based searching
techniques havent yet explored in fully. A new approach for
XML semantics in terms of a specification language is used
in [9-10] to specify semantics rules. Aim of these papers is
to apply XML semantics approach to indicate reliability of
the Holy books being published in XML format. The work
done in [10] exhibits the significance of XML semantics
checker approach to examine semantic consistencies of Holy
Quran. This checker model successfully verifies that number
of verses in each chapter was correctly written in the Quran
XML format document. It was also verified that XML
document of Holy Quran contained exactly the same number
of chapters as really are in Holy Quran.
The work presented in [11-12] uses ontologies for key word
extraction and key phrase candidate for developing ontology
for Islamic literature based on an algorithm. A skeletal
methodology has been presented in these researches for
building these ontologies. A computational model for
representing Arabic lexicons using ontologies has been
presented in [13]. It is based on the field theory of semantics,
from the linguistics domain, and the data which drives the
design of the model is obtained to presents superiority and
perfection of the Arabic language. This paper presents the
design and implementation of the proposed ontological
model. Some results of its application on vocabulary of the
Holy Quran are presented. Another model presented in [14]
exploits WordNet relationships in relational database model.
The implementation of this model has been carried out using
Surah Al-Baqrah, the largest chapter of the Holy Quran. The
precision of this model's prototype implementation is
claimed somehow to be far better than simple key word
searching.
One good semantic based work has been done in [15]. In this
study, a query has been improved in order to retrieve more
relevant documents across language boundaries, a
mechanism for query translation with semantic which is
applied on as semantic query. Therefore, this study is
conducted with the purposes to investigate semantic
approach against the queries and vice versa. Furthermore, it
is also conducted to investigate the performance of query
based search on total retrieve and relevant retrieval. Results
from the experiments suggest that semantic approach is most
important process in cross language information retrieval. It
also suggest that semantic approach contributes to better
performance in retrieving more relevant and related Quran
document results. Another ontology based method for
searching Holy Quran is presented in [16] exploiting NLP
patterns that help reduce the effort during the knowledge
acquisition process. Some limitations of the work has also
been mentioned such that all the competency question
cannot be answer using Quran because Quran being reveal in
a general and the detail of every subject such as Salah being
described and elaborate in detail by Hadith. Secondly some
verses especially mutashabihat need further elaboration or
discussion by Quranic experts.

Sci.Int(Lahore),26(1),181-184,2014

METHODOLOGY
The current research is proposing theoretical framework
architecture for Concept based Evolutionary Quranic Search
Engine (CEQSE) that will take user queries as input and will
search concepts in Quran accordingly. The benefit of
implementing this framework is that the timing and accuracy
for searching is not same all the time. Initially this search
engine may take a longer time and may search some
irrelevant verses in comparison with its search after
experience. This framework consists of eight modules as
shown in figure 1.
Quran Document:
This module behaves as an input interface and is use to take
a Quran Document as input from user. This module holds all
verses of Quran in it. Although this is one time task but it
give opportunity to add as many books to search text as user
wants. This document passes the text to next module for
further processing.
Ontology Extractor:
The purpose of this module is to extract ontological
knowledge from the factual knowledge. This module takes
XML file and considering the concept that how a human
brain actually store semantic information perform tagging
operation on sentences level. This module does tagging
through dividing each sentence into three tags; Subject,
Object and Predicate. This ontological knowledge is store
into Ontological Knowledgebase repository to use this
knowledge as a conceptual knowledge for further modules.
This module is also responsible to provide ontology to
Query Engine according to user query.
Query Engine:
Query Engine work as a controller, it gets queries of user
and passes it to subsystems for processing. It gets the query
from user application and passes it to POS Tagger. It is also
responsible to retrieve the ontological knowledge from
ontology extractor to entertain the query of user. Then after
validation of concepts from concept validator and sends it to
Ontology Extractor for refinement of Ontological
knowledge.
POS Tagger:
POS Tagger is used for part of speech tagging /tokenization
of words. This is used to label each word of a sentence into
its suitable token like verb, adverb, noun etc.
XML Generator:
This module takes Quran document and converts it into
XML file format for CEQSE framework. XML is a most
useful language applies for the transmission of data in all
type of applications due to its popularity in storing and
describing information. This XML file then further transfers
to Ontology Extractor Module.
Morphological Analysis:
Initial task of this module is to filter out the verse from those
words, which are more frequent in the query as they contain
very low inequity for retrieval of relevant concept from
ontological Knowledgebase. As a document or query have
many morphological deviations so this module then is use to
extract the comprise morphemes in a word. In the result this
module brings the words to their stems or root form.

Sci.Int(Lahore),26(1),181-184,2014

183

ISSN 1013-5316; CODEN: SINTE 8

Fig1: CEQSE Framework

The purpose of this practice is that generally similar words


have same meanings so, uniformity of morphological words
enhance the effectiveness of retrieval of conceptual
knowledge. Morphological knowledgebase hold this
morphological
Synonyms Identifier:
To generate effective responses of users query this module
identifies the synonyms of the words. The purpose of this
identification is to get the every possible concept in the
document for user generated query.
Concept Validation Module:

Allah
Paradise
Garden
Hell
Sea
Punishment
Jinn
Man
Water
Earth
Believer
Sinner/Criminal

Search
Truth
0
0
3
1
2
0
6
6
1
4
0
0

Guided
Ways
1
0
0
1
1
0
3
6
1
4
0
0

Query Engine retrieves all concepts according to user query


and displays it on user application. Then at this point the
concepts are validated. If user selects any relevant verse then
the weights of that ontology with the searched query is
updated otherwise same weights are updated in the
ontological database. In the result of validation from user
side these selected concepts transfer to ontological
knowledgebase for refinement and enhancement for the
effectiveness of results according to query.
ALGORITHM

Islamicity
11
15
5
2
2
0
5
4
3
1
2
0

Corpus
Quran
0
0
0
1
1
0
5
5
1
3
0
0

Al-Islam
1
0
0
1
0
0
6
0
3
4
0
0

Actual
Result
42
14
14
2
4
0
6
36
0
3
0
3

Table 1: Comparison of different Quranic Search Engines

1.
2.
3.

4.

Take Quran document


Conversion of Quran document into XML format
Pass XML file to ontology extractor
i.
For each sentence of document
ii.
Perform tagging of each sentence in
subject, object and predicate form
iii.
Store Ontological Knowledge into
Ontological Knowledgebase
Take query of user at runtime

i.

5.

Perform Part of Speech tagging in POS


Tagger
ii.
Perform Morphological Analysis and find
frequent items in the query
iii.
Perform Synonyms Identification
iv.
Retrieve Ontological Knowledge from
ontology extractor
Validate concept knowledge on Application layer
i.
Transfer validated knowledge to Ontology
Extractor for knowledge refinement

184

ISSN 1013-5316; CODEN: SINTE 8

ii.

6.

Change retrieval policies according to


validated concepts
Goto step 4

DISCUSSION
The concept formulated in the proposed model and
algorithm is to provide concept based Quranic searching. To
justify proposed model and algorithm three parameters have
been selected i.e. efficiency, accuracy and unbiased
searching of the Quranic text. Although there are currently
many Quranic softwares and databases are available which
are performing good searching. Yet these softwares are
capable to find most significant words in query and then
retrieve those verses in Quran from database that contains
such kind of keywords in them [4-8, 11-12] whether or not
they are required. Critical analysis of these tools implies that
keyword based searching does not have any ability to
entertain the query of user properly as there are many verses
in Quran which actually doesnt contain any explicit word
yet they possess many hidden concepts in them. For instance
Sura e Rehman contains many different concepts and it is
clear from table 1 that different searching tools have
different results for any particular word.
Other then the accuracy another critical issue is that this
cycle will be repeated every time any of the concept is
searched which results in slow and seemingly wrong results.
On the contrary proposed model extracts concepts from this
verse and tag each pronoun to a particular noun with which
it represent.
There exist software [16] which perform ontology based
Quranic search using NLP for knowledge acquisition but the
limitation of this software is, it is consulting Hadith for
getting the answers of many concepts that are not clearly
reveal in Quran. This phenomena is pointing towards an
illusion that there exist many Hadith books and every writer
is providing its own interpretation about different Hadith so,
there is a possibility exist extracted concepts about user
query actually confusing the user through providing different
interpretation on a single search. Such software also subsist
which have predefined concepts [15,14] and give the
answer of every search of user in fixed means, like if they
have concept ALLAH is one, then if user search ALLAH
they every time give same verse assuming that all concepts
of Quran have been listed. While Quran is termed as living
book among Muslims and still a lot of research is being
made by scholars to understand Quranic concepts so there is
a sheer need to update the concepts and to bind new
concepts with relevant verses. CEQSE provides a theoretical
framework having the ability to refine its search through
evolving and improving its concepts from user validation.
Therefore, it is able to provide efficient searching with better
understating of Quranic Verses, delivering the deep
understanding of meanings.
CONCLUSION
This paper propose a theoretical framework providing a
comprehensive basis for implementing semantic based
concept extraction engine for Quranic search. The effective
results of search provide better understanding of underlying
concepts of Quranic Verses to user. In future an application
will be developed for Quranic search using this framework,

Sci.Int(Lahore),26(1),181-184,2014

in addition with an extensive knowledgebase holding vast


textual information.
REFERENCES
[1] A. Rippin, The Blackwell Companion to the Quran. p18
(2006).
[2] M. Mutahhari, Understanding the Uniqueness of the
Quran. Vol I No. (1984).
[3] E. H. Onen, S. Saarela, and K. Viljanen, Ontogator:
Combining View- and Ontology-Based Search with
Semantic Browsing. In: Proceedings of XML Finland
2003, Open Standards, XML, and the Public Sector,
Kuopio, October, 2003.
[4] A. Shawar, Atwell, An Arabic Chabot giving answers
from the Qur'an in Bel Proceedings of TALN04: XI
Conference.
Vol.
2,
(2004)
[5] Search Truth, an online QURAN and Hadith search
web portal, retrieved on January 1, 2012
http://www.searchtruth.com/
[6] Guided Ways Technologies, an online QURAN and
Hadith search web portal, retrieved on January 1,
2012, http://www.guidedways.com/index.php
[7] IslamiCity, an online QURAN search web portal with
translation in different languages, Retrieved on
January 2, 2012
http://www.islamicity.com/mosque/quran/
[8] University of Southern California, Centre for MuslimJewish Engagement provides Quran search database on
web portal, Retrieved on January 2, 2012
[9] Y. Kotb, K. Gondow, and T. Katayama, The SLXS
Specification Language for Describing Consistency of
XML Documents, Proc. Of the Fourth Workshop on
Information and Computer Science (WICS2002),
IEEE Comp. Soc., El-Damam, Saudi Arabia, pp. 289304, March 2002.
[10] Y. Kotb, K. Gondow, and T. Katayama, The XML
Semantics Checker Model, Proc. of the Third
International Conference on Parallel and Distributed
Computing, Applications and Technologies
(PDCAT02), Kanazawa, Japan, pp. 430-438,
September 2002.
[11] S. Saad and N. Salim, Build Islamic Ontology based
on Ontology Learning, Postgraduate Annual
Research Seminar 2007, (3-4 July 2007 ).
[12] S. Saad, N. Salim and N. Omar, Keyphrase Extraction
for Islamic Knowledge Ontology, IT symposium Vol.
2, pp. 1-6 (on 26-28 Aug 2008).
[13] M. Yahya, H. Khalifa, A. Bahanshal, I. Odah, N.
Helwah, An ontological model for representing
semantic lexicons: an application on time nouns in the
holy quran The Arabian Journal for Science and
Engineering, Volume 35, Number 2C in December
2010
[14] M. Nadeem, H. Ullah, M. Imran, M. Sikandar,
Relational WordNet Model for Semantic Search in Holy
Quran. Muhammad Shoaib, International Conference
on Emerging Technologies IEEE, 2009
[15] A.Yunus, R. Zainuddin and N. Abdullah, Semantic
Query for Quran Documents Results IEEE Conference
on Open Systems (ICOS 2010), December 5-7, 2010,
Kuala Lumpur, Malaysia
[16] S. Saad, N. Salim, S. Zainuddin, An Early Stage of
Knowledge Acquisition Based on Quranic Text
International Conference on Semantic Technology
and Information Retrieval 28-29 June 2011,
Putrajaya, Malaysia

Você também pode gostar