Presentation On Finding Relation Among Medical Terms

FINDING semantic
relationship among associated

medical terms
Submitted By:
Manisha Singh(111497)
Sneha Bairagi(111717)
Abhinav Rai (511004)
Introduction
With the continuous digitisation of medical knowledge, information extraction
tools become more and more important for practitioners of the medical
domain. In this project we tackle semantic relationships extraction from
medical texts.
In this project, we have focused on Disease-Medicine co-occurrence
relationship extraction from the text of the literature. A large-scale and
accurate list of drug-disease treatment pairs derived from published
biomedical literature can be used for drug repurposing.
PROPOSED SYSTEM
Information extraction is the identification of specific
information in unstructured data sources, such as natural

resources text.
First task identifies and extracts informative sentences on
diseases and treatment topics.
The second one performs a finer grained classification of
these sentence according to semantic relation that exist
between diseases and treatments.
Implementation
Steps involved are:

Obtaining documents from the web containing medical data.
Perform tokenization.
Perform stemming.
Perform POS tagging.
Perform annotation.
Find disease-treatment pairs using pattern matching.
Tokenization
Tokenization is the process of breaking up the given text into
units called token. The tokens may be words or number or

punctuation mark.
Tokenization does this task by locating word boundaries. Ending
point of a word and beginning of the next word is called word
boundaries. Tokenization is also known as word segmentation.
Stemming
Stemming is the term used in linguistic morphology
and information retrieval to describe the process for
reducing inflected words to their word stem, base
or root form-generally a written word form.
Existing stemming algorithms are :
Truncate(n), Lovins Stemmer, Dawson

Porters Stemmer.
We are using porters stemmer.
stemmer,
POS tagging
Part-of-speech tagging (POS tagging or POST), also
called grammatical tagging or word-category

disambiguation, is the process of marking up a word in a
text (corpus) as corresponding to a particular part of
speech , based on both its definition, as well as its context
i.e. relationship with adjacent and related words in a
phrase, sentence, or paragraph.
The process of assigning a part-of-speech to each word in
a sentence.
VITERBI ALGORITM
Given
a) start state: s1
b)alphabet A={a1 a2 an}
c)Set of states S={s1 s2 .. sn}
d) Transition probability.
Data structure
1. A N*T array called SEQSCORE to maintain the winner sequence

always(N=#states, T=length of O/P sequence).
2.Another N*T array called BACKPTR to recover the path.
Steps
1.Initilization
SEQSCORE(1,1)=1.0
BACKPTR(1,1)=0.0
For(i=2 to N) do
SEQSCORE(i,1)=0.0
(expressing the fact that first state is S1)
2 Iteration
for(t=2 to T) do
for(i=2 to N) do
SEQSCORE(i,t)=max(j=1,N)
BACKPTR(i,t)=index j that gives the max above.
3 Sequence identification
C(T)= i that maximizes SEQSCORE(i,T)
for i from (T-1) to 1 do
C(i)=BACKPTR[C(i+1),(i+1)]
Example
[a1,0.3]
[a1,0.3]
s1
[a1,0.1]
[a2,0.2]
[a2,0.4]
s2
[a1,0.2]
[a2,0.2]
[a2,0.3]
Tabular representation
A1
A2
A1
A2
S1
1.0
0.1
0.09
.012
.0081
S2
0.0
0.3
.06
.027
.0054
Probability table
E
S1
s2
A1
A2
A1
A2
BACKPTR Table
Annotating Corpora and Searching patterns

Sentences are tagged with disease entities from the clean
disease lexicon and drug entities from the drug list.

Pattern is searched between disease and drug :
- in,
- in the treatment of,
- for,
- in patients with,
- for the treatment of,
- treatment of,
- therapy for,
- therapy in etc.
Algorithm
Input: Disease, Rules.
Output: Medicine, Semantic Relationship.
1. For any disease do
Extract paper form Medline.
2. Tokenize the document.
3. Remove all stopwords.
4. Perform stemming.
5. POS tagging is preformed to separate required part of speech.
6. convert this corpora to annotated corpora.
7. From annotated sentences
Extract sentence having atleast one medicine and one
disease.
8. Pattern is searched between disease and medicine.
9. Medicines are associated and ranked based on frequency and
superiority.
10. Semantic relationships are then presented to user.
HARDWARE requirements
PROCESSOR
RAM
HARD DISK
: PENTIUM IV
: 256 MB
: 40GB
SOFTWARE REQUIREMENTS
FRONT END
: JAVA SWING
OPERATING SYSTEM
: WINDOWS XP/7
TOOL
: ECLIPSE
Deliverables
Rapid access to information regarding potential immunizations.
Medicines ranked on the basis of their frequency.
Can be used in medicine repurposing.
Can provide knowledge to doctors about new drugs available for
disease by processing biomedical literature and clinical trial studies.
Extension Possibility
It can extended to extract information regarding cure, symptoms and
prevention of disease.
It can help in finding the root cause of the disease and then by taking the
patient history or condition and providing him the dose accordingly. It is based
on viewing the composition of medicine and after applying it on patient report
identifying that is it be suiting him .
References
Rong Xu and QuanQiu Wang Large- scale extraction of accurate drug-
disease treatment pairs from biomedical literature for drug repurposing,

Issue 2013.
Fadi Yamout, Further Enhancement to the Porters Stemming
Algorithm, Issue 2006.
Ray S and Craven M,Representing sentence structure in Hidden Markov
Models for information extraction, Proceedings of IJCAI-2001.
M. S. Ryan and G. R. Nudd., The Viterbi Algorithm, Department of
Computer Science, University of Warwick, Coventry,England,Issue 1993.
Jesse Davis jdavis Mark Goadrich, The Relationship Between PrecisionRecall and ROC Curves, Department of Computer Sciences and
Department of Biostatistics and Medical Informatics, University of
Wisconsin-Madison,USA.
Thank you

Presentation On Finding Relation Among Medical Terms

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Presentation On Finding Relation Among Medical Terms

Enviado por

Direitos autorais:

Formatos disponíveis

FINDING semantic

relationship among associated

information in unstructured data sources, such as natural

Steps involved are:

Tokenization is the process of breaking up the given text into

units called token. The tokens may be words or number or

Truncate(n), Lovins Stemmer, Dawson

called grammatical tagging or word-category

1. A N*T array called SEQSCORE to maintain the winner sequence

Annotating Corpora and Searching patterns

disease lexicon and drug entities from the drug list.

disease by processing biomedical literature and clinical trial studies.

disease treatment pairs from biomedical literature for drug repurposing,

Você também pode gostar