Escolar Documentos
Profissional Documentos
Cultura Documentos
Introduction
With the continuous digitisation of medical knowledge, information extraction
tools become more and more important for practitioners of the medical
domain. In this project we tackle semantic relationships extraction from
medical texts.
In this project, we have focused on Disease-Medicine co-occurrence
relationship extraction from the text of the literature. A large-scale and
accurate list of drug-disease treatment pairs derived from published
biomedical literature can be used for drug repurposing.
PROPOSED SYSTEM
Information extraction is the identification of specific
Implementation
Tokenization
Stemming
Stemming is the term used in linguistic morphology
and information retrieval to describe the process for
reducing inflected words to their word stem, base
or root form-generally a written word form.
Existing stemming algorithms are :
stemmer,
POS tagging
Part-of-speech tagging (POS tagging or POST), also
VITERBI ALGORITM
Given
a) start state: s1
b)alphabet A={a1 a2 an}
c)Set of states S={s1 s2 .. sn}
d) Transition probability.
Data structure
Steps
1.Initilization
SEQSCORE(1,1)=1.0
BACKPTR(1,1)=0.0
For(i=2 to N) do
SEQSCORE(i,1)=0.0
(expressing the fact that first state is S1)
2 Iteration
for(t=2 to T) do
for(i=2 to N) do
SEQSCORE(i,t)=max(j=1,N)
BACKPTR(i,t)=index j that gives the max above.
3 Sequence identification
C(T)= i that maximizes SEQSCORE(i,T)
for i from (T-1) to 1 do
C(i)=BACKPTR[C(i+1),(i+1)]
Example
[a1,0.3]
[a1,0.3]
s1
[a1,0.1]
[a2,0.2]
[a2,0.4]
s2
[a1,0.2]
[a2,0.2]
[a2,0.3]
Tabular representation
A1
A2
A1
A2
S1
1.0
0.1
0.09
.012
.0081
S2
0.0
0.3
.06
.027
.0054
Probability table
E
S1
s2
A1
A2
A1
A2
BACKPTR Table
Algorithm
Input: Disease, Rules.
Output: Medicine, Semantic Relationship.
1. For any disease do
Extract paper form Medline.
2. Tokenize the document.
3. Remove all stopwords.
4. Perform stemming.
5. POS tagging is preformed to separate required part of speech.
6. convert this corpora to annotated corpora.
7. From annotated sentences
Extract sentence having atleast one medicine and one
disease.
8. Pattern is searched between disease and medicine.
9. Medicines are associated and ranked based on frequency and
superiority.
10. Semantic relationships are then presented to user.
HARDWARE requirements
PROCESSOR
RAM
HARD DISK
: PENTIUM IV
: 256 MB
: 40GB
SOFTWARE REQUIREMENTS
FRONT END
: JAVA SWING
OPERATING SYSTEM
: WINDOWS XP/7
TOOL
: ECLIPSE
Deliverables
Rapid access to information regarding potential immunizations.
Medicines ranked on the basis of their frequency.
Can be used in medicine repurposing.
Can provide knowledge to doctors about new drugs available for
Extension Possibility
It can extended to extract information regarding cure, symptoms and
prevention of disease.
It can help in finding the root cause of the disease and then by taking the
patient history or condition and providing him the dose accordingly. It is based
on viewing the composition of medicine and after applying it on patient report
identifying that is it be suiting him .
References
Rong Xu and QuanQiu Wang Large- scale extraction of accurate drug-
Thank you