Memorial Da Disciplina de Informática Na Saúde

Memorial da Disciplina de Informática em Saúde Ministrada
pelo Prof. Dr. Mauro Oliveira

Memorial of the discipline of Information Technology in Health,
conducted by Phd. Mauro Oliveira
Gerson Vieira Albuquerque Neto 1
1 Introdução
Esse documento tem como objetivo demonstrar a contribuição que a cadeira de
Informática em Saúde, ministrada pelo Professor Dr. Mauro Oliveira, no semestre 2017.2 no
Mestrado Acadêmico de Ciências da Computação do IFCE, Campus Fortaleza.
A metodologia utilizada na disciplina foi a de ensino "Aula às Avessas", onde o aluno é
incentivado a falar em sala de aula tanto quanto o professor. Além disso atividades semanais
fazem com que o aluno estude o assunto através de fontes alternativas de ensino que serão
explicadas posteriormente.
Assim, como trabalho final da cadeira foi elaborado esse memorial para que se tenha um
registro da contribuição científica para o IFCE e a comunidade. Com o desenvolvimento de
atividades extra sala de aula, o aprendizado em curso paralelo via plataforma EAD, a
participação em estudos acadêmicos que geraram artigos científicos e a submissão de artigo
em evento internacional e sua eventual aceitação, foi possível ter visão das contribuições
2 Fundamentação teórica
O Professor Mauro apresentou à turma no início do semestre a proposta da metodologia
de "Aula às Avessas", cujo o maior objetivo é um envolvimento dos alunos nas aulas
ministradas e em atividades que levem durante a cadeira o conhecimento adquirido nos
encontros semanais para a comunidade.
Nas aulas havia discussões sobre tópicos da cadeira, sobre a essência de um mestrado,
além da participação de alunos que haviam concluído ou estavam em fase de conclusão. Todos
esses alunos desenvolveram trabalhos de Informática na Saúde.
Além das aulas ministradas em sala de aulas, havia uma estrutura de exercícios:
• Curso de Informática na Saúde Online;
• Leitura e análise crítica de artigos nacionais e internacionais;
• Vídeos para entendimento do avanço tecnológico de aspectos relevantes para o assunto;
1Instituto Federal de Educação, Ciência e Tecnologia do Ceará, IFCE

{gerson.vieira@ppgcc.ifce.edu.br}
Memorial da Disciplina de Informática em Saúde Ministrada pelo Prof. Dr. Mauro Oliveira
• Discussões frequentes em sala de aula sobre o que significa fazer um mestrado e ser um
mestre;
• Participação em apresentações de Qualificação e Defesa de Mestrado sobre os temas

discutidos em sala de aula;
2.1 Exercícios
Foram 5 listas de exercícios, envolvendo itens que se dividiam em: Vídeos sobre temas
de saúde ou de inovação tecnológica, leitura de artigos do Professor Mauro, Curso Online
(EAD), Leitura de artigos publicados, na área de informática na saúde, análise de qualificações
e defesas de mestrado e utilização de ferramentas das tecnologias utilizadas na cadeira.
2.2 EAD
A plataforma Coursera possui cursos de uma grande variedade. Para a cadeira foi
utilizado o curso de Informática na Saúde 2, que possui vídeos e exercícios sobre o assunto.
Para quem quiser, pode pagar um valor e receber um certificado oficial do curso.
2.3 Leitura de Artigos

A leitura de artigos, tinha como resultado a elaboração de resumo com os pontos mais
relevantes, em Inglês. Abaixo a lista dos artigos lidos:
• 2010: “A Context-Aware Framework for Health Care Governance Decision-Making

System”.(IEEE PUBLICATION TITLE: Second IEEE Workshop on Interdisciplinary
Research on E-health Services and Systems, Montreal, June 14 2010). 2) 2011:
"Integrating Mobile Devices In a Brazilian Health Governance Framework". (2011 ACEEE
PUBLICATION TITLE: International Conference on Advances of Information &
Communication Technology in Health Care (ICTHC 2011)), Dec 15-16, 2011 at Jakarta,
Indonesia.
• 2012: “LARIISA 2.0 , A Platform for Data Integration of Public Health System in Cloud
Computing Environment 2012 – ADVANCE PUBLICATION TITLE: International
Workshop on ADVANCEs in ICT Infrastructures and Services, 2012, Aracati (Ce).
• 2013: “CLARIISA, a Context-aware Framework based on Geolocalization for a Health

Governance SystemA”. (2013 – IEEE HEALTHCOM PUBLICATION TITLE: 15TH
International Conference on eHealth Networking, Application & services, 0ctober 9-12
2013, Lisboa, Portugal.
• 2014: “Using Bayesian Networks to improve the DecisionMaking Process in the Public
Health System”. (2014 – IEEE HEALTHCOM PUBLICATION TITLE: 16TH International
Conference on e-Health Networking, Application & services, 0ctober 9-12 2014, Natal,
Brazil.
2Interprofessional Healthcare Informatics

{www.coursera.org/learn/health-informatics-professional/}
Disciplina de Informática na Saúde 2

• 2014: “Towards A Cost-Effective Homecare for A Public Health Management System In

Brazil”. (2014 – IEEE HEALTHCOM PUBLICATION TITLE: 16TH International
Conference on e-Health Networking, Application & services, 0ctober 9-12 2014, Natal,
Brazil.
• 2017: “Using Linked Data in the Data Integration for Maternal and Infant Death Risk of the
SUS in the GISSA Project”. (Published in: WebMedia – Simpósio Bras de Sist Multimídia
e Web, Gramado, RGS – Brazil). 2) 2017: “Using Predictive Classifiers to Prevent Infant
Mortality in the Brazilian Northeast”. (Published in IEEEHealthcom Dalian, China).
2.4 2.4 Slide

O assunto Informática na Saúde vem sendo trabalhado por outros alunos em diferentes
instâncias. Foram analisadas apresentações de Eventos e de Defesas, em alguns casos tendo
a possibilidade de participar das defesas e dar sugestões de melhoria. Abaixo segue lista dos
trabalhos analisados:
• 2014 – IEEEHealthcom NATAL – Lariisa Homecare
• 2015 Projeto Lariisa – Leonardo Gardini
• 2017 Projeto NextSAUDE - Henrique Mota
• 2017 Plataforma Marcia – Fabio José
2.5 Ferramentas
Nas listas de exercícios existiam práticas com ferramentas relevantes para entender
alguns dos Artigos analisados, como a ferrameta Protegé. Essa ferramenta é utilizada para a
criação de Ontologias.
2.6 Projetos
Participação em Projetos em andamento com alunos de outras instituições.
Oportunidade de criticar e participar ativamente dos trabalhos.
No caso do autor desse Memorial, após uma dessas participações, onde foram feitas
sugestões com relação a como organizar os dados e mudar a ferramenta do Weka para o
Matlab em sua versão Trial, tivemos a aceitação de um Full Paper no Advance 2018, em
Santiago no Chile.
2.6.1 Mineração de Dados Para cada aluno foi passado um tema que envolve os
trabalhos que vem sendo desenvolvidos e escritos. Os três temas foram: IoT, Data
Mining e Ontologia e Linked Data.
Após essa divisão os alunos passaram a trabalhar com as pessoas que já estavam
envolvidas nos artigos para contribuir com análise do conteúdo atual e possíveis sugestões de
melhoria nos trabalhos já em andamento.

2.6.2 Avaliação de Artigos Ainda relacionado ao assunto do projeto, foi feito um

direcionamento para os Artigos sobre o tema e cada um desses artigos já estavam em
andamento para publicação em algum evento.
No artigo que envolve Data Mining havia uma análise sobre uma base de dados do
Governo Federal com a relação de Nascidos e em caso de morte, o cadastro na base de
Nascidos Mortos. O objetivo do autor era a partir dessa análise mostrar a probabilidade que
tinha um recém-nascido de ir a óbito, dados os levantamentos feitos durante a gravidez e na
hora do parto. A contribuição dada veio de questionamentos sobre o balanceamento na
quantidade de amostras a serem classificadas, na tradução dos atributos das amostras de
"Letras" para números que significassem um real peso nos cálculos, e da aplicação da
classificação de dados de teste em outros classificadores desenvolvidos no Matlab.
2.6.3 Aplicação em saúde (Artigo ADVANCE) O artigo (1) possui aplicação na saúde
uma vez que, o cálculo da Probabilidade de morte de um recém-nascido no seu
nascimento ajuda a equipe médica a priorizar os cuidados tanto à mãe, quanto ao bebê
quando comparado a outros pacientes.
Com o tempo esse mesmo estudo que vem sendo aprimorado com os dados dessas
bases de dados específicas, pode ser ampliado para outras análises, seja com a utilização dos
mesmos dados para um diferente direcionamento ou através de bases novas. A ideia de análise
de uma base de dados, e a determinação de quais atributos são importantes para o cálculo
probabilístico de um evento acontecer, torna o projeto maleável para que a mineração de dados
se torne o ponto mais relevante, uma vez que a ideia inicial foi aplicada e aprovada.
Por exemplo, a análise de outras bases pode inferir outro tipo de probabilidade, como a
de uma criança desenvolver algum tipo de doença no momento do nascimento com esses ou
novos atributos, já vez que a mineração e a análise dos dados mostrou relação para um
diferente resultado.
3 Contribuição
Foi possível na cadeira participar ativamente de artigos em andamento. Através de
revisões, desenvolvimento e análise de conteúdos já pronto, os alunos matriculados na cadeira
fizeram artigos que foram submetidos a eventos da área. No meu caso, os artigos (1) e (2)
(Submetido para o EATIS 2018, ainda não obtivemos resposta).
Nos referidos artigos houve participação ativa para inclusão de diferentes classificadores
e ferramentas na categorização de casos de Mortalidade Infantil e Melhor classificação dos
casos de Dengue, Zika e chikungunya. A utilização de ferramentas extra serviu para comprovar
a eficácia da ferramenta WEKA, utilizada em outros estudos realizados previamente. Os artigos
estão como anexo nesse memorial, para futura referência.
4 Conclusão
A experiência nessa cadeira foi muito relevante para o mestrado. Aulas, trabalhos,
conversas, prazos apertados e aceitações nos fazem crescer como profissionais e futuros

mestres. A metodologia utilizada tira o aluno de sua zona de conforto para que possa ir atrás
de desenvolver suas habilidades.
Para a comunidade os alunos tiveram a oportunidade de gerar artigos científicos na área
de Saúde que são pesquisas, no caso dos artigos dessa cadeira, relevantes para a utilização
de tecnologia remotas que serão capazes de salvar vidas em casos de emergência.
Também ao se dedicar a escrever esse Memorial foi possível montar na mente o quebra
cabeça de tudo que foi trabalhado, e ter a visão da contribuição acadêmica dessa metodologia
e da cadeira.
Referências
1 RAMOS, R. et al. Using predictive classifiers to prevent infant mortality in the brazilian
northeast. IEEEHealthcom, Dali - China, 2017.
2 Intelligent solution for classification of diseases transmitted by vector aedes aegypti.

EATIS, France Paris, 2018.

6
ANEXO I
Intelligent Solution for Classification of
Diseases Transmitted by Vector Aedes
Aegypti
Abstract—Several physical or emotional factors can contribute negatively to critical moments in the
health area, negatively influencing the diagnosis of diseases. Therefore, this work proposes an intelligent
solution based on classifiers as an inference mechanism capable of assisting health professionals during
the process of clinical management of diseases transmitted by the Aedes Aegypti mosquito, identifying the
most probable diagnosis based on symptoms and outcome of exams. Thus, two learning models capable
of inferring the probability of a patient being infected with a particular disease were applied, with an
accuracy up to 91.6%. An intelligent API to support decision-making was then built during the clinical
management of dengue and chikungunya. The solution allows several applications to access learning
models. As proof of concept, a mobile application of popular consultation for the identification of dengue
and chikungunya was also developed, still in the prototyping phase.
Index Terms—dengue, chikungunya, health system, data mining, machine learning, classification
I. ABSTRACT
T he Artificial Intelligence (AI) is a subarea of Computer Science that studies ways of

reproducing human reasoning [1]. AI-based systems suggest actions and predict events
based on data analysis. In recent years, intelligent computer systems have been proposed,
capable of solving more general problems and learning in an autonomous way, in addition to
interacting with each other and with human beings. In the 1980s, AI and medical researchers
joined efforts to define the field of Artificial Intelligence in Medicine (AIM), which has brought a
breakthrough in computer systems capable of assisting specialists in medical diagnosis [2].
Now, many AIM researches have developed innovative applications and solutions, improving
life quality to many people and assisting health professionals in complex procedures involving
decision-making [3]. Such systems are capable of inferring new knowledge from a set of
examples. For this purpose, Machine Learning (ML) mechanisms are trained and adjusted to
the context of the problem. This process contains a series of complex steps and actions that
influence the end result in a number of ways.
Diseases of endemic traits, for example, require increased attention because they can
spread easily, such as dengue fever, chikungunya fever, and fever caused by the Zika virus,
which are part of the of compulsory notification diseases added to the Brazilian Notification of
Injury Information System (SINAN). These diseases have reached several states in the country,
causing epidemics in several regions. The fight against the Aedes Aegypti mosquito has
become the main object of a public health campaign in Brazil, according to the Ministry of
Health. More than 20 million reais were released in 2016 in campaigns and actions to fight the
mosquito [4]. A number of initiatives have been taken to stem its progress, yet it develops rapidly
and, in favorable environments, reproduces easily.
From the open data provided by the web portal of the city of Recife (PE) in Brazil,
thousands of cases have been extracted and already classified, related to the diseases in this
7
context. These cases gave subsidies to the classification algorithms used in this work, which
were trained to classify new unknown cases. The system developed in this work serves more
than one stage of clinical management, therefore, this training process was carried out in two
stages: one to assist professionals in the stage of suspected disease, analyzing only symptoms
and results of rapid exams; and another focused on the final diagnosis, also taking into account
the results of more specific tests. The developed system has two main components: the
connection module and the inference module. The connection module receives REST requests
(Representational State Transfer) with the attributes made available by the application.
Depending on the case or stage of clinical management, information about symptoms, health
history or clinical exams performed by the patient in question will be received. This information
is treated and sent to the inference module which, based on ML techniques, estimates the
likelihood of a patient having contracted one of the diseases. This module analyzes a set of
diagnosed cases of the diseases in question, training the ML algorithms used in the
classification process.
This work aims to develop an intelligent solution, based on Machine Learning, capable of
classifying diseases transmitted by the Aedes Aegypti vector, contributing to the suspicion and
diagnosis of these diseases, supporting the whole process of clinical management of such.
II. RELATED W ORK

There are several applications in the literature for each of the approaches and strategies
of data classification. AI has evolved rapidly and its applications in many areas of knowledge
have solved several problems of different levels of complexity.
Oliveira uses genomic data to classify clinical forms of dengue. The work makes an in-
depth study of aspects of genetic polymorphisms rather than treating clinical or laboratory data
[5]. Data were obtained from 105 patients from the LaviTE dengue cohort. The base has 26
cases of Dengue Hemorrhagic Fever, 49 cases of Complicated Classical Dengue and 30 cases
of classical dengue. Data were obtained through the application of mass genotyping techniques
(Illumina). The PMC model, which is based on RNA, classifies cases of severe dengue with
accuracy of 85%.
The work of Teles et al. Presents a SAD aimed at the diagnosis and identification of
dengue severity. The system makes use of Bayesian networks to aid in the diagnosis in cases
of uncertainty [6]. The proposed model analyzes user data (symptoms) and infers about its risk:
low, medium or high. The work is a component of the LARIISA framework, discussed in [7]. This
system has other decision mechanisms in its interface. Although it didn’t test other approaches,
the work shows good results.
The work presented in [8] carries out a study as proof of concept of the LARIISA project
(Laboratory of Intelligent and Integrated Networks Applied to Health System) applied to the
dengue scenario. The system uses Context Awarenness ontologies and concepts to identify
areas at risk of disease outbreak. Such information will serve as both support for decision
making of managers and to alert and prevent the population. For this, the system counts on
several actors, among them, the smartphone and Digital TV.
In [9], the researchers propose an intelligent system based on ontologies capable of

determining areas at risk of infection. The proposal collects case notifications in a collaborative
way and analyzes the data through intelligent models. Applying heuristics, the system can

8
predict a region with high epidemic probability before it occurs. Thus, the system can collaborate
with managers in the decision-making process and help users with alerts of risk areas. The
THING, as it is known, is a module of framework LARIISA [7], a model for health management.
The system is designed for any type of notification, but is best suited to the context of endemic
diseases. The collaborative strategy brings agility to the systems with the reliability of the data,
which can be provided by any user.
Based on several recent literature studies, this study extends the research published in
[10], which proposed a mobile application capable of classifying cases of dengue and
chikungunya using ML methods. The work analyzed data from diagnosed cases of the diseases
in question to train learning algorithms and predict cases of risk. The previous proposal focused
on the first care, evaluating only the symptoms of the diseases in question and some pre-
existing diseases.
III. METHODOLOGY
It is possible to find in the literature several methodologies that propose optimize the
processes in obtaining new knowledge through the Data Mining (DM) or Knowledge-Discovery
in Databases (KDD) [11], [12]. New methodologies include the application of algorithms based
on machine learning that analyze databases and generate learning models capable of inferring
[13]. The selection of the best algorithms, as well as the treatment of the data for the context
under analysis, is performed by the data scientist. This work is based on a simplified
methodology based on classical data mining methodologies. The figure 1 shows the sequence
of steps of the methodology proposed in this work, which was divided into three phases:
5 A. Data Collection and Integration

Machine Learning methods represent the knowledge acquired from the set of
experiences observed. Thus, the greater the sample equivalence in relation to the population
(all examples) the better the model. Therefore, if we have more
Fig. 1. Stages of the Methodology Adopted in this Work.

9
real cases of the diseases in question, the better the scope of the algorithm. However, it must
be taken care not to overfit, which means that when there are many similar examples, it enables
the algorithms for only those cases (sample), presenting poor results when classifying different
cases. It is therefore ideal that the examples be the most generic (different from each other)
possible, giving a more complete view to the learning algorithms [1].
Obtaining real data for analysis is always a challenge for MD, especially health data. With
research developed in hospital charts and public databases, two databases were obtained with
cases of dengue and chikungunya of 2015 and 2016. Then an integration was performed,
resulting in a single database containing 33 attributes. In total, there were 20,138 recorded
cases that included cases of dengue fever, chikungunya and other unclassified, inconclusive or
discarded cases. Data were obtained from the open data portal of the City of Recife / PE - Brazil,
available at [14]. 20,137 cases were extracted, being 10,513 dengue and 1,274 chikungunya.
4,713 Samples were removed for incomplete data. Some cases, classified as inconclusive,
were removed for lack of information. Cases that are neither dengue nor chikungunya have
been labeled as others, which means that it can be any other disease.
6 B. Attribute Selection and Data Filtering

During the training phase, the influence of the attributes on the classification of the data
was analyzed. It was verified that there are attributes that influence and others that do not
influence the results, so a selection of the most relevant attributes was performed. Many
incomplete fields can also disrupt, as it is not known which value presented that attribute.
Consequently, it was necessary to filter the incomplete cases at the base, leaving 1,333 cases
of dengue, 1,273 cases of chikungunya and 1,624 cases that could represent any other disease,
including ziKa.
The table I shows the symptoms present in the collected data. The technical names
have been converted to facilitate understanding.
Data on the patient’s health history are of vital relevance for the diagnosis of a disease.
This is because, depending on the patient’s illnesses, the symptoms may present different
Table I
MAIN SYMPTOMS PRESENTED BY PATIENTS.
Symptoms
Fever
Nausea
Vomiting
Arthritis
Conjunctivitis
Headache
Back pain
Muscle aches
Arthralgia intensa
Pain around the eyes Red spots on skin
Red dots on the skin

10
behaviors. In addition, these symptoms may be related to such diseases, which may confuse
the diagnosis. Therefore, some preexisting diseases were also considered as attributes. These
are shown in Table II.
Table II
PRE-EXISTING DISEASES.
Diabetes
Blood diseases
Liver disease
Kidney disease
Hypertension
Disease in the stomach
Autoimmune diseases
The collection of exams is an important step in the clinical management process, and its
results added value to the diagnosis, making it more accurate. Thus, this study considered the
results of exams in the model proposed in this study. Some more specific tests, such as
serology, for example, are still very expensive or time-consuming and are not always available
in the public health network [15]. Therefore, health professionals first request more immediate
examinations in order to have a faster pre-diagnosis of the problem, discarding some
hypotheses depending on the case. Table III shows these exams.
Table III
EXAMS REQUESTED DURING CLINICAL DISEASE MANAGEMENT.
Tie Test
Hemogram (Leucopenia)
Chikungunya serum 1
Chikungunya serum 2
PRNT exam
Serological dengue
ELISA Examination
Virus isolation
PCR Examination
Data balancing can both speed up processing and improve results in some cases. This
is because some attributes may hinder the learning process, confusing the algorithms. Thus,
an automated attribute selection mechanism was applied, which consists of truncating less
relevant attributes to highlight those with greater significance cite
Hall2003BenchmarkingMining. Some algorithms presented better results after the selection of
attributes, such as the Nayve Bayes classifier, while others had their accuracy impaired.
Therefore, this work applied attribute selection only to cases where there was improvement.
32 attributes of several stages of the clinical management process were treated,

considering only the cases that contained at least the duly filled symptoms. None of the cases
presented all the results of specific tests completed. This is because health professionals, from
the outset, work with a diagnosis hypothesis, requesting tests only for one type of suspicion.

11
This lack of data hampers learning from the algorithms that were modeled to consider these
exams only as unrealized.
7 C. Data Balancing and Normalization

Classification algorithms learn by analyzing a set of experiments. However, if an algorithm
learns more about one particular experiment (class) than another, it will tend to classify it. So,
depending on the problem, it is not interesting to have an unbalanced database. In the problem
of disease classification there can be no tendency, as it is so important to classify both diseases.
The data balancing step is usually performed prior to attribute selection and data cleansing. But
after some observations, it has been realized that the data truncated in the cleaning process
can further unbalance the classes. Therefore, this research chose to postpone this process step
to the last stage of preprocessing. The database treated in the first phase (suspicion) has 1,133
cases of dengue, 1,273 cases of chikungunya and 1,624 unidentified cases that were
considered for other diseases. Of these, 208 cases of dengue, 69 cases of chikugunya and 536
cases of cases of other diseases were selected for the diagnostic phase, which include test
results. Therefore, in order to obtain a better classification, a balancing of the data was
performed. Two balancing algorithms were applied for the tests:
• SMOTE: Performs interpolation between close examples of minority classes, creating
synthetic examples for these classes. The technique only reaches one class at a time,
needing to process several times if there are more than one unbalanced class [16].
• Resample: As its name implies, it is a re-sampling technique. The algorithm performs the
balancing through the replication (copy) of some examples, which can influence any of the
classes (majority and minority), depending on the configuration.
Although they used different strategies, the two balancing methods showed good results.
Care was taken that there was no over-adjustment, avoiding creating or duplicating synthetic
examples, using the following algorithms in moderation. The table IV shows the best balances
achieved by the algorithms. The different techniques significantly influenced the results
of the classification algorithms, affecting them positively and negatively, depending on the case.
Thus, this work opted for the method that presented the best result for the algorithm tested.
We tried to achieve a better balanced boundary between classes and less use of
balancing algorithms. After several tests, the following settings were achieved. For the first
phase the SMOTE was applied twice, with 45% and 30%, reaching
Table IV BALANCING RESULTS.
Dengue Chikungunya Others
Suspicion
SMOTE 1.642 1.654 1.624

Resample 1.586 1.603 1.645
Diagnosis
SMOTE 536 483 520

Resample 595 501 529

12
one unbalanced class at a time. In Resample a class-to-class ratio of 0.9 and 120% power was
applied, reaching both classes unbalanced simultaneously. In the second step, the SMOTE of
600% and 150% and Resample was applied to
0.9 with 200%.
8 D. Training and Testing

The training of the data consists in submitting a set of experiments to the algorithm to
enable it to the new situations. The more different the experiences, the more generic the model
will be and the better it will be in different situations. The testing step is to submit new labeled
cases to the trained model to compare the classification results to their actual label. This
procedure can be done by separating the data set into two parts. One dedicated to training and
another to testing. Alternatively, there is the cross-validation test, which consists of dividing the
training data into n subsets and selecting one for the test and the rest for training. This procedure
is performed for n n times, each set being separated for testing once. This procedure was
developed by [17] and is widely used in validation tests. For this work, the 10 part cross
validation test was used. That is, the dataset was divided into 10 partitions, trained and tested
separately. After the tests, the procedure generates an array with the correct and wrong cases,
which will then be analyzed to provide relevant information. This matrix is known as the
confusion matrix, which crosses the values classified with the actual values.
Researches focusing on the application of intelligence to equipment have resulted in

several learning paradigms. Some are based on statistics, others on decision trees, others on
artificial neural networks. The current literature points out several applications of these
techniques in diverse problems. In health, we can see some good examples of success, which
make use of the following algorithms:
1) Bayes Network (BN): It is a probabilistic classifier based on the Bayes’ theorem.
Bayesian networks, as they are known, create a network of interdependencies between the
probabilities (priori and posteriori), treating the attributes hierarchically.
2) Na¨ıve Bayes (NB): classifier based on the Bayes’ theorem that calculates the
probability of a particular event to occur given a set of events. Different from the BN classifier,
it considers all the attributes independent of each other.
3) Random Tree (RT): Decision tree-based sorter that sorts k attributes on each node
without pruning, generating random trees without training.
4) Random Forest (RF): The RF classifier generates several random trees using different
algorithms. Then the one that best adapts to the data is chosen, presenting better results.
5) J48: It is a reimplementation of the algorithm C4.5 [18], which selects the best node
partitioning for best results. The algorithm also performs a pruning of subtrees that do not show
information gain.
6) Support Vetor Machine (SVM): It is a non-probabilistic linear binary classifier that, given
a data set with two classes, seeks to separate them linearly into the classification.
7) Multilayer Perceptron (MLP): It is an ANN-based classifier with at least three layers:
input, intermediate, output. Their neurons use non-linear activation functions, trained from an
algorithm based onm backpropagation.

13
9 E. Interpretation and Comparison

The algorithms can generate several outputs in the training and testing stages. For
example, decision tree algorithms generate a tree in the training phase. This tree can explain
the patterns found in the data or highlight some anomaly. The confusion matrix, generated in
the test phase, also helps explain the behavior of the results. The comparison can check if the
adjustments are improving or worsening the results of the algorithm in relation to the objective.
From the results can also be generated several evaluation metrics, which support the process
of interpretation and adjustment.
The performance of an algorithm is calculated through evaluation metrics that are based
primarily on the confusion matrix, which relates the values of the data to the results inferred by
the algorithms. The table V and the figure 2 present these metrics.
Table V
MATRIX CONFUSION
Classified
Positives Negatives
Positives TP FN
Real
Negatives FP TN
TP (True Positives): hese are the positive cases that were actually classified as
positive. TN (True Negatives): These are the negative cases that were correctly
classified as negative. FP (False Positives): These are the negative cases that were
classified as positive (false alarms). FN (False Negatives): These are the positive
cases that were incorrectly classified as negative.
The evaluation of an algorithm is performed through the analysis of metrics constructed

from the confusion matrix. These metrics show how good an algorithm is in relation to the
problem addressed. Depending on the problem, you prioritize one more metric than another.
This is because each metric measures different characteristics [19]. For example, for a less
critical problem, where it is important to know how many times the algorithm has been
successful, it only looks at the correct cases, disregarding the erroneously classified cases. In
problems considered complex, we seek to analyze the results in a more panoramic way. The
most known metrics are:
Fig. 2. Graphic representation of the Confusion Matrix

14
• Precision: The proportion of correct predictions, regardless of what is positive and what is
negative.
(1)
• Cover: The proportion of true positives. The ability of the system to accurately predict the
condition for cases that actually have it.
(2)
• Harmonic Mean: The harmonic mean, also known as the F-measure, is a performance
measure widely used in forecasting tasks. By combining precision with coverage, it avoids
the disadvantages of simple metrics such as error rate, especially in cases of unbalanced
class distributions [20].
(3)
The analysis of the metrics and confusion matrix is a sensitive stage of the process, since
it requires a greater perception of the data scientist. The interpretation of these values leads to
the identification of anomalies that hinder the training of the algorithms, which leads to better
adjustments.
In order to meet the various levels of the clinical management process, this work
considered dividing the training of the algorithms into two parts. One to attend the first stages
of management, which include symptoms and some simple exams and another to meet the final
diagnosis, including more specific exams. The first step is to assist in decision making during
the early stages of clinical management, suspicion, and the second is focused on the final
diagnosis.
1) Suspicion: During the processing step tests were performed with various algorithms,
which were subsequently analyzed in order to improve your results. The best results achieved
in the analysis step are shown in Table VI.
The algorithms based on decision tree presented better results for the data set of this
work. All algorithms presented better results using the resample balancer relative to SMOTE.
The table VII shows the results achieved in each class/disease by the RF algorithm.
The harmonic mean of the analysis criteria presented good results for the disease
classification problem. The results do not present much discrepancy between the metrics. It can
Table VI
BEST RESULTS OBTAINED IN THE SUSPICION PHASE.
Algorithm Precision Cover Harmonic
Mean
BN 61.3 61.3 61.2
NB 60.5 59.6 59.2
J48 66.4 66.4 66.4
RT 68.5 68.5 68.4
RF 69.3 69.3 69.3
MLP 65.3 65.0 65.1
kNN 68.4 68.4 68.4
SVM 62.8 60.9 60.4
Table VII RESULT OF THE RF CLASSIFIER DIVIDED BY METRICS.

15

Mean
Dengue 66.9 67.4 67.2
Chikungunya 71.0 68.8 69.9
Others 69.9 71.6 70.7
be seen in the confusion matrix, shown in Table VIII, that erroneously classified cases are
balanced between diseases.
Table VIII
RF CLASSIFIER CONFUSION MATRIX.
Classified
Dengue 1069 223 284

Real Chikungunya 278 1103 222
Others 250 218 1177
The symptoms presented by the diseases are very similar and are confused with each
other. The algorithms classify some cases erroneously, as shown by the matrix. However, the
results are relevant in relation to the system proposal.
2) Diagnosis: In order to support the final diagnosis, more specific examinations were
included in the training of the algorithms. In this stage, only cases analyzed in the laboratory
were used, neglecting the cases without any specific examination. The table IX shows the best
results achieved by the algorithms.
Table IX
BEST RESULTS OBTAINED IN THE DIAGNOSTIC PHASE
Algorithm Precision Cover Harmonic Mean Balancing
BN 87.9 86.3 86.0 SMOTE
NB 68.6 68.2 65.0 SMOTE
J48 88.5 88.5 88.4 SMOTE
RT 90.4 90.5 90.4 Resample
RF 90.8 90.9 90.8 Resample
MLP 91.5 91.6 91.5 Resample
kNN 80.9 77.8 77.6 Resample
SVM 61.6 59.8 58.9 SMOTE
Different from the suspect stage, the results of this step are much more satisfactory. The
exams give more certainty to the diagnosis. In this case, some algorithms presented better
results using the SMOTE balancing method, especially probabilistic ones. Decision tree-based
classifiers continue to perform well. However, the algorithm based on RNAs, namely PMC,
outperformed all other algorithms at this stage. The Table X displays the test results for each
class/disease of this classifier.
Table X
MLP CLASSIFIER DIVIDED BY METRICS
Mean
Dengue 87.1 89.2 88.1
Chikungunya 97.7 100.0 98.8
Others 87.1 89.2 88.1

16
It was noticed that in this stage, where the classes were very unbalanced, the intense
use of balancers ended up affecting the results of the tests. In this case we can see a coverage
of 100% in chikungunya, which was just the class with the fewest cases. Therefore it is
concluded that there was an over-adjustment.
A RNA foi constru´ıda com 19 neuronios e uma camadaˆ intermediaria. A Tabela XI

mostra a matriz de confus´ ao gerada˜ a partir dos indicadores de avaliaçao. The RNA
was built with˜ 19 neurons and an intermediate layer. Table XI shows the confusion matrix
generated from the evaluation indicators.
Table XI
MLP NEURAL CLASSIFIER CONFUSION MATRIX.
Classified
Dengue 472 2 55 0
Real Chikungunya 0 501
Others 70 10 515
The results of the procedure, achieved through the adopted methodology, can guide the
development of an inference platform capable of classifying the diseases in question.
The difference in approach can be seen from the analysis of the results. The information
gain strategy used in decision tree based methods is very efficient because, even in confusing
situations, they guarantee an effective representation of the learning model.
IV. DENYA SOLUTION

The test results showed that the classifiers are able to accurately match diseases related
to the Aedes Aegypt mosquito. Such predictions could contribute significantly in the context of
health applications that treat these diseases. Thus, as a product of this research work, Denya
(Dengue and Chikungunya Diagnostic Support System) is proposed, which is able to provide
the service of classification of diseases to various applications. The system is an Application
Programming Interface (API) that can be accessed via REST (Representational State Transfer)
requests, allowing access through the internet.
10 A. Arquitetura
The API was developed using the JAVA language. The system architecture was divided
into three layers, namely data, inference and connection modules. There is also the application
layer, which makes use of API features. Figure 3 shows the system architecture.
11 B. Data Module
The data module is responsible for storing the set of cases of the diseases in question.
These examples are used to train the algorithms whenever the system is booted. The module
has a set of selected and treated examples.

17
Fig. 3. System Architecture.
12 C. Inference Module
Stage where the process of classification of new cases occurs. This module has two
classification algorithms, namely the RF and PMC classifiers. One focused on the first phases
of the management and the other for more advanced phases, including the results of exams.
This module uses the WEKA API, available in ”Data Mining Software in Java”.
13 D. Connection Module
This is the module responsible for receiving and handling external requests. Applications
use the REST(Representational State Transfer) pattern to communicate with the API using the
JSON format. To do so, we use Framework Spring to handle these requests.
14 E. Applications
In this layer are the softwares that use the API to provide decision support services.
These software send a set of symptoms and tests and, as a result, receive the probability for
each disease.
1) Application for Dengue and Chikungunya Diagnostic Aid : As proof of concept, this
work also developed a mobile application aimed at the pre-care of patients with suspected
diseases. The application is able, from a set of symptoms informed by the patient, to infer the
most likely disease. From an objective questionnaire, the user describes their symptoms. After
answering the questionnaire, the interface provides a button for the classification of the
symptoms reported by the user, as shown in Figure 4.

18
Application developed on the Android platform, using the

JAVA programming language. We used the IDE Android Studio, which is the official platform
for the development of the application. This was built using the Android SDK 22 and is available
for devices with versions from 4.0 (Jelly Bean).
Fig. 4. Mobile Application Interface.
V. CONCLUSIONS AND FUTURE W ORK
context of health. In evolutionary scenarios, where there are treatments of different data
in each stage, it is noticed that different approaches behave differently, presenting divergent
results. Algorithms that showed good results in one step may not achieve these same good
results in the next phase. Therefore, it is necessary to create hybrid systems that aggregate
different approaches. In this work, the process was divided in only two stages, since the entire
training, testing and adjustment process was performed manually, requiring a lot of time and
effort. However, some papers propose solutions that automate these process steps using
ontologies. These approaches integrate the two techniques in order to achieve better results by
optimizing the steps of DM [21]. As seen, the clinical management processes of the diseases
under study can add several new attributes during their evolution. Thus, it would be interesting
to apply an automated approach to algorithm training, choosing the best for each set of
attributes. In this way, you can always find the best algorithm for that situation.
In addition to dengue and chikungunya, other diseases transmitted by the vector Aedes
Aegypti need to be considered. As is the case of zika, which has also caused serious damage
to the population. Fever caused by the Zika virus is also associated with increased occurrences
of microcephaly in the country and is considered to be of national urgency [22]. However, the
lack of public records of this disease made it impossible to include them in the classification
process of this research.
Due to its greater severity, dengue fever is a disease of greater concern, as it has caused
more deaths [23]. Thus, it is vitally important to identify, in addition to the disease, its severity.
Both dengue and chikungunya present different clinical forms, with different severities. A future

19
version of the platform could treat these clinical forms separately, alerting about the
urgency/emergence of the case.
A decision support system that supports the various phases of clinical management could
not only predict the probable diagnosis of the case but, based on available hospital data,
including availability of materials in laboratories, suggest examinations to achieve a result.
As future work, we intend to integrate Denya into MARCIA, an interoperable system for
clinical management of chikungunya [24]. The system follows the whole process of clinical
management of the disease, adding information from the patient from his first visit to the health
unit until his diagnosis and treatment. From the information of symptoms and exams inserted in
MARCIA, the Denya inference module could calculate the probability of that case being related
to chikungunya. This information, inserted by health professionals, could also be included in the
chikungunya case database, improving the prediction of the Denya platform.
ACKNOWLEDGMENT
This work was supported by FUNCAP (Fundaçao Cearense˜ de Apoio ao
Desenvolvimento Cient´ıfico e Tecnologico) under´ the Program of Research Productivity
Grants, Incentive for Interiorization and Technological Innovation - BPI, FUNCAP Edital No.
09/2015.
BIBLIOGRAPHY
[1] K. Faceli, A. C. Lorena, J. Gama, and A. C. de Carvalho, Inteligenciaˆ Artificial: Uma Abordagem de Aprendizado
de Maquina´ . Rio de Janeiro, RJ, Brasil: LTC, 2015.
[2] E. Coiera, Guide to Health Informatics. CRC Press, Mar. 2015, accessed 06-09-2017.
[3] L. C. Lobo, “Inteligencia artificial e medicina,”ˆ Revista Brasileira de Eduçao M˜ edica´ , vol. 41, pp. 185–193, Jun.
2017.
[4] Brasil, “Prevençao˜ e combate: Dengue, chikungunya e zika,” http://combateaedes.saude.gov.br/pt/prevencao-e-
combate/, 2016, acesso em 06-09-2017.
[5] T. W. F. Oliveira, “Aplicaçao de redes neurais artificiais na modelagem˜ de um classificador de formas cl´ınicas
de dengue utilizando dados genomicos,” 2009, trabalho de Conclusˆ ao de Curso. Universidade de˜
Pernambuco (UPE), Recife, PE, Brasil.
[6] G. Teles, C. Oliveira, R. Braga, L. Andrade, R. Ramos, P. Cunha, and M. Oliveira, “Using bayesian networks to
improve the decision-making process in public health systems,” in 2014 IEEE 16th International Conference on e-
Health Networking, Applications and Services (Healthcom), Oct. 11-15, Natal, RN, Brazil. IEEE, 2014, pp. 565–
570.
[7] L. M. Gardini, R. Braga, J. Bringel, C. Oliveira, R. Andrade, H. Martin, L. O. M. Andrade, and M. Oliveira, “Clariisa,
a context-aware framework based on geolocation for a health care governance system,” in 2013 IEEE 15th
International Conference on e-Health Networking, Applications Services (Healthcom), Oct. 9-12, Lisbon, Portugal.
IEEE, 2013, pp. 334–339.
[8] A. M. B. Oliveira, O. Andrade, F. Antunes, C. O. Maura Filho, A. Garcia, and L. Gardini, “Applying ontology and
context awareness concepts on health management system: a dengue crisis study case,” 2013.
[9] P. D. A. Cardoso, “COISA: Conselheiro inteligente de saude do projeto´ Lariisa,” 2015.
[10] O. C Braga, O. C Fonseca, M. Moreira, J. Rodrigues, F. R V Silveira,ˆ A. M B Oliveira, and A. J V Neto, “A mobile
health solution for diseases control transmitted by Aedes Aegypti mosquito using predictive classifiers,” in I
Workshop de Computaçao Urbana (CoUrb) do XXXV˜ Simposio Brasileiro de Redes de Computadores e
Sistemas Distribu´ ´ıdos (SBRC). SBRC, Mai. 15-19, Belem, PA, Brasil 2017, pp. 144–156.´
[11] U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, Advances in knowledge discovery and data
mining. Menlo Park, CA, USA: AAAI press, 1996, vol. 21.
[12] M. J. Berry and G. Linoff, Data Mining Techniques: For Marketing, Sales, and Customer Support. New York, NY,
USA: John Wiley &
Sons, Inc., 1997.
[13] R. F. Ramos, C. L. C. Mattos, A. H. S. Junior, A. R. R. Neto, G. A.´ Barreto, H. A. Mazzal, and M. O. Mota, Heart
Diseases Prediction Using Data from Health Assurance Systems in Models and Methods for Supporting Decision-
Making in Human Health and Environment Protection. Nova York, NY, USA: Nova Publishers, 2016.

20
[14] Recife, “Casos de dengue, zika e chikungunya,” http://dados.recife.pe.gov.br/dataset/casos-de-dengue-zika-

echikungunya, 2016, acessado em 06-09-2017.
[15] M. R. R. S. Alves and V. M. C. Gadelha, “Onto2ae: Um sistema de aux´ılio aos pre-diagn´ osticos de doenças
oriundas do mosquito aedes´ aegypti.” Instituto Federal do Rio Grande do Norte (IFRN), Pau dos Ferros, RN,
Brasil, 2016, trabalho de Conclusao de Curso.˜
[16] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling
technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002.
[17] M. W. Browne, “Cross-validation methods,” Journal of Mathematical Psychology, vol. 44, pp. 108–132, 2000.
[18] J. R. Quinlan, C4.5: Programs for Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers
Inc., 1993.
[19] M. Awad and R. Khanna, Machine Learning. Berkeley, CA: Apress, 2015.
[20] R. Busa-Fekete, B. Szor¨ enyi, K. Dembczynski, and E. H´ ullermeier,¨ “Online F-measure optimization,” in
Advances in Neural Information Processing Systems (NIPS 2015), Dec. 7-12, Montreal, Canada. MIT Press
Cambridge, 2015, pp. 595–603.
[21] M. Hilario, A. Kalousis, P. Nguyen, and A. Woznica, “A data mining ontology for algorithm selection and meta-
mining,” in Proceedings of the ECML/PKDD09 Workshop on 3rd generation Data Mining (SoKD09), 2009, pp. 76–
87.
[22] K. G. Luz, G. I. V. d. Santos, and R. d. M. Vieira, “Febre pelo v´ırus Zika,” Epidemiologia e Serviços de Saude´ ,
vol. 24, no. 4, pp. 785–788, 2015.
[23] Brasil, “Monitoramento dos casos de dengue, febre de chikungunya e febre pelo v´ırus zika ate´ a semana
epidemiologica´ 13,”
http://portalsaude.saude.gov.br/images/pdf/2016/abril/26/2016-014— Dengue-SE13-prelo.pdf, 2016, acessado
em 06-09-2017.
[24] F. J. G. de Sousa, “MARCIA, UMA METODOLOGIA PARA O MANEJO
DE REGISTRO CLÍINICO COM USO DE ARQUETIPOS PARA INTEROPERABILIDADE ENTRE SISTEMAS DE
SAUDE,” 2017, dissertacãp Curso de Mestrado Profissional Integrado em Computacão Aplicada da Instituto
Federal do Cear˜ a (UECE), Fortaleza,´ Brasil.

21
ANEXO II
Evaluating Classification Algorithms performance with

Matlab for generating alerts of risk of infant death
Gerson Albuquerque1, Cristiano Silva1, Joyce Quintino1, Odorico Andrade2,

and Mauro Oliveira1
1 Federal Institute of Education, Science and Technology of Ceara´,
Fortaleza, Ceara´, Brasil gersonvan@gmail.com
cristianocagece@gmail.com joycequintino11@gmail.com
amauroboliveira@gmail.com
2 Brazilian National Congress - Bras´ılia, Distrito Federal, Brazil
odorico0811@gmail.com
Abstract
GISSA is an intelligent system for health decision making focused on childish

maternal care. In this system, are generated alerts that cover the five health
domains: clinicalepidemiological, normative, administrative, knowledge
management and shared knowledge. The system proposes a contribution to the
reduction of child mortality in Brazil. Thus, this paper presents studies over an
intelligent module that uses Machine Learning to generate child death risk alerts on
GISSA. These studies focus on the use of different classification Algorithms, with a
methodology based on Data Mining in order to reach a learning model capable of
calculating the probability of a newborn dying. The work brings together public
databases SIM and SINASC for the training of classification algorithms. During the
methodological process, it was made a subsampling to balance the number of inputs
and be fair in the training model results, executed with Matlab scripts.
Introduction
The problem of infant mortality mainly affects the so-called underdeveloped countries [21].
According to the United Nations, the overall rate of child mortality has dropped by 53% in 25
years [22], while in Brazil this reduction has been 77% in the last 22 years [23]. Probably due to
improvements in maternal and child care through programs to support pregnant women, such
as the Stork Network, which goal is to preserve maternal and child health, especially in the first
years of life [6]. The State of Ceara´ had a reduction of 11.5% between 2014 and 2015 [14].
However, these rates are still high compared with developed countries. Norway, for example,
presented an infant mortality rate of 2.4% in 2014 [8]. Therefore, more effective strategies are
needed to alleviate this problem.
22
GISSA (Intelligent Governance in Health Systems) is a framework to support decision

making in health settings. The system is able to generate alerts and administrative reports for
managers and health professionals.
This work presents LAIS, a mechanism based on machine learning capable of predicting
cases of infant mortality to assist managers in decision making. Integration and analysis of the
SIM (Mortality Information System) and SINASC (Information System for Live Births) public
databases, was made available by DATASUS (Department of Information Technology of the
SUS). Thus, the model generated by the LAIS is able, from the attributes of the newborn and
those of its mother, to classify and calculate the risk of infant death.
This paper is organized as follows. Section 1 presents LARIISA and GISSA; Section 2
discusses related work; section 3 describes the intelligent module based on machine learning
using the pattern recognition methodology and section 4 presents conclusions and future work.
1 Theoretical Foundation
1.1 LARIISA project
LARIISA is a platform developed in 2009 [20] with the aim of providing governance intelligence
in the five areas of health (clinical and epidemiological, legal, administrative, knowledge
management and shared knowledge), helping users (patients, health workers, nurses, doctors,
administrators, health secretaries, etc.) in decision making. To do so, it is necessary to manage
health-related databases, dispersed on government bases or not, by crossing them with
information captured in real time [15].
1.2 GISSA
The GISSA project is an instance of the LARIISA platform, with focus on the Stork Network
project of the Brazilian Ministry of Health, supported by FINEP (Funding of Studies and Projects)
and being implemented by Instituto Atlaˆntico. Its purpose is to help decision makers at all levels
of the health cycle (patient, health agent, physician, hospital manager, secretary, etc.), with the
generation of dashboards and alerts, related to maternal and child health issues. A GISSA
prototype is operational in Taua´, Ceara´, and is being implemented in other municipalities of
the State of Ceara´.
GISSA is therefore composed of a set of components that allows the collection,
integration, and visualization of information relevant to the decision-making process [3].
Currently, it has the following alerts: live birth with low weight; delayed vaccination; prenatal;
vaccine campaign; among others. In this context, [11] proposes a mechanism based on
heuristics capable of calculating the probability of death of newborns using information from
different databases for GISSA. However, although it is based on medical knowledge, the work
does not perform tests of efficiency or precision, which prevents the evaluation of this
mechanism.
2 Related Work
Malnutrition is considered to be a major cause of child mortality in underdeveloped countries. In
[17], classification algorithms were used to find patterns related to the nutritional status of
children under five years of age. The study aims to identify which factors affect the nutritional
status of children. A total of 11,654 cases were treated with 16 health and socioeconomic

23
attributes, collected from an Ethiopian Health Demographic Survey conducted in 2011. The
machine learning algorithms used were J48 [25] of decision trees, Naive Bayes [16] and the
rules inducer class PART [10]. After several experiments, was selected the PART algorithm that
presented the best performance, with a precision of 92.6% and a Receiver Operating
Characteristic (ROC) curve area of 97.8%.
In [28] a study on infant death in children under one year of age was performed using
Data Mining techniques. Then, the SIM and SINASC databases were used for the municipality
of Rio de Janeiro between 2008 and 2012. The integration was done through the field DN (Birth
Certificate Number), present in the SINASC and SIM. A total of 3,336 individuals were born and
died. In the research, the following 13 attributes were used: Sex of the Newborn, Apgar1 (5
parameters that are assessed during the first minute of the child’s life - heart rate, respiration,
muscle tone, irritability and skin color), Apgar5 parameters that are evaluated during the fifth
minute of the child’s life - heart rate, breathing, muscle tone, irritability and skin color), Newborn
weight, Newborn color, Newborn age, basic cause of death , age of mother, number of dead
children, number of live children, number of weeks of gestation, type of pregnancy and type of
delivery. Was used the unsupervised algorithm Apriori [1] for the investigation of birth
characteristics that are associated with death in children under one year of age. At the end of
the work, some rules provided may assist health professionals.
A study of births at Bega Obstetrics and Gynecology Clinique, Timi¸soara, Romania, was
presented in [27]. A dataset was analyzed with 2,325 births and 15 attributes, were some of
them are: mother’s age, number of pregnancies, number of pregnancies weeks of gestation,
child sex, child’s weight, and type of delivery. The goal of the paper is to predict the child’s Apgar
score at birth, using the tool

24
Figure 1: Smart Alert Generation Methodology
WEKA [9] and 10 classification algorithms: Naive Bayes, J48, IBK [2], Random Forest [5], SMO
[24], AdaBoost [12], LogitBoost [13], JRipp [7], REPTree and SimpleCart [4]. The LogitBoost
algorithm presented better results in the experiments. The generated model was used in a Java
application to predict the Apgar score of a new patient. [18] uses Bayesian Networks to support
decision making in uncertain environments. A network was developed to classify hypertensive
disorders focused on the care of pre-eclampsia. Using the Bayesian Nisy-OR model in a
database, the system analyzes the data layout and classifies them in the network. From the
symptoms presented by the pregnant woman, the system, through statistical data, infers the
severity of the case, helping the doctor specialized in the diagnosis of pre-eclampsia. This
approach proved accurate even with a small number of data.
[19] makes a detailed analysis between the Naive Bayes and the decision tree J48
classifiers. The paper analyzes a set of data related to hypertensive disorders to evaluate
pregnancy complications. A study of the performance of the classifiers and a Confusion Matrix
is done using predictive parameters. The two classifiers present close values. However, the J48
decision tree had a more accurate result.
3 Intelligent Module Based on Machine Learning

To achieve better results in the Data Mining process, the methodology of pattern recognition
developed at the Federal University of Ceara´ (UFC) in the Centauro Laboratory was used. This
methodology consists of a set of steps performed in the Data Mining process with the objective
of selecting the best algorithms and attributes according to the context studied [26].
Figure 1 shows the steps developed in this work: Data collection; Integration; Evaluation
and Results; and Application. First, data is collected from databases. Then, this data is
integrated through the junction between the bases. Subsequently, the algorithms are trained,
tested and evaluated according to the appropriate metric, generating a prediction model. Finally,
the generated model is tested on a prototype capable of predicting the risk of a newborn coming
to death.
3.1 Collection and Preparation of Data

Data were collected from two different public databases: SINASC, which contain information on
live births; and SIM, which contain information on mortality, including cases of infant mortality.
Both databases are available on the DATASUS portal in DBC (DataBase Container) format. The
data refer to the State of Ceará in the years 2013 (SINASC and SIM) and 2014 (SIM). These
data were converted to SQL (Structured Query Language) using TABWIN, a software provided
by DATASUS for viewing and manipulating public data.

25
3.1.1 Integration and Selection of Attributes

With the relationship of SIM bases and SINASC it is possible to retrieve information about the
birth of children victims of infant mortality. Thus it is possible to distinguish children who have
survived or not up to one year of age.
Each live birth has a unique attribute called the Born Birth Declaration Number
(numerodn), always filled in at the base of SINASC. The SIM base also has the field (numerodn),
which is filled only in cases of infant death. This field was essential for the integration of the
bases, since it is possible to relate the infant mortality data to the birth data. The integration was
divided into 4 stages:
Step 01: Taking into account that children born in 2013 can be victims of infant mortality
in 2014, the bases of SIM2013 and SIM2014 were united for children who died less than 1 year
old. The following is a simplified expression (in relational algebra) of the integration process
(Equation 1):
SIM0 ← σidade((SIM2013) ∪ (SIM2014)) (1)

Step 02: Then, the SINASC2013 and SIM are joined together using the numerodn field.
The result returns all child mortality cases occurred in 2013 or 2014. The following is a simplified
expression in relational algebra (Equation 2):
M ← (ρSN(SINASC) 1 SN.numerodn = S.numerodnρS(SIM0)) (2)

Step 03: Also, it was searched for cases of newborns who did not suffer death. Thus, a
query was made at SINASC2013, except for cases that suffered infant death M. The following
is a simplified expression in relational algebra (Equation 3):
V ← (SINASC2013 − M)) (3)

Step 04: Finally, a union of the death cases M and non-infantile V deaths occurred in 2013
and 2014. The following is a simplified expression (in relational algebra) (Equation 4):
ALL ← (M ∪ V ) (4)
In this stage, labeling was also performed, death YES and non-death NO, needed in
supervised classification problems.
The result of the integration is a dataset with 50 attributes, containing information on the
birth and death (if it occurred) of children born in 2013. The dataset obtained resulted in 1,182
cases of death and 124,876 cases of children who survived up to one year. 16 of these attributes
were selected for the analysis step.
The values of these attributes are originally inserted as strings. The Matlab scripts used
work with numbers, so it was necessary to convert these strings into to numeric values to
make calculations. The weight for each category value was determined by the sequence it was
observed in the data analysis. The values were defined from -1 to 9, where:
• -1 – used to ”Campo-em-Branco” (free translation: Blank field)
• -0.5 – to ”Ignorado” (free translation: ignored)
• 0 – to ”Errado (free translation: wrong) and from 1 to 9 – defined to the other characteristics
considering 1 to smaller death risk and 9 to bigger risk of death

26
Figure 2: Example of a Test procedure using KFold with CrossValidation - Classified by MLP
and KFold = 5
3.2 Analysis and Tests

In order to find a more adequate model for the prediction problem of infants, were performed
experiments with the algorithms, K Nearest Neighbor (KNN), Naive Bayes (NB), Vector Support
(SVM) and Artificial Neural Networks (ANN), using scripts of a trial version of Matlab. Some of
these algorithms are described as follows:
(i) K Nearest Neighbor (KNN): It works in order to calculate the similarity between the
recordto be analyzed and the records of the dataset in order to estimate the class of the record
that was presented as input. When a new record needs to be classified, it is compared to all
training data records to identify k-neighbors closest according to some selected metric, and one
of the most used is the Euclidean distance (Equation 5). The Euclidean distance refers to the
distance between two points measured by the straight line that interconnects these two points
(Equation 5).
(5)
(ii) Artificial Neural Networks: The letter in equation 6 represents the input signals (data on
the problem), while the synaptic weights are represented by wi (responsible for weighting the
input signals according to the level of importance), and Σ represents the aggregating function.
It also has + Θ, threshold of activation, a constant responsible for allowing or not the passage
of signal; u represents the activation potential [29].
n
X
u = wi ∗ x i + Θ (6)
i=1
In this classifier was adopted a cross-validation K-Fold (Figure 2), because the
stratification allows tests using different parameters, and determine the best learning rate and
neurons number.
(iii) Naive Bayes: The Bayesian Naive Bayes algorithm is based on probability theory and
assumesthat attributes will influence the class independently. During model creation, the

27
classifier will construct a table showing how much each category of each attribute contributes
to each class. In equation 7, C represents the class and e={A1 = a1,...{An = an} are the attributes
of the classes. The tests show that the Naive Bayes classifier is the most suitable for this
purpose, presenting good results with area of the ROC curve of 92.1%.
n
Y
P(A1,...An,C = P(C) P(Ai|C) (7)
k=1
3.3 Evaluation and Results

In the first experiment, the Naive Bayes algorithm obtained the best results. It presented the
highest area value of the ROC curve compared to the other.
Table 1 refers to the experiment with balanced data, in which the Spread Subsample
algorithm was used to balance the classes of the data sample by means of a random sub-
sampling. This equals the number of individuals in the living and dead classes with, respectively,
1,182 instances for each class. It is noticed that even with the change in the number of classes,
the Naive Bayes algorithm continued to have the best result because it had the largest area
value of the ROC curve.
Algorithms Precision Recall F-Measure ROC
KNN 0,895 0,830 0.861 0,888
Naive Bayes 0,921 0,809 0,861 0,924
V. PERCEPTRON 0,900 0,838 0,868 0,875
MLPerpectron 0,837 0,821 0,829 0,898
Table 1: Experiment
Because the Naive Bayes algorithm obtained the largest area value of the ROC curve in
the experiment, a more detailed study was carried out in this experiment: Table 2 shows the
Confusion Matrix of the Naive Bayes algorithm for an analysis of the results. It is verified that
Naive Bayes correctly classified 2,056 children (86,912%) that correspond to the correct
diagonal in the table below (956 + 1,100). Therefore, 308 children (13,028%) were classified
incorrectly (another diagonal: 82 + 226).
Predicted Class
Dead Live
Dead 956 226

Real Class
Live 82 1100
Table 2: Confusion Matrix - Naive Bayes
Among the 308 children who were misclassified, 82 (3.46%) were false positives and 226
(9.56%) were false negatives. Of the 2,056 children who were correctly classified, 956 (40.44%)
are true positives and 1,100 (46.53%) are true negatives. As 956 are true positives, this indicates
those who suffered childhood death and that 82 false positives did not suffer infant death, but
were classified as having died.

28
3.4 Application
As mentioned before, the objective of this paper is to predict through the application of
classification algorithms the analysis of the attributes to calculate the chance of a new born die.
The tools used are useful to search for results based on the precision after some training.
After this process of analysis and comparison between the algorithms, we made the
choice of the most efficient classification algorithm according to the domain studied. Thus, the
Naive Bayes classifier is the one that best fits to the dataset analyzed. So, the model generated
by the algorithm was used to classify the risk of new patients suffering death.
4 Conclusions and Future Work

The proposal presented in this paper adds value to the GISSA alerts, providing them with an
intelligent mechanism based on classifiers. Thus, it is able to provide the health manager, in
addition to the important warnings that already produced the probability of death of a newborn
from the information of the pregnant and the newborn itself. Therefore, the decision maker may
prioritize more urgent cases and, consequently, mitigate the serious problem of infant mortality.
As a future work, it is intended to apply the methodology used in the present work to the
integration of SINASC and e-SUS databases, as also run tests with other tools like the language
R and Scikitpython to test the perfomance of the tools itself. It is also expected to use together
classification and heuristics with ontology (that is under work by other team) to fit specific classes
of problems. This will allow the possibility of developing a hybrid mechanism to be added to the
GISSA, from these experiments.
Acknowledgment
The authors would like to thank PRPI / IFCE and FUNCAP for the support they received via the
Program for Productivity in Research, Incentives for Interiorization and Technological
Innovation.
References
[1] Rakesh Agrawal, Ramakrishnan Srikant, et al. Fast algorithms for mining association rules.
In Proc. 20th int. conf. very large data bases (VLDB), September 12-15, Santiago, Chile,
volume 1215, pages 487–499, 1994.
[2] David W. Aha, Dennis Kibler, and Marc K. Albert. Instance-based learning algorithms.
Machine Learning, 6(1):37–66, 1991.
[3] L. O. M. Andrade, M. Oliveira, and Ronaldo Ramos. Projeto GISSA: META FÍSICA 3 –
atividade 3.1 Definir modelo de inteligência de gestaõ na sau´de.
https://amauroboliveira.files.wordpress.com/2015/11/
2015-nov30-meta-3-ativ-1-moldelointeligc3aanciagestc3a3o-draf-1-0.pdf, 2015. [Online;
accessed 30-September-2016].

29
[4] L Breiman, JH Friedman, RA Olshen, and CJ Stone. Classification and regression trees,
wadsworth international group, belmont, CA, 1984. Case Description Feature Subset
Correct Missed FA Misclass, 1:1–3, 1993.
[5] Leo Breiman. Random forests. Mach. Learn., 45(1):5–32, October 2001.
[6] Pauline Cristine da Silva Cavalcanti, Garibaldi Dantas Gurgel Junior, Ana Ribeiro de
Vasconcelos, and Andre copyright Vinicius Pires Guerrero. Um modelo da Rede Cegonha,
12 2013.
[7] William W Cohen. Fast effective rule induction. In Proceedings of the twelfth international
conference on machine learning, July 9—12, Tahoe City, CA, USA, pages 115–123, 1995.
[8] CIA World Factbook. Noruega taxa de mortalidade infantil. http://www.indexmundi.com/pt/
noruega/taxa_de_mortalidade_infantil.html/, last viewed: July 17 2017, 2015.
[9] Eibe Frank, Mark. Hall, and Ian Witten. Online appendix for ”data mining: Practical machine
learning tools and techniques. In Morgan Kaufmann. 5 edition, 2016.
[10] Eibe Frank and Ian H. Witten. Generating accurate rule sets without global optimization. In
Proceedings of the Fifteenth International Conference on Machine Learning (ICML ’98),
pages 144–151, San Francisco, CA, USA, 1998. Morgan Kaufmann Publishers Inc.
[11] Renato Freitas, Cleilton Lima, Oton Braga, Gabriel Lopes, Odorico Monteiro, and Mauro
Oliveira. Using linked data in the integration of data for maternal and infant death risk of the
sus in the gissa project. In Proceedings of the 23nd Brazilian Symposium on Multimedia
and the Web (WebMedia ’17), October 17–20, Gramado, RS, Brazil. ACM, 2017.
[12] Yoav Freund, Robert E Schapire, et al. Experiments with a new boosting algorithm. In icml,
volume 96, pages 148–156, 1996.
[13] Jerome Friedman, Trevor Hastie, Robert Tibshirani, et al. Additive logistic regression: a
statistical view of boosting (with discussion and a rejoinder by the authors). The annals of
statistics, 28(2):337–407, 2000.
[14] G1. Taxa de mortalidade infantil no ceara´. http://g1.globo.com/ceara/noticia/2016/08/
ceara-reduz-mortalidade-infantil-materna-e-fetal, last viewed: PREENCHER, 2016.
[15] Leonardo M ”Gardini, Reinaldo Braga, Jose Bringel, Carina Oliveira, Rossana Andrade,
Hervé Martin, Luiz OM Andrade, and Mauro” Oliveira. Clariisa, a context-aware framework
based on geolocation for a health care governance system. In IEEE 15th International
Conference on eHealth Networking, Applications & Services (Healthcom), October 9-12,
Lisbon, Portugal, pages 334–339. IEEE, 2013.
[16] George H John and Pat Langley. Estimating continuous distributions in bayesian classifiers.
In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, August
18-20, Montreal, QU, Canada, pages 338–345. Morgan Kaufmann Publishers Inc., 1995.
[17] Z Markos, F Doyore, M Yifiru, and J Haidar. Predicting under nutrition status of under-five
children using data mining techniques: The case of 2011 ethiopian demographic and health
survey. J Health Med Inform, 5:152, 2014.
[18] M. W. L. Moreira, J. J. P. C. Rodrigues, A. M. B. Oliveira, R. F. Ramos, and K. Saleem. A
preeclampsia diagnosis approach using bayesian networks. In 2016 IEEE International
Conference on Communications (ICC), pages 1–5, May 2016.

30
[19] M. W. L. Moreira, J. J. P. C. Rodrigues, A. M. B. Oliveira, K. Saleem, and A. Neto.

Performance evaluation of predictive classifiers for pregnancy care. In 2016 IEEE Global
Communications Conference (GLOBECOM), pages 1–6, Dec 2016.
[20] Mauro Oliveira, Carlos Hairon, Odorico Andrade, Regis Moura, Claude Sicotte, J-L Denis,
Stenio Fernandes, Jerome Gensel, Jose Bringel, and Hervé Martin. A context-aware
framework for health care governance decision-making systems: A model based on the
brazilian digital tv, 2010.
[21] ONU. Onu: Meta global de mortalidade infantil sera´ atingida com atraso de 11 anos.
http://www.
bbc.com/portuguese/noticias/2014/09/140916_unicef_meta_mortalidade_infantil_rm, last
viewed: July 22 2017, 2014.
[22] ONU. Onu afirma que taxa de mortalidade infantil no mundo caiu pela metade em 25 anos.
http://www.uai.com.br/app/noticia/saude/2015/09/09/noticias-saude,187094/onu-a\
begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{}intopreamble] rma-
que-taxa-de-mortalidade-infantil-no-mundo-caiu-pela-metade, last viewed: July 17 2017,
2015.
[23] ONU. Taxa de mortalidade infantil no brasil cai 77 https://istoe.com.br/324257_TAXA+DE+
MORTALIDADE+INFANTIL+NO+BRASIL+CAI+77+EM+22+ANOS+DIZ+ONU/, last
viewed: June 28 2017, 2015.
[24] John C. Platt. Advances in kernel methods. chapter Fast Training of Support Vector
Machines Using Sequential Minimal Optimization, pages 185–208. MIT Press, Cambridge,
MA, USA, 1999. [25] J Ross Quinlan. C4.5: programs for machine learning. Elsevier, 2014.
[26] Ronaldo F. Ramos, César L. C. Mattos, Amauri H. Souza Juńior, Ajalmar R. Rocha Neto,
Guilherme A. Barreto, Hélio A. Mazzal, and Ma´rcio O. Mota. Heart diseases prediction
using data from health assurance systems in models and methods for supporting decision-
making in human health and environment protection. In Nova Publishers, Nova York, NY,
USA. 2016.
[27] Raul Robu and S¸tefan Holban. The analysis and classification of birth data. Acta
Polytechnica Hungarica, 12(4), 2015.
[28] Claúdio Jesus Rosa. Aplicaçaõ de KDD nos dados dos sistemas SIM e SINASC em
busca de padro˜es descritivos de o´bito infantil no munic´ıpio do rio de janeiro, 2015.
[29] IN da Silva, Danilo Hernane Spatti, and Rogério Andrade Flauzino. Redes neurais
artificiais para engenharia e ciências aplicadas. Saõ Paulo: Artliber, pages 33–111, 2010.

31
ANEXO III
LISTA DE EXERCÍCIOS 1
1.a) The videos show to the new students a small sample of what is a Master Degree. One
the first things is a good English, because is essential for your research and to be sure that what
you expect to study doesn't exist. Also, the Master in Computer Science has an obligation of
knowing several topics of his own area.
Dedication e sacrifices are essential during the time you are applying in the master degree.
A good article has the following structure:
- Resumed Title, with no emotion
- Abstract (In the language of the article and in english)
- Introduction
- Theoretical Founding
- Related Work
- Contribution
- Conclusion
It doesn't matter the subject of your dissertation, the important is that you understand that
your contribution is a reflection, independent if it is a software or anything else.
1.b) The Master in Computer Science is someone that learns to have knowledge in different
topics of his own area. He/She knows topics about several subjects connected to his/her area,
and should be prepared to talk or teach about anything, even if it is in a superficial level.
2) The article brings the question of how a little (or a billionaire) benevolence can make a
difference in people's lives. Most people act always waiting for something in return or living their
life having the Self as the highlight. I see it as the normality of the human being, people who stand
out in benevolence end up always being rare, because in cases seen it does not matter if they
are rich or poor, but always end up making a difference
3) In this video Tim brings the need for the integration of data between the different
systems. This need address to the Linked Data Term, which brings information from the amount
of data together. An example is the Hans Rosling video, from 2002, where he shows several
important information from data collected from UN databases. This data can lead to important
decisions or studies about the behavior of nations in the last 35 years at least.
4) Theory: Collection of data, information and knowledge to talk about any subject.
Knowledge is a set of rules to generate information from information.
There was an evolution about how the data should be treated. Over the years new
concepts were added to the conception of the data analysis.
The organization of data takes time, and is essential that the data is understood, to improve
the knowledge on how you want to see it. It should bring everyone to the process, then you start

32
to see how the work is going, relate the incomes and outcomes with the daily tasks, and also use
the information to be shared with the collaborators, so they can know where to go.
In the module 2 it is shown some semantic tools, that give meaningful information to data.
For this, there is a need for a unified language, like in Health where the are some patterns to
unify codes and knowledge.
5.a) This article brings some facilities to take the health care inside homes by a common
equipment that exists inside the house of most of the population. With the help of sensors, the
communication could be facilitated to act about the monitoring, treatment and alerts on people’s
health.
5.b) This article brings a more specific usage about the home care. It shows a more
technical usage of people's health data collection to improve the monitoring and decision making.
6) In this presentation is discussed how the decisions over the health systems can be
optimized, and how the processes in the information Era also helps to decrease costs. The project
proposes a software platform that allows decision-making in public health system scenario.
Data is everywhere, but data without organization is nothing. Putting the data collected
with knowledge you’re able to generate information that allows the decision make. It’s important
to have everyone connected in the same objective, each area with their responsibility.
The use of ontologies over this data allow that different patterns have semantic
interoperability which helps the professionals to understand the big picture of the patient situation.
Also, the slides show other different approaches using the platform Lariisa, as said before,
is important the communication between all the areas.

33
1) -
2)
3) No site TED.com assistir o segundo vídeo com Tim Berners Lee sobre “Big Data”
This video shows how useful Raw data can be. Data is everywhere, and the most
important is to learn how make this data become information. The examples show important
information to different types of situations, even racism. Data is free, and can be to improve your
personal daily actions, or to the interested governments to take objective actions to the
population.
4) Electronic Health Record Components, Evidence-based Practice

In health systems, data is analyzed to provide evidences and bring faster clinical
treatments using a unique information. Also in cases of diseases outbreaks and disasters it can
be essential for the first care of the patient. Another important factor is not only the data analysis,
but also the same data everywhere, because the health care team can see the evolution of the
patient health as also the patient can have access to the information anytime.
For this to happen globally, standards are needed so the IT people can develop the
systems with the only concern of the system itself, the terminologies are used from the
responsible entities for each health category.
The evidences are important to provide studies specific data and information, that comes
from scientific literature. Altogether, is essential the different professionals to get together and
work for tools that considers all the four-key points below:
1. Evidence and EBP can be interprofessional

2. EBP Resources
3. EHRs are EBP Platforms
4. Semantic tools are essential for the healthcare quality agenda
ii. Quality Improvement / Workflow Analysis / Redesign
5) 2012: “LARIISA 2.0, A Platform for Data Integration of Public Health System
in Cloud Computing Environment 2012 – ADVANCE PUBLICATION TITLE:
International Workshop on ADVANCEs in ICT Infrastructures and Services,
2012, Aracati (Ce).
This article brings questions to be asked before setting up a very big health system. For
this system to work, the Linked data is an important matter when it comes to the amount of data
present in everyone’s life. This data, in this case health data, must be analyzed to become
important information about for the population health, so, as a start, "all data having their
description defined by a common vocabulary stated in a domain ontology", is called open data,
and this is important for the big integration between systems. The framework suggested is
essential for the several spheres of health systems administration, to have a proper management
in the decisions to be made. This surely will avoid the inefficient use of public resources regarding
to what, when and where to use these resources.

34
2013: “CLARIISA, a Context-aware Framework based on Geolocalization for a Health

Governance SystemA”. (2013 – IEEE HEALTHCOM PUBLICATION TITLE: 15TH
International Conference on eHealth Networking, Application & services, 0ctober 9-12
2013, Lisboa, Portugal.
All the work that is being done with Lariisa in this article can be seen as part of a very big
context. This article illustrates one of the usages that can be used to gather information, process
and take decisions to the health of the patients (users). This work shows how the work on contexts
that the mobile device returns is important to take precautions based on the user history, location
and other data collected about the environment.
6) a) MARCIA E DENGOSA proposes a fight against dengue, but by the MARCIA

structure it is possible to verify that its application can be much bigger, being an
electronic medical record for any type of medical attention. With OpenEHR as a
centralizer, it is possible to define a common language among health systems
that want to be interoperable so that they can provide their data for other types
of queries.
• Search patterns that allow interoperability between systems
• Standardization of electronic medical records
• Use of a platform following defined standards within a server ready for a
technical language that assists in the standardization of health terms.
7)

35

36
1. -
2.
3. In this video, Tim Berners shows his concern about the growth of the web on how
it will grow. He proposes a Magna Carta for the web, to assure the web’s wide-
open spaces. It’s up to users to fight for the right to access and openness
4.
5.
6. Lariisa HomeCare suggested the use of TV/set-top box for homecare. This
started with the prediction of the number of people that would be using these
digital TV’s, so the homecare would serve for a lot of different people, and also
to a increase in the number of elderly people. With this family centered stage
would be possible to provide high quality healthcare services. Also, would provide
several information so that health managers would use this for intelligent
governance.

37
1. -
2. Uma Escola Pra Valer... em tempos de Google!
a. Teaching is a task that should require constant updating. Even today,
there are teachers teaching as it was in the 80's. Today we must get into
the vibration of the students, and take the class to another level. The
information changed place, and the teacher is the one who should show
the student all the ways he can use that information.
3. The Internet of Things(IoT) takes everyday more place in our lives. Either when
you’re waking up to the time you go to sleep, in several equipment used daily.
So, something with such a presence at our lives, could be used to make
people’s life better. Thinking on how to use it, we can automate repeated tasks,
use it to warns about anything and use it to saves people.
With the aging of the population, more and more elderly will need this care. The
IoT equipment can be helpful in tasks like monitor on a continuous basis and
send SOS message to someone when the vitals of the person get bad, or in
cases of falls, where the elderly have no more conditions to walk.
One example is a fall communicator. Supposes and elderly lives alone, and
he/she uses make all the home tasks alone. If he/she is using a bracelet or any
other type of wearable gadget with the functionality of detecting the pattern of
the fall, using internal sensors, the equipment can send and automatic message
to the registered person that could help the elderly.
4. Games are a structured form of play, usually undertaken for enjoyment and
sometimes used as an educational tool (Wikipedia). As educational tools, they
are used to teach something. Simulation is how it can recreate situations trough
mathematical methods. These games have the purpose of Increase
Knowledge, trough train in several aspects. For that, you must be specific about
what the game will be about, giving it good definition. Knowing its goal, we use
the informatics to construct simulation that will bring the reality to com a
controlled environment, where the computers puts the nurse in a simulation of
reality. These games must follow a script, that will measure the knowledge of
the nurse while he/she interacts with the simulation. This allows a training
before real life, with the goal of teaching the subject before the train in real
people.
Information Ethics talks about how patient data must be treated, with respect
for the information property and privacy. It examines the morality that comes
from information as a resource, a product, or as a target. To understand better,
it must be done a reflection over customs and traditions. It must be studied to
activate the sense of responsibility regarding the consequences of individual
38
and collective interactions in the information field, to improve the qualification

for intercultural dialogue based on the recognition of different kinds of
information cultures and values, to provide basic knowledge about ethical
theories and concepts and about their relevance in everyday information work.
Everyone involved must be trained about ethics
5. a) não fiz
b) Data is everywhere. The use of this data to provide information is important
for decision taking in several areas. This article uses this data to provide
information about Infant mortality. The information generated is relevant to
prevent this problem, by using Machine Learning techniques that train this data
using different algorithms, and giving as result probabilities that help doctors to
make decisions over with case would be important to take care of first.
6. Lariisa HomeCare suggested the use of TV/set-top box for homecare. This
started with the prediction of the number of people that would be using these
digital TV’s, so the homecare would serve for a lot of different people, and also
to a increase in the number of elderly people. With this family centered stage
would be possible to provide high quality healthcare services. Also, would
provide several information so that health managers would use this for
intelligent governance.
7. a) Fast Health Interoperability Resources (FHIR) Architecture

Library for use in agile web-based and mobile device development. FHIR’s
primary purpose is to address interoperability with well-structured, expressive
data models and simple, efficient data exchange mechanisms. In addition, FHIR
aligns as architectural principles the reuse and composability, scalability,
performance, usability, data fidelity and implementability. FHIR is designed for
web, with its resources being based on simple XML or JSON structures, with
an http-based RESTful protocol where each resource has predictable URL.
b) The primary intention of FHIR is to solve system-to-system (B2B) and

system-to-application (B2C) communications, without making assumptions
about the systems. FHIR is defined in terms of a library of ‘resources’, there is
some level of ‘clinical modelling’ going on, but it is inside the FHIR XML master
formalism. The resources are designed to be instantiated and accessed over
REST APIs, and where possible, open internet standards are used for data
representation.
In the case of openEHR, the primary goal is to solve the patient data challenge
– long-lived, versioned, distributed and computable patient records. In
openEHR, is used a standard Reference Model, and various layers of models
on top, including archetypes, templates, and terminology subsets. The

39
openEHR Reference Model is standard for the whole world, and all openEHR
data, no matter where they are, obey it. Both are implementation oriented – both
groups pursue their respective strategies to implement solutions that work; they
just do it differently. One of the differences is at what point (what level of models)
‘implementers’ become directly involved. In FHIR, it is ‘always’ – FHIR is
designed to be for and by application developers.
OpenEHR does take a different approach: separation of domain semantics from
definition of the concrete means of communicating them, which necessarily
means that how and when developers and clinicians are involved is different
from the FHIR approach.

Memorial Da Disciplina de Informática Na Saúde

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Memorial Da Disciplina de Informática Na Saúde

Enviado por

Direitos autorais:

Formatos disponíveis

Memorial da Disciplina de Informática em Saúde Ministrada

pelo Prof. Dr. Mauro Oliveira

• Curso de Informática na Saúde Online;

• Leitura e análise crítica de artigos nacionais e internacionais;

• Vídeos para entendimento do avanço tecnológico de aspectos relevantes para o assunto;

1Instituto Federal de Educação, Ciência e Tecnologia do Ceará, IFCE

• Participação em apresentações de Qualificação e Defesa de Mestrado sobre os temas

2.3 Leitura de Artigos

• 2010: “A Context-Aware Framework for Health Care Governance Decision-Making

• 2013: “CLARIISA, a Context-aware Framework based on Geolocalization for a Health

2Interprofessional Healthcare Informatics

Disciplina de Informática na Saúde 2

• 2014: “Towards A Cost-Effective Homecare for A Public Health Management System In

2.4 2.4 Slide

• 2014 – IEEEHealthcom NATAL – Lariisa Homecare

• 2015 Projeto Lariisa – Leonardo Gardini

• 2017 Projeto NextSAUDE - Henrique Mota

• 2017 Plataforma Marcia – Fabio José

Disciplina de Informática na Saúde 3

2.6.2 Avaliação de Artigos Ainda relacionado ao assunto do projeto, foi feito um

Disciplina de Informática na Saúde 4

2 Intelligent solution for classification of diseases transmitted by vector aedes aegypti.

Disciplina de Informática na Saúde 5

T he Artificial Intelligence (AI) is a subarea of Computer Science that studies ways of

II. RELATED W ORK

In [9], the researchers propose an intelligent system based on ontologies capable of

Disciplina de Informática na Saúde 7

5 A. Data Collection and Integration

Fig. 1. Stages of the Methodology Adopted in this Work.

Disciplina de Informática na Saúde 8

6 B. Attribute Selection and Data Filtering

Disciplina de Informática na Saúde 9

32 attributes of several stages of the clinical management process were treated,

Disciplina de Informática na Saúde 10

7 C. Data Balancing and Normalization

SMOTE 1.642 1.654 1.624

SMOTE 536 483 520

Disciplina de Informática na Saúde 11

0.9 with 200%.

8 D. Training and Testing

Researches focusing on the application of intelligence to equipment have resulted in

Disciplina de Informática na Saúde 12

9 E. Interpretation and Comparison

The evaluation of an algorithm is performed through the analysis of metrics constructed

Fig. 2. Graphic representation of the Confusion Matrix

Disciplina de Informática na Saúde 13

Disciplina de Informática na Saúde 14

Algorithm Precision Cover Harmonic

Dengue Chikungunya Others

Dengue 1069 223 284

Disciplina de Informática na Saúde 15

A RNA foi constru´ıda com 19 neuronios e uma camadaˆ intermediaria. A Tabela XI

Dengue Chikungunya Others

IV. DENYA SOLUTION

Disciplina de Informática na Saúde 16

Fig. 3. System Architecture.

Disciplina de Informática na Saúde 17

Application developed on the Android platform, using the

Fig. 4. Mobile Application Interface.

V. CONCLUSIONS AND FUTURE W ORK

Disciplina de Informática na Saúde 18

Disciplina de Informática na Saúde 19

[14] Recife, “Casos de dengue, zika e chikungunya,” http://dados.recife.pe.gov.br/dataset/casos-de-dengue-zika-