Escolar Documentos
Profissional Documentos
Cultura Documentos
1
PERSONAL INTERESTS
Astronomy
Digital
Cloud Computing Data Center
Data Warehousing Technology
Healthcare Oncology Data Science
See into the present See into the present See into the present
• Objects : x-rays, gamma-rays, magnetic • Objects : tumors, molecules, proteins, • Objects : tumors, cells, chromosomes, genes,
resonance, ultrasounds, laser light... hormones, enzymes, biomarkers, amino DNA, enzyme, hormone, antibody…
• Quantum mechanics : photon, electron, acids … • Quantum biology : DNA mutation, cellular
magnetism… • Quantum chemistry : biomolecular modeling, respiration…
chemical energy, spectra analysis…
ONCOANALYTICS
See into the future
Infinite
Analytics Use Cases
Quantum Physics and • Objects : data, bits, Qbits
Artificial Intelligence
• Quantum computer : artificial
Intelligence, algorithms, cryptography,
search, simulation, linear equations,
prediction, recommendation, risk…
Computer vision
4
Where are we heading about
fight against cancer ?
WHAT IS CANCER AND ONCOLOGY ?
• Primary tumor is the name for where a cancer G3: Poorly differentiated (high grade)
starts G4: Undifferentiated (high grade)
Cancer is the
second leading
cause of death in
the world
• Tobacco
• Alcohol
• Nutrition
• Certain class of drugs
• Genome
• Physical inactivity
• Radio frequency
• Phytosanitary products
• Pollution
• Radio activity
• Skin exposure (sun, ultraviolet light) Source : American Institute for Cancer Research
Diagnostics
Treatments
Supportive Cares
• Psychiatry • Kinesitherapy
• Psychology • Spiritual
• Sophrology, Meditation, Mindfulness, Wellness • Rehabilitation
• Dietetic • Social work
• Speech therapy • Volunteer
• Personal medicine
Vision that all people one day will be offered customized care, with treatments that match our genomic
profiles and personal histories
• Immunotherapy drugs
Immunotherapy is the fruition of a century-old idea: that a person’s own immune system can be stimulated to
fight cancer
• Cell-based therapies
Patient’s own T cells are directly manipulated to more readily attack cancer cells. In this treatment, T cells are
collected from a patient’s blood, genetically engineered to recognize certain proteins on cancer cells, and
loaded back into the patient’s bloodstream
• Epigenetic therapies
Cancer could be treated in a different way, by transforming cancer cells back to normal rather than destroying
them. CRISPR technology is used to easily alter DNA sequences and modify gene function. The protein Cas9 (or
"CRISPR-associated") is an enzyme that acts like a pair of molecular scissors, capable of cutting strands of DNA
• Battling metastases
Metastatic tumor cells have a remarkable tendency to cling to blood vessels, a survival mechanism that might
be important for the spread of many types of cancer
Value
ADVANCED ANALYTTICS
PRESCRIPTIVE
ANALYTICS
What should be
done ?
PREDICTIVE
ANALYTICS
What could happen ?
DIAGNOSTIC
ANALYTICS
What did it
ANALYTTICS
happen ?
DESCRIPTIVE
ANALYTICS
What
happened ?
Maturity
ARTIFICIAL INTELLIGENCE
Makes it possible for machines to learn from experience, adjust to new inputs and perform human-like tasks.
Computers can be trained to accomplish specific tasks by processing large amounts of data and recognizing
patterns in the data
Volume
Great use of precision medicine, big data
explosion in cancer care, especially as genomic
and environmental data become more ubiquitous
Variety
Great data variety combining traditional clinical and
administrative data, unstructured data (genomics, Value
imaging, text…), socioeconomic data and social data Significant advances in data to better
diagnosis and treatment plans, the
patient outcomes, better prioritize
resources and lowering costs
Big Data for
OncoAnalytics
Velocity Veracity
Rapidly increasing speed at which new data is
Good data quality. Data source is authoritative.
being created by technological advances, and the
Privacy and data protection safeguards. Data are
corresponding need for that data to be integrated
regularly updated. Data are unambiguous,
and analyzed in near real-time
complete, easy to find, understand and use
Gets personalized Receives new insight for Has access to Has access to
guidance on treatment discovery through metrics and tools metrics and tools
decisions by matching access to a massive that support high- that support high-
each patient’s care body of de-identified quality efficient quality efficient
against quality patient care data to cares and costs data and IT costs
standards and data analyze patterns
from patients like
theirs
• De-identifying patient identification code numbers are de-identified by replacing the original code number by a unique
random code number, creating de-identified dataset. It’s reversible process.
• Anonymization destroys all links between the de-identified dataset and the original dataset. It’s non-reversible process
Integration Process
• Names
• All geographic subdivisions (except country)
• All elements of dates (except year)
• Telephone numbers
• Fax numbers
• Email addresses
• Social security numbers
• Medical record numbers
• Health plan beneficiary numbers
• Account numbers
• Certificate/license numbers
• Vehicle identifiers and serial numbers
• Device identifiers and serial numbers
• URL
• IP address
• Biometric identifiers, including finger and voice prints
• Full-face photographs and any comparable images
• Any other unique identifying number, characteristic, or
code
Collecting data from various sources by detecting patterns and outliers with
the help of guided advanced analytics and visual navigation of data, thus
enabling consolidation of cellular, patient and population data
Diagnostics
Making use patient data to generate insights,
Treatments take decisions, increase revenues, enhance cares
Expenses
Cares support
coordination, minimize abuse and fraud and save
on costs
Coordination
Others
Time
Clinical trials
Investments
Others
Governance
Analytics Analytics Analytics
sets of both
Visualization
structured and
Big data unstructured data
warehouse is
mainly technology, BIG DATA WAREHOUSE
which stands on
Storage
Big data
BIG DATA INFRASTRUCTURE
Administration
architecture,
Big data fabric is
BIG DATA FABRIC massively parallel,
a system that highly scalable and
Servers
Visualization
Using data aggregation Using techniques such as Using techniques such Using techniques such as
and data mining to drill-down, data mining as statistics, predictive graph analysis, simulation,
provide insight into the and correlations modeling and complex event processing,
past forecasting machine learning, neural
networks
Centralizing all data at any scale with flexible software and available
architecture for massively parallel data processing on a network of
lower costs commodity hardware
Appliance architecture
Storage patterns
Phases
ETL
Data sources Transform Database
Extract Load
• While data is being extracted, the transformation phase is executed and the
already received data are prepared for loading. As soon as there is some data Extract Load
ready to be loaded into the big data warehouse, the data loading kicks off
without waiting for the completion of the previous phase Extract Load
ELT
Data sources Database Transform
• While data is being extracted, the already received data are prepared for loading.
As soon as there is some data ready to be loaded into the big data warehouse, Extract Load
the data loading kicks off and transformation is executed in-database without Extract Load
waiting for the completion of the previous phase
Extract Load
Administration
Primary Disaster recovery
Data replication
• Monitoring and scheduling system
• Hardware failure , disk or server crash, rack failure
• Data deletion, data corruption
• Site failure , disaster (fire, water, network, power…)
• Backup and restore management Data backup
Data restore
Monitoring Backup/Restore
Storage Array
Governance
Organization Doctors and researchers relationship, people management and costs control
• Maintaining a full audit history across all data in a
single place Metadata
Managing data about other data generally referred to as content data (catalog,
• Tracking, classifying and locating data to comply dictionary, taxonomy
with governance and compliance rules Data security management is a way to maintain data integrity and to make sure that the
Data Security
• Visualizing the upstream and downstream lineage data is not accessible by unauthorized parties or susceptible to corruption of data
of data to verify reliability Set of characteristics of data : completeness, validity, accuracy, consistency, availability
• Defining and automating complex data lifecycle Data Quality
and timeliness fulfills requirements
activities with integrated metadata policies
• Verifying access privileges Master Data Processes, policies, standards and tools that consistently define and manage the critical
Management data to provide a single point of reference
• Searching metadata and visualizing lineage
• Encrypting or decrypting data Data Life Cycle Managing information throughout its lifecycle, from requirements through retirement..
Management Data archiving and lineage
• New tools and techniques are required to efficiently process all information, more
data sources emerge
• There is no one cure-all for cancer, there is no single tool for data analytics
• Supercomputing power required to rapidly process huge structured and
unstructured data volume
BIG DATA FABRIC BIG DATA ANALYTICS BIG DATA WAREHOUSE BIG DATA MANAGEMENT
Bid Data Fabric, Big Data Predictive Analytics Big Data Warehouse, Data Governance Stewardship and
Q2 2018 and Machine Learning Solutions, Q3 2018 Q2 2017 Discovery Providers Q2 2017
Source : Forrester
• Runs Hadoop deliver quick
setup, higher performance and
automation
• Helps overcome these issues by
optimizing the infrastructure
with automation, balanced
system resources, and
integrated testing
• Runs Hadoop framework
• Uses Apache Spark and Storm in
option
• Manages services
Big data fabric, big data warehouse, big data BIG DATA CLOUD PLATFORM
analytics and big data management integrated Forrester Wave : Global public cloud platforms For
enterprise Developers, Q3 2016
in only-one OncoAnalytics Cloud Platform
IBM IBM
FLATIRON VARIAN
Governance
Analytics Analytics Analytics
• Data Integration software
Visualization • Data Analytics software
• Data Management software
• Software support
BIG DATA WAREHOUSE
Storage
BIG DATA INFRASTRUCTURE
Administration
BIG DATA FABRIC
Technical Architecture
Servers
• Servers
Extraction Transformation Loading • Storage
• InfiniBand & Ethernet Network
• Hardware support
IMANAGEMENT
Administraor
Researcher
Public data Public data
de-identified
Doctor
Doctor
Web Browser
1 to 10Gb Ethernet
SECONDARY SITE
PRIMARY SITE
40Gb Infiniband
10Gb Ethernet
Internet
10Gb Ethernet
• Have a big data insight – understanding concepts of big data, domains, needs, opportunities,
market, social behavior
• Discuss with key people – doctors, researchers, managers…
• Identify new skills and competencies – data scientists, architects…
• Identify alliances – services providers, software and hardware vendors
• Build a case – few public proof points or metrics to leverage, create much of it from scratch, focus
on single problem and only a handful of metrics
• Use internal data in priority – Electronic Medical Records exist in the cares center. Integrate external
data in second step
• Evangelize big data in financial and social terms – make an evangelization deck, explain how the
cares center will benefit from big data and the financial and social opportunities it creates. The
objective is for clinicians to embrace it and include it in their plans. Make it friendly
• Identify a sponsor – here’s the challenge with big data technology, looking for someone dynamic,
who understands the stakes and believes that technology can drive competitive advantage
• Capture metrics and use them to tell a story – identify only a few metrics that will be measure and
tell a story, people will remember the story long after they forget the numbers in the case
• Emphasize on big data opportunity – some people can’t see big data, it’s hard to
get passionate about abstract concepts. Need to visualize the problem and the opportunity.
Do a demonstration of big data project and show what new results will occur. A picture is always
worth than a thousand words
IT Manager
Managing
• Planning
• Team organization and coordination
• Meeting with doctors, researchers, manager and IT people
• https://www.iarc.fr
• https://en.e-cancer.fr
• https://www.cancer.org
• https://www.cancer.gov
• https://www.cancer.net
• globalcancermap.com