Escolar Documentos
Profissional Documentos
Cultura Documentos
Predictive Analytics
by Daniel D. Gutierrez
BROUGHT TO YOU BY
Predictive Analytics
2
www.inside-bigdata.com | 508-259-8570 | Kevin@insideBigData.com
Predictive Analytics
Predictive analytics for sales forecasting provides targeted, relevant predictive analytics to a broad
spectrum of business users to improve decision making. Image courtesy of TIBCO Spotfire.
3
www.inside-bigdata.com | 508-259-8570 | Kevin@insideBigData.com
Predictive Analytics
4
www.inside-bigdata.com | 508-259-8570 | Kevin@insideBigData.com
Predictive Analytics
Unsupervised learning is used to draw inferences different groups like frequent shoppers and
from datasets consisting of input data without infrequent shoppers. A classification analysis
labeled responses. The most common unsuper- would be possible if the customer shopping
vised learning method is cluster analysis, which is history were available, but this is not the case
used for exploratory data analysis to find hidden in unsupervised learning — we don’t have
patterns or grouping in data. response variables telling us whether a cus-
• Clustering – Using unsupervised techniques tomer is a frequent shopper or not. Instead,
like clustering, we can seek to understand we can attempt to cluster the customers on
the relationships between the variables or the basis of the variables in order to identify
between the observations by determining distinct customer groups.
whether observations fall into relatively dis- There are other types of unsupervised sta-
tinct groups. For example, in a customer seg- tistical learning including k-means clustering,
mentation analysis we might observe multiple hierarchical clustering, principal component
variables: gender, age, zip code, income, etc. analysis, etc.
Our belief may be that the customers fall in
Clustering shows the relationships between the variables or observations by determining whether they fall into
relatively distinct groups. Image courtesy of TIBCO Spotfire.
5
www.inside-bigdata.com | 508-259-8570 | Kevin@insideBigData.com
Predictive Analytics
6
www.inside-bigdata.com | 508-259-8570 | Kevin@insideBigData.com
Predictive Analytics
7
www.inside-bigdata.com | 508-259-8570 | Kevin@insideBigData.com
Predictive Analytics
8
www.inside-bigdata.com | 508-259-8570 | Kevin@insideBigData.com
Predictive Analytics
Data discovery software offers a rich, interactive analytic interface for EDA including accessing and manipulating data,
and composing analyses. Image courtesy of TIBCO Spotfire.
9
www.inside-bigdata.com | 508-259-8570 | Kevin@insideBigData.com
Predictive Analytics
Predictive Modeling
Using predictive analytics involves understanding might best meet the data analyst’s needs. Here are
and preparing the data, defining the predictive a number of points to consider when determining
model, and following the predictive process. Pre- which technique to use based on your data and the
dictive models can assume many shapes and sizes, problem you wish to solve.
depending on their complexity and the application • When the data is grouped by observations,
for which they are designed. The first step is to un- tools such as cluster analysis, association
derstand what questions you are trying to answer rules, and k-nearest neighbors usually provide
for your organization. The level of detail and com- the best results.
plexity of your questions will increase as you be-
• Use classification to separate the data into
come more comfortable with the analytic process.
classes based on the response variable – both
The most important steps in the predictive analyt-
binary classes like True or False, as well as
ics process are as follows:
multi-class situations.
• Define the project outcomes and deliverables,
• Use single, multiple and polynomial regres-
state the scope of the effort, establish busi-
sion when attempting to make a prediction
ness objectives, and identify the data sets to
rather than a classification.
be used.
• In poor quality or limited data situations, A/B
• Undertake data collection and data under-
testing is appropriate. As an example, A/B
standing.
tests are statistical experiments that help you
• Perform data munging – the process of in- decide whether a change is actually making a
specting, cleaning, and transforming the data. significant impact on your product.
• Utilize exploratory data analysis (EDA) – use
graphical techniques with the objective of The Predictive Analytics Process
discovering useful information, arriving at
conclusions. Apply statistics to validate the Define State goals and business
the problem to objectives, and identify
assumptions, hypothesis and test using stan- be solved data sets
dard statistical techniques.
• Apply modeling principles to provide the abil-
Perform data collection
ity to automatically create accurate predictive Data collection and data understanding
models about the future.
• Evaluate the model allowing you to verify the
robustness of the chosen model and make Data munging Cleanse and transform data
in preparation for analytics
mid-course corrections. Test models on exist-
ing data and apply predictions to new data.
• Select a deployment option to open up the Exploratory Use plots to discover useful
data analysis insights. Apply statistics.
analytical results to every day decision making
and to get results by automating the decisions
based on the modeling. Create accurate predictive
Data modeling
models about the future
Each of the above steps can be considered itera-
tive and may be revisited as needed. It should be
noted that the data munging step often is very Verify robustness of model
Evaluate model
time-consuming depending on the cleanliness of and make adjustments
the incoming data and can take up to 70% of the
overall project timeline.
Deploy model in
Deployment production environment
Characteristics of the data can often help you de-
termine what predictive modeling techniques
10
www.inside-bigdata.com | 508-259-8570 | Kevin@insideBigData.com
Predictive Analytics
Production Deployment
The final step in the predictive analytics project With predictive analytics software you can:
timeline is to determine how best to deploy the • Transform data into predictive insights to
solution to a production environment. Of prima- guide front-line decisions and interactions.
ry concern is using open source R on larger data
• Predict what customers want and will do
sets where performance is important. The open
next to increase profitability and retention.
source R engine was not built for enterprise usage.
Deploying open source R can problematic for the • Maximize the productivity of your people
following reasons: and processes.
• Poor memory management – R does not • Increase the value of your data assets.
reclaim memory well, so memory use can • Detect and avoid security threats and fraud
grow faster, leading to out-of-memory crash- before they affect your organization.
es, as well as non-linear performance due to • Perform statistical analysis including regression
increased garbage collection requests, and analysis, classification, and cluster analysis.
increased swapping.
• Measure the social media impact of your
• Risk of deploying open source with GPL products, services and marketing campaigns.
license – software vendors are forbidden to
embed or redistribute open source R as a part
of any commercial closed-source software.
About TIBCO Spotfire
In order to avoid these issues, analysts often will
opt to convert their working R solution to a differ- TIBCO Spotfire® is the analytics solution from infrastructure
ent programming environment like C++ or Python. and business intelligence giant, TIBCO Software. From interac-
This path, however, is far from optimal since it tive dashboards and data discovery to predictive and real-time
requires recoding and significant retesting. analytics, Spotfire’s intuitive software provides an astonishingly
fast and flexible environment for visualizing and analyzing your
Best practice would be to use a commercial, data. As your analytics needs increase, our enterprise-class
enterprise-grade R solution, like TIBCO Software’s capabilities can be seamlessly layered on, helping you to be
Enterprise Runtime for R (TERR) to resolve the first to insight — and first to action.
above limitations and to yield a robust production
environment. Because many corporations already TIBCO Spotfire has a long, rich history in predictive analytics.
have legacy predictive models in house, it is also With Spotfire you can develop your own proprietary models
recommended that you ensure your analytics plat- and leverage your investments in R, S+, SAS, MATLAB, and
form supports TERR, open source R, S+, MATLAB in-database analytics of Big Data sources, such as Teradata
and SAS models, in order to take advantage of an Aster. Spotfire also offers a commercial-grade R environment,
ecosystems of predictive analytics. TERR (TIBCO Enterprise Runtime for R), which was built from
the ground up to extend the reach of R to the enterprise,
making R faster, more scalable, and able to handle memory
Conclusion much more efficiently than the open source R engine.
In this Guide we have reviewed how predictive an- TIBCO regularly contributes to the R community, including feed-
alytics helps your organization predict with confi- back to the R Core team, and offers broad compatibility with
dence what will happen next so that you can make R functions and a growing number of CRAN packages, currently
smarter decisions and improve business outcomes. 1800+. The company regularly tests TERR with a wide variety
It is important to adopt a predictive analytics solu- of R packages, and continues to extend TERR to greater R cover-
tion that meets the specific needs of different users age. TERR can be used in RStudio, the popular R IDE and also
and skill sets from beginners, to experienced ana- integrates fully with TIBCO Spotfire, as well as TIBCO Complex
lysts, to data scientists. Event Processing products, such as TIBCO Streambase.
Learn more about TIBCO Spotfire and TERR at spotfire.com
11
www.inside-bigdata.com | 508-259-8570 | Kevin@insideBigData.com