Escolar Documentos
Profissional Documentos
Cultura Documentos
m
pl
im
en
ts
The State of Data
of
Analytics and
Visualization Adoption
A Survey of Usage, Access Methods,
Projects, and Skills
Matthew D. Sarrel
Raise Your Big Data IQ
Zoomdata Master Class makes it easy to get a big
data analytics education
Learn from top industry experts on topics like modern data and analytics
platforms, big and streaming data analytics, and more. Before you know it, people
will wonder how you got so smart!
Matthew D. Sarrel
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. The State of Data
Analytics and Visualization Adoption, the cover image, and related trade dress are
trademarks of O’Reilly Media, Inc.
While the publisher and the author have used good faith efforts to ensure that the
information and instructions contained in this work are accurate, the publisher and
the author disclaim all responsibility for errors or omissions, including without limi‐
tation responsibility for damages resulting from the use of or reliance on this work.
Use of the information and instructions contained in this work is at your own risk. If
any code samples or other technology this work contains or describes is subject to
open source licenses or the intellectual property rights of others, it is your responsi‐
bility to ensure that your use thereof complies with such licenses and/or rights.
978-1-491-99942-4
[LSI]
Table of Contents
iii
The State of Data Analytics and
Visualization Adoption
Introduction
Regardless of industry or company size, businesses are increasingly
relying on data analytics and visualization to build a competitive
advantage. Organizations are racing to gather, store, and analyze
data from many different sources in many different formats. In the
race toward success, businesses are transforming themselves to
make data-driven decisions, and the associated technology is evolv‐
ing as rapidly (or more so) as the businesses themselves.
The fast-evolving data analytics and visualization technology land‐
scape means that businesses and individuals are scrambling to make
the best technology choices. Businesses need to know that they’re
choosing the right languages, products, architectures, and data sour‐
ces. Individuals need to know that they’re learning the right skills to
snare the right jobs. Those who choose poorly run the risk of being
left behind as they fail to take advantage of the timely insights pro‐
vided by well-conceived and timely data analytics and visualization
programs.
For this reason, in the spring of 2017 Zoomdata commissioned
O’Reilly Media to field a survey to assess the state of data analytics
and visualization adoption. 875 survey respondents identified their
industry, job role, company size, reasons for using analytics, tech‐
nologies used in analytics programs, the perceived value of analytics
programs, and more.
1
Results indicate the following:
More than 50% of respondents indicated that they use analytics for
customer insights/customer 360, followed by business process opti‐
mization (43%; Figure 1-4). It’s important to note that these areas
directly support line-of-business activities. This supports the idea
that businesses are building data analytics and visualization pro‐
grams in order to make data-driven decisions and create competi‐
tive advantage.
This holds true across our top six industries (Figure 1-8). Digging
deeper, business analysts are the second most common target in
Spark and Kafka are far and away the most common technologies
used for analyzing streaming data (Figure 1-13). This holds true for
respondents across all industries as well as respondents in our top 6
industries. Spark and Kafka account for over 65% of streaming data
analysis in our survey. Technology/software (37%) is the leading
industry for Kafka followed by financial services (33%). Spark, pop‐
ular across all industries, is led by retail (40%), healthcare/medical
technology (37%), and technology/software (35%). Confluence is
most widely adopted in government (30%), which is also where
Streamsets (11%) is most common.
Digging deeper into our top six industries, we see that a high value is
placed on veracity across the board, although technology/software
and manufacturing don’t hold veracity in as high regard as do retail,
financial services, government, and healthcare/medical technology
(Figure 1-16). Variety is most valued by retail, financial services,
and technology/software (Figure 1-17). Interestingly, volume
(Figure 1-18) shares a similar value across our top six industries.
Velocity, the least valuable characteristic of big data in our overall
responses, does have value for technology/software and manufactur‐
ing (Figure 1-19).
Summary
The survey results show that to offer business value, analytics and
visualization programs are typically aimed at supplying business
users and business analysts with the information they require. This
information is most often embedded in an application or in a stand‐
alone BI application, and is engaged with via dashboards. The value
placed on veracity tells us that this information must be accurate
and unbiased.
Relational databases are the most popular main data sources for
organizations. Beyond that, analytic databases and Hadoop are the
most common sources of big data. This coincides with our respond‐
ents prioritizing Python, SQL, and relational database skills for ana‐
lytics workers. The emphasis on relational databases and SQL
indicates that our survey respondents still place tremendous value in
Summary | 13
typical business data and not as much in unstructured and stream‐
ing data. However, those working with streaming data rely heavily
on Kafka and Spark.
Manufacturing, financial services, and technology/software are fur‐
thest along the adoption curve for big data analytics and visualiza‐
tion technology, with companies in these verticals reporting that
they are in the “multiple projects” and/or “development” phases.
These three industries are followed by healthcare/medical technol‐
ogy and retail while government brings up the rear with over half of
respondents indicating that they either have no big data analytics
projects in-progress or they’re currently defining requirements.