Você está na página 1de 21

Co

m
pl
im
en
ts
The State of Data

of
Analytics and
Visualization Adoption
A Survey of Usage, Access Methods,
Projects, and Skills

Matthew D. Sarrel
Raise Your Big Data IQ
Zoomdata Master Class makes it easy to get a big
data analytics education
Learn from top industry experts on topics like modern data and analytics
platforms, big and streaming data analytics, and more. Before you know it, people
will wonder how you got so smart!

Check out Zoomdata Master Class today!

Learn from: Tony Baer, Ovum; Howard Dresner, Dresner


Advisory Services; Matt Aslett, 451 Research; Wayne
Eckerson, Eckerson Group; Mark Madsen, Third Nature;
Mike Lock, Aberdeen Group …and more!
The State of Data Analytics
and Visualization Adoption

Matthew D. Sarrel

Beijing Boston Farnham Sebastopol Tokyo


The State of Data Analytics and Visualization Adoption
by Matthew D. Sarrel
Copyright © 2017 O’Reilly Media, Inc. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA
95472.
O’Reilly books may be purchased for educational, business, or sales promotional use.
Online editions are also available for most titles (http://oreilly.com/safari). For more
information, contact our corporate/institutional sales department: 800-998-9938 or
corporate@oreilly.com.

Editor: Nicole Tache Interior Designer: David Futato


Production Editor: Kristen Brown Cover Designer: Karen Montgomery
Copyeditor: Octal Publishing, Inc. Illustrator: Ellie Volckhausen

September 2017: First Edition

Revision History for the First Edition


2017-09-18: First Release

The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. The State of Data
Analytics and Visualization Adoption, the cover image, and related trade dress are
trademarks of O’Reilly Media, Inc.
While the publisher and the author have used good faith efforts to ensure that the
information and instructions contained in this work are accurate, the publisher and
the author disclaim all responsibility for errors or omissions, including without limi‐
tation responsibility for damages resulting from the use of or reliance on this work.
Use of the information and instructions contained in this work is at your own risk. If
any code samples or other technology this work contains or describes is subject to
open source licenses or the intellectual property rights of others, it is your responsi‐
bility to ensure that your use thereof complies with such licenses and/or rights.

978-1-491-99942-4
[LSI]
Table of Contents

The State of Data Analytics and Visualization Adoption. . . . . . . . . . . . . . . 1


Introduction 1
Data Analytics and Visualization Usage: The Big Picture 2
Key Areas of Analytics by Industry 4
Usage and Access of Analytics by Industry 7
Working with the Data: Joining, Sourcing, Streaming 9
Requisite Skills for Analytics by Industry 10
The Value of Big Data Today 11
Summary 13

iii
The State of Data Analytics and
Visualization Adoption

Introduction
Regardless of industry or company size, businesses are increasingly
relying on data analytics and visualization to build a competitive
advantage. Organizations are racing to gather, store, and analyze
data from many different sources in many different formats. In the
race toward success, businesses are transforming themselves to
make data-driven decisions, and the associated technology is evolv‐
ing as rapidly (or more so) as the businesses themselves.
The fast-evolving data analytics and visualization technology land‐
scape means that businesses and individuals are scrambling to make
the best technology choices. Businesses need to know that they’re
choosing the right languages, products, architectures, and data sour‐
ces. Individuals need to know that they’re learning the right skills to
snare the right jobs. Those who choose poorly run the risk of being
left behind as they fail to take advantage of the timely insights pro‐
vided by well-conceived and timely data analytics and visualization
programs.
For this reason, in the spring of 2017 Zoomdata commissioned
O’Reilly Media to field a survey to assess the state of data analytics
and visualization adoption. 875 survey respondents identified their
industry, job role, company size, reasons for using analytics, tech‐
nologies used in analytics programs, the perceived value of analytics
programs, and more.

1
Results indicate the following:

• Big data analytics and visualization programs are most mature


in manufacturing, financial service, and technology/software
companies.
• Projects are typically built for business users and business ana‐
lysts who commonly rely on visual dashboards to gain the
insights that they require to optimize business processes and
better understand customers.
• Relational databases are the most common data source
(although analytic databases and Hadoop are the most common
source of big data).
• Companies are hungry for Python, SQL, and relational database
skills.
• Kafka and Spark are emerging as the streaming data technolo‐
gies of choice.
• Customer 360/customer insights is the most common use case.

After veracity (data quality), variety followed by volume are the


most valued characteristics of big data across all industries.
Our goal with this report is to highlight the results of this survey so
that they might inform your career or organization as you embrace
new technologies for data collection, storage, analysis, and visualiza‐
tion.

Data Analytics and Visualization Usage: The


Big Picture
The 875 respondents who participated in this survey represent a
variety of industries (Figure 1-1). More than 40% reported working
in technology/software. This is followed by just over 10% in finan‐
cial services, almost 8% in healthcare/medical technology, and
roughly 5% in manufacturing, government, retail, or education/
academia.

2 | The State of Data Analytics and Visualization Adoption


Figure 1-1. Industries represented in the survey

As shown in Figure 1-2, respondents primarily indicated that they


were engineers/developers (18%), data scientists (17%), data ana‐
lysts/business analysts (15%), or architects (13%) and they work at
companies of various sizes. It is interesting to note that Managers
and CxOs are actively engaged with these topics, with 14% of
respondents compared to 8% for IT professionals.

Figure 1-2. Job roles represented in the survey

Surprisingly, small businesses of fewer than 50 employees make up


many respondents (26%). It’s refreshing to see small business lead
the charge toward the new technologies and business processes
related to data analytics and visualization (Figure 1-3).

Data Analytics and Visualization Usage: The Big Picture | 3


Figure 1-3. Organizational size (by number of employees) represented
in the survey

More than 50% of respondents indicated that they use analytics for
customer insights/customer 360, followed by business process opti‐
mization (43%; Figure 1-4). It’s important to note that these areas
directly support line-of-business activities. This supports the idea
that businesses are building data analytics and visualization pro‐
grams in order to make data-driven decisions and create competi‐
tive advantage.

Figure 1-4. Key areas using analytics within organizations

Key Areas of Analytics by Industry


Although aggregate survey results are interesting, when you drill
down into specific industries, you begin to see some important
trends. This also allows you to understand the state of data analytics
and visualization use in your industry and provides guidance for
developing programs that help build competitive advantage.
Picking up where we left off discussing the aggregate data, let’s take a
look at the key areas of analytics use by industry (Figure 1-5). Cus‐
tomer insights/customer 360 is an area of focus for more than 50%
of respondents in the technology/software, financial services, and
retail industries, and surprisingly for more than 30% of respondents
in education/academia. The potential business impact of under‐

4 | The State of Data Analytics and Visualization Adoption


standing customers cannot be underestimated. Understanding cus‐
tomer needs is likely to lead to happy customers, and happy
customers are likely to lead to greater revenue.

Figure 1-5. Areas of analytics by industry

The exception is in healthcare/medical technology where healthcare


data analysis is far and away the most common key area of data ana‐
lytics and visualization use. This doesn’t come as very much of a

Key Areas of Analytics by Industry | 5


surprise though because this is an industry specific use case. If
you’re not analyzing healthcare data, you’re probably not much of a
healthcare/medical technology company. Healthcare data analysis is
followed by other important business-related analyses such as cus‐
tomer insights/customer 360 and business process optimization.
Business process analysis is another important use of data analytics
and visualization, and occupies a top-three spot in every industry as
reported by survey respondents. Business process optimization is
the top use of data analytics and visualization in manufacturing and
government. Optimizing business processes typically results in
decreased operating costs and can also lead to greater customer sat‐
isfaction, so this is a strategic way to build competitive advantage
across many industries.
Similarly, the retail and manufacturing industries also place an
emphasis on supply chain analytics and visualization initiatives.
Uncovering supply chain problems in a timely manner gives retail
and manufacturing businesses an opportunity to find alternate sour‐
ces. An optimal supply chain is certainly a competitive advantage
for these businesses.
Fraud detection/cyber security intelligence is an important use of
data analytics and visualization in financial services and govern‐
ment. Fraud detection is critical to any financial service, given that
this industry is rife with attempted fraud. Where there’s money,
there’s likely to be attempted fraud. Detecting and eliminating fraud
builds trust with customers while decreasing operating costs. Cyber
security intelligence is a focus of numerous government agencies,
while preventing fraud is critical to elections and efficient ongoing
operations.
Looking at the question “At what stage are big data analytics
project(s) in your organization” by industry helps us to understand
how the rate of adoption varies by industry. In our top six industries
—financial services, government, healthcare/medical technology,
manufacturing, retail, and technology/software—we see that adop‐
tion runs the gamut from “we don’t have big data analytics projects”
(18%) to “multiple projects” (22%). Manufacturing leads the “multi‐
ple projects” category, with 28%, while government lags in this cate‐
gory, with 7%.
Let’s examine the stage of data analytics projects by specific industry
(Figure 1-6). The leading response in manufacturing is “multiple

6 | The State of Data Analytics and Visualization Adoption


projects” at 28% followed by “in development” at 22%. We see a sim‐
ilar case in the financial services industry, with about 25% of
respondents note having “multiple projects,” and about 21% of
respondents having “in development” projects. Technology/software
respondents indicate that 21% are involved in multiple projects and
in-development projects. In healthcare/medical technology the pic‐
ture is a little muddled in that 25% of respondents are engaged in
multiple projects, whereas 26% report that they aren’t engaged in
any big data analytics projects. Retail is in a similar position with
23% reporting no projects and 25% reporting multiple projects. In
government, “we don’t have big data analytics projects” leads at 33%
followed by “defining requirements” at 27%.

Figure 1-6. Stage of data analytics projects by industry

Usage and Access of Analytics by Industry


Looking at the target user for big data analytics and visualization
projects (Figure 1-7), we see that in aggregate our survey respond‐
ents are developing for business users. This means that the analytics
and visualization software must be easy to use and intuitive. Busi‐
ness users can’t afford to spend all day focused on the mechanics of
analytics. For analytics to provide competitive advantage, business
users must be able to quickly and easily convert data into insights
and take action.

Figure 1-7. Target users of big data analytics project(s)

This holds true across our top six industries (Figure 1-8). Digging
deeper, business analysts are the second most common target in

Usage and Access of Analytics by Industry | 7


government (tied with customers), manufacturing, retail, and tech‐
nology/software. Data scientists are the second most common target
users in financial services and healthcare/medical technologies.

Figure 1-8. Target users of big data analytics project(s) by industry

We asked survey participants, “Where would big data analytics be


available to users?” and the responses are split roughly evenly
between embedded in an application of business process and stand‐
alone business intelligence (BI) applications (Figure 1-9). Financial
services (57%) and technology/software (54%) show a slight prefer‐
ence for embedded, whereas retail (58%) shows a slight preference
for standalone BI applications.

Figure 1-9. Method for accessing big data analytics

We asked survey participants to identify how users would interact


with data analytics: dashboards, embedded in applications, or opera‐
tional reports (Figure 1-10). Across our top 6 categories, respond‐
ents showed a strong preference toward dashboards. The second
most common way for users to interact with big data analytics was
operational reports. However, the second most common way for
financial services users to interact with big data analytics was
embedded in applications.

8 | The State of Data Analytics and Visualization Adoption


Figure 1-10. User interaction with data analytics

Working with the Data: Joining, Sourcing,


Streaming
We asked survey participants how they join data from multiple
sources in order to analyze it (Figure 1-11). In our top six categories,
data warehouse/datamart was the predominant response. This was
especially true in retail (56%). Virtual federation/mashup (blending
data on-the-fly without moving into a warehouse) is most widely
used in healthcare/medical technology (24%), technology/software
(21%), and government (21%).

Figure 1-11. Joining data methodology

We asked survey participants to identify their main data sources


(Figure 1-12). Not surprisingly, relational database is the leading
response in our top six industries, topping out at 39% in healthcare/
medical technology. The leading nonrelational and big data stores
are ranked as analytic database, Hadoop, NoSQL database, cloud
data store, in-memory database, and search database. Financial
services (24%) and government (25%) make the heaviest use of ana‐
lytic databases, whereas retail (11%) and technology/software (10%)
make the heaviest use of cloud data stores. Hadoop usage hovers
around 15%, except in government where it drops to 9%. In-
memory databases are used primarily by manufacturing (10%) and
government (9%). Manufacturing (12%) is also the heaviest user of
search databases.

Working with the Data: Joining, Sourcing, Streaming | 9


Figure 1-12. Main data sources for analytics

Spark and Kafka are far and away the most common technologies
used for analyzing streaming data (Figure 1-13). This holds true for
respondents across all industries as well as respondents in our top 6
industries. Spark and Kafka account for over 65% of streaming data
analysis in our survey. Technology/software (37%) is the leading
industry for Kafka followed by financial services (33%). Spark, pop‐
ular across all industries, is led by retail (40%), healthcare/medical
technology (37%), and technology/software (35%). Confluence is
most widely adopted in government (30%), which is also where
Streamsets (11%) is most common.

Figure 1-13. Technologies for analyzing streaming data

Requisite Skills for Analytics by Industry


Turning to the analytics-related skills that industries are staffing
based on the technologies that they are planning to adopt, we see
that overall the skills in the most demand are Python, SQL, and rela‐
tional databases, followed by Hadoop and Java (Figure 1-14). This
holds true in our top 6 industries as well, with government leading
the demand for Python (19%) and relational database (17%) skills,
and healthcare/medical technology leading the demand for SQL
(18%) skills.

10 | The State of Data Analytics and Visualization Adoption


Figure 1-14. Required analytics-related skills by industry

The Value of Big Data Today


We asked survey participants to rank the value of four characteris‐
tics of big data: veracity, velocity, variety, and volume. This gives
insight into the overall use and business impact of big data analytics
and visualization programs. Volume refers to the amount of data
that is gathered and analyzed. Variety refers to the many different
sources and types of data—structured and unstructured data—that
is gathered and analyzed. Velocity refers to the pace at which data is
gathered and analyzed. And last, but certainly not least, veracity
refers to how closely the data approximates the “truth” and lacks bia‐
ses, abnormalities, and inaccuracies. Successful big data analytics
programs must consider the combination of volume, variety, veloc‐
ity, and veracity in order to provide business insight. Anything less
will fail to provide the competitive advantage the company desired
when launching its big data analytics and visualization initiative.
Looking at the combined data across all industries, the most valued
characteristic of data is veracity (Figure 1-15). This isn’t terribly sur‐
prising, given that without veracity, there wouldn’t be much value in
big data projects at all. It doesn’t matter how powerful your analytics
programs are if you’re feeding them biased and inaccurate data.
Next in importance is variety. This indicates that analytics and visu‐
alization solutions must be able to combine multiple sources and
types of data, structured and unstructured, to provide the insights
that businesses need. Next in importance is volume. Finally, velocity
has the least value to survey respondents, indicating that they con‐
tinue to place tremendous value in typical business data and not as
much in unstructured and streaming data. This is consistent with
the relative lack of adoption of streaming data analysis as reported
in other questions.

The Value of Big Data Today | 11


Figure 1-15. Value of data characteristics (1 being most valuable, 4
being least valuable)

Digging deeper into our top six industries, we see that a high value is
placed on veracity across the board, although technology/software
and manufacturing don’t hold veracity in as high regard as do retail,
financial services, government, and healthcare/medical technology
(Figure 1-16). Variety is most valued by retail, financial services,
and technology/software (Figure 1-17). Interestingly, volume
(Figure 1-18) shares a similar value across our top six industries.
Velocity, the least valuable characteristic of big data in our overall
responses, does have value for technology/software and manufactur‐
ing (Figure 1-19).

Figure 1-16. Importance of veracity

12 | The State of Data Analytics and Visualization Adoption


Figure 1-17. Importance of variety

Figure 1-18. Importance of volume

Figure 1-19. Importance of velocity

Summary
The survey results show that to offer business value, analytics and
visualization programs are typically aimed at supplying business
users and business analysts with the information they require. This
information is most often embedded in an application or in a stand‐
alone BI application, and is engaged with via dashboards. The value
placed on veracity tells us that this information must be accurate
and unbiased.
Relational databases are the most popular main data sources for
organizations. Beyond that, analytic databases and Hadoop are the
most common sources of big data. This coincides with our respond‐
ents prioritizing Python, SQL, and relational database skills for ana‐
lytics workers. The emphasis on relational databases and SQL
indicates that our survey respondents still place tremendous value in

Summary | 13
typical business data and not as much in unstructured and stream‐
ing data. However, those working with streaming data rely heavily
on Kafka and Spark.
Manufacturing, financial services, and technology/software are fur‐
thest along the adoption curve for big data analytics and visualiza‐
tion technology, with companies in these verticals reporting that
they are in the “multiple projects” and/or “development” phases.
These three industries are followed by healthcare/medical technol‐
ogy and retail while government brings up the rear with over half of
respondents indicating that they either have no big data analytics
projects in-progress or they’re currently defining requirements.

14 | The State of Data Analytics and Visualization Adoption


About the Author
Matthew D. Sarrel is the founder of Sarrel Group, a technical and
content marketing consulting practice and product test lab. Matt has
over 30 years of experience in technology analysis, implementation,
testing, and marketing with a focus on security, networking, and big
data. He has worked for some of the largest and smallest tech com‐
panies in the world. Matt has written for numerous publications
such as PCMag, eWeek, InfoWorld, GigaOm, CIO, eSecurityPlanet,
Allbusiness.com, and Backayard Magazine. Matt is passionate about
cooking with fire and competes on the KCBS BBQ circuit.

Você também pode gostar