Você está na página 1de 93

December 5, 2016

Dresner Advisory Services, LLC

2016 Edition

Big Data Analytics Market Study

Wisdom of Crowds Series

Licensed to Pentaho
2016 Big Data Analytics Market Study

Disclaimer:

This report should be used for informational purposes only. Vendor and product selections should be made based on
multiple information sources, face-to-face meetings, customer reference checking, product demonstrations and
proof-of-concept applications.

The information contained in all Wisdom of Crowds Market Study Reports reflects the opinions expressed in the
online responses of individuals who chose to respond to our online questionnaire and does not represent a scientific
sampling of any kind. Dresner Advisory Services, LLC shall not be liable for the content of reports, study results, or for
any damages incurred or alleged to be incurred by any of the companies included in the reports as a result of its
content.

Reproduction and distribution of this publication in any form without prior written permission is forbidden.

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

2
2016 Big Data Analytics Market Study

Definition

Big Data Analytics Defined


We define big data analytics as systems that enable end-user access to and analysis of data
contained and managed within the Hadoop ecosystem.

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

3
2016 Big Data Analytics Market Study

Introduction
This year we celebrate the ninth anniversary of Dresner Advisory Services! We offer our
thanks to all of you for your continued support and ongoing encouragement.

Since our founding in 2007, we have worked hard to set the bar highchallenging
ourselves to innovate and lead the marketoffering ever greater value with each
successive year.

Our first market report in 2010 set the stage for where we are today. Since that time, we
have expanded our agenda and have added new research topics every year since. For
2016, we are on track to release 15 major reports, including our recent flagship BI
reportin its seventh year of publication!

In addition to our ongoing coverage of key topics such as embedded BI, big data
analytics and advanced and predictive analytics, we have added new topics including
Collective InsightsTM (blending collaboration and governance) and systems integrators.

For this, our second Big Data Analytics Market Study, we continue to focus upon the
combination of analytical solutions within the Hadoop ecosystem, adding some new
criteria and exploring changing market dynamics and user perceptions and plans.

We hope you enjoy this report!

Best,

Howard Dresner
Chief Research Officer
Dresner Advisory Services

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

4
2016 Big Data Analytics Market Study

Contents
Definition ......................................................................................................................... 3
Big Data Analytics Defined........................................................................................... 3
Introduction ..................................................................................................................... 4
Benefits of the Study ....................................................................................................... 7
A Consumer Guide ...................................................................................................... 7
A Supplier Tool ............................................................................................................ 7
About Howard Dresner and Dresner Advisory Services .................................................. 8
About Jim Ericson ........................................................................................................... 9
Survey Method and Data Collection .............................................................................. 10
Data Quality ............................................................................................................... 10
Executive Summary ...................................................................................................... 12
Study Demographics ..................................................................................................... 13
Geography ................................................................................................................. 13
Functions ................................................................................................................... 14
Vertical Industries ...................................................................................................... 15
Organization Size....................................................................................................... 16
Analysis and Trends: Big Data Analytics ....................................................................... 18
Importance of Big Data .............................................................................................. 18
Big Data Adoption ...................................................................................................... 19
Future Adoption of Big Data ....................................................................................... 25
Big Data Use Cases................................................................................................... 31
Big Data Infrastructure ............................................................................................... 37
Big Data Data Access ............................................................................................. 43
Big Data Search ......................................................................................................... 49
Big Data Analytics / Machine-Learning Technologies ................................................ 55
Big Data Distributions ................................................................................................ 61
Industry and Vendor Analysis ........................................................................................ 68
Big Data Analytics Vendor Ratings ............................................................................ 79
Glossary ........................................................................................................................ 80

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

5
2016 Big Data Analytics Market Study

Other Dresner Advisory Services Research Reports .................................................... 84


Appendix: Big Data Analytics Study Survey Instrument ................................................ 85

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

6
2016 Big Data Analytics Market Study

Benefits of the Study


The DAS Big Data Analytics Market Study provides a wealth of information and
analysis, offering value to both consumers and producers of related technology and
services.

A Consumer Guide
As an objective source of industry research, consumers use the DAS Big Data Analytics
Market Study to understand how their peers are leveraging and investing in big data
analytics and related technologies.

Using our unique vendor performance measurement system, users glean key insights
into software supplier performance, enabling:

Comparisons of current vendor performance to industry norms


Identification and selection of new vendors

A Supplier Tool
Vendor licensees use the DAS Big Data Analytics Market Study in several important
ways:

External Awareness

Build awareness for the big data analytics market and supplier brand, citing
DAS Big Data Analytics Market Study trends and vendor performance
Create lead and demand generation for supplier offerings through association
with DAS Big Data Analytics Market Study brand, findings, webinars, etc.

Internal Planning

Refine internal product plans and align with market priorities and realities as
identified in DAS Big Data Analytics Market Study
Better understand customer priorities, concerns, and issues
Identify competitive pressures and opportunities

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

7
2016 Big Data Analytics Market Study

About Howard Dresner and Dresner Advisory Services


The DAS Big Data Analytics Market Study was conceived, designed, and executed by
Dresner Advisory Services, LLC, an independent advisory firm, and Howard Dresner, its
president, founder and chief research officer.

Howard Dresner is one of the foremost thought leaders in business intelligence and
performance management, having coined the term Business Intelligence in 1989. He
has published two books on the subject, The Performance
Management Revolution Business Results through Insight
and Action (John Wiley & Sons, Nov. 2007) and Profiles in
Performance Business Intelligence Journeys and the
Roadmap for Change (John Wiley & Sons, Nov. 2009). He
lectures at forums around the world and is often cited by the
business and trade press.

Prior to Dresner Advisory Services, Howard served as chief


strategy officer at Hyperion Solutions and was a research fellow at Gartner, where he
led its business intelligence research practice for 13 years.

Howard has conducted and directed numerous in-depth primary research studies over
the past two decades and is an expert in analyzing these markets.

Through the Wisdom of Crowds Business Intelligence market research reports, we


engage with a global community to redefine how research is created and shared. Other
research reports include:

- Wisdom of Crowds Flagship Business Intelligence Market study

- Advanced and Predictive Analytics

- Collective InsightsTM

- Internet of Things and Business Intelligence

- Small and Mid-Sized Enterprise Business Intelligence

- Systems Integrators

Howard conducts a weekly Twitter tweetchat on Fridays at 1:00 p.m. ET. During these
live events the #BIWisdom tribe discusses a wide range of business intelligence
topics.

You can find more information about Dresner Advisory Services at


www.dresneradvisory.com.
http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

8
2016 Big Data Analytics Market Study

About Jim Ericson


Jim Ericson is a research director with Dresner Advisory Services.

Jim has served as a consultant and journalist who studies end-user management
practices and industry trending in the data and information management fields.

From 2004 to 2013 he was the editorial director at Information Management magazine
(formerly DM Review), where he created architectures for user and
industry coverage for hundreds of contributors across the breadth of
the data and information management industry.

As lead writer, he interviewed and profiled more than 100 CIOs,


CTOs, and program directors in a 2010-2012 program called 25
Top Information Managers. His related feature articles earned
ASBPE national bronze and multiple Mid-Atlantic region gold and
silver awards for Technical Article and for Case History feature
writing.

A panelist, interviewer, blogger, community liaison, conference co-chair, and speaker in


the data-management community, he also sponsored and co-hosted a weekly podcast
in continuous production for more than five years.

Jims earlier background as senior morning news producer at NBC/Mutual Radio


Networks and as managing editor of MSNBCs first Washington, D.C. online news
bureau cemented his understanding of fact-finding, topical reporting, and serving broad
audiences.

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

9
2016 Big Data Analytics Market Study

Survey Method and Data Collection


As with all of our Wisdom of Crowds Business Intelligence Market Studies, we
constructed a survey instrument to collect data and used social media and crowd-
sourcing techniques to recruit participants.

We include our own research community of nearly 4,000 organizations as well as


crowdsourcing and vendors customer communities.

Data Quality
We carefully scrutinized and verified all respondent entries to ensure that only qualified
participants are included in the study.

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

10
2016 Big Data Analytics Market Study

Executive
Summary

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

11
2016 Big Data Analytics Market Study

Executive Summary
Over two years of big data analytics study, we see a significant increase in
uptake and a large drop in holdouts with no big data plans. High tech and
telecom are industry leaders (p. 20-24).
Current adoption and future plans for the use of big data analytics have reached
a level of significance we did not see last year. Forty-one percent of
organizations are already using Hadoop-related big data. Even more say they
may use big data in the future (p. 19).
Among organizations that have not yet adopted big data, 14 percent will adopt in
the current calendar year, a horizon grows that grows to 47 percent in 2017.
BICC respondents are likely future adopters (p. 25-30).
Among technologies and initiatives considered strategic to business intelligence,
big data analytics is ranked 20th out of 30 topical areas under study, still well
behind core BI practices (p. 18). Overall, vendors are still highly positive on big
data though sentiment is leveling off (p. 68).
The top big data use cases in 2016 are data warehouse optimization, followed by
customer/social analysis (p. 31-36).
The top big data infrastructure choice among users is Spark, followed by
Map/Reduce, Yarn, Oozie, Tez, Mesos, and Atlas. Over time, Spark is gaining
status as a category leader (p. 40-42). Industry support is strongest for
Map/Reduce, but Spark is closing in quickly (p. 69-70).
Spark SQL is the most-cited big data access structure followed closely by Hive
and HDFS (p. 43-48). Industry support is strongest for Hive and HDFS; Spark
support remains lower than user expectations (p. 71-72).
Amid lukewarm interest, toward big data search technologies, Elasticsearch
resonated most strongly followed by Apache Solr and Cloudera Search (p. 49-
54). The industry is strongest for Apache Solr, and support for Cloudera fell
noticeably (p. 73-74).
Spark MLib is the most-preferred big data machine learning technology,
important to more than 60 percent of respondents. All machine learning
technologies gather interest but are still at the fringe (p. 55-60). Industry support
for big data analytics / machine learning is strongest for Spark MLib followed by
Mahout (p. 75-76).
Cloudera is the most popular big data distribution among users, followed by
Hortonworks, Amazon, and MAP/R (p. 61-66). We see significant existing
industry support and future plans for big data (Hadoop) distributions (p. 77-78).

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

12
2016 Big Data Analytics Market Study

Study Demographics
Our 2015 Big Data Analytics Market Study is based on a cross-section of data that
spans geographies, functions, organization size, and vertical industries. We believe
that, unlike other industry research, this supports a more representative sample and
better indicator of true market dynamics. We constructed cross-tab analyses using
these demographics to identify and illustrate important industry trends.

Geography
North America, which includes the U.S., Canada, and Puerto Rico, represents 57
percent of respondents (fig. 1). EMEA accounts for the next largest group (32 percent),
followed by Asia Pacific and Latin America.

Geographies Represented
60% 57%

50%

40%

32%

30%

20%

10% 8%

3%

0%
North America Europe, Middle East Asia Pacific Latin America
and Africa

Figure 1 Geographies represented

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

13
2016 Big Data Analytics Market Study

Functions
IT (28 percent) and the business intelligence competency center (21 percent) are the
two largest groups represented in our big data analytics sample (fig. 2).

Examining trends and behavior by function helps us compare and contrast plans and
priorities in different areas of organizations.

Functions Represented

Information Technology (IT) 28%

Business intelligence competency center 21%

Executive management 12%

Research and development (R&D) 11%

Sales and Marketing 10%

Finance 8%

Other 12%

0% 5% 10% 15% 20% 25% 30%

Figure 2 - Functions represented

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

14
2016 Big Data Analytics Market Study

Vertical Industries
Technology (14 percent), financial services (10 percent), and consulting (9 percent) are
the most represented industries in our study, followed by healthcare, education, and
telecommunications (fig. 3). We include responses from consultantswho often have
greater interaction with initiatives and deeper industry knowledge than many customer
counterparts. This also yields insight into the partner ecosystem for BI vendors.

Vertical Industries Represented


20%
18%
18%

16%
14%
14%

12%
10%
10% 9%
9%
8%
8% 7%

6% 6%
5%
4%
4% 3%
2% 2%
2% 2% 2% 2%

0%

Figure 3 Vertical industries represented

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

15
2016 Big Data Analytics Market Study

Organization Size
Respondents to our big data analytics study reflect a mix of organizational sizes and
structures (fig. 4). Small organizations of 1-100 employees represent 26 percent of the
sample. Mid-sized organizations also account for 27 percent, and the remaining 47
percent are large organizations with more than 1,000 employees.

Organization Sizes Represented


30%
27% 27%
26%
25%

20%
20%

15%

10%

5%

0%
1 - 100 101 - 1000 1001 - 5000 More than 5000

Figure 4 Organization sizes represented

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

16
2016 Big Data Analytics Market Study

Analysis and
Trends

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

17
2016 Big Data Analytics Market Study

Analysis and Trends: Big Data Analytics

Importance of Big Data


Among technologies and initiatives considered strategic to business intelligence, big
data analytics is ranked 20th out of 30 topical areas we currently study (fig. 5). This
finding reflects interest similar to last year's inaugural Big Data Analytics Market Study
(in which big data ranked 18th of 25 topics under study at the time). We understand that
big data interest can and does vary widely from organization to organization and will be
critical to some and irrelevant to others. While we see increasing momentum, big data
analytics still distantly trails the status and penetration of mainstream business
intelligence practices such as reporting, dashboards, and end-user self-service.

Technologies and Initiatives Strategic to Business Intelligence


0% 20% 40% 60% 80% 100%
Reporting
Dashboards
End-user "self-service"
Advanced visualization
Data discovery
Data warehousing
Data mining, advanced algorithms, predictive
Integration with operational processes
Data storytelling
Enterprise planning/budgeting
Mobile device support
Critical
Embedded BI (contained within an application,
Governance
Collaborative support for group-based analysis Very important
End-user data preparation and blending
Search-based interface
Important
Software-as-a-Service and cloud computing
In-memory analysis
Ability to write to transactional applications Somewhat
Location intelligence/analytics important
Big data (e.g., Hadoop) Not important
Pre-packaged vertical/functional analytical
Text analytics
Streaming data analysis
Open source software
Social media analysis (Social BI)
Cognitive BI (e.g., Artificial Intelligence-based BI)
Complex event processing (CEP)
Internet of Things (IoT)
Edge computing

Figure 5 - Technologies and initiatives strategic to business intelligence

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

18
2016 Big Data Analytics Market Study

Big Data Adoption


Current adoption and future plans for the use of big data analytics have reached a level
of significance we did not see last year. Forty-one percent of organizations say they are
already using big data analytics (fig. 6), which we define as "systems that enable end-
user access to and analysis of data contained and managed within the Hadoop
ecosystem. Even more respondents (46 percent) say they may use big data in the
future. Just 14 percent have no plans for future use of big data analytics.

Adoption of Big Data

No. We have no
plans to use big
data at all, 14%

Yes. We use big


data today, 41%

We may use big


data in the future,
46%

Figure 6 Adoption of big data

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

19
2016 Big Data Analytics Market Study

Over the two years of our comprehensive big data analytics study, we see a significant
increase in uptake and a large drop in holdouts with no plans (fig. 7). Forty-one percent
of respondents report current big data use, a greater than two-fold increase over 2015.
At the same time, the number of respondents with no plans fell by a factor of greater
than two, from 36 percent to 14 percent. The percentage of ambivalent users was
consistent year over year at 45 percent or a bit more. We can anecdotally chalk these
findings up to a emerging mix of practical/achievable projects, service enablement, and
greater understanding of big data uses.

Adoption of Big Data 2015 to 2016


50%

45%

40%

35%

30%

25% 2015
2016
20%

15%

10%

5%

0%
Yes. We use big data today We may use big data in the No. We have no plans to use
future big data at all

Figure 7 - Adoption of big data 2015 to 2016

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

20
2016 Big Data Analytics Market Study

In our 2016 sample, EMEA leads slightly in current adoption (43 percent) compared to
North America (40 percent) and is well ahead of Asia Pacific (33 percent) (fig. 8). Asia
Pacific also reports the most organizations with "no plans to use big data at all" (27
percent). Both EMEA and North America report 46 percent undecided ("we may use big
data...") respondents.

Adoption of Big Data by Geography


100%

90%

80%

70%

60% No. We have no plans to use


big data at all
50%
We may use big data in the
40% future
Yes. We use big data today
30%

20%

10%

0%
North America Europe, Middle Asia Pacific
East and Africa

Figure 8 Adoption of big data by geography

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

21
2016 Big Data Analytics Market Study

Perennial first-mover high-tech organizations lead 2016 big data adoption with 59
percent reporting current use (fig. 9). Telecommunications, with possibly the greatest
data transaction volume issues of any industry, is the next most likely industry to
currently use big data analytics (50 percent). Financial services, another high data
transaction industry, reports 45 percent current use. Less likely to be current users,
consulting industry respondents are nonetheless prepared to embrace big data as
needed.

Adoption of Big Data by Vertical Industry


100%

90%

80%

70%

60%

50% No. We have no plans to use


big data at all
40%
We may use big data in the
30%
future
20% Yes. We use big data today
10%

0%

Figure 9 Adoption of big data by vertical industry

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

22
2016 Big Data Analytics Market Study

In 2016, the BICC supplanted R&D as the most likely current departmental user of big
data (fig. 10). This finding supports the notion that big data is moving from an
experimental to practical pursuit in organizations. As is often the case, executive
management is a likely-to-sure proponent of evolutionary technologies such as big data.
We are uncertain as to why finance is also a strong player in big data unless interest
there is tuned organizationally at cost savings. IT predictably lags in current adoption
and is most likely to have vested interest in supporting legacy and traditional technology
investments.

Adoption of Big Data by Function


100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Information Business Executive Research and Sales & Finance
Technology intelligence management development Marketing
(IT) competency (R&D)
center
No. We have no plans to use big data at all
We may use big data in the future
Yes. We use big data today

Figure 10 Adoption of big data by function

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

23
2016 Big Data Analytics Market Study

Current adoption of big data is strongest (61 percent) within very large businesses and
institutions that have more than 5,000 employees (fig. 11). Small organizations with one
to 100 employees have the lowest rate of current adoption (29 percent). After very large
organizations, however, small and mid-size (101-1,000 employees) are most open to
possible future use. We would expect that small organizations are most likely cloud
users of big data services while large organizations will likely deploy onsite.

Adoption of Big Data by Organization Size


100%

90%

80%

70%

60% No. We have no plans to use


big data at all
50%
We may use big data in the
40% future
Yes. We use big data today
30%

20%

10%

0%
1 - 100 101 - 1000 1001 - 5000 More than
5000

Figure 11 Adoption of big data by organization size

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

24
2016 Big Data Analytics Market Study

Future Adoption of Big Data


Among organizations that have not yet adopted big data but have future plans, 14
percent say they will adopt in the current calendar year (fig. 12). This horizon grows
rapidly in 2017 when 47 percent plan to adopt. Unlike 2015 (see following fig. 13), only
a minority of non-users of big data adopters are postponing plans beyond 2017. Though
we often find big data plans compartmentalized to projects or departments, future
adoption will also hinge on current investment budgets for more "conventional"
technologies.

Future Adoption of Big Data

Will adopt in 2016,


14%

Will adopt beyond


2017, 40%

Will adopt in 2017,


47%

Figure 12 Future adoption of big data

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

25
2016 Big Data Analytics Market Study

Compared to our inaugural 2015 study, year-over-year future adoption plans for big
data represent a sea change of respondent behavior (fig. 13). Current year adoption
plans are more than three times greater in 2016 (14 percent) compared to last year (4
percent). Next-year adoption in our current study (47 percent) shows remarkable growth
from 2015's 27 percent plans. Significantly fewer respondents are delaying plans
beyond next year, plainly indicating they are allocating money, resources, and time to
big data solutions and their use.

Future Adoption of Big Data 2015 to 2016


80%

70%

60%

50%

40% 2015
2016

30%

20%

10%

0%
Will adopt this year Will adopt next year Will adopt beyond next year

Figure 13 - Future adoption of big data 2015 to 2016

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

26
2016 Big Data Analytics Market Study

Regionally, among those who have not already adopted big data, North American and
Asia-Pacific respondents are more motivated to increase use compared to those in
EMEA (fig. 14). Asia Pacific has the greatest number of both 2016 (17 percent) and
2017 (50 percent) adopters; EMEA has the most respondents (48 percent) with plans
deferred beyond 2017.

Future Adoption of Big Data by Geography


100%

90%

80%

70%

60%

50% Will adopt beyond 2017


Will adopt in 2017
40%
Will adopt in 2016
30%

20%

10%

0%
North America Europe, Middle Asia Pacific
East and Africa

Figure 14 - Future adoption of big data by geography

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

27
2016 Big Data Analytics Market Study

Among organizations not yet using big data, vertical adoption in 2016 is highest (about
20 percent) in education, technology, and telecommunications (fig. 15). Plans for 2017
adoption are by far highest in financial services (75 percent), followed by consulting and
healthcare. (While future plans for telecommunications and technology appear relatively
low, recall that these sectors are also the greatest current users of big data technologies
(fig. 9, p. 22)).

Future Adoption of Big Data by Vertical Industry


100%

90%

80%

70%

60%

50%

40% Will adopt beyond 2017


30% Will adopt in 2017
20% Will adopt in 2016

10%

0%

Figure 15 Future adoption of big data by vertical industry

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

28
2016 Big Data Analytics Market Study

Among non-users of big data, the BICC has by far the highest (30 percent) current-year
adoption plans (fig. 16). Accelerating BICC use is generally a reflection of delivery as
well as incipient demand for business technologies, another indication that big data
analytics is "crossing the chasm" of use cases and enterprise adoption. Sales and
marketing and IT (low in current usage, fig. 10, p. 23), are the next most likely to be
current-year adopters of big data analytics, perhaps by executive fiat, (whose next year
interest is correspondingly highest).

Future Adoption of Big Data by Function


100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%
Business Executive Information Finance Sales & Research and
intelligence management Technology Marketing development
competency (IT) (R&D)
center
Will adopt in 2016 Will adopt in 2017 Will adopt beyond 2017

Figure 16 Future adoption of big data by function

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

29
2016 Big Data Analytics Market Study

As with current users of big data analytics (fig. 11, p. 24), 2016 first-adoption plans are
highest at very large organizations with more than 5,000 employees (fig. 17). More than
60 percent of very large organizations will take up the use of big data in 2016, more
than twice the rate at small organizations (29 percent). That said, we continue to believe
cloud-based offerings will be a strong driver of big data going forward for organizations
of any size. Possibly in that vein, 2017 adoption plans are highest at small organizations
(58 percent), followed by mid-sized organizations (50 percent).

Future Adoption of Big Data by Organization Size


100%

90%

80%

70%

60%

50% Will adopt beyond 2017


Will adopt in 2017
40%
Will adopt in 2016
30%

20%

10%

0%
1 - 100 101 - 1000 1001 - 5000 More than
5000

Figure 17 Future adoption of big data by organization size

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

30
2016 Big Data Analytics Market Study

Big Data Use Cases


The top big data use case in 2016 is data warehouse optimization, which is considered
critical or very important to 65 percent of respondents (fig. 18). As data warehouse
deployments are mostly confined to large institutions, this reinforces our view that big
data is predominantly a large-organization pursuit meant to lower cost and complexity.
That said, customer / social analysis is the next most likely use case and is, at
minimum, "very important" to a majority of respondents.

Big Data Use Cases

Data warehouse optimization

Customer/ social analysis

Clickstream analytics

Fraud detection

Internet of Things

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Critical Very important Important Somewhat important Not important

Figure 18 Big data use cases

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

31
2016 Big Data Analytics Market Study

Year over year, the top big data use cases, data warehouse optimization and customer /
social analysis, retain (and extend) their top rankings (fig. 19). The Internet of Things,
the third-most popular use case in 2015, lost momentum in 2016, possibly due to
settling hype and uneven prospects for average organizations. Clickstream analytics
and fraud detection gained the most influence year over year.

Big Data Use Cases 2015 to 2016


4

3.5

2.5

2 2015
2016
1.5

0.5

0
Data warehouse Customer/ social Clickstream Fraud detection Internet of Things
optimization analysis analytics

Figure 19 - Big data use cases 2015 to 2016

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

32
2016 Big Data Analytics Market Study

By region, Asia Pacific and North America are the most likely to prioritize data
warehouse optimization (fig. 20). (All use cases, particularly fraud detection and
clickstream analytics, are, in fact, more highly prioritized in Asia Pacific than in other
regions.) Compared to North America, EMEA nonetheless has more interest in
customer / social analysis and the Internet of Things.

Big Data Use Cases by Geography

North America

Data warehouse
optimization
Customer/ social analysis

Europe, Middle
Clickstream analytics
East and Africa

Fraud detection

Internet of Things

Asia Pacific

1 2 3 4 5

Figure 20 - Big data use cases by geography

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

33
2016 Big Data Analytics Market Study

When parsed by vertical industry, all industries rank data warehousing as a top or
second priority. Our 2016 sample shows somewhat surprising standout interest in data
warehouse optimization among healthcare respondents (fig. 21). Elsewhere, financial
services predictably reports the highest interest in fraud detection (and clickstream
analysis). Consulting leads technology in interest in customer / social analysis. The
Internet of Things interest is highest in education.

Big Data Use Cases by Vertical Industry

Technology

Data warehouse
Financial services optimization
Customer/ social analysis

Consulting Clickstream analytics

Fraud detection
Healthcare
Internet of Things

Education

1 2 3 4 5

Figure 21 Big data use cases by vertical industry

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

34
2016 Big Data Analytics Market Study

All functions in our 2016 sample rank data warehouse optimization as their highest big
data use case priority (fig. 22). IT has the most standout interest in data warehouse
optimization, which is not surprising given traditional ownership boundaries. BICC and
executive management report the highest interest in customer /social analysis, perhaps
with an opportunistic viewpoint. BICC and sales/marketing are most interested in
clickstream analytics. Finance respondents show below-average interest in all big data
use cases.

Big Data Use Cases by Function

Information
Technology (IT)

Business intelligence
competency center Data warehouse
optimization
Executive Customer/ social analysis
management
Clickstream analytics
Research and
development (R&D) Fraud detection

Sales & Marketing Internet of Things

Finance

1 2 3 4 5

Figure 22 Big data use cases by function

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

35
2016 Big Data Analytics Market Study

Very large organizations (>5,000) expectedly have the greatest proportional interest in
data warehouse optimization (fig. 23). Generally, we would expect large organizations
to be more conventional in their approach to big data use cases with an eye toward cost
efficiency, while smaller peers are more balanced across opportunities. It is interesting
however that IoT has not caught fire in organizations of any size and that very large
organizations are the least attuned to customer / social analysis.

Big Data Use Cases by Organization Size

1 - 100

Data warehouse
optimization
101 - 1000 Customer/ social analysis

Clickstream analytics

1001 - 5000 Fraud detection

Internet of Things

More than 5000

1 2 3 4 5

Figure 23 Big data use cases by organization size

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

36
2016 Big Data Analytics Market Study

Big Data Infrastructure


To gather baseline data on big data infrastructure awareness/adoption, we assembled a
list of relevant frameworks, databases, and other technologies in the Hadoop / open
source orbits of interest. In our 2016 sample, Spark is the preferred mechanism
followed by Map/Reduce, Yarn, Oozie, Tez, Mesos, and Atlas. Spark and Map/Reduce
notably stand out across multiple grades of importance. All but the top three choices
(Spark, Map/Reduce, Yarn) are "not important" or only "somewhat important" to the
majority of respondents.

Big Data Infrastructure


Spark

Map/Reduce

Yarn

Oozie

Tez

Mesos

Atlas

Knox Gateway

Alluxio (formerly Tachyon)

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Critical Very important Important Somewhat important Not important

Figure 24 Big data infrastructure

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

37
2016 Big Data Analytics Market Study

Across two years of study, Spark has surpassed Map/Reduce as the preferred big data
infrastructure (fig. 25). Preferences for Spark and associated applications/frameworks
extend across all measures in this report even though Map/Reduce is well penetrated in
early-stage use. All infrastructure choices gained favor in 2016 over 2015; the biggest
gainer besides Spark and Map/Reduce was Yarn. (2016 is the first year we polled
respondents on interest in Atlas and Knox Gateway.)

Big Data Infrastructure 2015 to 2016


4.0

3.5

3.0

2.5

2.0

1.5

1.0

0.5

0.0

2015 2016

Figure 25 - Big data infrastructure 2015 to 2016

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

38
2016 Big Data Analytics Market Study

By region, Asia-Pacific respondents indicated the highest interest in all big data
infrastructures polled in 2016 and prioritize Yarn over Map/Reduce (fig. 22), perhaps
indicating late-arriving interest and newer editions of Hadoop. Among regional
preferences, EMEA had the second-highest interest in Spark and Map/Reduce, ahead
of North America. Interest in Yarn is equal in North America and EMEA. EMEA has
slightly higher interest in Oozie and somewhat less interest in Tez and Mesos compared
to North America.

Big Data Infrastructure by Geography

North America

Europe, Middle East and Africa

Asia Pacific

1 2 3 4 5

Spark Map/Reduce Yarn Oozie Tez Mesos

Figure 26 - Big data infrastructure by geography

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

39
2016 Big Data Analytics Market Study

Big data infrastructure preferences vary by vertical industry (fig. 27). While technology
industry respondents are most singularly interested in Spark, other verticals share
similar affinity for Map/Reduceand consulting actually grades Map/Reduce higher
than Spark. This latter finding may find consulting serving existing demand and
investments in Map/Reduce. Technology, healthcare, and consulting have the most
interest in Yarn; healthcare and consulting are also the most likely to engage with
Oozie.

Big Data Infrastructure by Vertical Industry

Technology

Financial services

Consulting

Healthcare

Education

1 2 3 4 5

Spark Map/Reduce Yarn Oozie Tez Mesos

Figure 27 Big data infrastructure by vertical industry

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

40
2016 Big Data Analytics Market Study

Big data infrastructure preferences vary interestingly by function (fig. 28). The BICC
(often contained within IT) is the strongest proponent of Spark especially, followed by
Map/Reduce. As we have seen elsewhere, executive interest often follows (or leads) in
the lines of BICC activity. By comparison, R&D interest is weak and falls sharply after
Spark and Map/Reduce. Central IT is predictably a laggard in embracing big data
compared to other roles but shows some preference for the various options. Perhaps
most interesting is sales and marketing, where Ozzie and Tez claim the highest marks
of any department.

Big Data Infrastructure by Function


Information
Technology (IT)

Business intelligence
competency center

Executive
management

Research and
development (R&D)

Sales & Marketing

Finance

1 2 3 4 5

Spark Map/Reduce Yarn Oozie Tez Mesos

Figure 28 Big data infrastructure by function

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

41
2016 Big Data Analytics Market Study

We see differences in big data infrastructure preferences across organizations of


different size, but none that are striking (fig. 29). Spark and Map/Reduce are easily the
preferred choice in organizations large or small, though Spark appears to have the most
influence in very large organizations. Likewise, Yarn is consistently the third most highly
cited infrastructure choice of all organizations.

Big Data Infrastructure by Organization Size

1 - 100

101 - 1000

1001 - 5000

More than 5000

1 2 3 4 5

Spark Map/Reduce Yarn Oozie Tez Mesos

Figure 29 Big data infrastructure by organization size

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

42
2016 Big Data Analytics Market Study

Big Data Data Access


We asked organizations which big data structure access they preferred and which is
more/most important to them. This includes indirect access to Hadoop and other related
engines. In our 2016 study, Spark SQL is the most cited and considered, at minimum,
important to close to 80 percent of the sample (fig. 30). Hive and HDFS, perhaps more
familiar to the conventional data warehousing audience, follow closely and elicited even
more "critical" responses than Spark.

Big Data - Data Access


Spark SQL

Hive/HiveQL

HDFS

HBase

Google BigQuery

Redshift

MongoDB

Impala

Pivotal HAWQ

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Critical Very important Important Somewhat important Not important

Figure 30 Big data data access

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

43
2016 Big Data Analytics Market Study

Among big data access technologies studied both last year and this year, all gained
positive sentiment year over year, especially Spark SQL, Hive/Hive QL, and Impala (fig.
31). Trailing technologies, with the exception of Pivotal HAWQ, all reached positive
sentiment of 2.7 to 2.9, in the range of "important."

Big Data - Data Access 2015 to 2016


4

3.5

2.5

1.5

0.5

2015 2016

Figure 31 - Big data - data access 2015 to 2016

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

44
2016 Big Data Analytics Market Study

Big data access preferences vary by region (fig. 32). Asia Pacific had the strongest
response to several technologies, specifically Hbase, Hive, HDFS, and Spark. Globally,
Hbase was less appealing in regions other than Asia Pacific. Cloud-based solutions
(Redshift, Google BigQuery) fared worse but were slightly more appealing in North
America than other regions.

Big Data - Data Access by Geography

North America

Europe, Middle East and Africa

Asia Pacific

1 2 3 4 5

Spark SQL Hive/HiveQL HDFS HBase Google BigQuery Redshift

Figure 32 Big data data access by geography

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

45
2016 Big Data Analytics Market Study

By vertical industry, financial services, technology, and consulting are the most aligned
around Spark SQL for data access (fig. 33). HiveQL resonated most strongly in
healthcare, followed by consulting and technology. Healthcare was also the strongest
proponent of HDFS, followed by financial services and technology. Consulting
respondents report an outsized interest in Redshift. Google BigQuery fared best in
consulting and financial services.

Big Data - Data Access by Vertical Industry

Technology

Financial services

Consulting

Healthcare

Education

1 2 3 4 5

Spark SQL Hive/HiveQL HDFS HBase Google BigQuery Redshift

Figure 33 Big data data access by vertical industry

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

46
2016 Big Data Analytics Market Study

Departmental interest in data access varies by function (fig. 34). The BICC and
executive management are the strongest proponents of Spark SQL. More traditional
Hbase, HDFS, and Hive are the most favored in sales and marketing, while the BICC is
most focused on HDFS and Hive along with Spark. Cloud-based offerings (Redshift,
Google BigQuery) are initially most interesting to executive management.

Big Data - Data Access by Function


5

1
Redshift Google HBase HDFS Hive/HiveQL Spark SQL
BigQuery
Information Business intelligence Executive
Technology (IT) competency center management
Research and Sales & Marketing
development (R&D)

Figure 34 - Big data - data access by function

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

47
2016 Big Data Analytics Market Study

Small organizations (most likely to be early adopters) are proportionately most drawn to
Spark as a newer opportunity for big data access (fig. 35). Redshift (and Google
BigQuery in mid-sized organizations) are also popular as an easy and inexpensive entry
point to big data access for smaller organizations. Very large organizations are more
likely invested in big data access via HDFS and Hive followed by Spark SQL.

Big Data - Data Access by Organization Size

1 - 100

101 - 1000

1001 - 5000

More than 5000

1 2 3 4 5

Spark SQL Hive/HiveQL HDFS HBase Google BigQuery Redshift

Figure 35 - Big data - data access by organization size

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

48
2016 Big Data Analytics Market Study

Big Data Search


We asked respondents to rank interest in big data search facilities, which in Hadoop
include indexing and natural language textual search (fig. 36). In our 2016 sample,
Elasticsearch resonated most strongly followed by Apache Solr and Cloudera Search.
Despite shifting over time (which we will expand on in the following figure) there is no
clear first choice in big data search; all three technologies are, at minimum, "important"
to 65 percent to 74 percent of respondents.

Big Data Search

Elasticsearch

Apache Solr

Cloudera Search

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Critical Very important Important Somewhat important Not important

Figure 36 - Big data search

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

49
2016 Big Data Analytics Market Study

Across two years of study data, we saw a small reversal of fortunes among big data
search options (fig. 37). While Elasticsearch moved past early open source provider
Apache Solr into first place, Cloudera fell slightly from the top choice to third. While we
consider rising year-over-year sentiment a positive development, we reiterate that there
is currently no clear first choice emerging in big data search.

Big Data Search 2015 to 2016


3.5

3.0

2.5

2.0

1.5

1.0

0.5

0.0
Elasticsearch Apache Solr Cloudera Search

2015 2016

Figure 37 - Big data search 2015 to 2016

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

50
2016 Big Data Analytics Market Study

As in other measures, we found sentiment toward big data search options strongest
"across the board" in Asia Pacific (fig. 38). Also as mentioned, year-over-year sentiment
toward big data search increased across all regions, though with middling and not
remarkable levels of interest..

Big Data Search by Geography

North America

Europe, Middle East and Africa

Asia Pacific

1 2 3 4 5

Elasticsearch Apache Solr Cloudera Search

Figure 38 - Big data search by geography

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

51
2016 Big Data Analytics Market Study

We saw some divergence from overall results in big data search preference by industry
(fig. 39). Due to sector-size bias, we found respondents in three verticals (financial
services, healthcare, and consulting) preferred Cloudera Search to both top choice
Elasticsearch and Apache Solr. In contrast, technology, with a larger pool of
respondents, preferred Elasticsearch. In all instances, Apache Solr was the second
choice and was most preferred in healthcare and financial services.

Big Data Search by Vertical Industry

Technology

Financial services

Consulting

Healthcare

Education

1 2 3 4 5

Elasticsearch Apache Solr Cloudera Search

Figure 39 - Big data search by vertical industry

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

52
2016 Big Data Analytics Market Study

As a new relatively new technology hyped as "innovative," it is not entirely surprising to


find big data search advocacy strongest in executive management (fig. 40). Overall
functional preference was in favor of Elasticsearch (to a striking degree in IT), with the
exception of research and development, which preferred the earlier test bed of Apache
Solr. Overall sentiment ranged at or below a level of 3.0, indicating that big data search
is at best "important" or less and not critical to most audiences.

Big Data Search by Function

Information
Technology (IT)

Business intelligence
competency center

Executive
management

Research and
development (R&D)

Sales & Marketing

Finance

1
Elasticsearch 2
Apache Solr 3
Cloudera Search 4 5

Figure 40 - Big data search by function

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

53
2016 Big Data Analytics Market Study

Big data search preferences vary somewhat but not dramatically in organizations of
different size (fig. 41). The largest departure in our 2016 sample is in mid-sized firms of
101 to 1,000 employees, where interest declines noticeably from Elasticsearch to other
options.

Big Data Search by Organization Size

1 - 100

101 - 1000

1001 - 5000

More than 5000

1 2 3 4 5

Elasticsearch Apache Solr Cloudera Search

Figure 41 - Big data search capabilities by organization size

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

54
2016 Big Data Analytics Market Study

Big Data Analytics / Machine-Learning Technologies


We asked respondents to rank their interest in a variety of big data analytics and
machine-learning technologies (fig. 42). The leader, Spark MLib (here and throughout
this category), is considered, at minimum, important by more than 60 percent of
respondents and ranks well ahead of all competitors. As we will see in the following
figure, this is a stark improvement over the previous year. Still, Spark MLib is
considered "critical" to just 15 percent of respondents, reflecting an early-stage market
response to machine learning.

Big Data Analytics / Machine Learning

Spark MLib

Rhipe (R)

Mahout

Oryx

Myrrix

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Critical Very important Important Somewhat important Not important

Figure 42 - Big data analytics / machine learning

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

55
2016 Big Data Analytics Market Study

Year-over-year interest in big data analytics and machine learning increased across the
board, though it still remains confined to levels of 2.0 or "somewhat important" (fig. 43).
The most popular choice, Spark MLib, also grew the most from 2015 to 2016. The next
greatest momentum levels were in Rhipe and Mahout.

Big Data Analytics / Machine Learning


2015 to 2016
3.50

3.00

2.50

2.00

1.50

1.00

0.50

0.00
Spark MLib Rhipe (R) Mahout Oryx Myrrix

2015 2016

Figure 43 - Big data analytics / machine learning 2015 to 2016

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

56
2016 Big Data Analytics Market Study

Asia-Pacific respondents have a stronger response to different machine-learning


capabilities compared to other geographies (fig. 44). EMEA is next most engaged with
machine learning, ahead of levels in North America. Spark MLib is again the top choice
across all regions. Mean levels of interest are again mostly in the somewhat important
to "important" range.

Big Data Analytics / Machine Learning by


Geography

North America

Europe, Middle East and Africa

Asia Pacific

1 2 3 4 5

Spark MLib Rhipe (R) Mahout Oryx Myrrix

Figure 44 - Big data analytics / machine learning by geography

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

57
2016 Big Data Analytics Market Study

In our 2016 sample, interest in big data machine learning varied by vertical industry but
overall was led by preference for Spark MLib (fig. 45). Healthcare and technology
showed the greatest interest in MLib. Healthcare and consulting were most interested in
Rhipe.

Big Data Analytics / Machine Learning by Vertical


Industry

Technology

Financial services

Consulting

Healthcare

Education

1 2 3 4 5

Spark MLib Rhipe (R) Mahout Oryx Myrrix

Figure 45 - Big data analytics / machine learning by vertical industry

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

58
2016 Big Data Analytics Market Study

By function, Spark MLib is again the standout category leader across organizational
roles. BICC and executive management are again mirrors of the top areas of interest,
followed by R&D and sales and marketing (fig. 46). IT is mostly unengaged with big
data analytics and machine learning, even more so than sales and marketing or finance.

Big Data Analytics / Machine Learning by


Function
Information
Technology (IT)

Business intelligence
competency center

Executive
management

Research and
development (R&D)

Sales & Marketing

Finance

1 2 3 4 5

Spark MLib Rhipe (R) Mahout Oryx Myrrix

Figure 46 - Big data analytics / machine learning by function

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

59
2016 Big Data Analytics Market Study

Organizations of all sizes prefer Spark MLib over all other big data analytics / machine-
learning options (fig. 47). This effect is not correlated to size. In our 2016 sample,
sentiment for MLib is strongest in organizations with 1,001 to 5,000 employees. We see
that preference for Spark MLib is higher at large organizations, while small peers have a
proportionately greater interest in R-based Rhipe.

Big Data Analytics / Machine Learning by


Organization Size

1 - 100

101 - 1000

1001 - 5000

More than 5000

1 2 3 4 5

Spark MLib Rhipe (R) Mahout Oryx Myrrix

Figure 47 - Big data analytics / machine learning by organization size

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

60
2016 Big Data Analytics Market Study

Big Data Distributions


We asked respondents to rank the most important big data distributions by order of
importance (fig. 48). In 2016, Cloudera led in measures of "critical" and was the
strongest overall performer, followed by Hortonworks, Amazon, and MAP/R. Cloudera,
Hortonworks and MAP/R were all seen as, at minimum, "important" to 63 percent to 68
percent of respondents.

Big Data Distributions

Cloudera

Hortonworks

Amazon

MAP/R

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Critical Very important Important Somewhat important Not important

Figure 48 - Big data distributions

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

61
2016 Big Data Analytics Market Study

Interest in all big data distributions increased year over year in 2016 (fig. 49). Amazon
fell slightly from a tie for top place to third, behind Cloudera and Hortonworks. Interest
levels for the top three choices were at or near 3.0, indicating average responses near
"important" to respondents.

Big Data Distributions 2015 to 2016


3.50

3.00

2.50

2.00

1.50

1.00

0.50

0.00
Cloudera Hortonworks Amazon MAP/R

2015 2016

Figure 49 - Big data distributions 2015 to 2016

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

62
2016 Big Data Analytics Market Study

In 2016, there were differences of interest by geography in the four big data distributions
we sampled (fig. 50). Asia Pacific is again the leader across the board on all distribution
interest. Perhaps most noticeably, EMEA reported the greatest standout interest in
Cloudera compared to other distributions.

Big Data Distributions by Geography

North America

Europe, Middle East and Africa

Asia Pacific

1 2 3 4 5

Cloudera Hortonworks Amazon MAP/R

Figure 50 - Big data distributions by geography

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

63
2016 Big Data Analytics Market Study

By vertical industry, healthcare, consulting, and financial services expressed the


greatest interest in Cloudera (fig. 51). Technology respondents (more heavily weighted
in our study) preferred Amazon. Map/R performed strongest in consulting, healthcare,
and education. Hortonworks performed best in consulting, healthcare, and financial
services.

Big Data Distributions by Vertical Industry

Technology

Financial services

Consulting

Healthcare

Education

1 2 3 4 5

Cloudera Hortonworks Amazon MAP/R

Figure 51 - Big data distributions by vertical industry

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

64
2016 Big Data Analytics Market Study

Unlike other measures, Cloudera is not a big data distribution category leader by
function due to sample weighting (fig. 52). In our 2016 sample, Hortonworks was a
standout leader among sales and marketing respondents. Amazon performed strongest
among distributions for executive management respondents. BICC respondents
preferred Hortonworks by a lesser margin, and IT interest was led by Cloudera.

Big Data Distributions by Function


Information
Technology (IT)

Business intelligence
competency center

Executive
management

Research and
development (R&D)

Sales & Marketing

Finance

1 2 3 4 5

Cloudera Hortonworks Amazon MAP/R

Figure 52 - Big data distributions by function

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

65
2016 Big Data Analytics Market Study

Small to very large organizations have varying preferences in big data distributions,
though not to an extreme extent (fig. 53). As we might expect, cloud-based Amazon and
AWS distributions appeal most strongly to small organizations for simple and
inexpensive startup projects that have also demonstrated abilities to scale. Mid-sized
(101-1,000) organizations also most prefer Amazon, though we do see a trend among
larger organizations to bring big data distribution management in house. Cloudera and
Hortonworks are the top picks among large and very large organizations.

Big Data Distributions by Organization Size

1 - 100

101 - 1000

1001 - 5000

More than 5000

1 2 3 4 5

Cloudera Hortonworks Amazon MAP/R

Figure 53 - Big data distributions by organization size

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

66
2016 Big Data Analytics Market Study

Industry and
Vendor
Analysis

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

67
2016 Big Data Analytics Market Study

Industry and Vendor Analysis


In 2016 as in 2015, we reached out to the vendor community with questions about their
capabilities and plans for technologies in big data analytics, including its perceived
importance to their strategies. Compared to 2015, industry sentiment appears to be
leveling off (fig. 54). Overall, vendors are still highly positive on big data but are trading
over the top enthusiasm for something less than a complete revolution in data
management. We view it as a positive that the proclaimed criticality of a still emergent
set of technologies has been replaced by an optimistic upside of one that is "very
important" at the same time user adoption (or awareness of same) has grown notably
year over year (fig. 7, p. 20).

Industry Importance of Big Data 2015 to 2016


70%

60%

50%

40%

30%

20%

10%

0%
Critically important Very important Somewhat important Not important

2015 2016

Figure 54 Industry importance of big data 2015 to 2016

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

68
2016 Big Data Analytics Market Study

Among big data infrastructure options in the Hadoop ecosystem, Map/Reduce still has
the highest level of vendor support, which is not surprising given its longevity and
relative maturity (fig. 55). Support for Spark is closing in quickly with the highest
predicted industry support plans for the next 12 months, after which Spark support will
be ubiquitous. After Spark, industry support drops quickly below 50 percent. Future "no
plans" for support range from 30 percent to more than 60 percent.

Industry Support for Big Data Infrastructure


100%

90%

80%

70%

60%

50%
No plans
40% 24 months
30% 18 months

20% 12 months
Today
10%

0%

Figure 55 - Industry support for big data infrastructure

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

69
2016 Big Data Analytics Market Study

Year over year, industry plans for supporting Map/Reduce, Spark, Yarn, and Tez have
all gathered momentum (fig. 56). Despite some growth in user sentiment (fig. 25, p. 38),
industry support for Oozie declined. We continue to expect that proprietary vendor
support of open source big data projects will be opportunistic and customer driven.

Industry Support for Big Data Infrastructure


2015 to 2016
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%

2015 2016

Figure 56 - Industry support for big data infrastructure 2015 to 2016

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

70
2016 Big Data Analytics Market Study

Existing industry support for access to big data sources is greatest for Hive/Hive QL (87
percent), followed by HDFS (85 percent) (fig. 57). These top choices are in line with top
user preferences for data access, but Spark support is a good bit lower than user
expectations (fig. 30, p. 43). Industry support for Redshift is next highest, somewhat
ahead of user priorities. Google BigQuery currently has much lower industry support but
is the third most cited choice of users.

Industry Support for Access to Big Data Sources


100%

90%

80%

70%

60%
No plans
50%
24 months
40% 18 months

30% 12 months
Today
20%

10%

0%

Figure 57 Industry support for access to big data sources

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

71
2016 Big Data Analytics Market Study

Year-over-year industry support for data access increased for all big data sources
polled with the exception of Redshift (fig. 58). Though Redshift was a lower priority
among users than industry vendors, it gained additional user interest in 2016 (fig. 31, p.
44). The biggest gainer of industry support in 2016, Redshift, gained even more interest
among user respondents year over year.

Industry Support for Access to Big Data Sources


2015 to 2016
100%

90%

80%
70%

60%

50%

40%

30%

20%

10%
0%

2015 2016

Figure 58 - Industry support for access to big data sources 2015 to 2016

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

72
2016 Big Data Analytics Market Study

Industry support for big data search did gain momentum in 2016, though support
remains distinctly lukewarm (fig. 59). While 30 percent of vendors indicate support for
Apache Solr, under 20 percent currently support Elasticsearch or Cloudera Search and
more than 40 percent have no plans for future support. The tepid investment in big data
search is in line with current user sentiments, which show little urgency for search (fig.
36, p. 49).

Industry Support for Big Data Search


100%

90%

80%

70%

60% No plans
24 months
50%
18 months

40% 12 months
Today
30%

20%

10%

0%
Apache Solr Elasticsearch Cloudera Search

Figure 59 - Industry support for big data search

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

73
2016 Big Data Analytics Market Study

Year-over-year industry support for big data search varied noticeably by product (fig.
60). While support for category leader Apache Solr grew from 25 percent to 31 percent,
Cloudera Search support fell from 26 percent to 14 percent. Support for Elasticsearch
was flat year over year. We cannot be certain whether swings in industry support are
related to existing penetration or other market factors. We saw user interest in all three
big data search products grow in interest year over year, but not with urgency (fig. 37, p.
50).

Industry Support for Big Data Search


2015 to 2016
35%

30%

25%

20%

15%

10%

5%

0%
Cloudera Search Apache Solr Elasticsearch

2015 2016

Figure 60 - Industry support for big data search 2015 to 2016

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

74
2016 Big Data Analytics Market Study

Industry support for big data analytics / machine learning is strongest for Spark MLib
followed by Mahout, though we concede these investments are not urgent and reflect
the esoteric uses of machine learning in the current market (fig. 61). While industry
support for MLib is expected to reach a total of 56 percent in the next 12 months, future
support for all other machine learning methods is tepid and may never reach 50
percent. (Spark MLib, Rhipe, and Mahout were top user machine-learning choices but
also showed low levels of enthusiasm (fig. 42, p. 55).

Industry Support for Big Data Analytics /


Machine Learning
100%

90%

80%

70%

60% No plans
24 months
50%
18 months
40% 12 months
Today
30%

20%

10%

0%
Spark MLib Mahout Rhipe (R) Oryx Myrrix

Figure 61 Industry support for big data analytics / machine learning

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

75
2016 Big Data Analytics Market Study

Year-over-year industry support for big data analytics / machine learning was higher for
Spark MLib, slightly lower for Mahout, and significantly lower for other products,
particularly Rhipe (fig. 62). Again, support investments remain low and, as with current
vendor support shown in fig. 61 above, where investment or interest is developing, it
tends to go to Spark MLib.

Industry Support for Big Data Analytics /


Machine Learning 2015 to 2016
35%

30%

25%

20%

15%

10%

5%

0%
Spark MLib Mahout Rhipe (R) Oryx Myrrix

2015 2016

Figure 62 - Industry support for big data analytics / machine learning 2015 to 2016

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

76
2016 Big Data Analytics Market Study

Compared to support for big data search, we see significant existing industry support
and future plans for big data (Hadoop) distributions (fig. 63). Current support is
strongest for Hortonworks, followed by Cloudera and Map/R. Current support for
Amazon is under 60 percent, but industry respondents expect to see about 90 percent
support for all products within 24 months. These investments support stronger user
sentiments for big data distributions (fig. 48, p. 61) than for search or machine learning.

Industry Support for Big Data (Hadoop)


Distributions
100%

90%

80%

70%

60% No plans
24 months
50%
18 months
40% 12 months
Today
30%

20%

10%

0%
Hortonworks Cloudera MAP/R Amazon

Figure 63 - Industry support for big data (Hadoop) distributions

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

77
2016 Big Data Analytics Market Study

Industry support/investments in Hortonworks and MAP/R big data distributions grew


year over year in 2016, while support for Cloudera and Amazon declined slightly (fig.
49, p. 62). Industry support for Hortonworks is currently greater than 80 percent; in
contrast, Amazon support is below 60 percent.

Industry Support for Big Data (Hadoop)


Distributions 2015 to 2016
90%

80%

70%

60%

50%

40%

30%

20%

10%

0%
Hortonworks Cloudera MAP/R Amazon
2015 2016

Figure 64 - Industry support for big data (Hadoop) distributions 2015 to 2016

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

78
2016 Big Data Analytics Market Study

Big Data Analytics Vendor Ratings


In rating vendors for big data analytics, we examined levels of functionality in five
categories: infrastructure, data access, search, machine learning, and supported
distributions (fig. 65). Criteria were weighted based on user responses/priorities. Top-
rated vendors include Zoomdata (1st), RapidMiner (2nd), Pentaho (3rd), Datameer (4th),
Domo (4th) and Information Builders (5th).

Big Data Analytics Vendor Ratings


Zoomdata
MicroStrategy 32 RapidMiner
16
Looker 8 Pentaho

Logi Analytics 2 Datameer


1
0.5
Birst 0.25 Domo

Microsoft Information Builders

Jinfonet SAP

TIBCO Tableau
Oracle
Infrastructure Data Access Search Distributions Machine Learning Total Score

Figure 65 Big data analytics vendor ratings

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

79
2016 Big Data Analytics Market Study

Glossary
Alluxio (formerly Tachyon) is a memory-centric distributed storage system enabling reliable
data sharing at memory-speed across cluster frameworks.
Source: alluxio.org

Atlas is designed to exchange metadata with other tools and processes within and outside of the
Hadoop stack, thereby enabling platform-agnostic governance controls that effectively address
compliance requirements
Source: Apache Software Foundation

BigQuery is a RESTful web service that enables interactive analysis of massively large datasets
working in conjunction with Google Storage. It is an Infrastructure as a Service (IaaS) service
that may be used complementarily with MapReduce.

Elasticsearch is a search server based on Lucene. It provides a distributed, multitenant-capable


full-text search engine with an HTTP web interface and schema-free JSON documents.
Elasticsearch is developed in Java and is released as open source under the terms of the Apache
License. Elasticsearch is the second most popular enterprise search engine after Apache Solr.*

HAWQ is a parallel SQL query engine that combines the key technological advantages of the
industry-leading Pivotal Analytic Database with the scalability and convenience of Hadoop.
HAWQ reads data from and writes data to HDFS natively. HAWQ delivers industry-leading
performance and linear scalability. It provides users the tools to confidently and successfully
interact with petabyte range data sets. HAWQ provides users with a complete, standards-
compliant SQL interface.
Source: Pivotal

HBase is an open source, non-relational, distributed database modeled after Google's BigTable
and is written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop
project and runs on top of HDFS (Hadoop Distributed File System), providing BigTable-like
capabilities for Hadoop.

The Hadoop distributed file system (HDFS) is a distributed, scalable, and portable file system
written in Java for the Hadoop framework.

The Apache Hive data warehouse software facilitates querying and managing large datasets
residing in distributed storage. Hive provides a mechanism to project structure onto this data and
query the data using a SQL-like language called HiveQL. At the same time this language also
allows traditional map/reduce programmers to plug in their custom mappers and reducers when it
is inconvenient or inefficient to express this logic in HiveQL.

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

80
2016 Big Data Analytics Market Study

Source: Apache Software Foundation

The Apache Knox Gateway is a REST API Gateway for interacting with Apache Hadoop
clusters. The Knox Gateway provides a single access point for all REST interactions with
Apache Hadoop clusters.
Source: Apache Software Foundation

Impala is an open source, native analytic database for Apache Hadoop. Impala is shipped by
Cloudera, MapR, Oracle, and Amazon.
Source: Cloudera

Mahout is a project of the Apache Software Foundation to produce free implementations of


distributed or otherwise scalable machine learning algorithms focused primarily in the areas of
collaborative filtering, clustering and classification. Many of the implementations use the
Apache Hadoop platform. Mahout also provides Java libraries for common math operations
(focused on linear algebra and statistics) and primitive Java collections.
Source: Apache Software Foundation

MapReduce is a programming model and an associated implementation for processing and


generating large data sets with a parallel, distributed algorithm on a cluster. Conceptually similar
approaches have been very well known since 1995 with the Message Passing Interface standard
having reduce and scatter operations.

Apache Mesos is an opensource cluster manager that was developed at the University of
California, Berkeley. It "provides efficient resource isolation and sharing across distributed
applications, or frameworks". The software enables resource sharing in a fine-grained manner,
improving cluster utilization.

MLlib is Sparks scalable machine-learning library consisting of common learning algorithms


and utilities, including classification, regression, clustering, collaborative filtering,
dimensionality reduction, as well as underlying optimization primitives.
Source: Apache Software Foundation

MongoDB is a cross-platform document-oriented database. Classified as a NoSQL database,


MongoDB eschews the traditional table-based relational database structure in favor of JSON-like
documents with dynamic schemas (MongoDB calls the format BSON), making the integration of
data in certain types of applications easier and faster. Released under a combination of the GNU
Affero General Public License and the Apache License, MongoDB is free and open source
software.

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

81
2016 Big Data Analytics Market Study

Myrrix, offers a complete, real-time, scalable clustering and recommender system. The
solution is built on top of the Apache Mahout machine-learning project.
Source: Cloudera

Oozie is a workflow scheduler system to manage Hadoop jobs. It is a server-based Workflow


Engine specialized in running workflow jobs with actions that run Hadoop MapReduce and Pig
jobs. Oozie is implemented as a Java Web application that runs in a Java servlet container.

Oryx is built on Apache Spark and Apache Kafka, with specialization for real-time large scale
machine learning. It is a framework for building applications but also includes packaged, end-to-
end applications for collaborative filtering, classification, regression, and clustering.
Source: Cloudera

RHIPE integrates the R statistical environment with the Hadoop framework. RHIPE allows R
users to compute on terabyte-sized data sets a cluster using the MapReduce framework, thus
offering the best of both worlds to users seeking to leverage the strength of R and Hadoop.
People with very large data sets stored in the Hadoop Distributed File System can now easily
process the data on hundreds or even thousands of nodes in parallel, using only the R language.
Source: Revolution Analytics

Cloudera Search is one of Cloudera's near-real-time access products. Cloudera Search enables
non-technical users to search and explore data stored in or ingested into Hadoop and HBase.
Users do not need SQL or programming skills to use Cloudera Search because it provides a
simple, full-text interface for searching.
Source: Cloudera

Solr is an open source enterprise search platform, written in Java, from the Apache Lucene
project. Its major features include full-text search, hit highlighting, faceted search, real-time
indexing, dynamic clustering, database integration, NoSQL features and rich document (e.g.,
Word, PDF) handling. Providing distributed search and index replication, Solr is designed for
scalability and fault tolerance. Solr is the most popular enterprise search engine.

Apache Spark is an open source cluster computing framework originally developed in the
AMPLab at University of California, Berkeley but was later donated to the Apache Software
Foundation where it remains today. In contrast to Hadoop's two-stage disk-based MapReduce
paradigm, Spark's multi-stage in-memory primitives provides performance up to 100 times faster
for certain applications. By allowing user programs to load data into a cluster's memory and
query it repeatedly, Spark is well suited to machine-learning algorithms.

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

82
2016 Big Data Analytics Market Study

Spark SQL is a component on top of Spark Core that introduces a new data abstraction called
DataFrames, which provides support for structured and semi-structured data. Spark SQL
provides a domain-specific language to manipulate DataFrames in Scala, Java, or Python. It also
provides SQL language support with command-line interfaces and ODBC/JDBC server.

Apache Tez is an extensible framework for building high-performance batch and interactive
data-processing applications, coordinated by YARN in Apache Hadoop. Tez improves the
MapReduce paradigm by dramatically improving its speed while maintaining MapReduces
ability to scale to petabytes of data. Important Hadoop ecosystem projects like Apache Hive and
Apache Pig use Apache Tez, as do a growing number of third-party data-access applications
developed for the broader Hadoop ecosystem.
Source: Apache Software Foundation

YARN is one of the key features in the second-generation Hadoop 2 version of the Apache
Software Foundation's open source distributed processing framework. Originally described by
Apache as a redesigned resource manager, YARN is now characterized as a large-scale,
distributed operating system for big data applications.
* All sources Wikipedia unless otherwise noted

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

83
2016 Big Data Analytics Market Study

Other Dresner Advisory Services Research Reports

- Wisdom of Crowds Flagship Business Intelligence Market study


- Advanced and Predictive Analytics
- Business Intelligence Competency Center
- Cloud Computing and Business Intelligence
- Collective InsightsTM
- End User Data Preparation
- Enterprise Planning
- Internet of Things and Business Intelligence
- Location Intelligence
- Small and Mid-Sized Enterprise Business Intelligence
- Systems Integrators

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

84
2016 Big Data Analytics Market Study

Appendix: Big Data Analytics Study Survey Instrument

Please provide your contact information below:

Name*: _________________________________________________

Company Name: _________________________________________________

Address 1: _________________________________________________

Address 2: _________________________________________________

City: _________________________________________________

State: _________________________________________________

Zip: _________________________________________________

Country: _________________________________________________

Email Address*: _________________________________________________

Phone Number: _________________________________________________

Major Geography

( ) Asia/Pacific

( ) Europe, Middle East and Africa

( ) Latin America

( ) North America

What is your current title?

_________________________________________________

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

85
2016 Big Data Analytics Market Study

What function are you a part of?

( ) Business intelligence competency center

( ) Executive management

( ) Finance

( ) Information Technology (IT)

( ) Manufacturing

( ) Marketing

( ) Project/program management office

( ) Sales

( ) Research and development (R&D)

( ) Other - Write In: _________________________________________________

Please select an industry

( ) Advertising

( ) Aerospace

( ) Agriculture

( ) Apparel and accessories

( ) Automotive

( ) Aviation

( ) Biotechnology

( ) Broadcasting

( ) Business services

( ) Chemical

( ) Construction

( ) Consulting

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

86
2016 Big Data Analytics Market Study

( ) Consumer products

( ) Defense

( ) Distribution & logistics

( ) Education

( ) Energy

( ) Entertainment and leisure

( ) Executive search

( ) Federal government

( ) Financial services

( ) Food, beverage and tobacco

( ) Healthcare

( ) Hospitality

( ) Gaming

( ) Insurance

( ) Legal

( ) Manufacturing

( ) Mining

( ) Motion picture and video

( ) Not for profit

( ) Pharmaceuticals

( ) Publishing

( ) Real estate

( ) Retail and wholesale

( ) Sports

( ) State and local government

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

87
2016 Big Data Analytics Market Study

( ) Technology

( ) Telecommunications

( ) Transportation

( ) Utilities

( ) Other - Write In: _________________________________________________

How many employees does your company employ worldwide?

( ) 1 - 100

( ) 101 - 1000

( ) 1001 - 5000

( ) More than 5000

Do you use or intend to use big data technology/architecture within your organization?*

( ) Yes. We use big data today

( ) No. We have no plans to use big data at all

( ) We may use big data in the future

What product(s) does your organization use with big data for BI/analytics?

____________________________________________

How satisfied are you with your vendor and product for big data analytics?

( ) Extremely satisfied

( ) Somewhat satisfied

( ) Somewhat unsatisfied

( ) Unsatisfied

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

88
2016 Big Data Analytics Market Study

What are your plans for Big Data (Hadoop) Analytics in the Future?

( ) Will adopt in 2016

( ) Will adopt in 2017

( ) Will adopt beyond 2017

What use cases are most important for Big Data (Hadoop) in your organization?

Very Somewhat Not


Critical Important
important important important

Data warehouse () () () () ()
optimization

Customer/social () () () () ()
analysis

Internet of () () () () ()
things

Fraud detection () () () () ()

Clickstream () () () () ()
analytics

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

89
2016 Big Data Analytics Market Study

Please indicate the importance of the following Big Data infrastructure components

Very Somewhat Not


Critical Important
important important important

Alluxio () () () () ()
(formerly
Tachyon)

Mesos () () () () ()

Spark () () () () ()

Map/Reduce () () () () ()

Oozie () () () () ()

Yarn () () () () ()

Tez () () () () ()

Atlas () () () () ()

Knox () () () () ()
Gateway

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

90
2016 Big Data Analytics Market Study

Please indicate the importance of the following Big Data - data access capabilities

Very Somewhat Not


Critical Important
important important important

Google () () () () ()
BigQuery

HBase () () () () ()

HDFS () () () () ()

Hive/HiveQL () () () () ()

Impala () () () () ()

MongoDB () () () () ()

Pivotal () () () () ()
HAWQ

Redshift () () () () ()

Spark SQL () () () () ()

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

91
2016 Big Data Analytics Market Study

Please indicate the importance of the following Big Data search capabilities

Very Somewhat Not


Critical Important
important important important

Cloudera () () () () ()
Search

Apache Solr () () () () ()

Elasticsearch () () () () ()

Please indicate the importance of the following Big Data analytical/machine learning components

Very Somewhat Not


Critical Important
important important important

Mahout () () () () ()

Rhipe () () () () ()
(R)

Oryx () () () () ()

Myrrix () () () () ()

Spark () () () () ()
MLib

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

92
2016 Big Data Analytics Market Study

Please indicate the importance of the following Big Data (Hadoop) distributions

Very Somewhat Not


Critical Important
important important important

Cloudera () () () () ()

Hortonworks () () () () ()

MAP/R () () () () ()

Amazon () () () () ()

http://www.dresneradvisory.com Copyright 2016 Dresner Advisory Services, LLC

93

Você também pode gostar