Escolar Documentos
Profissional Documentos
Cultura Documentos
2016 Edition
Licensed to Pentaho
2016 Big Data Analytics Market Study
Disclaimer:
This report should be used for informational purposes only. Vendor and product selections should be made based on
multiple information sources, face-to-face meetings, customer reference checking, product demonstrations and
proof-of-concept applications.
The information contained in all Wisdom of Crowds Market Study Reports reflects the opinions expressed in the
online responses of individuals who chose to respond to our online questionnaire and does not represent a scientific
sampling of any kind. Dresner Advisory Services, LLC shall not be liable for the content of reports, study results, or for
any damages incurred or alleged to be incurred by any of the companies included in the reports as a result of its
content.
Reproduction and distribution of this publication in any form without prior written permission is forbidden.
2
2016 Big Data Analytics Market Study
Definition
3
2016 Big Data Analytics Market Study
Introduction
This year we celebrate the ninth anniversary of Dresner Advisory Services! We offer our
thanks to all of you for your continued support and ongoing encouragement.
Since our founding in 2007, we have worked hard to set the bar highchallenging
ourselves to innovate and lead the marketoffering ever greater value with each
successive year.
Our first market report in 2010 set the stage for where we are today. Since that time, we
have expanded our agenda and have added new research topics every year since. For
2016, we are on track to release 15 major reports, including our recent flagship BI
reportin its seventh year of publication!
In addition to our ongoing coverage of key topics such as embedded BI, big data
analytics and advanced and predictive analytics, we have added new topics including
Collective InsightsTM (blending collaboration and governance) and systems integrators.
For this, our second Big Data Analytics Market Study, we continue to focus upon the
combination of analytical solutions within the Hadoop ecosystem, adding some new
criteria and exploring changing market dynamics and user perceptions and plans.
Best,
Howard Dresner
Chief Research Officer
Dresner Advisory Services
4
2016 Big Data Analytics Market Study
Contents
Definition ......................................................................................................................... 3
Big Data Analytics Defined........................................................................................... 3
Introduction ..................................................................................................................... 4
Benefits of the Study ....................................................................................................... 7
A Consumer Guide ...................................................................................................... 7
A Supplier Tool ............................................................................................................ 7
About Howard Dresner and Dresner Advisory Services .................................................. 8
About Jim Ericson ........................................................................................................... 9
Survey Method and Data Collection .............................................................................. 10
Data Quality ............................................................................................................... 10
Executive Summary ...................................................................................................... 12
Study Demographics ..................................................................................................... 13
Geography ................................................................................................................. 13
Functions ................................................................................................................... 14
Vertical Industries ...................................................................................................... 15
Organization Size....................................................................................................... 16
Analysis and Trends: Big Data Analytics ....................................................................... 18
Importance of Big Data .............................................................................................. 18
Big Data Adoption ...................................................................................................... 19
Future Adoption of Big Data ....................................................................................... 25
Big Data Use Cases................................................................................................... 31
Big Data Infrastructure ............................................................................................... 37
Big Data Data Access ............................................................................................. 43
Big Data Search ......................................................................................................... 49
Big Data Analytics / Machine-Learning Technologies ................................................ 55
Big Data Distributions ................................................................................................ 61
Industry and Vendor Analysis ........................................................................................ 68
Big Data Analytics Vendor Ratings ............................................................................ 79
Glossary ........................................................................................................................ 80
5
2016 Big Data Analytics Market Study
6
2016 Big Data Analytics Market Study
A Consumer Guide
As an objective source of industry research, consumers use the DAS Big Data Analytics
Market Study to understand how their peers are leveraging and investing in big data
analytics and related technologies.
Using our unique vendor performance measurement system, users glean key insights
into software supplier performance, enabling:
A Supplier Tool
Vendor licensees use the DAS Big Data Analytics Market Study in several important
ways:
External Awareness
Build awareness for the big data analytics market and supplier brand, citing
DAS Big Data Analytics Market Study trends and vendor performance
Create lead and demand generation for supplier offerings through association
with DAS Big Data Analytics Market Study brand, findings, webinars, etc.
Internal Planning
Refine internal product plans and align with market priorities and realities as
identified in DAS Big Data Analytics Market Study
Better understand customer priorities, concerns, and issues
Identify competitive pressures and opportunities
7
2016 Big Data Analytics Market Study
Howard Dresner is one of the foremost thought leaders in business intelligence and
performance management, having coined the term Business Intelligence in 1989. He
has published two books on the subject, The Performance
Management Revolution Business Results through Insight
and Action (John Wiley & Sons, Nov. 2007) and Profiles in
Performance Business Intelligence Journeys and the
Roadmap for Change (John Wiley & Sons, Nov. 2009). He
lectures at forums around the world and is often cited by the
business and trade press.
Howard has conducted and directed numerous in-depth primary research studies over
the past two decades and is an expert in analyzing these markets.
- Collective InsightsTM
- Systems Integrators
Howard conducts a weekly Twitter tweetchat on Fridays at 1:00 p.m. ET. During these
live events the #BIWisdom tribe discusses a wide range of business intelligence
topics.
8
2016 Big Data Analytics Market Study
Jim has served as a consultant and journalist who studies end-user management
practices and industry trending in the data and information management fields.
From 2004 to 2013 he was the editorial director at Information Management magazine
(formerly DM Review), where he created architectures for user and
industry coverage for hundreds of contributors across the breadth of
the data and information management industry.
9
2016 Big Data Analytics Market Study
Data Quality
We carefully scrutinized and verified all respondent entries to ensure that only qualified
participants are included in the study.
10
2016 Big Data Analytics Market Study
Executive
Summary
11
2016 Big Data Analytics Market Study
Executive Summary
Over two years of big data analytics study, we see a significant increase in
uptake and a large drop in holdouts with no big data plans. High tech and
telecom are industry leaders (p. 20-24).
Current adoption and future plans for the use of big data analytics have reached
a level of significance we did not see last year. Forty-one percent of
organizations are already using Hadoop-related big data. Even more say they
may use big data in the future (p. 19).
Among organizations that have not yet adopted big data, 14 percent will adopt in
the current calendar year, a horizon grows that grows to 47 percent in 2017.
BICC respondents are likely future adopters (p. 25-30).
Among technologies and initiatives considered strategic to business intelligence,
big data analytics is ranked 20th out of 30 topical areas under study, still well
behind core BI practices (p. 18). Overall, vendors are still highly positive on big
data though sentiment is leveling off (p. 68).
The top big data use cases in 2016 are data warehouse optimization, followed by
customer/social analysis (p. 31-36).
The top big data infrastructure choice among users is Spark, followed by
Map/Reduce, Yarn, Oozie, Tez, Mesos, and Atlas. Over time, Spark is gaining
status as a category leader (p. 40-42). Industry support is strongest for
Map/Reduce, but Spark is closing in quickly (p. 69-70).
Spark SQL is the most-cited big data access structure followed closely by Hive
and HDFS (p. 43-48). Industry support is strongest for Hive and HDFS; Spark
support remains lower than user expectations (p. 71-72).
Amid lukewarm interest, toward big data search technologies, Elasticsearch
resonated most strongly followed by Apache Solr and Cloudera Search (p. 49-
54). The industry is strongest for Apache Solr, and support for Cloudera fell
noticeably (p. 73-74).
Spark MLib is the most-preferred big data machine learning technology,
important to more than 60 percent of respondents. All machine learning
technologies gather interest but are still at the fringe (p. 55-60). Industry support
for big data analytics / machine learning is strongest for Spark MLib followed by
Mahout (p. 75-76).
Cloudera is the most popular big data distribution among users, followed by
Hortonworks, Amazon, and MAP/R (p. 61-66). We see significant existing
industry support and future plans for big data (Hadoop) distributions (p. 77-78).
12
2016 Big Data Analytics Market Study
Study Demographics
Our 2015 Big Data Analytics Market Study is based on a cross-section of data that
spans geographies, functions, organization size, and vertical industries. We believe
that, unlike other industry research, this supports a more representative sample and
better indicator of true market dynamics. We constructed cross-tab analyses using
these demographics to identify and illustrate important industry trends.
Geography
North America, which includes the U.S., Canada, and Puerto Rico, represents 57
percent of respondents (fig. 1). EMEA accounts for the next largest group (32 percent),
followed by Asia Pacific and Latin America.
Geographies Represented
60% 57%
50%
40%
32%
30%
20%
10% 8%
3%
0%
North America Europe, Middle East Asia Pacific Latin America
and Africa
13
2016 Big Data Analytics Market Study
Functions
IT (28 percent) and the business intelligence competency center (21 percent) are the
two largest groups represented in our big data analytics sample (fig. 2).
Examining trends and behavior by function helps us compare and contrast plans and
priorities in different areas of organizations.
Functions Represented
Finance 8%
Other 12%
14
2016 Big Data Analytics Market Study
Vertical Industries
Technology (14 percent), financial services (10 percent), and consulting (9 percent) are
the most represented industries in our study, followed by healthcare, education, and
telecommunications (fig. 3). We include responses from consultantswho often have
greater interaction with initiatives and deeper industry knowledge than many customer
counterparts. This also yields insight into the partner ecosystem for BI vendors.
16%
14%
14%
12%
10%
10% 9%
9%
8%
8% 7%
6% 6%
5%
4%
4% 3%
2% 2%
2% 2% 2% 2%
0%
15
2016 Big Data Analytics Market Study
Organization Size
Respondents to our big data analytics study reflect a mix of organizational sizes and
structures (fig. 4). Small organizations of 1-100 employees represent 26 percent of the
sample. Mid-sized organizations also account for 27 percent, and the remaining 47
percent are large organizations with more than 1,000 employees.
20%
20%
15%
10%
5%
0%
1 - 100 101 - 1000 1001 - 5000 More than 5000
16
2016 Big Data Analytics Market Study
Analysis and
Trends
17
2016 Big Data Analytics Market Study
18
2016 Big Data Analytics Market Study
No. We have no
plans to use big
data at all, 14%
19
2016 Big Data Analytics Market Study
Over the two years of our comprehensive big data analytics study, we see a significant
increase in uptake and a large drop in holdouts with no plans (fig. 7). Forty-one percent
of respondents report current big data use, a greater than two-fold increase over 2015.
At the same time, the number of respondents with no plans fell by a factor of greater
than two, from 36 percent to 14 percent. The percentage of ambivalent users was
consistent year over year at 45 percent or a bit more. We can anecdotally chalk these
findings up to a emerging mix of practical/achievable projects, service enablement, and
greater understanding of big data uses.
45%
40%
35%
30%
25% 2015
2016
20%
15%
10%
5%
0%
Yes. We use big data today We may use big data in the No. We have no plans to use
future big data at all
20
2016 Big Data Analytics Market Study
In our 2016 sample, EMEA leads slightly in current adoption (43 percent) compared to
North America (40 percent) and is well ahead of Asia Pacific (33 percent) (fig. 8). Asia
Pacific also reports the most organizations with "no plans to use big data at all" (27
percent). Both EMEA and North America report 46 percent undecided ("we may use big
data...") respondents.
90%
80%
70%
20%
10%
0%
North America Europe, Middle Asia Pacific
East and Africa
21
2016 Big Data Analytics Market Study
Perennial first-mover high-tech organizations lead 2016 big data adoption with 59
percent reporting current use (fig. 9). Telecommunications, with possibly the greatest
data transaction volume issues of any industry, is the next most likely industry to
currently use big data analytics (50 percent). Financial services, another high data
transaction industry, reports 45 percent current use. Less likely to be current users,
consulting industry respondents are nonetheless prepared to embrace big data as
needed.
90%
80%
70%
60%
0%
22
2016 Big Data Analytics Market Study
In 2016, the BICC supplanted R&D as the most likely current departmental user of big
data (fig. 10). This finding supports the notion that big data is moving from an
experimental to practical pursuit in organizations. As is often the case, executive
management is a likely-to-sure proponent of evolutionary technologies such as big data.
We are uncertain as to why finance is also a strong player in big data unless interest
there is tuned organizationally at cost savings. IT predictably lags in current adoption
and is most likely to have vested interest in supporting legacy and traditional technology
investments.
23
2016 Big Data Analytics Market Study
Current adoption of big data is strongest (61 percent) within very large businesses and
institutions that have more than 5,000 employees (fig. 11). Small organizations with one
to 100 employees have the lowest rate of current adoption (29 percent). After very large
organizations, however, small and mid-size (101-1,000 employees) are most open to
possible future use. We would expect that small organizations are most likely cloud
users of big data services while large organizations will likely deploy onsite.
90%
80%
70%
20%
10%
0%
1 - 100 101 - 1000 1001 - 5000 More than
5000
24
2016 Big Data Analytics Market Study
25
2016 Big Data Analytics Market Study
Compared to our inaugural 2015 study, year-over-year future adoption plans for big
data represent a sea change of respondent behavior (fig. 13). Current year adoption
plans are more than three times greater in 2016 (14 percent) compared to last year (4
percent). Next-year adoption in our current study (47 percent) shows remarkable growth
from 2015's 27 percent plans. Significantly fewer respondents are delaying plans
beyond next year, plainly indicating they are allocating money, resources, and time to
big data solutions and their use.
70%
60%
50%
40% 2015
2016
30%
20%
10%
0%
Will adopt this year Will adopt next year Will adopt beyond next year
26
2016 Big Data Analytics Market Study
Regionally, among those who have not already adopted big data, North American and
Asia-Pacific respondents are more motivated to increase use compared to those in
EMEA (fig. 14). Asia Pacific has the greatest number of both 2016 (17 percent) and
2017 (50 percent) adopters; EMEA has the most respondents (48 percent) with plans
deferred beyond 2017.
90%
80%
70%
60%
20%
10%
0%
North America Europe, Middle Asia Pacific
East and Africa
27
2016 Big Data Analytics Market Study
Among organizations not yet using big data, vertical adoption in 2016 is highest (about
20 percent) in education, technology, and telecommunications (fig. 15). Plans for 2017
adoption are by far highest in financial services (75 percent), followed by consulting and
healthcare. (While future plans for telecommunications and technology appear relatively
low, recall that these sectors are also the greatest current users of big data technologies
(fig. 9, p. 22)).
90%
80%
70%
60%
50%
10%
0%
28
2016 Big Data Analytics Market Study
Among non-users of big data, the BICC has by far the highest (30 percent) current-year
adoption plans (fig. 16). Accelerating BICC use is generally a reflection of delivery as
well as incipient demand for business technologies, another indication that big data
analytics is "crossing the chasm" of use cases and enterprise adoption. Sales and
marketing and IT (low in current usage, fig. 10, p. 23), are the next most likely to be
current-year adopters of big data analytics, perhaps by executive fiat, (whose next year
interest is correspondingly highest).
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Business Executive Information Finance Sales & Research and
intelligence management Technology Marketing development
competency (IT) (R&D)
center
Will adopt in 2016 Will adopt in 2017 Will adopt beyond 2017
29
2016 Big Data Analytics Market Study
As with current users of big data analytics (fig. 11, p. 24), 2016 first-adoption plans are
highest at very large organizations with more than 5,000 employees (fig. 17). More than
60 percent of very large organizations will take up the use of big data in 2016, more
than twice the rate at small organizations (29 percent). That said, we continue to believe
cloud-based offerings will be a strong driver of big data going forward for organizations
of any size. Possibly in that vein, 2017 adoption plans are highest at small organizations
(58 percent), followed by mid-sized organizations (50 percent).
90%
80%
70%
60%
20%
10%
0%
1 - 100 101 - 1000 1001 - 5000 More than
5000
30
2016 Big Data Analytics Market Study
Clickstream analytics
Fraud detection
Internet of Things
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
31
2016 Big Data Analytics Market Study
Year over year, the top big data use cases, data warehouse optimization and customer /
social analysis, retain (and extend) their top rankings (fig. 19). The Internet of Things,
the third-most popular use case in 2015, lost momentum in 2016, possibly due to
settling hype and uneven prospects for average organizations. Clickstream analytics
and fraud detection gained the most influence year over year.
3.5
2.5
2 2015
2016
1.5
0.5
0
Data warehouse Customer/ social Clickstream Fraud detection Internet of Things
optimization analysis analytics
32
2016 Big Data Analytics Market Study
By region, Asia Pacific and North America are the most likely to prioritize data
warehouse optimization (fig. 20). (All use cases, particularly fraud detection and
clickstream analytics, are, in fact, more highly prioritized in Asia Pacific than in other
regions.) Compared to North America, EMEA nonetheless has more interest in
customer / social analysis and the Internet of Things.
North America
Data warehouse
optimization
Customer/ social analysis
Europe, Middle
Clickstream analytics
East and Africa
Fraud detection
Internet of Things
Asia Pacific
1 2 3 4 5
33
2016 Big Data Analytics Market Study
When parsed by vertical industry, all industries rank data warehousing as a top or
second priority. Our 2016 sample shows somewhat surprising standout interest in data
warehouse optimization among healthcare respondents (fig. 21). Elsewhere, financial
services predictably reports the highest interest in fraud detection (and clickstream
analysis). Consulting leads technology in interest in customer / social analysis. The
Internet of Things interest is highest in education.
Technology
Data warehouse
Financial services optimization
Customer/ social analysis
Fraud detection
Healthcare
Internet of Things
Education
1 2 3 4 5
34
2016 Big Data Analytics Market Study
All functions in our 2016 sample rank data warehouse optimization as their highest big
data use case priority (fig. 22). IT has the most standout interest in data warehouse
optimization, which is not surprising given traditional ownership boundaries. BICC and
executive management report the highest interest in customer /social analysis, perhaps
with an opportunistic viewpoint. BICC and sales/marketing are most interested in
clickstream analytics. Finance respondents show below-average interest in all big data
use cases.
Information
Technology (IT)
Business intelligence
competency center Data warehouse
optimization
Executive Customer/ social analysis
management
Clickstream analytics
Research and
development (R&D) Fraud detection
Finance
1 2 3 4 5
35
2016 Big Data Analytics Market Study
Very large organizations (>5,000) expectedly have the greatest proportional interest in
data warehouse optimization (fig. 23). Generally, we would expect large organizations
to be more conventional in their approach to big data use cases with an eye toward cost
efficiency, while smaller peers are more balanced across opportunities. It is interesting
however that IoT has not caught fire in organizations of any size and that very large
organizations are the least attuned to customer / social analysis.
1 - 100
Data warehouse
optimization
101 - 1000 Customer/ social analysis
Clickstream analytics
Internet of Things
1 2 3 4 5
36
2016 Big Data Analytics Market Study
Map/Reduce
Yarn
Oozie
Tez
Mesos
Atlas
Knox Gateway
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
37
2016 Big Data Analytics Market Study
Across two years of study, Spark has surpassed Map/Reduce as the preferred big data
infrastructure (fig. 25). Preferences for Spark and associated applications/frameworks
extend across all measures in this report even though Map/Reduce is well penetrated in
early-stage use. All infrastructure choices gained favor in 2016 over 2015; the biggest
gainer besides Spark and Map/Reduce was Yarn. (2016 is the first year we polled
respondents on interest in Atlas and Knox Gateway.)
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
2015 2016
38
2016 Big Data Analytics Market Study
By region, Asia-Pacific respondents indicated the highest interest in all big data
infrastructures polled in 2016 and prioritize Yarn over Map/Reduce (fig. 22), perhaps
indicating late-arriving interest and newer editions of Hadoop. Among regional
preferences, EMEA had the second-highest interest in Spark and Map/Reduce, ahead
of North America. Interest in Yarn is equal in North America and EMEA. EMEA has
slightly higher interest in Oozie and somewhat less interest in Tez and Mesos compared
to North America.
North America
Asia Pacific
1 2 3 4 5
39
2016 Big Data Analytics Market Study
Big data infrastructure preferences vary by vertical industry (fig. 27). While technology
industry respondents are most singularly interested in Spark, other verticals share
similar affinity for Map/Reduceand consulting actually grades Map/Reduce higher
than Spark. This latter finding may find consulting serving existing demand and
investments in Map/Reduce. Technology, healthcare, and consulting have the most
interest in Yarn; healthcare and consulting are also the most likely to engage with
Oozie.
Technology
Financial services
Consulting
Healthcare
Education
1 2 3 4 5
40
2016 Big Data Analytics Market Study
Big data infrastructure preferences vary interestingly by function (fig. 28). The BICC
(often contained within IT) is the strongest proponent of Spark especially, followed by
Map/Reduce. As we have seen elsewhere, executive interest often follows (or leads) in
the lines of BICC activity. By comparison, R&D interest is weak and falls sharply after
Spark and Map/Reduce. Central IT is predictably a laggard in embracing big data
compared to other roles but shows some preference for the various options. Perhaps
most interesting is sales and marketing, where Ozzie and Tez claim the highest marks
of any department.
Business intelligence
competency center
Executive
management
Research and
development (R&D)
Finance
1 2 3 4 5
41
2016 Big Data Analytics Market Study
1 - 100
101 - 1000
1001 - 5000
1 2 3 4 5
42
2016 Big Data Analytics Market Study
Hive/HiveQL
HDFS
HBase
Google BigQuery
Redshift
MongoDB
Impala
Pivotal HAWQ
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
43
2016 Big Data Analytics Market Study
Among big data access technologies studied both last year and this year, all gained
positive sentiment year over year, especially Spark SQL, Hive/Hive QL, and Impala (fig.
31). Trailing technologies, with the exception of Pivotal HAWQ, all reached positive
sentiment of 2.7 to 2.9, in the range of "important."
3.5
2.5
1.5
0.5
2015 2016
44
2016 Big Data Analytics Market Study
Big data access preferences vary by region (fig. 32). Asia Pacific had the strongest
response to several technologies, specifically Hbase, Hive, HDFS, and Spark. Globally,
Hbase was less appealing in regions other than Asia Pacific. Cloud-based solutions
(Redshift, Google BigQuery) fared worse but were slightly more appealing in North
America than other regions.
North America
Asia Pacific
1 2 3 4 5
45
2016 Big Data Analytics Market Study
By vertical industry, financial services, technology, and consulting are the most aligned
around Spark SQL for data access (fig. 33). HiveQL resonated most strongly in
healthcare, followed by consulting and technology. Healthcare was also the strongest
proponent of HDFS, followed by financial services and technology. Consulting
respondents report an outsized interest in Redshift. Google BigQuery fared best in
consulting and financial services.
Technology
Financial services
Consulting
Healthcare
Education
1 2 3 4 5
46
2016 Big Data Analytics Market Study
Departmental interest in data access varies by function (fig. 34). The BICC and
executive management are the strongest proponents of Spark SQL. More traditional
Hbase, HDFS, and Hive are the most favored in sales and marketing, while the BICC is
most focused on HDFS and Hive along with Spark. Cloud-based offerings (Redshift,
Google BigQuery) are initially most interesting to executive management.
1
Redshift Google HBase HDFS Hive/HiveQL Spark SQL
BigQuery
Information Business intelligence Executive
Technology (IT) competency center management
Research and Sales & Marketing
development (R&D)
47
2016 Big Data Analytics Market Study
Small organizations (most likely to be early adopters) are proportionately most drawn to
Spark as a newer opportunity for big data access (fig. 35). Redshift (and Google
BigQuery in mid-sized organizations) are also popular as an easy and inexpensive entry
point to big data access for smaller organizations. Very large organizations are more
likely invested in big data access via HDFS and Hive followed by Spark SQL.
1 - 100
101 - 1000
1001 - 5000
1 2 3 4 5
48
2016 Big Data Analytics Market Study
Elasticsearch
Apache Solr
Cloudera Search
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
49
2016 Big Data Analytics Market Study
Across two years of study data, we saw a small reversal of fortunes among big data
search options (fig. 37). While Elasticsearch moved past early open source provider
Apache Solr into first place, Cloudera fell slightly from the top choice to third. While we
consider rising year-over-year sentiment a positive development, we reiterate that there
is currently no clear first choice emerging in big data search.
3.0
2.5
2.0
1.5
1.0
0.5
0.0
Elasticsearch Apache Solr Cloudera Search
2015 2016
50
2016 Big Data Analytics Market Study
As in other measures, we found sentiment toward big data search options strongest
"across the board" in Asia Pacific (fig. 38). Also as mentioned, year-over-year sentiment
toward big data search increased across all regions, though with middling and not
remarkable levels of interest..
North America
Asia Pacific
1 2 3 4 5
51
2016 Big Data Analytics Market Study
We saw some divergence from overall results in big data search preference by industry
(fig. 39). Due to sector-size bias, we found respondents in three verticals (financial
services, healthcare, and consulting) preferred Cloudera Search to both top choice
Elasticsearch and Apache Solr. In contrast, technology, with a larger pool of
respondents, preferred Elasticsearch. In all instances, Apache Solr was the second
choice and was most preferred in healthcare and financial services.
Technology
Financial services
Consulting
Healthcare
Education
1 2 3 4 5
52
2016 Big Data Analytics Market Study
Information
Technology (IT)
Business intelligence
competency center
Executive
management
Research and
development (R&D)
Finance
1
Elasticsearch 2
Apache Solr 3
Cloudera Search 4 5
53
2016 Big Data Analytics Market Study
Big data search preferences vary somewhat but not dramatically in organizations of
different size (fig. 41). The largest departure in our 2016 sample is in mid-sized firms of
101 to 1,000 employees, where interest declines noticeably from Elasticsearch to other
options.
1 - 100
101 - 1000
1001 - 5000
1 2 3 4 5
54
2016 Big Data Analytics Market Study
Spark MLib
Rhipe (R)
Mahout
Oryx
Myrrix
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
55
2016 Big Data Analytics Market Study
Year-over-year interest in big data analytics and machine learning increased across the
board, though it still remains confined to levels of 2.0 or "somewhat important" (fig. 43).
The most popular choice, Spark MLib, also grew the most from 2015 to 2016. The next
greatest momentum levels were in Rhipe and Mahout.
3.00
2.50
2.00
1.50
1.00
0.50
0.00
Spark MLib Rhipe (R) Mahout Oryx Myrrix
2015 2016
56
2016 Big Data Analytics Market Study
North America
Asia Pacific
1 2 3 4 5
57
2016 Big Data Analytics Market Study
In our 2016 sample, interest in big data machine learning varied by vertical industry but
overall was led by preference for Spark MLib (fig. 45). Healthcare and technology
showed the greatest interest in MLib. Healthcare and consulting were most interested in
Rhipe.
Technology
Financial services
Consulting
Healthcare
Education
1 2 3 4 5
58
2016 Big Data Analytics Market Study
By function, Spark MLib is again the standout category leader across organizational
roles. BICC and executive management are again mirrors of the top areas of interest,
followed by R&D and sales and marketing (fig. 46). IT is mostly unengaged with big
data analytics and machine learning, even more so than sales and marketing or finance.
Business intelligence
competency center
Executive
management
Research and
development (R&D)
Finance
1 2 3 4 5
59
2016 Big Data Analytics Market Study
Organizations of all sizes prefer Spark MLib over all other big data analytics / machine-
learning options (fig. 47). This effect is not correlated to size. In our 2016 sample,
sentiment for MLib is strongest in organizations with 1,001 to 5,000 employees. We see
that preference for Spark MLib is higher at large organizations, while small peers have a
proportionately greater interest in R-based Rhipe.
1 - 100
101 - 1000
1001 - 5000
1 2 3 4 5
60
2016 Big Data Analytics Market Study
Cloudera
Hortonworks
Amazon
MAP/R
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
61
2016 Big Data Analytics Market Study
Interest in all big data distributions increased year over year in 2016 (fig. 49). Amazon
fell slightly from a tie for top place to third, behind Cloudera and Hortonworks. Interest
levels for the top three choices were at or near 3.0, indicating average responses near
"important" to respondents.
3.00
2.50
2.00
1.50
1.00
0.50
0.00
Cloudera Hortonworks Amazon MAP/R
2015 2016
62
2016 Big Data Analytics Market Study
In 2016, there were differences of interest by geography in the four big data distributions
we sampled (fig. 50). Asia Pacific is again the leader across the board on all distribution
interest. Perhaps most noticeably, EMEA reported the greatest standout interest in
Cloudera compared to other distributions.
North America
Asia Pacific
1 2 3 4 5
63
2016 Big Data Analytics Market Study
Technology
Financial services
Consulting
Healthcare
Education
1 2 3 4 5
64
2016 Big Data Analytics Market Study
Unlike other measures, Cloudera is not a big data distribution category leader by
function due to sample weighting (fig. 52). In our 2016 sample, Hortonworks was a
standout leader among sales and marketing respondents. Amazon performed strongest
among distributions for executive management respondents. BICC respondents
preferred Hortonworks by a lesser margin, and IT interest was led by Cloudera.
Business intelligence
competency center
Executive
management
Research and
development (R&D)
Finance
1 2 3 4 5
65
2016 Big Data Analytics Market Study
Small to very large organizations have varying preferences in big data distributions,
though not to an extreme extent (fig. 53). As we might expect, cloud-based Amazon and
AWS distributions appeal most strongly to small organizations for simple and
inexpensive startup projects that have also demonstrated abilities to scale. Mid-sized
(101-1,000) organizations also most prefer Amazon, though we do see a trend among
larger organizations to bring big data distribution management in house. Cloudera and
Hortonworks are the top picks among large and very large organizations.
1 - 100
101 - 1000
1001 - 5000
1 2 3 4 5
66
2016 Big Data Analytics Market Study
Industry and
Vendor
Analysis
67
2016 Big Data Analytics Market Study
60%
50%
40%
30%
20%
10%
0%
Critically important Very important Somewhat important Not important
2015 2016
68
2016 Big Data Analytics Market Study
Among big data infrastructure options in the Hadoop ecosystem, Map/Reduce still has
the highest level of vendor support, which is not surprising given its longevity and
relative maturity (fig. 55). Support for Spark is closing in quickly with the highest
predicted industry support plans for the next 12 months, after which Spark support will
be ubiquitous. After Spark, industry support drops quickly below 50 percent. Future "no
plans" for support range from 30 percent to more than 60 percent.
90%
80%
70%
60%
50%
No plans
40% 24 months
30% 18 months
20% 12 months
Today
10%
0%
69
2016 Big Data Analytics Market Study
Year over year, industry plans for supporting Map/Reduce, Spark, Yarn, and Tez have
all gathered momentum (fig. 56). Despite some growth in user sentiment (fig. 25, p. 38),
industry support for Oozie declined. We continue to expect that proprietary vendor
support of open source big data projects will be opportunistic and customer driven.
2015 2016
70
2016 Big Data Analytics Market Study
Existing industry support for access to big data sources is greatest for Hive/Hive QL (87
percent), followed by HDFS (85 percent) (fig. 57). These top choices are in line with top
user preferences for data access, but Spark support is a good bit lower than user
expectations (fig. 30, p. 43). Industry support for Redshift is next highest, somewhat
ahead of user priorities. Google BigQuery currently has much lower industry support but
is the third most cited choice of users.
90%
80%
70%
60%
No plans
50%
24 months
40% 18 months
30% 12 months
Today
20%
10%
0%
71
2016 Big Data Analytics Market Study
Year-over-year industry support for data access increased for all big data sources
polled with the exception of Redshift (fig. 58). Though Redshift was a lower priority
among users than industry vendors, it gained additional user interest in 2016 (fig. 31, p.
44). The biggest gainer of industry support in 2016, Redshift, gained even more interest
among user respondents year over year.
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
2015 2016
Figure 58 - Industry support for access to big data sources 2015 to 2016
72
2016 Big Data Analytics Market Study
Industry support for big data search did gain momentum in 2016, though support
remains distinctly lukewarm (fig. 59). While 30 percent of vendors indicate support for
Apache Solr, under 20 percent currently support Elasticsearch or Cloudera Search and
more than 40 percent have no plans for future support. The tepid investment in big data
search is in line with current user sentiments, which show little urgency for search (fig.
36, p. 49).
90%
80%
70%
60% No plans
24 months
50%
18 months
40% 12 months
Today
30%
20%
10%
0%
Apache Solr Elasticsearch Cloudera Search
73
2016 Big Data Analytics Market Study
Year-over-year industry support for big data search varied noticeably by product (fig.
60). While support for category leader Apache Solr grew from 25 percent to 31 percent,
Cloudera Search support fell from 26 percent to 14 percent. Support for Elasticsearch
was flat year over year. We cannot be certain whether swings in industry support are
related to existing penetration or other market factors. We saw user interest in all three
big data search products grow in interest year over year, but not with urgency (fig. 37, p.
50).
30%
25%
20%
15%
10%
5%
0%
Cloudera Search Apache Solr Elasticsearch
2015 2016
74
2016 Big Data Analytics Market Study
Industry support for big data analytics / machine learning is strongest for Spark MLib
followed by Mahout, though we concede these investments are not urgent and reflect
the esoteric uses of machine learning in the current market (fig. 61). While industry
support for MLib is expected to reach a total of 56 percent in the next 12 months, future
support for all other machine learning methods is tepid and may never reach 50
percent. (Spark MLib, Rhipe, and Mahout were top user machine-learning choices but
also showed low levels of enthusiasm (fig. 42, p. 55).
90%
80%
70%
60% No plans
24 months
50%
18 months
40% 12 months
Today
30%
20%
10%
0%
Spark MLib Mahout Rhipe (R) Oryx Myrrix
75
2016 Big Data Analytics Market Study
Year-over-year industry support for big data analytics / machine learning was higher for
Spark MLib, slightly lower for Mahout, and significantly lower for other products,
particularly Rhipe (fig. 62). Again, support investments remain low and, as with current
vendor support shown in fig. 61 above, where investment or interest is developing, it
tends to go to Spark MLib.
30%
25%
20%
15%
10%
5%
0%
Spark MLib Mahout Rhipe (R) Oryx Myrrix
2015 2016
Figure 62 - Industry support for big data analytics / machine learning 2015 to 2016
76
2016 Big Data Analytics Market Study
Compared to support for big data search, we see significant existing industry support
and future plans for big data (Hadoop) distributions (fig. 63). Current support is
strongest for Hortonworks, followed by Cloudera and Map/R. Current support for
Amazon is under 60 percent, but industry respondents expect to see about 90 percent
support for all products within 24 months. These investments support stronger user
sentiments for big data distributions (fig. 48, p. 61) than for search or machine learning.
90%
80%
70%
60% No plans
24 months
50%
18 months
40% 12 months
Today
30%
20%
10%
0%
Hortonworks Cloudera MAP/R Amazon
77
2016 Big Data Analytics Market Study
80%
70%
60%
50%
40%
30%
20%
10%
0%
Hortonworks Cloudera MAP/R Amazon
2015 2016
Figure 64 - Industry support for big data (Hadoop) distributions 2015 to 2016
78
2016 Big Data Analytics Market Study
Jinfonet SAP
TIBCO Tableau
Oracle
Infrastructure Data Access Search Distributions Machine Learning Total Score
79
2016 Big Data Analytics Market Study
Glossary
Alluxio (formerly Tachyon) is a memory-centric distributed storage system enabling reliable
data sharing at memory-speed across cluster frameworks.
Source: alluxio.org
Atlas is designed to exchange metadata with other tools and processes within and outside of the
Hadoop stack, thereby enabling platform-agnostic governance controls that effectively address
compliance requirements
Source: Apache Software Foundation
BigQuery is a RESTful web service that enables interactive analysis of massively large datasets
working in conjunction with Google Storage. It is an Infrastructure as a Service (IaaS) service
that may be used complementarily with MapReduce.
HAWQ is a parallel SQL query engine that combines the key technological advantages of the
industry-leading Pivotal Analytic Database with the scalability and convenience of Hadoop.
HAWQ reads data from and writes data to HDFS natively. HAWQ delivers industry-leading
performance and linear scalability. It provides users the tools to confidently and successfully
interact with petabyte range data sets. HAWQ provides users with a complete, standards-
compliant SQL interface.
Source: Pivotal
HBase is an open source, non-relational, distributed database modeled after Google's BigTable
and is written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop
project and runs on top of HDFS (Hadoop Distributed File System), providing BigTable-like
capabilities for Hadoop.
The Hadoop distributed file system (HDFS) is a distributed, scalable, and portable file system
written in Java for the Hadoop framework.
The Apache Hive data warehouse software facilitates querying and managing large datasets
residing in distributed storage. Hive provides a mechanism to project structure onto this data and
query the data using a SQL-like language called HiveQL. At the same time this language also
allows traditional map/reduce programmers to plug in their custom mappers and reducers when it
is inconvenient or inefficient to express this logic in HiveQL.
80
2016 Big Data Analytics Market Study
The Apache Knox Gateway is a REST API Gateway for interacting with Apache Hadoop
clusters. The Knox Gateway provides a single access point for all REST interactions with
Apache Hadoop clusters.
Source: Apache Software Foundation
Impala is an open source, native analytic database for Apache Hadoop. Impala is shipped by
Cloudera, MapR, Oracle, and Amazon.
Source: Cloudera
Apache Mesos is an opensource cluster manager that was developed at the University of
California, Berkeley. It "provides efficient resource isolation and sharing across distributed
applications, or frameworks". The software enables resource sharing in a fine-grained manner,
improving cluster utilization.
81
2016 Big Data Analytics Market Study
Myrrix, offers a complete, real-time, scalable clustering and recommender system. The
solution is built on top of the Apache Mahout machine-learning project.
Source: Cloudera
Oryx is built on Apache Spark and Apache Kafka, with specialization for real-time large scale
machine learning. It is a framework for building applications but also includes packaged, end-to-
end applications for collaborative filtering, classification, regression, and clustering.
Source: Cloudera
RHIPE integrates the R statistical environment with the Hadoop framework. RHIPE allows R
users to compute on terabyte-sized data sets a cluster using the MapReduce framework, thus
offering the best of both worlds to users seeking to leverage the strength of R and Hadoop.
People with very large data sets stored in the Hadoop Distributed File System can now easily
process the data on hundreds or even thousands of nodes in parallel, using only the R language.
Source: Revolution Analytics
Cloudera Search is one of Cloudera's near-real-time access products. Cloudera Search enables
non-technical users to search and explore data stored in or ingested into Hadoop and HBase.
Users do not need SQL or programming skills to use Cloudera Search because it provides a
simple, full-text interface for searching.
Source: Cloudera
Solr is an open source enterprise search platform, written in Java, from the Apache Lucene
project. Its major features include full-text search, hit highlighting, faceted search, real-time
indexing, dynamic clustering, database integration, NoSQL features and rich document (e.g.,
Word, PDF) handling. Providing distributed search and index replication, Solr is designed for
scalability and fault tolerance. Solr is the most popular enterprise search engine.
Apache Spark is an open source cluster computing framework originally developed in the
AMPLab at University of California, Berkeley but was later donated to the Apache Software
Foundation where it remains today. In contrast to Hadoop's two-stage disk-based MapReduce
paradigm, Spark's multi-stage in-memory primitives provides performance up to 100 times faster
for certain applications. By allowing user programs to load data into a cluster's memory and
query it repeatedly, Spark is well suited to machine-learning algorithms.
82
2016 Big Data Analytics Market Study
Spark SQL is a component on top of Spark Core that introduces a new data abstraction called
DataFrames, which provides support for structured and semi-structured data. Spark SQL
provides a domain-specific language to manipulate DataFrames in Scala, Java, or Python. It also
provides SQL language support with command-line interfaces and ODBC/JDBC server.
Apache Tez is an extensible framework for building high-performance batch and interactive
data-processing applications, coordinated by YARN in Apache Hadoop. Tez improves the
MapReduce paradigm by dramatically improving its speed while maintaining MapReduces
ability to scale to petabytes of data. Important Hadoop ecosystem projects like Apache Hive and
Apache Pig use Apache Tez, as do a growing number of third-party data-access applications
developed for the broader Hadoop ecosystem.
Source: Apache Software Foundation
YARN is one of the key features in the second-generation Hadoop 2 version of the Apache
Software Foundation's open source distributed processing framework. Originally described by
Apache as a redesigned resource manager, YARN is now characterized as a large-scale,
distributed operating system for big data applications.
* All sources Wikipedia unless otherwise noted
83
2016 Big Data Analytics Market Study
84
2016 Big Data Analytics Market Study
Name*: _________________________________________________
Address 1: _________________________________________________
Address 2: _________________________________________________
City: _________________________________________________
State: _________________________________________________
Zip: _________________________________________________
Country: _________________________________________________
Major Geography
( ) Asia/Pacific
( ) Latin America
( ) North America
_________________________________________________
85
2016 Big Data Analytics Market Study
( ) Executive management
( ) Finance
( ) Manufacturing
( ) Marketing
( ) Sales
( ) Advertising
( ) Aerospace
( ) Agriculture
( ) Automotive
( ) Aviation
( ) Biotechnology
( ) Broadcasting
( ) Business services
( ) Chemical
( ) Construction
( ) Consulting
86
2016 Big Data Analytics Market Study
( ) Consumer products
( ) Defense
( ) Education
( ) Energy
( ) Executive search
( ) Federal government
( ) Financial services
( ) Healthcare
( ) Hospitality
( ) Gaming
( ) Insurance
( ) Legal
( ) Manufacturing
( ) Mining
( ) Pharmaceuticals
( ) Publishing
( ) Real estate
( ) Sports
87
2016 Big Data Analytics Market Study
( ) Technology
( ) Telecommunications
( ) Transportation
( ) Utilities
( ) 1 - 100
( ) 101 - 1000
( ) 1001 - 5000
Do you use or intend to use big data technology/architecture within your organization?*
What product(s) does your organization use with big data for BI/analytics?
____________________________________________
How satisfied are you with your vendor and product for big data analytics?
( ) Extremely satisfied
( ) Somewhat satisfied
( ) Somewhat unsatisfied
( ) Unsatisfied
88
2016 Big Data Analytics Market Study
What are your plans for Big Data (Hadoop) Analytics in the Future?
What use cases are most important for Big Data (Hadoop) in your organization?
Data warehouse () () () () ()
optimization
Customer/social () () () () ()
analysis
Internet of () () () () ()
things
Fraud detection () () () () ()
Clickstream () () () () ()
analytics
89
2016 Big Data Analytics Market Study
Please indicate the importance of the following Big Data infrastructure components
Alluxio () () () () ()
(formerly
Tachyon)
Mesos () () () () ()
Spark () () () () ()
Map/Reduce () () () () ()
Oozie () () () () ()
Yarn () () () () ()
Tez () () () () ()
Atlas () () () () ()
Knox () () () () ()
Gateway
90
2016 Big Data Analytics Market Study
Please indicate the importance of the following Big Data - data access capabilities
Google () () () () ()
BigQuery
HBase () () () () ()
HDFS () () () () ()
Hive/HiveQL () () () () ()
Impala () () () () ()
MongoDB () () () () ()
Pivotal () () () () ()
HAWQ
Redshift () () () () ()
Spark SQL () () () () ()
91
2016 Big Data Analytics Market Study
Please indicate the importance of the following Big Data search capabilities
Cloudera () () () () ()
Search
Apache Solr () () () () ()
Elasticsearch () () () () ()
Please indicate the importance of the following Big Data analytical/machine learning components
Mahout () () () () ()
Rhipe () () () () ()
(R)
Oryx () () () () ()
Myrrix () () () () ()
Spark () () () () ()
MLib
92
2016 Big Data Analytics Market Study
Please indicate the importance of the following Big Data (Hadoop) distributions
Cloudera () () () () ()
Hortonworks () () () () ()
MAP/R () () () () ()
Amazon () () () () ()
93