Você está na página 1de 41

Rexer Analytics

2013 Data Miner Survey


Summary Report

For more information contact


Karl Rexer, PhD
krexer@RexerAnalytics.com
www.RexerAnalytics.com

Outline
Overview & Key Findings
Focus on CRM
Big Data
The Ascendance of R
Challenges in the Use of Analytics
Engagement & Job Satisfaction
Analytic Software
Other Findings
Appendix: Rexer Analytics
2013 Rexer Analytics

Overview & Key Findings

2013 Rexer Analytics

Vendors are
included in this
analysis.

2013 Data Miner Survey: Overview


Vendors*

6th

survey since 2007

68 questions
Academics

10,000+ invitations emailed,


plus promoted by
newsgroups, vendors,
and bloggers
Respondents:
1,259 data miners
from 75 countries
Data collected in
first half of 2013
*Data from software vendors is excluded from
analyses in this presentation unless otherwise noted.
2013 Rexer Analytics

18%

NGO / Govt (6%)

Corporate

35%

15%
26%
Consultants

Central & South America (4%)


Brazil 2%
Middle East & Africa (3%)
Asia Pacific
India 5%
Australia 3%

11%
41%

Europe
Germany 8%
UK 5%
France 4%
Poland 3%

North America
USA 37%
Canada 3%

41%

Key Findings
FOCUS ON CRM: In the past few years, there has been an increase among data miners in the already
substantial area of customer-focused analytics. Respondents are looking for a better understanding of
customers and seeking to improve the customer experience. This can be seen in their goals, analyses,
big data endeavors, and in the focus of their text mining.

BIG DATA: Many in the field are talking about the phenomena of Big Data. There are clearly some

areas in which the volume and sources of data have grown. However it is unclear how much Big Data
has impacted the typical data miner. While data miners believe that the size of their datasets have
increased over the past year, data from previous surveys indicate that the size of datasets have been
fairly consistent over time.

THE ASCENDANCE OF R: The proportion of data miners using R is rapidly growing, and since 2010, R
has been the most-used data mining tool. While R is frequently used along with other tools, an
increasing number of data miners also select R as their primary tool.

CHALLENGES IN THE USE OF ANALYTICS: Data miners continue to report challenges at each level
of the analytic process. Companies often are not using analytics to their fullest and have continuing
issues in the areas of deployment and performance measurement.

ENGAGEMENT & JOB SATISFACTION: The Data Miners in our survey are highly engaged with the

analytic community: consuming and producing content, entering competitions and searching for
education and growth within their jobs. All of these activities lead to high job satisfaction, which has been
increasing over time.

ANALYTIC SOFTWARE: Data miners are a diverse group who are looking for different things from their
data mining tools. Ease-of-use and cost are two distinguishing dimensions. Software packages vary in
their strengths and features. STATISTICA, KNIME, SAS JMP and IBM SPSS Modeler all receive high
satisfaction ratings.

2013 Rexer Analytics

Focus on CRM

2013 Rexer Analytics

Analytic Goals Increasingly Focus on CRM


In the past few years, there has been
an increase among data miners in
the already substantial area of
customer-focused analytics.
Respondents are looking for a better
understanding of customers and
seeking to improve the customer
experience. This can be seen in their
goals, analyses, big data endeavors,
and in the focus of their text mining.
Seven of the top ten analysis goals
identified this year were directly
related to customer service or
marketing. Four of these goals
increased in popularity by eight
points or more since our last survey
in 2011.

Improving understanding of customers


Retaining customers
Improving customer experiences
Market research / survey analysis
Selling products / services to existing customers
Acquiring customers
Improving direct marketing programs
Sales forecasting
Risk management / credit scoring
Fraud detection or prevention
Price optimization
Medical advancement / drug discovery / biotech
Manufacturing improvement
Investment planning / optimization
Website or search optimization
Supply chain optimization
Software optimization
Human resource applications
Collections
Language understanding
Information security
Natural resource planning or discovery
Criminal or terrorist detection
Fundraising
Reducing email spam

2011

2013

33%
30%
22%
29%
23%
23%
22%
19%
22%
21%
14%
12%
10%
11%
8%
7%
7%
4%
6%
4%
4%
3%
4%
3%
2%

45%
36%
36%
36%
33%
32%
27%
27%
26%
23%
22%
17%
15%
13%
12%
11%
9%
8%
7%
7%
5%
5%
4%
3%
2%

Question: What were the goals of your analyses in the past year? (select all that apply) (Substantial changes noted in red)
2013 Rexer Analytics

CRM / Marketing: #1 Place for Data Miners


CRM / Marketing remains the #1 area to
which data mining is applied.
The roots of data mining in customer
focused analytics are strong. In each of
the 6 Data Miner Surveys, more people
report applying their analytics in the field
of CRM / Marketing than any other field.
In 2013, 36% of data miners indicated
that they are commonly involved in
CRM / Marketing data mining, up slightly
from 2011. The number of data miners
working in the overlapping area of Retail
analytics is also increasing.

32%
31%

Academic

24%
27%

Financial
Retail

16%
12%

Telecommunications

14%
13%

Insurance

14%
14%

Technology

14%
14%

Medical

13%
12%

Internet-based

13%
11%

Manufacturing

12%
10%

Government

12%
11%

Pharmaceutical

Question: In what fields do you TYPICALLY apply data mining? (Select all that apply)
2013 Rexer Analytics

36%
33%

CRM/Marketing

10%
10%

2013
2011

Data miners also report working in Nonprofit (5%), Hospitality / Entertainment /


Sports (4%), Military / Security (2%), and
Other (10%).
8

Customer Transactions: #1 Source of Large Data


Sources of Large Data
Customer transactional data
often affords the opportunity
for a wide range of analytics
due to the depth and scope of
available data.
Among respondents who
reported increases in data
volume, 60% identified
customer transaction data as
a source of their large data
sets.

Customer transaction data

60%

Text data

43%

Timeseries collected online

32%

Social media data

24%

Call center data

24%

Timeseries via sensors

24%

Web log or click stream data

22%

Geospatial data

18%

Mobile device data

14%

Email

12%

Image or video data

9%

RFID

3%

Audio data

3%

Question: What are the sources of data for your large datasets? (select all that apply)
2013 Rexer Analytics

Customer Service and Text Mining


Text mining adoption has steadily increased
since 2010, to its present state where 38% of
data miners incorporate text mining into their
analyses. Particular growth is seen in the use
of text mining for the purposes of customer
service. This is not surprising given the
opportunity that verbatim customer
comments afford organizations in
understanding the experiences
and needs of their customers.
100%
80%

31%

34%

34%

33%

40%
20%

32%

36%

33%

30%

2010

2011

2013

2013
39%
36%
28%

News articles

25%

27%

Scientific or technical literature


Web-site feedback
Contact center notes or transcripts
E-mail or other correspondence
Point of service notes or transcripts
Employee surveys
Insurance claims or underwriting notes
Medical records

23%
22%
16%
27%
10%
15%
15%
11%

25%
25%
25%
22%
20%
18%
14%
14%

Question: In your text mining, what text material do you analyze


or plan to analyze? (Substantial changes noted in red)

38%

60%

Blogs and other social media


Customer / market surveys
Online forums or review sites

2011
33%
38%
21%

Current Text
Miners

Corporate

Plan to Start

Consultants

No Plans to
Start

0%

Academic
NGO / Gov't

34%

34%

41%
38%
45%

35%
23%

32%
23%
40%

26%

29%

Data miners working in Government settings are


most likely to be actively using text mining.

Question: Which is the best description of your use of text mining?


2013 Rexer Analytics

10

Big Data

2013 Rexer Analytics

11

Big Data: Hype or Reality?


There is a lot of talk in the business and technical
press about Big Data. Clearly some businesses
and scientific areas are working with very large
data sets. However, it is unclear how much Big
Data has impacted the typical data miner.

2013: Perception of Data Size Increase


Decreased
(2%)

In 2013, the general perception among data miners


is that data volumes have increased (72% say it
has). However, the datasets they report using are
of similar size to what was reported in 2007.
Additionally, only 13% report that their company has
an active big data program.

18%

26%

26%

Increased
Somewhat

Increased
a Lot

46%

26%

Question: Has the volume/size of data that you use


in your analyses increased in the last two years?

2013: Your Companys Big Data Plan

Typical Data Set Size


2013 7% 11%

Same

30%

8%

Active Big Data


Program

No Big
Data Plan

13%
2009

9% 15%

2007 5% 11%

21%

20%

1,000 or fewer records


10,001-100,000 records
1,000,001 - 100,000,000 records

24%

29%

25%

30%

7%

7%

1,001-10,000 records
100,001-1,000,000 records
More than 100,000,000 records

Question: What size data sets did you typically data mine in the past year?
2013 Rexer Analytics

Pilot Program

13%

10%
Plan to
Implement

32%

32%
Exploring

Question: What is your company / organization


doing with regards to Big Data?
12

Where is Big Data Coming From?


Subjectively, 72% of data miners feel they are experiencing an increase in data.
They report that their large data is coming from a variety of sources the most frequently
reported sources are customer transactions and text data.
Sixty-nine percent of these data miners report that their Big Data is from new sources, and
close to half (45%) report new types of data, indicating that the composition of Big Data is
not just an increase in data volume from their standard sources.
Sources of Large Datasets
60%

Customer transaction data

43%

Text data

32%

Time series collected online

24%
24%
24%
22%

Social media data


Call center data
Time series via sensors
Web log or click stream data

18%

Geospatial data
Mobile device data
Email
Image or video data

14%
12%
9%

3%
Audio data 3%

Sources of Increased Data Volume


69%

New sources

Volume alone

New types

62%
45%

RFID

Question: What are the sources of data for your large datasets? (select all
that apply)
2013 Rexer Analytics

Question: Have the increases in data volume/size been due to


increases in data of the same type that you've previously worked with,
or has the increase in data volume / size been due to the addition of
data from new sources or the addition of data of new types?
13

Challenges Presented by Big Data and Their Solutions


Problems

Number of Respondents

Regardless of whether Big Data is a new or


more longstanding phenomenon, there are
inherent challenges in working with large
data sets. Respondents shared their ideas
about the challenges presented by big data
(in an open-ended survey question). The
most frequently identified challenges were
time and effort, available computing power,
and data management.

Time
required
Timeand
andeffort
Effort Required
Available
Availablecomputing
Computing power
Power

Distributing Distributing
or parallel
processing
or Parallel
Processing

39

Data storage
Storage
Data
Model
Modelperformance
Performance

31
17

Question: What new challenges does the increasing size of


your datasets pose to your analyses? (open-end response)

Solutions
87

59
28

24

Pre-processing
and data
checks
Pre-Processing
and Data
Checks

42

Noproblems
Problems
No

Better software,
algorithms,
or orcode
Better Software,
Algorithms,
Code

Sampling, partitioning,
or reducing
dataset
Sampling, Partitioning,
or Reducing
Datasetsize
Size

56

Data
Datamanagement
Management

Number of Respondents
Upgrading Upgrading
or replacing
hardware
or Replacing
Hardware

95

18

Respondents also shared their


solutions to big data challenges.
Better software or algorithms,
upgrading hardware, and sampling
were the most frequently cited
solutions.

Question: How are you overcoming these challenges?


2013 Rexer Analytics

14

The Ascendance of R

2013 Rexer Analytics

15

The Popularity of R Software is Skyrocketing


The proportion of data miners using R is rapidly growing, and since 2010, R has been the
most-used data mining tool. While R is frequently used along with other tools, an increasing
number of data miners also select R as their primary tool. Among data miners who say they
are likely to switch their primary package in the coming year, R is frequently identified as the
tool they are plan to switch to more than 2.5 times more often that any other tool.

R Usage
80%
70% of data miners
report using R

70%
60%
50%
40%
30%

24% of data miners


select R as their
primary tool

20%
10%
0%
2007 2008 2009 2010 2011 2012 2013
2013 Rexer Analytics

16

Priorities and Characteristics of R Users


Important Factors in Selecting Software

While data miners overall consider quality and

accuracy of model performance, dependability of


software, and data manipulation capabilities the most
important factors when choosing a data mining tool,
those using R as their primary tool identify the ability
to write ones own code as their most important priority.

R is primary tool

All data miners

#1: Ability to write own


code

#1: Quality & accuracy


of model performance

#2: Quality & accuracy


of model performance

#2: Dependability of
software

#3: Data manipulation


capabilities

#3: Data manipulation


capabilities

The quality of the user interface was rated as significantly less important by primary R users than by
other data miners.

Interestingly, there was no difference in the stated importance of cost of tool between those using R as
their primary package and others. However, primary R users are more satisfied than other tool users
with the cost of their software (see page 33). They are also more satisfied with the variety of available
algorithms and the ability to modify algorithms to fine-tune analyses.

While R is heavily used among data


miners working in all settings, in
corporate settings, a smaller
proportion of data miners report
that R is their primary tool.

2013 Rexer Analytics

17

Challenges in the Use of Analytics

2013 Rexer Analytics

18

Only Corporate
respondents are
included in this
analysis.

Use of Analytics is Still Evolving

As in previous years, data miners report challenges at each level of the analytic process.
Companies often are not using analytics to their fullest and have continuing issues in the
areas of deployment and performance measurement. Only 16% of companies always use
analytics to address appropriate questions and 7% rarely or never do. Additionally, corporate
analytic sophistication is only considered high or very high by 38% of respondents.
Never
1%

Use Analytics

Rarely

6%

Sometimes

Usually

Always

34%

43%

16%$

Question: When there are questions that can be addressed by analytics, how
often does your company / organization use analytics to address them?

Very Low
3%

Sophistication

Low

Moderate

14%

40%

High

24%

Very High

14%$

Question: In general, with what degree of sophistication does


your company / organization approach analytic problems?
2013 Rexer Analytics

19

Results of Analyses are Often Not Deployed


There is perhaps no greater frustration for data miners than seeing their hard work get
sidelined. While most data miners report that the results of their analyses are being
deployed most or all of the time, a third say that they are only deployed sometimes or rarely
these data miners also have substantially lower job satisfaction. Those in academic, NGO
and Government settings report even less frequent deployment than those working for a
company or in a consulting capacity.

Frequency of Deployment
Overall

5%

28%

Corporate 3%
Consultants
Academic
NGO / Gov't

50%

22%

58%

26%

52%

13%
5%

Never

40%
45%

Rarely

16%

17%
20%
36%
37%

Sometimes

Most of the Time

9%
13%

Always

Question: How often are results of your analytics deployed and/or utilized?
2013 Rexer Analytics

20

Time to Analyze and Deploy Varies


Gaps between final models and utilization plague many projects. While six in ten data
miners report that data is available to them for analysis within days of capture, deployment
takes substantially longer, with nearly six in ten respondents estimating weeks to over a
year between analysis and deployment.

Time to Data Analysis

Time to Deployment

33%
10%
Minutes

22%

17%

Hours

Days

Weeks

32%
24%

15%

Months

3%

4%

Year or
More

Minutes

Question: What is the typical lag time between when


your data is captured / created and when it becomes
available to you for inclusion in your analyses?
2013 Rexer Analytics

22%

11%
3%
Hours

Days

Weeks

Months

Year or
More

6%
Not
Deployed

Question: What is the typical lag time between when your


analyses are completed and when they are deployed / used?
21

Measuring Performance and Updating Analyses


One of the most puzzling aspects of model deployment is how often organizations fail to follow-up to
determine whether models have been effective. Only about half of data miners report that their
organization reliably measure analytic performance (most of the time or always). And 21% report that
their organizations rarely or never measure the performance of their analytic initiatives.
Without reliable performance measurement, its tough for organizations to know when to update their
models. The majority (64%) of data miners report that their organizations typically update models quarterly
or annually. However, 13% report that their models are updated dynamically: daily or more frequently.

Performance Measurement

Model Updating
Daily or More
Frequently

Never
Rarely

14%

Sometimes

Always

7%

25%

22%

Annually

33%

7%

16%

32%
Most of the
Time

Question: How often does your company / organization


measure the performance of analytic projects? (e.g., accuracy
of model predictions, ROI, or other success measurements)
2013 Rexer Analytics

13%

Weekly

Monthly

31%
Quarterly
Question: How frequently are models
typically updated in your organization?
22

Engagement & Job Satisfaction

2013 Rexer Analytics

23

Engagement with the Analytic Community


The Data Miners in our survey are highly engaged with the analytic community, consuming and producing
content, entering competitions and searching for education and growth within their jobs. All of these
activities lead to high job satisfaction, which has been increasing over time.
Data Miners are most likely to read journals, newsletters, and blogs to stay informed. More than half (56%)
engage in at least one of these three activities at least weekly. They are least likely to conduct webinars or
write blog entries. Additionally, an impressive 73% actively contributed to the knowledge base at least once
in the past year by conducting webinars, writing blog entries, contributing to newsletters or newsgroups,
submitting articles, or presenting at conferences.
0%

20%
26%

Read journals
Read analytic newsletters / newsgroups

41%

16%

4%

Wrote blog entries 5% 7%


Conducted webinars

46%
30%

34%

Presented at conferences

29%

26%

4%

10%
6%

15%
10%

15% 5%

Academics and Consultants are more


likely to contribute to the knowledge base
Corporate Consultants
68%

10% 5%

Weekly or More

28%

20%

14%
9% 8%

100%

34%

52%

Attended webinars

80%

31%
20%

Attended conferences

Submitted articles

60%

39%

Read blogs

Contributed newsletters / newsgroups

40%

Monthly

A Few Times

74%

Academic

NGO /
Govt

86%

67%

Once

Question: How often in the past year have you participated in the following activities to stay informed and connect with other data miners?
2013 Rexer Analytics

24

Data Mining Competitions


Many members of the data mining community have either engaged in the knowledge sharing and
creative enterprise of competitions or have plans to. Fifteen percent of respondents have
participated in at least one data mining competition (with the average number of competitions
among those who participate being two).
Kaggle and the KDD Cup are the two competitions with the highest participation. Additionally,
31% of data miners intend to participate in upcoming Kaggle competitions (8% already have, and
plan to again, and 23% have yet to compete, but plan to).
0%

10%
8%

Kaggle
KDD Cup

4%

Conferences

4%

Health
Prize
HeritageHeritage
Health Data
Analysis

20%

30%

40%

23%
4%

18%
16%

3%

CrowdAnalytix 2%
TunedIT
NITRD
Innocentive

Have and Plan to Again

13%
13%
10%
11%
12%

Have and Don't Plan to Again

Never Have, but Plan to

Question: Which statement best describes your background and plans regarding data mining/analytic competitions? (Have competed and
plan to again, Have competed but do not plan to again, Never competed but plan to in the future, Never competed and do not plan to)
2013 Rexer Analytics

25

Vendors are
included in this
analysis.

Job Satisfaction & Demand for Data Miners are High


Data mining has proven to be a fulfilling
career for many practitioners. Overall, 36%
report being very satisfied with their jobs
and very few report dissatisfaction.
Satisfaction has also increased since 2011.
A notable pocket of greater satisfaction is
among data miners working for companies
that make data mining software 53%
report being very satisfied.
Data miners are also in demand. The
majority report that their companies are
doing more projects and increasing the
size of their analytic staff.
Overall, individuals reporting the most
growth also report higher satisfaction.
Data miners working in NGO/Government
settings report less growth and fewer
data miners working in these settings
report being very satisfied.

Job Satisfaction
Corporate

5%

15%

Consultants
Consultant 4% 10%
Academics
Academic 3% 15%
NGO
/ Govt
NGO/Govt.
Vendors
Vendor

8%
10%

2013 Rexer Analytics

37%

48%

13%

Very unsatisfied

31%

49%

33%
62%

16%

34%

Unsatisfied

53%

Neutral

Satisfied

Very satisfied

Number of Projects
Corporate

11%

52%

35%

Consultants

13%

51%

35%

NGO / Gov't

23%

Decrease Substantially
Increase Somewhat

46%
Decrease Somewhat
Increase Substantially

27%
No Change

Size of Analytic Staff


Corporate

5%

Consultants
NGO / Gov't 5% 9%
Decreased Significantly
Increased Slightly

Question: What is your current


level of job satisfaction?

48%

Question: How will the number of data mining projects your organization
conducts this year compare to what has been typical in the past few years?

38%

39%

16%

42%

37%

17%

47%
Decreased Slightly
Increased Significantly

33%

5%

Same

Question: How has the size of your organizations


analytic staff changed over the past year?

26

Vendors are
included in this
analysis.

Ways to Increase Job Satisfaction

Despite the high satisfaction rates, data miners are able to identify several ways their job
satisfaction can be increased (other than being paid more). The number one way:
greater appreciation by management or clients and greater autonomy while working on
analytic projects. Interesting projects, educational opportunities, and expansion of
analytics are also cited by a number of respondents as ways to enhance job satisfaction.

Number of Respondents

Greater appreciation or autonomy

165

Interesting or challenging projects

63

More training or educational opportunities

57

Wider/Greater use of analytics

52

Better tools or resources


More analysts

47
45

Question: Other than being paid more, what one thing would increase your satisfaction with your job?
2013 Rexer Analytics

27

Analytic Software

2013 Rexer Analytics

28

Tool Selection

Primary Analytic Tool

Data miners are a diverse group who are looking for


different things from their data mining tools. They report
using multiple tools to meet their analytic needs, and
even the most popular tool is identified as their primary
tool by just 24% of data miners. Over the years, R and
Rapid Miner have shown substantial increases.
Cluster analysis* reveals that, in their tool-selection
preferences, data miners fall into 5 groups. The primary
dimensions that distinguish them are price sensitivity and
code-writing / interface / ease-of-use preferences.
Cost is important

A
15%

B
Ability to
write ones
own code is
important

Everything is important
Ease-of-use
& interface
quality are
important

18%

E
35%

C
21%

D
11%

Cost is not important


*Cluster analysis was conducted on data miners ratings of the importance of 22 tool selection factors.
2013 Rexer Analytics

29

Tool Selection Groups


More information about the 5 groups of data miners identified on the previous page:

Importance of cost

Very high

High

Moderate

Low / Moderate

Very high

Importance of ease-of-use

High

Low / Moderate

Moderate

High

Very high

Importance of user
interface quality

High

Low

High

Very high

Very high

Importance of ability to
write ones own code

Low

Very high

High

Low

High

Rapid Miner (26%)


IBM Modeler (12%)
KNIME (11%)

R (56%)
SAS (10%)

R (26%)
SAS (19%)

STATISTICA (31%)
IBM Modeler (20%)
Rapid Miner (12%)

R (19%)
STATISTICA (16%)
KNIME (10%)
Rapid Miner (10%)

R (62%)
Rapid Miner (50%)
IBM Statistics (40%)
IBM Modeler (36%)
Weka (33%)

R (90%)
Weka (37%)
SAS (33%)
Matlab (31%)

R (73%)
SAS (43%)
IBM Statistics (35%)
Matlab (32%)
SQL Server (32%)
SAS-EM (32%)

R (51%)
IBM Statistics (38%)
STATISTICA (37%)
IBM Modeler (32%)

R (73%)
IBM Statistics (35%)
Rapid Miner (34%)
Weka (32%)
SQL Server (30%)
SAS (30%)

---

---

---

Less Likely

More Likely

Many new data


miners

Few new data


miners

---

Many new data


miners

Many experienced
data miners

Primary tools

Tool use

Working with Big Data


Experience (years)

2013 Rexer Analytics

30

Tool Use Varies by Employment Setting


R, IBM SPSS Statistics, Rapid Miner, and SAS are the software tools used by the most data miners. The
average data miner reports using 5 tools, but conducts 76% of their work in their primary tool. R, STATISTICA,
Rapid Miner, and SAS are the primary data mining tools chosen most often. 64% of data miners also report
writing their own code the most common language is SQL (43%), followed by Java (26%) and Python (24%).
The graphs below summarize the patterns of primary tool selection and overall tool usage, which vary by the
setting in which data miners work e.g., academics are heavier users of Weka and Matlab.
Overall
0%

20%

40%

60%

Corporate
80% 0%

20%

40%

60%

Consultants
80% 0%

20%

40%

60%

Academics
80% 0%

20%

40%

60%

NGO / Govt
80% 0%

20%

40%

60%

80%

R
IBM SPSS Statistics
Rapid Miner
SAS
Weka
Matlab
Microsoft SQL Server
IBM SPSS Modeler
SAS Enterprise Miner
KNIME
STATISTICA
Mathematica
Minitab
SAS JMP
IBM Cognos
Oracle Advanced Analytics
C45 / C50 / See5
Orange
SAP
Salford Systems
TIBCO S+ / Spotfire Miner
KXEN
What Data mining / analytic tools did you use in the past
year? (rate each as never, occasionally, or frequently)
2013 Rexer Analytics

What one data mining / analytic software package


do you use most frequently in the past year?

If you regularly used multiple data mining packages in the past


year, please identify the package that you used second most.

31

Tool Satisfaction
Most data miners are happy with their analytic software. STATISTICA and KNIME have particularly high
satisfaction ratings (they also had the highest ratings in the 2011 survey). SAS JMP, IBM SPSS Modeler,
Rapid Miner and R also have high ratings. While people are more satisfied with their primary tools, the
patterns of primary and secondary tool satisfaction are generally similar. However, people choosing IBM
SPSS Statistics as their secondary tool give it high ratings, while people using SAS Enterprise Miner and
IBM SPSS Modeler as their secondary tools give these tools lower ratings.
Most people also report that they will continue using their primary tools the highest continuation rate is
among people choosing KNIME as their primary tool: 85% report that they are extremely likely to
continue using it as their primary tool for the next 3 years. R and STATISTICA users also report especially
high continuation plans. Across all tools, when people say they are likely to switch primary tools, many are
choosing R (see page 16).
Satisfaction with Primary & Secondary Tools
29%
STATISTICA 4%
44%
KNIME 4%
9%
45%
SAS JMP
41%
IBM SPSS Modeler 3% 9%
7%
48%
Rapid Miner
10%
46%
R
12%
IBM SPSS Statistics 4%
Oracle Advanced Analytics 5% 5%
13%
KXEN 4%
25%
Weka
3%
8%
11%
SAS
19%
Matlab
11%
15%
SAS Enterprise Miner
40%
Minitab
9%
23%
Microsoft SQL Server

Extremely Dissatisfied

67%
52%
45%
47%
42%
42%
60%
70%
57%

Dissatisfied

56%
52%
67%
50%
50%
60%

Neutral

Satisfied

25%
20%
26%
19%
26%
13%
24%
10%
9%

Extremely Satisfied

Satisfaction question: Please rate your overall satisfaction with [insert name of previously identified software package].
2013 Rexer Analytics

32

Tool Satisfaction: Details


Overall, data miners express the most satisfaction with the quality and accuracy of their tools model
performance and with the variety of algorithms their tools make available to them. Data miners are least
satisfied with their tools help functions, their graphical visualization of models, and their ability to handle
large data sets. STATISTICA received strong ratings across many dimensions.
Overall

IBM SPSS IBM SPSS


Statistics
Modeler

KNIME

Rapid
Miner

SAS

SAS
Enterprise STATISTICA
Miner
4.48
4.62
4.23
4.59
3.74
4.52
4.16
4.51
4.10
4.44
4.10
4.59
4.27
4.58
4.17
4.50
3.77
4.41

Weka

Quality and accuracy of model performance


Variety of available algorithms
Data manipulation capabilities
Dependability/Stability of software
Ability to automate repetitive tasks
Quality of output / Ease of interpretation
Ease of use
Good metrics of model quality
Data quality assessment & data preparation capabilities
Ability to easily incorporate data at different levels of
granularity (e.g. transaction data and customer data)

4.28
4.27
4.19
4.19
4.18
4.11
4.11
4.08
4.05

3.96
3.66
3.91
4.02
3.79
3.87
4.10
3.72
3.72

4.15
4.05
4.36
3.96
3.76
3.89
4.67
3.89
4.27

4.30
4.36
4.54
4.27
4.42
4.17
4.58
3.91
4.37

4.39
4.74
4.24
4.24
4.35
4.10
3.59
4.19
4.02

4.25
4.55
4.07
4.07
4.18
4.18
4.39
4.17
4.00

4.20
3.91
4.50
4.28
4.26
3.84
3.77
4.01
4.26

4.03

3.87

4.25

4.21

3.94

4.04

4.14

4.10

4.30

3.59

Cost of software

4.03

3.02

2.89

4.85

4.93

4.86

2.33

2.70

3.91

4.89

Ability to modify algorithm options to fine-tune analyses

4.01

3.26

3.63

3.80

4.35

4.10

3.91

3.94

4.28

4.18

Good variable discovery, profiling and selection


Quality of user interface
Ease of model deployment (scoring to other data sets)
Speed
Enables mining within one's database
Ability to handle very large data sets
Strong graphical visualization of models

4.00
3.97
3.97
3.95
3.92
3.84
3.83

3.64
4.02
3.42
3.62
3.59
3.65
2.94

4.16
4.47
4.01
4.01
4.18
4.19
3.60

4.07
4.54
4.21
4.00
4.08
3.90
3.90

4.03
3.49
3.87
3.69
3.92
3.27
4.14

4.06
4.37
4.19
3.95
3.83
3.59
4.01

3.78
3.66
3.92
3.93
3.78
4.35
3.09

4.23
4.10
4.00
3.97
3.93
4.30
3.77

4.42
4.53
4.43
4.54
4.26
4.56
4.58

3.77
3.54
3.75
3.70
3.69
3.18
3.38

Useful help menu, demos and tutorials

3.82

3.87

3.82

4.05

3.86

3.54

3.67

3.90

4.23

3.50

Mean satisfaction rating on 1-5 scale

Higher Satisfaction

4.16
4.46
3.48
4.03
3.76
3.82
4.03
4.06
3.47

Lower Satisfaction

Question: Rate how satisfied you are with the performance of your primary data mining package (identified earlier) on each of these factors.
2013 Rexer Analytics

33

Other Findings

2013 Rexer Analytics

34

Many Names for Analytic Professionals

Vendors are
included in this
analysis.

A variety of labels are used to describe analytic professionals. The most common
descriptors chosen by survey respondents are Data Scientist, Researcher, Data Analyst,
and Business Analyst.

Other
Software Developer

3%
4%

Computer Scientist
Engineer
Predictive Modeler

Data Miner

8%

Data Scientist

17%

5%
15%

8%
8%

12%
Statistician

Researcher

9%

Data Analyst

11%
Business Analyst

Question: Which of the following do you primarily consider yourself to be?


2013 Rexer Analytics

35

Algorithms
Regression, decision trees, and cluster analysis continue to form a triad of core algorithms for
most data miners. This has been consistent since the first Data Miner Survey in 2007.

The average respondent reports typically using 12 algorithms. People with more years of
experience use more algorithms, and consultants use more algorithms (13) than people
working in other settings (11).
0%

20%

40%

60%

80%

100%

31%
38%
15%
6%
Regression
22%
34%
18%
9%
Decision trees
15%
35%
26%
11%
Cluster analysis
13%
22%
22%
18%
Time series
9%
16%
20%
19%
Text mining
9%
14%
18%
17%
Ensemble models
8%
17%
22%
19%
Factor analysis
8%
15%
23%
19%
Neural nets
8%
13%
16%
16%
Random forests
16%
24%
17%
Association rules 6%
15%
23%
19%
Bayesian 6%
14%
18%
17%
Support vector machines (SVM) 6%
14%
20%
16%
Anomaly detection 6%
15%
15%
Proprietary algorithms 6% 10%
The number of algorithms used varies by the
18%
18%
Rule induction 4% 10%
14%
18%
Social network analysis 4% 10%
labels people use to describe themselves, with
13%
16%
Uplift modeling 4% 10%
Data Miners (14) and Data Scientists (14)
8%
14%
20%
Survival analysis
using the most, and Software Developers (9)
8%
13%
16%
Link analysis
and Programmers (8) the fewest.
7%
14%
19%
Genetic algorithms
15%
MARS 4% 9%

Most of the time

Often

Sometimes

Rarely

Question: What algorithms / analytic methods do you TYPICALLY use? (Select all that apply)
2013 Rexer Analytics

36

Computing Environments
There have been notable increases across the past four years in the use of servers
(local or mainframe) and cloud computing for data mining. Meanwhile processing
locally (on a desktop or laptop) has remained fairly constant.

Windows is the most common operating system for analytics.


Operating System

Computing Environment
80%#
70%#

89%

Windows

60%#
50%#
40%#

37%

Linux

#Local#processing#
#Server#processing#

30%#

Unix

15%

Mac OS

14%

#Cloud#compu>ng#

20%#
10%#
0%#
2010#

2011#

2012#

2013#

Question: What are the computing environments/platforms on which data


mining/analytics occurs at your company/organization? (Check all that apply)
2013 Rexer Analytics

Question: What are the operating systems in which data mining/


analytics occurs at your company/organization? (Check all that apply)
37

Appendix: Rexer Analytics

2013 Rexer Analytics

38

Rexer Analytics Overview


Company Summary
Small privately held consulting firm
Founded in 2002
Focus: Analytic and CRM Consulting

Senior Staff
Karl Rexer, PhD
Paul Gearan
Heather Allen, PhD

(applied statistics & data mining)

Key Partners
Example Projects
Customer attrition analysis & prediction
IBM (SPSS)
Student retention analysis & prediction
Oracle
Analytic CRM strategy
Bernett Research
Fraud detection
Vlamis Software
Models to predict loan default
Customer segmentation
Sales forecasting
Market basket analysis
Product allocation optimization
CRM metric design & measurement
Predictive models for customer acquisition and cross-sell campaign targeting
Survey research (to understand customer needs & customer decision making)
2013 Rexer Analytics

39

Rexer Analytics Clients

2013 YTD
Pricewaterhouse

2012
2011

Additional clients were served. Some


wish to remain anonymous, and others
were served indirectly through partners.

2010
2009
2008
2007

2006
2005
2004
2003

Hewlett

Hewlett

Packard

Hewlett

Packard
Coverall
Intellidyn

Hewlett

Packard
Coverall

Hewlett

Packard

Analytics

Banks

BBIQ
7 Retail

Banks

Security

Coverall

Quest
Packard Quest
ath Power
Analytics
Analytics

CVS
2002
Forbes
Pharmacy Verizon

ath Power
CVS
ath Power
Consulting

Quest
Bridgewater
Pharmacy Fiserv
Analytics Bridgewater State College Overture

Fiserv
Salford
Networks
State

New
Systems
Performance
College
Fleet
Direct
Performance
Programs
Bank
Plymouth
Programs

DocSite

Objective
2 Retail
Bank
Banks
Management Objective
4 Retail
Management
BBIQ
Banks
8 Retail

DLA Piper
MIT

Epidemiology
McGraw-Hill Group
Construction McGraw-Hill
Construction
Palladium
(9 clients)
Palladium
(5 clients)
Nexus
Direct
Parc
Management
Quest
Analytics
Quest
Analytics
Sage
ath Power
Telecom
ath Power
Forbes
Consulting
Leader
Networks

One Day

University

5 Retail

Banks

Oracle
Redbox

Coverall

Packard

Packard
Raytheon

Raytheon

Palladium
Quest

Hewlett

Hewlett

Leader

Networks
(3 clients)
Accudata
(2 clients)
ITT Flow
Control
Stethographics

ADT
Davol

CR Bard
DLA Piper

Accudata

(2 clients)
ITT Flow

Control
SNCR
Lincoln Peak

SNCR
10 Retail

AboutFace
13 Retail

ADT Security
(2 divisions)
New Balance
MIT

Epidemiology

Group

McGraw-Hill

Networks
(3 clients)

Objective

Management
SNCR
Loan Depot
Shasta

Partners
9 Retail

Banks

Deutsche Bank

Pricewaterhouse

Redbox

Construction
Construction Meredith
Corporation
Palladium
(2 clients)
Palladium
Quest
(4 clients)
Analytics
Quest
ath Power
Analytics
Leader
ath Power
Networks
(2 clients)
Leader

Banks

2013 Rexer Analytics

HBO

McGraw-Hill

Loan Depot

Banks

Oracle

Coopers

Coopers
Oracle
Deutsche Bank
Redbox
HBO
ADT Security
Tyco Integrated
Security
West Corporation
Coverall
MIT
Epidemiology
Group
McGraw-Hill
Construction
Mundial
Quest
Analytics
ZaPOP
ath Power
IDG World Expo
Objective
Management
Palladium
(2 clients)
Leader
Networks
(4 clients)
NSCA
Jet Advisors
SNCR
DomainsBot
Faze1 Solar
6 Retail Banks

Google

Oracle
AS Watson
Redbox
HBO
Tyco Integrated

Security
MIT Epidemiology

Group
McGraw-Hill
Construction
Hult International
Business School
GFR Media
Rezolve
Guidewire
ath Power
IDG World Expo
Faze1 Solar
Jet Advisors
Fourth Millennium

Technologies

DomainsBot
Leader Networks

(2 clients)

Forbes Consulting
Objective

Management

Cogent Consulting
4 Retail Banks

40

Authors of the six Data Miner Surveys (2007-2013):


Heather Allen, PhD; Paul Gearan; & Karl Rexer, PhD

For more information contact:


Karl Rexer, PhD
krexer@RexerAnalytics.com
617-233-8185
Rexer Analytics
30 Vine Street
Winchester, MA 01890
USA
www.RexerAnalytics.com

2013 Rexer Analytics

41

Você também pode gostar