Você está na página 1de 12

Predicting Churn

A SAS White Paper


Table of Contents
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

The Price of Churn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

The Raw Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

The Challenge for Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Information Delivery in the Telecommunications Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

The Challenge for IT Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

“Think Strategic, Start Focused” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Churn Prediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Business Goals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Data Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

The Analysis Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Deployment and Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Warehousing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Case Study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Process Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Input Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Modelling Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

References and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8


Predicting Churn

Introduction
Customer acquisition and retention is a concern for all industries, but it is particularly acute
in the strongly competitive and now broadly liberalized telecommunications industry. For the
marketing departments of newcomer companies, the major short to medium-term issue is
likely to be attracting new customers. However, for the incumbent operators and the more
mature market entrants, retaining profitable customers is the number one business pain.

The information delivery systems of many telecommunication companies are reaching a


maturity level that allows them to make the step from simple query and reporting of past
cancellations towards the semi-automatic creation of predictive models. Companies can
assure a more constant flow of revenue and higher profit margins through targeted activity
such as the fine-tuning of services and promotional messages.

This white paper introduces some of the terms and concepts involved in the process of accu-
rately predicting which customers are likely to deliver high value and at the same time are
likely to exhibit a high propensity to change suppliers.

Readers who require a more detailed treatment of churn prediction are advised to con-
tact SAS Institute and inquire about SAS Institute’s Best Practice papers.

The Price of Churn


The Raw Costs
Churn is the common denominator in the world’s liberalized telecommunications industry. It
now costs European and US telcos close to US$ 4 billion each year, and the global cost of
customer defection may well approach a staggering US $10billion.1 Annual churn rates of 25 Telecommunications Online,
1

February 1999
to 30 percent are the norm, and carriers at the upper end of this spectrum will get no return
on investment on new subscribers. Why? Because it typically takes three years to pay back
the cost (approximately US $400 in the United States and US$ 700 in Europe) of replacing
each lost customer with a new one (customer acquisition).

In the European and Asian markets in particular, the number of new market entrants is
adding to the churn phenomenon. In Europe, 30 new telcos entered the market in 1998,
seeking the 15 percent market share that analysts say they will need to survive. The growth
in the number of subscribers has eased this situation in the past, but as market growth slows
and average revenue per user declines, we are likely to see an increase in predatory activity.

The Challenge for Management


The problem confronting telcos’ management is that it is very difficult to determine which
subscribers left the company and why. It is therefore even more difficult to predict which cust-
omers are likely to leave the company, and more difficult still to devise cost-effective incentives
that will persuade likely “churners” to stay.

Churn is such a massive problem that it affects other aspects of customer relationship man-
agement, such as customer acquisition. A manager must ask himself, “Am I recruiting the
right people or are they likely to churn before I have made a return on my investment?” “How
is churn affecting the lifetime value of my customer base?” and “Can we get a complete view
of our customer information, so that we can profile likely churners?” The answer to these
questions depends largely on having the right information delivery solutions in place.
1
Predicting Churn

Information Delivery in the Telecommunications Industry


The Challenge for IT Management
Telcos are among the biggest users of IT systems, yet their IT departments tend to be focused
on meeting day-to-day operational goals. In many cases the technology is not yet in place to
support the complex requests for information from the sales and marketing departments that
must address the issue of churn. Also they may lack the expertise to support complex data
mining and analytical/predictive tasks. The volumes of data that are needed to undertake
such tasks are huge and sometimes difficult or impossible to access and consolidate using
conventional operational system tools.

“Think strategic, start focused”


2 SAS (1996), SAS White Paper, “Think strategic, start focused”2 is a good motto for anybody who wants to build an information
SAS’ Rapid Warehousing
Methodology, Cary, NC: SAS delivery solution. You must always keep the big picture in mind, because information delivery
projects are invariably designed to solve business problems relating to the organization’s
overall strategy. An information delivery project therefore requires the support of senior man-
agement. (Clearly, this is not the case with systems that merely process data and migrate it
from one machine to the next.) On the other hand, it is foolish to try to solve everything in
one go. Such an approach will suck in untold resources, delay return on investment, and put
the project managers under increasing strain. Business executives who must answer to
shareholders while fending off the competition understand this message. Every investment
in data warehousing and data mining needs to show a prompt return on investment.

Now for the good news: we’ve done it all before, many times over. Experience tells us that
the only feasible way to build an enterprise information delivery solution is with support from
the top down and action from the bottom up, adding one information delivery application
after the other.

Everybody involved in an information delivery project must have in mind the clear vision of
3 Mattison, Rob, (1997), Data Ware- an integrated information architecture all along the way. As Rob Mattison3 points out, the
housing and Data Mining for
Telecommunications, Norwood, main goal to envision is the alignment of the telecommunications value chain, the organiza-
MA: Artech Computer Science tional structure and the architecture of the IT systems, with the value chain taking the lead-
Library.
ing role (see Figure 1).

Figure 1. The Value Chain in the Telecommunications Industry. At the start is the creation of a product and
the acquisition of the right to distribute it. A network then has to be built and maintained and the customers
must be properly billed and managed. (From: Mattison, Rob, (1997), Data Warehousing and Data Mining
for Telecommunications, Norwood, MA: Artech Computer Science Library.)
2
Predicting Churn

The IT systems in most companies already align with the value chain to some degree. What
can usually be found is that two main systems make up the largest part of the IT infrastruc-
ture: the switching system and the billing system, corresponding to the network side and the
business side of the value chain. The records that the switching system keeps on the call
details are passed to the billing system for summarization and the issuing of bills.

Both these systems are essentially operational systems. However, they often have been
stretched beyond their original purpose to serve as platforms for information delivery. So on
the network side traffic monitoring or even capacity planning functionality may have been
added, while the billing system may have turned into something like a customer manage-
ment system. If that is so then on both sides of the house possibly hundreds of end users
will run reports directly against what used to be and still is supposed to function as an opera-
tional system. This won’t work in the long run.

What is really needed is an information delivery layer, a data warehouse. The warehouse
allows the switching and the billing system to work more efficiently, because they don’t have
to cater for so many end-user requests. Maybe even more importantly, such a buffer layer
insulates the business users from any changes in the organizational or IT structure of the
switching and billing operations. All the information needed for their purpose will always be
found at the same place, processed and enhanced in a way appropriate for their specific pur-
poses. Adding this extra degree of freedom is what data warehousing is all about. It is this
freedom that makes it possible and advisable to build the data warehouse step-by-step from
the bottom up. It would be wasteful to re-enter unnecessary constraints up-front to limit the
business user's freedom to cater for his needs. It is rarely possible to anticipate information
needs very long in advance and this is especially true when data mining comes into play.
Data mining is a process that is meant to generate new knowledge and therefore will in turn
lead to new processes and data requirements. To a certain degree this circle is open-ended.
As a consequence the nature of the relationship between IT and business in the area of
information delivery systems must change to be one of constant co-operation and adapta-
tion, and must not be inhibited by a spirit of once-and-for all system implementation.

Data warehousing and mining can add value at a number of different places along the
telecommunications value chain. Consolidating and analysing customer information in the
marketing department for retention or acquisition purposes, for example profiling profitable
customers, and designing promotional campaigns, has surely been the most prominent appli-
cation area so far. Another important value proposition is located in the credit area, where it
is necessary to evaluate a customer’s tendency not to pay the bill. The results of such an
analysis can then be used to limit the customer’s access to certain services or to reject appli-
cations of risky customers in the first place. Closely related to this are fraud detection appli-
cations. Other applications concern the optimization of customer service as well as network
traffic monitoring and capacity planning.

Churn management (also referred to as customer retention strategy) is seen by many telcos
and other companies as today’s most pressing business pain and is therefore often chosen
as the first application area of data mining. Not all companies, however, are yet at the point
where they want to start with predicting future cancellations. Some are still denying the prob-
lem and are fully focusing on acquisition. Others are so far only able to generate lists of
churners and to get a feeling of the size of the problem. Yet others are already able to carry
out simple analyses and queries that describe the cancellations that have occurred in the
past more in detail. Over the past year many companies have embarked on data mining pro-
jects that allow them to learn from the past and predict each customer’s likelihood to cancel
in the future.
3
Predicting Churn

Churn Prediction
Business Goals
The primary goal of churn analysis is usually to create a list of contracts that are likely to be
cancelled in the near future. The customers holding these contracts are then targeted with
special incentives designed to deter cancellation. At a more sophisticated level, the telecom-
munications company will attempt to detect the reasons for an expected cancellation,
because this information may help customize the offer. Detecting causes of churn that lie
within the sphere of influence of the company also enable it to eliminate them in the future.
For example, analysis may reveal that factors as different as inadequate billing procedures
and connection quality are the root causes of churn.

Data Definition
The statistical unit for churn analysis is most often not a customer but a contract. In other
words, propensities of cancellation are calculated on a per contract rather than a per customer
basis. The main reason for this is that many important predictor variables, for example the
length of time since a contract has been signed, or the time left until the end of the obligation
period, are associated with contracts rather than customers. Also, even though a customer
may hold several contracts, usually each of these contracts contributes to revenue.

However, it should be remembered that mailings and other follow-up actions target cus-
tomers, not contracts, so there must be some post-analysis processing to summarize the
predictions for customers as individuals.

It is a good idea to conduct analysis/build models for market segments that exhibit some broad
commonality. Building a model means finding rules that relate customer attributes (input vari-
ables) to the likelihood of the churn event coded in the target variable. The customer attributes
typically considered in a churn analysis can be broadly categorized into four kinds:

• customer demographics;
• contractual data;
• technical quality data;
• billing and usage data; and
• events-type data.

The most commonly used historic variables are:

• the time a customer spends on air;


• the number of calls; and
• the revenue.

The Analysis Process


The process of analysing the data follows SAS Institute’s SEMMA (Sample, Explore, Modify,
Model, Assess) methodology. An in-depth description of this methodology is available from
4 SAS (1997), SAS White Paper, SAS Institute.4
From Data to Business Advantage:
Data Mining, SEMMA Methodolo-
gy and the SAS System, Cary, NC:
SAS

4
Predicting Churn

Deployment and Review


Deployment means taking the churn management process out of the limited realm of the
analyst. As a first step it is often necessary to make the information that defines and results
from the analysis flow available to a broader group of business users. These reports can
include a depiction of the process flow itself, the different assessment charts, the decision
tree structure, the coefficients of the logistic regression or the details of the pre-processing.

Before launching the prevention campaign it is generally considered necessary to verify the
accuracy of the model at least once by comparing the predictions with the actual cancel-
lations of the most recent month.

Warehousing
There are a number of data management tasks that need to be executed at regular intervals
when a churn management solution has been deployed. Most importantly the usage and rev-
enue data from the operational billing and switching systems need to be passed to the ware-
house, where they are aggregated and appended to the churn analysis data set.

Outlook
It usually is not just a customer’s likelihood to cancel that makes him a good target for a pre-
vention campaign. The value that a customer will bring to the company in the future — usual-
ly a projection of past revenue — plays a major role in designing a campaign in such a way
that it brings maximum return on investment. A third quantity also needs to be taken into
account, that is the likelihood of a targeted customer to be deterred from cancelling. All three
quantities — churn likelihood, prevention likelihood and value need to be combined to decide
on campaign targets.

Case Study
Process Flow
A simple process flow will illustrate how a churn analysis can be implemented with the help of Enter-
prise Miner™ software. The flow concentrates on the main aspects of the SEMMA methodology.

The flow starts in the upper left corner with the Input Data Source node, where the data are
read in, model roles and measurement levels are assigned, univariate summary statistics are
calculated and univariate distribution charts can be viewed. The data set used here will be
described more closely in the following section.

A subset of customers is selected, namely all customers whose contract started at least six
months ago. The Filter node is used for this purpose.

Some simple transformations of historic variables are then done in the Transform Variables
node. Instead of using all the detail available for the modelling it was decided to retain only
the values for the last month and summarize the values of the other months by building the
average of the monthly totals. Additionally flags are created that indicate if the usage was
zero in the last month. A Data Set Attributes node is used afterwards to set the model role
and the measurement levels of the newly created variables.

The data is then split in half with the Data Partition node to obtain a training data set and a
validation data set that is later used to choose the best model.
5
Predicting Churn

Figure 2. On the left: The tools palette. On the right: The workspace containing the process flow diagram
(pfd) of a simple churn analysis.

The Replacement node then substitutes missing values with sensible estimates. Three mod-
els are built, two of which are regression models that only use complete observations as
input. The two Regression nodes are therefore connected to the Replacement node.

Two alternative variable selection methods are experimented with, namely backward and
stepwise selection. The third model is a decision tree. The Tree node is not connected to the
Replacement node, since the tree algorithm can handle missing values.

The quality of the three models is then assessed and compared using the Assessment node.
The reporter node creates a hypertext (HTML) documentation of the analysis and the Score
node creates SAS code that can be used for assigning churn propensities on any system
independently of Enterprise Miner.

Input Data
The input data set should typically be organized to include names, model roles, measure-
ment levels and labels of the variables. Variables would typically describe the contract and
the respective customer, together with variables containing the usage and revenue in the rel-
evant time period (totals and more detailed information).

Modelling Techniques
A decision tree finds optimal if-then rules that split the customers, variable by variable, into
ever finer segments in such a way that the terminal segments (“leaves”) contain either a very
high or a very low proportion of churners. Regression models try to find optimal coefficients
for a linear equation that relates the input variables to the likelihood of the churn event.

6
Predicting Churn

Figure 3. Decision Tree Results (tree view). Only the first 3 of 6 levels are shown. Dark shading represents
a high proportion of churners in the corresponding segment.

Assessment
The quality of the models needs to be assessed and compared in order to pick the model
that gives the highest business value.

After modelling, the customers in the validation data set are sorted by the churn propensities
assigned by each model. Equally sized groups (in this case containing two percent of the cus-
tomers each) are then built and the proportions of actual churners in each group determined.

Summary
Churn is widely recognized as a major threat to telecommunications service providers. If they
can retain the best customers, telcos can increase overall profitability. To do this, they must
first of all identify which customers are likely to churn. What are their characteristics? What
can be done to incentivize them to stay loyal? In this white paper we have set out some of
the requirements of a methodology and software solution that will help telcos to answer
these questions by exploring and mining customer data.

Figure 4. Decision Tree Results Shown in Ring View. Dark shading represents a high proportion of churn-
ers in the segment.

7
Predicting Churn

A clear methodological approach and powerful enabling software technology are, however,
only two of the essentials for a successful churn modelling exercise. The traditional virtues of
science, patience and curiosity will eventually also have their share, since data mining is an
iterative process — answers to one set of questions lead to more interesting and specific
questions.

The many possible routes that a data mining analysis can take require a software solution
that can harness this degree of complexity without limiting the freedom and creativity of the
analyst. SAS Enterprise Miner software solves this dilemma by synthesizing the world-
renowned statistical analysis and reporting system of SAS Institute with an easy-to-use
graphical user interface (GUI) that can be understood and used by business analysts as well
as quantitative experts.

References and Further Reading


Mattison, Rob, (1997), Data Warehousing and Data Mining for Telecommunications,
Norwood, MA: Artech Computer Scion Library.

SAS Institute Inc., (1996), SAS Institute White Paper, SAS Institute’s Rapid Warehousing.

Methodology, Cary, NC: SAS Institute Inc.

SAS Institute Inc., (1999), SAS Institute White Paper, Finding the Solution to Data Mining —
A map of the features of SAS® Enterprise Miner™ Software, Version 3, Cary, NC:
SAS Institute Inc.

SAS Institute Inc., (1999), SAS Institute Best Practice Paper, Data Mining and the Case for
Sampling: Solving Business Problems Using SAS Enterprise Miner™ Software, Cary, NC:
SAS Institute Inc.

SAS Institute Inc., (1999), SAS Institute Solution Overview, The SAS® Solution for Customer
Relationship Management, Cary, NC: SAS Institute Inc.

SAS Institute Inc., (1999), SAS Institute White Paper, Implementing the Customer Relation-
ship Management Foundation – Analytical CRM, Cary, NC: SAS Institute Inc.

SAS Institute Inc., (1997), SAS Institute White Paper, From Data to Business Advantage:
Data Mining, SEMMA Methodology and the SAS System, Cary, NC: SAS Institute Inc.

SAS Institute Inc., (1997), SAS Institute White Paper, Business Intelligence Systems and
Data Mining, Cary, NC: SAS Institute Inc.

SAS Institute Inc., (1999), SAS Institute Best Practice Paper, Best Practice in Churn
Prediction, Cary, NC: SAS Institute Inc.

8
9
39470US.1200

Você também pode gostar