Escolar Documentos
Profissional Documentos
Cultura Documentos
p a p e r
September 2011
By Philip Carter
Sponsored by
w h i t e pa p e r
Defining Big Data. This is not in the context of the quantity or threshold that actually quantifies Big Data (as this is changing all the time, and will be applied differently, depending on the vertical and market segment), but more in terms of a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-speed capture, discovery and/or analysis.
Hadoop, Mapreduce, Key Value Store? There is a lot of hype around the new technologies that are being used by the market to deal with the Big Data phenomenon. We will highlight some of these and their relative importance. The Value of Big Data in Analytics. The bottom line here is that it is getting more complicated to process and analyse these
1
large and growing data sets and it essentially requires a re-assessment of the broader information management strategies for the majority of organisations that have started their business analytics journey. Why Big Data Analytics is Important (and Different). Many have asked the question what is new with this trend? This section will highlight the traditional use of business analytics in the old pre-Big Data world, versus Big Data analytics in the New World. This will also look at the various use cases that IDC expects to see being most commonly used across a variety of industries. The Skill Factor the Rise of the Data Scientist. With the raft of new technologies and organisational structures
that need to be put in place as the Big Data phenomenon becomes a reality, there will be increasing demand for data scientists the next-generation analytical professionals who are able to extract information from large data sets and then present value-added content of business value to non-data experts who also have the unique skill of understanding the new models that need to be put in place. Mapping out the Big Data Analytics Journey. The Big Data analytics journey will be an iterative one it is therefore important to map this out in the context of a broader framework. This section aims to do exactly that, and also provide some recommendations to CIOs as they embark on this exciting journey into the brave new world of Big Data analytics.
top drivers vary significantly by organisation size and industry. Similarly, IDC surveyed 693 European organisations in February 2011 where 51% of respondents said that BI and analytics are high-priority technologies. In emerging markets such as Asia/Pacific, the focus is very much on capturing the next wave of growth.
According to more than 1000 CIOs and LOB executives that were interviewed as part of the Asia/Pacific C-Suite Barometer in February 2011, business analytics was rated as the number one technology area that would enable their organisations to gain a competitive edge in the year ahead.
TOP 5
Business intelligence/ analytics Network Social media/ online channel Collaboration (including video, mobility,) Cloud computing/ services
0 5 10 15 20 25 30 35 %
With more businesses in Asia investing in IT to ride the hyper growth wave in emerging markets, they are harnessing analytics-led solutions to gain better customer insights, manage risk and financial metrics more effectively, and at the same time, strive for unique market differentiation. Historically, organisations have made significant investments in applications with the objective of automating business processes and capturing data to improve operational efficiency. Many of these projects are still ongoing, but what is becoming increasingly clear to the senior management of these entities is that they (and their business managers) have not been able to get hold of the right information (mainly due to poorly integrated systems and
questionable data quality) at the right time (due to performance and scalability issues) to the right stakeholders within their organisations for the critical decision-making capabilities needed to drive the necessary business impact. And where they are unable to do this, the line of business is procuring and deploying their own solutions in a new wave of shadow IT investments focusing on business analytics, thereby forcing CIOs to re-examine these issues with a specific focus on driving better IT-business alignment. These are taking place even without the Big Data dynamic in the picture which when added, creates the perfect storm for Big Data analytics to take centre stage.
3
Figure 2: IDC Business Analytics Taxonomy Performance Management & Analytic Applications Financial Performance & Strategy Management
Budgeting, Planning, Consolidation, Profitability, Strategy Management
Data Warehouse Management Platform Data Warehouse Management Data Warehouse Generation
Data extraction, transformation, loading; data quality
Source: IDC, 2011
Unstructured Data (Video, rich media etc) Semi-Structured (e.g. Weblogs, social media feeds)
Time
The Volume. One is embodied more in the structured data realm. Some of this is held in transactional data stores and is linked to the ever-present electronic trail that individuals and businesses create in the wake of rapidly increasing online activity. Sensory data (machine-to-machine) contribute to this area too. The other is in existing data warehouses or data marts, which have over time grown to petabyte scale. The Variety. The other aspect of this Big Data phenomenon is the need to analyse semi-structured and unstructured data. Text, video and other forms of media will require a completely different architecture and technologies to perform for the required analysis. For example, if you look at the social media phenomenon, many marketing departments are looking at ways to do sentiment and brand analysis based on what is being posted on Facebook, Twitter and YouTube. This dynamic becomes more
complex in Asia with local social media sites like RenRen in China and Nate in Korea. The Velocity. There will also be demand to analyse this data on a more regular basis for example, taking into account all transactions rather than a sample to obtain a more complete view of risk on a trade in real time.
In summary, Big Data refers to data sets whose volume, variety, velocity and complexity make it impossible for current databases and architectures to store and manage. IDC intentionally does not define Big Data as larger than a certain threshold (i.e. terabytes), mainly since this threshold would be a moving target depending on the sector, as well as the fact that it will obviously grow over time. More important is the value that organisations can derive from this phenomenon and the resulting need to rethink their information strategies to extract the value.
Distributed System
Hadoop
HBase
Although some of these terms will be used throughout this white paper, the focus is not to examine them in too much detail because as one IT executive recently mentioned to know the technology is one thing, but to apply it in the right environment is something entirely different. The new technology needs to be tied back to business requirements as much as possible not just examining the technology for the sake of
it. Having said that, most IT executives are not aware of the technologies and trends developing in this area and where they are aware of it, their strategy is to put a couple of people in their enterprise architecture team to experiment with the new technologies (i.e. in memory, Hadoop, MapReduce, Key Value Stores etc) that are being used to deal with the Big Data phenomenon.
Big Data Analytics: The Old World vs. The New Era
Many have asked the question what is new with this trend? This section highlights the traditional use of business analytics in the old pre-Big Data world, versus Big Data analytics in the Brave New World. This will also look at the various use cases that IDC expects to see being used most commonly across a variety of industries. The majority of IT organisations have progressed in terms of their infrastructure architectures over time; from predominantly mainframe-based environments in the 1980s to a focus on clientserver in the 1990s and the Web at the turn of the century, to what is now popularly known as private cloud. This supposed state of nirvana constitutes a consolidated, virtualised set of infrastructure resources (server, storage and network) that can be self-provisioned in an automated fashion by business users complete with SLAs that have the security, performance, availability and cost profiles transparent to all in the form of a service catalog. Very few organisations, if any, have achieved this state of infrastructure nirvana, and are still battling with a spaghetti-like tangle of compute resources in their datacenter. And now, we have this external force of Big Data as mentioned earlier that is forcing CIOs to rearchitect their infrastructure particularly in the context of how analytics capabilities are deployed in an enterprise-wide fashion. Below is an overview of the changes that IDC sees happening in the infrastructure world that is increasingly impacting the Big Data analytics world:
Table 2: Old World vs. New Era (Big Data Infrastructure) Old World Tenancy
Infrastructure Silos
New Era
Pooled resources Linear scalability (linked to distributed parallel processing and in memory storage) Hybrid (with cloud bursting capabilities) and widespread use of the appliance
Architecture
Performance tuned
Delivery Model
On Premise
Based on IDCs research in this space, here are three suggestions for CIOs in dealing with these issues: Cloud Bursting. The private cloud journey will line up well with the enterprisewide analytical requirements highlighted earlier, but CIOs need to ensure that workload assessments are conducted rigorously and that risk is mitigated where possible. Critical to this approach will be the evaluation of cloud bursting capabilities from external vendors (i.e. Infrastructure as a service), particularly as organisations start to leverage more real-time analytics environments, to ensure that the use of infrastructure resources maps closely to demand and that there are no issues in terms of performance and availability. Analytical Appliance. In terms of delivery models, IDC has seen significant performance benefits from analytical appliances for customers that are dealing with the impact of Big Data. In addition, since the software is optimised and pre-integrated with appliances, the deployment timeframes are typically shorter. As part of a recent global survey of CIOs, 10% of the respondents indicated that they will be looking at analytical appliances as a delivery model in 2011. IDC also believes that the demand for reference architectures will rise as CIOs look to integrate these appliances within existing data warehousing environments. In line with this increased adoption of the analytical appliance as a delivery model, IDC believes that IT departments will allocate less budget towards technical skills (i.e. installation, configuration and management), and more on
the high-end analytical skills needed to help drive the necessary business impact across multiple functions. Enterprise Architecture. Enterprise analytics needs an enterprise architecture that scales effectively with growth and the rise of Big Data analytics means that this issue needs to be addressed more urgently. Organisations need to look at creating a high performance analytical environment that leverages in-database analytics, parallel processing as well as in-memory storage to deal with the increased volume, velocity and variety of data. Particularly, in terms of dealing with unstructured data, more attention needs to be paid to Hadoop an open source software framework set up by Apache that allows for the distributed processing of large data sets across clusters of computers. However, there will be an ongoing tension between global standards and local requirements and the use of Hadoop would be a good example of this. Another would be the ability to process mixed workloads (e.g. analytical and operational) in the same infrastructure environment such as the appliance that was mentioned earlier. CIOs need to consider ways in which they can deliver value in terms of solving specific business problems, while at the same time, being cognizant of global architecture standards and specifications. While certain global governance models will not allow for the usage of some of these technologies in a production environment, business expectations will force IT departments to re-assess the way the enterprise architecture agenda is utilised at a local level.
The bottom line here is that it is getting more complicated to process and analyse these large, complex and growing data sets and it essentially requires a re-assessment of the broader information management strategy for the majority of organisations that have started their business
analytics journey. But the impact is potentially enormous. If you look at optimising the price on every item in a global retail chain or detecting fraud in real time you get a sense of the type of problems that Big Data analytics can be used to solve.
Table 3: Old World vs. New Era (Big Data Analytics) Old World Data Sets Data Velocity Data Analysis
Predefined Batch Predominantly Historic
New Era
All-encompassing and iterative Proactive and dynamic (real-time where appropriate) Predictive, Forecasting & Optimisation
However, despite the clear potential of such analytics it is important to understand that it will not necessarily be relevant or applicable to every use case. IDC believes that these use
cases can be best mapped out across two of the Big Data dimensions namely velocity and variety as outlined below:
Credit & Market Risk in Banks Fraud Detection (Credit Card) & Financial Crimes (AML) in Banks
(including Social Network Analysis)
Event-based Marketing in Financial Services and Telecoms Markdown Optimization in Retail Claims and Tax Fraud in Public Sector
Data Velocity
Social Media Sentiment Analysis Disease Analysis on Electronic Health Records Video Surveillance/ Analysis
Text Mining
Semi-structured
Unstructured
Data Variety
9
A better sense of the potential impact of deploying Big Data analytics to drive high value impact can be derived by exploring these use cases in more detail: Real-time Fraud Detection in Banks. Involves the ability to detect, prevent and manage fraud across multiple products, lines of business and channels for a bank. This requires the ability to capture the history for different types of entities (e.g. card, account, customer, terminal ID or IP address) involved in transactions, amplifying accuracy in detecting customer behaviours that fall outside the norm during point-of-sale (POS) transactions. This information can be used by multiple predictive models, for fraud detection and credit risk assessment. Markdown Optimisation in Retail. The ability for retailers to optimise prices for a
wide range of products in real time based on demand forecasting scenarios (that include the impact of promotions, seasonality and important calendar events) has a major impact on margins. These capabilities can also be augmented by social media sentiment analysis to ascertain customer demand for certain products on a more real-time basis. Disease Analysis on Electronic Health Records. As healthcare services evolve, analysts can get hold of a patients entire medical history in electronic format. This will present a major opportunity for Big Data analytics. For example, in the case of a disease such as diabetes, the ability to correlate patient medical history with dietary data (potentially from market basket analysis in retail) and optimised exercise schedules will provide medical practitioners with new insights that they had only previously dreamt of.
what we dont know i.e. there is so much unstructured data that the variables and analytical models are likely to be entirely new. This means that there is a need to re-think the way the analytical power users approach their work by creating a Sandbox Mentality where discovery is always the starting point. Generally, a background in data mining and statistics would be a good starting point for this type of analysis. Moving forward, there will be increasing demand for data scientists the next-generation business analyst with strong statistical skills who are able to extract information from large data sets and then present value to non-analytical experts but with the unique skill of understanding the new algorithms and analytical models that will have the most significant business impact in the short term. Globally, IDC is seeing a lot of interest in this more analytically inclined skill set. Roles and responsibilities have not been defined but it basically fits in with the earlier comments in terms of we dont know what we dont know i.e. there is so much unstructured data that the variables and analytical models are likely to be entirely new. It requires a very out-of-the-box type and creativity in terms of
the analytics that needs to be done on these new data types and structures. For example, if you look at the social media phenomenon (contributing to the semi-structured and unstructured data part of Big Data), many marketing departments are looking at ways to do sentiment and brand analysis based on what is being posted on Facebook, Twitter and YouTube (massive amounts as you can expect). This dynamic becomes more complex in Asia with local social media sites like RenRen in China and Nate in Korea. Currently, IT is not the first port of call for the chief marketing officer since it lacks the skills to understand what needs to be done (and in many cases, is still trying to work out what role it should play in the policy or governance of the use of social media). So the make-up of the IT department needs to be re-assessed in terms of technical, business and relationship skills. The maturity model below highlights how IDC sees these skills (both technical and business) mapping out in the context of the organisations that have adopted business analytics over time with a view to how this could evolve in the era of Big Data analytics:
11
Old World
Pilot
Little or no expertise in analytics basic knowledge of BI tools Functional knowledge of BI tools
New Era
Enterprise Analytics
Advanced data modelers and stewards key part of the IT department Savvy analytical modelers and statisticians utilised In database mining, and limited usage of parallel processing and analytical appliance Significant revenue impact (measured and monitored on a regular basis) Data definitions and models standardised Aligned (including LOB executives) Involved
Departmental Analytics
Data warehouse team focused on performance, availability and security Few business analysts limited usage of advanced analytics Data warehouse implemented, broad usage of BI tools, limited analytical data marts Certain revenue generating KPIs in place with ROI clearly understood Initial data warehouse model and architecture Visible Limited
Financial Impact Data Governance Line of Business (LOB) CIO Engagement % of Customers (IDC Estimates)
No substantial financial impact. No ROI models in place Little or none (Skunk works) Frustrated Hidden
20%
65%
10%
5%
In terms of capturing and developing the right skills in the era of Big Data analytics, the creation of a Business Analytics Competency Centre that sits across the business and IT departments will be critical. IDC believes that this type of structure not only clarifies the roles and responsibilities of key stakeholders for this transformation, it also drives internal visibility, provides a mechanism for education as well as bridging the IT/business gap (and the marketing and sales teams in particular as key individuals from these departments will need to be represented) since improving decision making amongst front-office staff will be the primary focus of these projects. In conjunction with the skills dimension, IDC believes that this structure should be involved in the following areas: Technology identification/deployment Business case creation and ROI justification Data governance frameworks with clear
policies and guidelines around master data management, data quality and data models Ensure IT/Business alignment by involving the critical stakeholders at the right time Involve the CIO as the supporter of the necessary transformation from an IT perspective that will in turn create the necessary business impact Very few organisations have reached the level of maturity that can truly harness the potential that Big Data analytics represents and practically speaking, it is a major challenge to have ticked off all the relevant boxes, but this transformation is a necessary one in order for organisations to truly differentiate themselves in the current economic environment. The CIO (and the IT department) needs to play a critical role in this transformation. The next section highlights some suggestions that IDC believes should be taken into account in the context of this journey.
12
The CIO Big Data Analytics Checklist Architect for the Future. Historically, a lot of work in analytics has been focused on workarounds due to the limited scalability of the underlying hardware. As a result, many IT departments would create materialised views or pre-calculated data structures so that business users could work off these without impacting the performance of the systems that were processing the underlying data. Clustering, parallel processing and in-memory technologies mean that all that underlying data can now be used in the analytical environment. However, it is important not to fall into the same trap of blindly adding capacity based on availability. There is a need to assess multiple delivery models (i.e. cloud particularly for bursting capabilities, analytical appliances as well as the traditional client/server or 3-tiered Web architecture approach) on a case by case basis, as one size will definitely not fit all. Create a Sandbox Mentality. One of the key differences between analytics in the traditional old-school batch mode and what we are dealing with in terms of the Big Data era is that we are gathering data that we may or may not need and from an analysis perspective, this means we dont know what we dont know i.e. there is so much unstructured data that the variables and analytical models are likely to be entirely new. This means that there is a need to re-think the way that analytical power users go about developing their models by creating more of a Sandbox Mentality where a discovery process is always the starting point, particularly in terms of drawing linkages between unstructured, semi-structured and structured data. As part of this, new types of skills will need to be brought on board to understand social media nuance (i.e. more likely to be from Gen Y, Z or even the Millennials). Not Too Much Tinkering. Whenever a new set of cool technologies hits the market, there is a tendency for IT departments
to tinker which impacts the immediate business benefits. So while a certain amount of experimentation is a good thing (as outlined in the context of the Sandbox Mentality highlighted earlier Hadoop and Mapreduce definitely fit into this category), CIOs need to be careful that not too much time is wasted on experimentation versus delivering business value. Get the Team Right. The first step in this process involves the CIO assessing his/ her own IT department to examine relevant skill levels and organisational structures. In some cases, it will necessitate an internal transformation to get the business to take notice of the change. It then requires that the right people are empowered to execute the IT analytics strategy with the relevant processes and governance structures in place to enable them to effectively deliver the business expectations. Part of this will require a much deeper understanding of the capabilities of the underlying analytics technology for the CIO, but it will also involve working with LOB executives to hire the right type of analytically minded managers and knowledge workers who can leverage the underlying technological capabilities at the most optimal levels. Take Analytics to the Enterprise. The majority of IT projects in this space have been focused on building a data warehouse combined with a variety of BI tools to surface the underlying information to the end users. However, in terms of sophisticated analytics functionality, the lack of IT skills meant that these projects have been largely departmental and tactical in nature, leading to a silo-ed mentality. As a result, to assess something such as risk-adjusted profitability (combining financial, credit scoring and customer data) would be impossible. This needs to change; and it requires a different level of IT/business collaboration to do so, with the CIO personally focused on an enterprise-wide approach in deploying analytics to ensure that these projects are successful.
13
Governance and Enablement. This is where existing investments made in data warehousing technologies, if done correctly, will pay dividends. The data models and reference architecture that IT has in place will ensure that data definitions and standards are consistent across the various business departments. Further work needs to be done in the master data management (MDM) space in terms of bridging the operational and analytical gap around data governance but fundamentally, this platform should provide the necessary management and control that IT requires. When it comes to business enablement, IDC sees a new class of projects emerging that combines
business analytics with business process management capabilities more specifically, decision management software components that include tools for rule management, data mining, query and reporting, complex event processing (CEP), collaboration, BPM suites, search, and content analysis. IDC believes that IT departments that can complement previous investments in data warehousing and business intelligence technologies with a better understanding of the decisionmaking process in each of their organisations and the underlying decision management software will be best placed to manage the IT governance versus business enablement dilemma.
Conclusion
Despite the varying levels of maturity and adoption of business analytics, businesses are definitely gearing up for the utilisation of more advanced solutions and offerings in this space. In line with this, organisations need to plan strategically and build a robust roadmap before adopting business analytics. The new generation of business managers is more aware of the benefits of competing on business analytics and will be looking to drive adoption of this technology area more aggressively. Moving forward, IDC believes that a new approach is required to proactively effect the necessary change, with a specific focus on the following areas: Elevating the status of the CIO to that of one with more transformative impact on the organisation by playing an integral role in the deployment of the enterprise analytics strategy and ensuring that these technologies have the expected business impact An assessment of alternative delivery models (such as the appliance, in memory and Hadoop for Big Data) Capturing higher-level LOB attention and visibility as the next wave of business analytics projects are integrated with complex event processing (CEP) and business activity monitoring (BAM) technologies to drive a new class of projects that IDC defines as decision management The role of the CIO is gradually becoming much more important in the boardroom and is playing a key role in the purchase behaviour of advanced applications such as business analytics. Moreover, the CIO and the IT department need to leverage a broader set of business analytics capabilities to create a new information management strategy that deals with the emerging Big Data dynamic as well as delivering improved decision-making capabilities to the business stakeholders across the organisation.
14
#AP14962U
ABOUT THIS PUBLICATION This publication was produced by IDC Go-to-Market Services. IDC Go-to-Market Services makes IDC content available in a wide range of formats for distribution by various companies. A license to distribute IDC content does not imply endorsement of or opinion about the licensee. COPYRIGHT AND RESTRICTIONS Any IDC information or reference to IDC that is to be used in advertising, press releases, or promotional materials requires prior written approval from IDC. For permission requests, contact the GMS information line at 65-6829-7757 or gmsap@idc.com. Translation and/or localization of this document requires an additional license from IDC. For more information on IDC, visit www.idc.com. For more information on IDC GMS, visit www.idc.com/gms. IDC Asia/Pacific, 80 Anson Road, #38-00 Fuji Xerox Towers, Singapore 079970. P. 65.6226.0330 F. 65.6220.6116 www.idc.com. Copyright 2011 IDC. Reproduction is forbidden unless authorized. All rights reserved.