Você está na página 1de 6

Insights

Cognizant20-20
Reports
Cognizant

Data Science: How the Marriage


of Technology and Math
Can Produce Actionable Results
By blending the elegant science of pattern recognition with
the discipline of mathematics, organizations can solve
some of the worlds most analytically complex business problems,
such as geospatial analysis.

Executive Summary
Although data science has gone mainstream,
many onlookers see it as a fancy word for plain
old statistics. But, in fact, its a way for companies
to apply analytics through the use of a technology
platform to operate and strategize business more
intelligently.
Data science is about deriving insights by viewing
data through a mathematical lens. Its not about
math or about technology. Its about finding
patterns in the data that can be incredibly useful
in a wide variety of business situations.
Big data analytics has transcended buzzword
status because of its ability to unlock patterns and
insights that reside in a variety of data formats
and structures. But as we know all too well, the
sheer volume of data not really big anymore,
but now truly colossal makes it difficult if not
impossible to derive understanding from information without sophisticated computing powers and
savvy algorithms. Regardless of the industry, be

cognizant reports | june 2016

it financial services, retailing, manufacturing, etc.,


or even internal disciplines such as operations,
product development, marketing, etc., data
science can be brought to bear on numerous use
cases, and actionable insights can be discovered
one by one.
This white paper examines one application of
data science, geospatial analysis. Well explore
how it works from a technical standpoint in part,
because this approach demonstrates well how
the marriage of technology and math can be
combined to solve related business objectives.
The application of data science in a geospatial
context will hopefully illuminate how pattern recognition, when combined with mathematics, can
be applied in various use cases, across industries,
in ways that can arm leaders with insights and
foresights that make for better-informed business
decisions.

Charting Customer Locations via Geospatial Analysis

Source: Google Maps


Figure 1

Geospatial Analysis: Getting Well


Grounded
At its essence, geospatial analysis is all about
determining the relative position of things on
the earths surface, and then deriving understanding from this knowledge. Its easy to see
how many different types of organizations can
use geospatial analysis technology. For example,
knowing where particular populations live and
work can help banks tailor products and refine
market emphasis. The policing and security
industries not only can determine rates of crime
in particular areas, but actually predict future
incidents and rates. Marketers can identify
patterns of product needs and usage, and likes
and dislikes, and better manage campaigns and
offerings as a result. Dynamic insurance pricing
is becoming more mainstream with an understanding of areas of risk and safety, health and
illnesses, longevity and mortality.
Customer behavioral analysis, based on geo-location, can inform a variety of business decisions.
Geospatial analysis just one example of data
science in action is an essential ingredient in
understanding customers, including their needs
and wants, patterns of product consumption, etc.
Over the last few decades, numerous technologies have emerged to help make geospatial
analysis possible. These include mobile phone
triangulation (enabled by cell towers), a
technology prominent before the emergence
of global positioning systems (GPS). GPS,
of course, was made possible by the use of
satellites, and can locate an object or person
(via a GPS receiver, or smartphone) within as

cognizant 20-20 insights

little as three meters on the Earths surface.


Theres automatic identification systems (AIS),
used by ships at sea, as well as tracking via
radio frequency identification (RFID) and wi-fi.
As these technologies launched and became
mainstream, they also became less expensive,
allowing for the identification of geographic
locations of objects (spatial data) to become
pervasive in our daily lives.
Figure 1 demonstrates, via a pair of maps,
how customer position can be represented by
geospatial data, derived simply from the devices
people carry around with them. With the use of
wearable technologies, smartphones and tablets,
and the ubiquity of wi-fi and cellular interconnectivity, data gathered by geospatial targeting is
growing at an exponential rate.

Solving the Challenge of Hugeness


Geospatial analysis is immensely promising,
but as with everything there are challenges.
The practical use of geospatial data is typically
hampered by the sparse nature of data collection
strategies, the sheer volume of the data and
various technical issues associated with the
use of the data. The enormous volume of data
can easily overwhelm analysis; in fact, the huge
growth of data volume has become the number
one challenge for geospatial analysis.
In response, data scientists are using modern
mathematical techniques to improve analytical
performance, which can be sluggish when
immense data sets are accessed and processed.
This is particularly important when doing
geospatial analysis because any piece of

ground city, county, state, country, continent


contains almost countless points of information.
One such technique is to split the data into smaller
pieces, so it becomes easier to design and apply
algorithms to various components. This process
is known as tessellation, which cuts up all
the various location data points into smaller
regions in this context, these smaller regions
are called tiles so that the algorithm can be
practically applied to smaller chunks of data. This
enables analysts to treat each smaller group with
a single algorithm, and compare each one with
others more easily.
Taking a Closer look
The idea of tessellation used in geospatial
analysis is intriguing, not just because its fascinating science, but also because of the practical
problem it solves how to gain understanding
from immense sets of data. This requires a deep
dive into the process, exploring the math and
technology behind this analysis. The process can
get very practical, very fast.
As noted, tessellation takes a geographic area to
be analyzed which can be envisioned as a flat
plane and divides areas into geometric shapes
(tiles), with no overlaps or gaps to ensure nothing
is missed. Data points mapped in the same tile are
considered as one cluster, which forms the basic
unit for algorithms or visualization techniques.

across multiple servers or nodes have tessellation features built in, to better analyze spatial
data. In addition, these systems apply spatially
aware indexes to filter the data.
Data visualization is enhanced by picking one
such example from database services provider
Teradata. The company uses spatially aware
indexes and joining techniques to filter data
input sets, and spatial objects minimum
bounding rectangles (MBRs) in particular to
refine the filtered result. Teradata then provides
a set of functions into a two-dimensional grid.
This is the tessellation.

The enormous volume of data can


easily overwhelm analysis; in fact,
the huge growth of data volume has
become the number one challenge for
geospatial analysis.
A spatial index should be based on patterns and
demographics. Consider our Teradata example,
where two spatially aware methods are used:
tessellate and geodetic buffering functions.

Tessellate:

This is primarily intended to be


used for join processing. This process takes
two or more pieces of information say, all the
companys SKUs, plus particular customers
product needs and desires and joins them
together to meet an actionable objective
an appropriate sales pitch, for example, or a
customer satisfaction report. Even with the
technological advances of recent years, the join
operation remains a time-consuming operation
in relational query processing.

Tessellations, in fact, were used by Sumerians


6,000 years ago to create wall decorations
formed by patterns of clay tiles. In the 17th century
German mathematician Johannes Kepler made
an early documented study of tessellations,
and in the 19th century a mathematical study of
the entire process was undertaken.1 Geospatial
analysis today uses tessellation techniques not
only to split a large area into several smaller
ones, in the traditional manner, but also into
smaller areas that feature similar shapes but also
variable sizes. Thus, analysts can see and assess
the incredibly dense data points of a city, along
with the fewer data points existing within a much
larger section of the U.S. the Great Plains, for
example.
But while tessellation can allow for more
manageable chunks of geography to be
analyzed, its only via the marriage of math and
technology that it can be done efficiently and
quickly. Many of todays systems that can handle
geospatial data specifically, massively parallel
processing (MPP) databases that are partitioned

cognizant 20-20 insights

Tessellate_Index is used for index maintenance, while Tessellate_Search is used to


query and index the information that was built
using Tessellate_Index. Say you are trying to
find customer locations based on latitude and
longitude, derived from their mobile devices,
and you have about 2 million customers
spread across some 175 countries. Tessellate
can deliver a performance boost of 99.9%
compared with traditional queries that do not
use this technique.

Geodetic

buffering functions: Say you are


looking to identify customers close to a
particular store, something that is particularly useful when applying targeted marketing
via personalized coupons, sales information, special deals, etc. Geodetic buffering

Geospatial Analysis of Retail Customers: An Illustrative View

Cell 1

Cell 2
1000 m

1000 m
Cell 3

Cell 4
1000 m

Cell 5

Cell 6

1000 m

Cell 7

Cell 8

Store

Customer

Figure 2

can, for example, find the boundary for each


store. Tessellate_Index and Tessellate_Search
functions can then be used to identify and
restrict the customers within the boundaries,
and run an analysis about only those customers.
Consider geospatial analysis in a retailing
situation. Figure 2 represents stores as yellow
stars and customers as small circles. A 1,000meter circle is drawn around each store, depicting
the customers that are within a comfortable
walking or driving range of that retail location.
Distance calculations from a particular store can
take into consideration customers within the
cognizant 20-20 insights

same cell for that store, as well as customers in


other cells, to determine how to convey the availability of special deals, interesting menu items,
products that are on sale, coupons, etc. Now
consider how this can be expanded further into
larger geographical areas say, to the entire continental U.S., with many thousands of stores and
tens of millions of customers. A query for a store
in San Jose will take into consideration customers
only in that area, and not the customers in Omaha
or Austin. This has the potential to dramatically
improve retail offers and opportunities that are
particularly tailored to certain geographic locales.

Distance calculations from a particular


store can take into consideration
customers within the same cell for
that store, as well as customers
in other cells, to determine how to
convey the availability of special deals,
interesting menu items, products that
are on sale, coupons, etc.
Remember again that queries using spatial
indexes are much faster than queries that do not
use spatially aware indexes. This is an essential
attribute that enables rapid response and quick
alterations of marketing and product-supply
decisions.

Telecom: A large European telecom provider

needed a solution to better plan its cell-tower


networks. We developed and deployed a
solution using products from Environmental
Systems Research Institute (Esri), deploying
its GIS software and geodatabase management
applications to make sense of future cell tower
locations.

Utilities: A large oil and gas content-manage-

ment company needed to estimate the volume


and value of asset deployment in various
locations. We created business intelligence
models that better informed product development.

Nongovernmental

organizations: A global
child welfare NGO needed to manage its funds
better, to improve its programs, projects and
entire communities, and to monitor change in
the process. Here, we developed and deployed
both Web-based and mobile GIS technology to
produce the desired outcomes.

Geospatial Analysis: A Cross-Industry


Perspective
In the marriage of technology and math, theres
no need to restrict geospatial analysis to retailing.
Consider the financial services company that
wants to make sure its customers credit card
accounts are secure. If analysis shows that a
particular credit card was used in London in the
morning, and again in Beijing in the afternoon,
red flags may certainly be raised as to the possibility of a lost or stolen card. That is, unless the
system knows that the customer is a frequent
traveler who indeed has visited and worked in
those locations in the past. The same can be
done with, say, a mobile wallet that is tracked and
pinpointed via smartphone transactions.
Suppose an insurance company is able to map an
approaching hurricane and its potential landfall,
which threatens the companys customers in
a particular city or state. Here, appropriate
decisions can be made about potential insurance
payouts based on storm location, direction, at-risk
customers, etc.
What follows are real-world examples of geospatial
analysis that we have built and delivered to clients
in various industries:

Technology: For a global company developing

geospatial information systems (GIS) products,


Web-based GIS was developed for sales territory
management. This map-based solution helped
the teams planning sales efforts to identify
prospective customers as well as the competition also vying for those customers. As a result,
a new campaign was enhanced, and sales were
improved.

cognizant 20-20 insights

Retail:

A large European retailer faced the


challenge of optimal new-store setup; in
particular, it sought a fact-based approach to
better understand where its best customers
might reside. We developed a geospatial
analytics process using the Huff Model,2 to
determine sales potential per location based on
such factors as customer disposable income,
distance from the store, overall population and
other variables.

Looking Forward: The Road Ahead


These examples are powerful illustrations of data
sciences potential impact across industries. The
marriage between technology and mathematics
is delivering actionable insights, with analyses
deployed to address specific, practical use cases
that companies face every day.
Its not so much about math and technology per
se, but about the application of mathematical
features within a technological world to solve
business challenges. With an increasing understanding of the importance of data science, and
the importance of data scientists, business is
headed in the right direction.
For starters, an organization should understand
and embrace the following to benefit from its
application of data science adaptability:

Intent:

Organization heads and C-level


executives must define clearly the data
science component of organizational culture.

A roadmap and strategy, applied top-down


in an organization, helps gather the needed
management support to bring the change.

Adaptability: When applying new technology,

culture is one of the most important things to


consider. This does not mean to throw away the
existing technology stack but organizations
must understand the importance of blending
old with new. Weave architecture helps organizations grow successfully with the coexistence of traditional and big data technologies.
Each by itself is not sufficient to address all the
needs of an organization. A perfect blend of
both is what defines the road to success

Embrace new organizational structures: Not

to understate the importance of the CTO, but


the chief data officer (CDO) has an important
role to play here. With a mix of technologies
and culture, an organization needs to protect
internal and external information. The CDO
does exactly that, and also helps promote the
culture of fact-based decisions facts backed
by data.

Agility: Gone are the days when a one-size-fits-

all solution that never changes for decades was


a success. Organizations need to be flexible to
adapt to new changes, processes, technologies
and data usage policies.

Footnotes
1

Fyodorovs work marked the unofficial beginning of the mathematical study of tessellations. Other
prominent contributors include Shubnikov and Belov (1964), and Heinrich Heesch and Otto Kienzle (1963),
https://en.wikipedia.org/wiki/Tessellation.

The Huff Model is a spatial interaction model that calculates gravity-based probabilities of consumers at
each origin location patronizing each store in the store dataset. From these probabilities, sales potential
can be calculated for each origin location based on disposable income, population or other variables.
The probability values at each origin location can optionally be used to generate probability surfaces and
market areas for each store in the study area, http://arcscripts.esri.com/details.asp?dbid=15999.

About the Author


Udit Garg is a Senior Architect within Cognizants Analytics & Information Management business
unit. As an expert on database technologies and a database architect, Udit has produced customized
solutions based on corporate needs and roadmaps. He is a Teradata Certified Master and has over
10 years of experience in database technologies and architecture focused on roadmap definition, as well
as solution and information blueprints. Udit has defined a personalization blueprint for digitalization
exercises, created a system architecture for an enterprise warehouse and led a team that developed a
performance optimization/capacity management solution in a recent engagement. He can be reached at
Udit.Garg2@cognizant.com.

About Cognizant
Cognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business
process services, dedicated to helping the worlds leading companies build stronger businesses. Headquartered in Teaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfaction, technology innovation, deep industry and business process expertise, and a global, collaborative workforce that
embodies the future of work. With over 100 development and delivery centers worldwide and approximately 233,000 employees as of March 31, 2016, Cognizant is a member of the NASDAQ-100, the S&P
500, the Forbes Global 2000, and the Fortune 500 and is ranked among the top performing and fastest
growing companies in the world. Visit us online at www.cognizant.com or follow us on Twitter: Cognizant.

World Headquarters

European Headquarters

India Operations Headquarters

500 Frank W. Burr Blvd.


Teaneck, NJ 07666 USA
Phone: +1 201 801 0233
Fax: +1 201 801 0243
Toll Free: +1 888 937 3277
Email: inquiry@cognizant.com

1 Kingdom Street
Paddington Central
London W2 6BD
Phone: +44 (0) 20 7297 7600
Fax: +44 (0) 20 7121 0102
Email: infouk@cognizant.com

#5/535, Old Mahabalipuram Road


Okkiyam Pettai, Thoraipakkam
Chennai, 600 096 India
Phone: +91 (0) 44 4209 6000
Fax: +91 (0) 44 4209 6060
Email: inquiryindia@cognizant.com

Copyright 2016, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein is
subject to change without notice. All other trademarks mentioned herein are the property of their respective owners.

Codex 1889

Você também pode gostar