Escolar Documentos
Profissional Documentos
Cultura Documentos
NEC のプラットフォームシステム提供ノウハウとヴピコのシステム統合およびビジネスインサイト・ノ
Executive Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Section 1: Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Every organization has their own operational challenges but most of them have common business
drivers like improve operational efficiency, customer retention & satisfaction and better product
better quality to gain competitive advantage. Additional challenges could be to simplify the
complex data management process, reduce the cost, platform consolidation and intelligent data
placement for better analytics.
Organizations needs platform and tools which can bridge the gap between business critical data
and huge volume of data coming from new sources. SAP® Vora™ has emerged as one of the
technology which provides distributed computing solution for business that leverage Apache
Spark distribution framework to provide enriched interactive analytics on Hadoop platform.
SAP Vora is in-memory query engine which allows organizations to use SQL as query engine to
analyze large volume of data from enterprise application, data warehouse, Hadoop Data Lake and
real time streaming data from IoT devices.
This whitepaper describes integration of NEC's Big Data Platform called "Data Platform for
Hadoop (hereinafter DPH)" with NEC SAP HANA® appliance and Analytics from Vupico to
solve the challenges of customers credit loan scoring in real time. For this use case, Vupico has
designed and developed end to end solution to implement data pipeline that will shorten the
time between loan request submission to validation from days to minutes which helps financial
institution to decide the credit worthiness of customer in real-time. Some of the important topics
covered in this whitepaper are:
In today's world, almost 80% of the data generated and stored by enterprises are unstructured and
it remains unanalyzed due to lack of right platform, tool and resources who can quickly identify
potential value from such data. To become competitive, it's important for any organization to
link business data derived from traditional systems with huge data from new sources and get real
time insights that result in better business outcomes. This requires a new approach that combines
and correlates structured data with unstructured data obtained from new devices, social media or
sensors in a cost effective and timely manner.
Enterprises using SAP products, like ERP and CRM, have been trying to identify ways to lower
the total cost of ownership and this pursuit can partly be addressed by deploying SAP HANA as
transactional and analytical system to store and process data. However, the growth and importance
of unstructured data to deliver in depth business intelligence has limited the relevance of SAP
HANA because high cost of data storage and data management makes SAP HANA system very
expensive when the volume of data increases significantly.
With the evolution of modern data architecture and framework, organizations have been looking
for open systems which can run on commodity hardware and can scale flexibly as demand grows.
This created the need for flexible and modular infrastructure requirement that provides clients
with a cost effective platform with easy expansion capabilities.
As a result, the industry has witnesses a growth in demand for Hadoop/Spark based platform
that allows distributed processing of large data sets across clusters of computers in real-time
and such platforms have been adopted by many enterprises as analytics platform for big data.
Hadoop is designed to scale up from single server to thousands of machines, each offering local
computation and storage, capable of storing and processing petabytes of data in any format and
helps organizations to ingest, store, process and visualize data using a common platform.
To address above challenges of storing and analyzing data in cost effective manner, NEC with its
partner Vupico has introduced use case based Reference Architecture that combines SAP HANA
and Hadoop with SAP Vora and integrates it seamlessly into the existing enterprise data and big
data environment. NEC's large-scale distributed processing platform named as "Data Platform for
In the integrated stack, DPH is used to lower the cost of data storage system and also offload
expensive ETL processes from SAP HANA. This leads to an increase in profitability as it frees
up the capacity of SAP HANA system which can instead be used for higher value analytical
workload. Vupico with the experience of building and implementing business intelligence, data
processing pipeline and advance analytics using Hadoop and SAP ecosystem has developed an
interactive analytics and data tiering process by using SAP Vora for effective scoring for loan
processing.
2.1.1. Turnkey appliance with NEC's SAP HANA certified Expresss5800/A2040d server:
The appliance is designed using NEC Express 5800/A2040d scalable enterprise server. NEC
Express 5800/A2040d is a scale-up server designed with massive resource pool to support
compute intensive and memory-hungry applications in mission critical and virtualized
environments, supporting up to 4 processors with 96 cores (192 threads), 4TB of memory and
16 PCIe 3.1 slots.
2.1.4. Enhanced reliability, availability, and service (RAS) for SAP HANA delivered
through NEC- Red Hat Enterprise System Collaboration:
Before the advent of in-memory systems, NEC worked collaboratively with Red Hat in the
development of enterprise systems that delivered dynamic processing and memory functionality.
This collaboration resulted in the ability to remove faulty components from operation, and
reallocate system resources without system outage through standardized system calls to Red Hat
Enterprise Linux. NEC Express 5800/A2040d offers RAS features required to support business
critical workload for enterprise computing and avert SAP HANA down time.
NEC SAP HANA offering is available not only through SAP certified appliances but also through
Tailored Datacenter Integration (TDI); that brings wider choices to SAP HANA customers in
leveraging their existing hardware components, which should be SAP HANA certified, for their
SAP HANA environment.
For a list of certified appliances from NEC for SAP HANA, refer to online documentation at:
http://www.nec.com/en/global/prod/hana/model/appliance.html?
VUPICO's innovative services are centered on bringing modern architecture and latest technology
while integrating Big Data IoT, SAP HANA, Hadoop and Predictive Analytics into an information
platform. It provides consulting service that helps customers solve their business problems
through data analytics. Based on their extensive experience in providing data driven solution to
various industries and verticals, Vupico has expertise in implementing an end-to-end dataflow that
ingests data from multiple data-sources and combine the best of Hadoop and SAP solutions.
NEC and Vupico together have designed a proof of concept that integrates NEC DPH and SAP
HANA appliance along with Vupico's analytics use case of credit scoring. Vupico has developed
a business use case and has also implemented data pipeline that shortens the time between
loan request submissions and their subsequent validation, from days to minutes. For better and
effective decision, Vupico has developed additional functionality such as:
With the increase in the volume of data to be processed and the variety of data consisting of the
conventional structured data and lately unveiled potential data mine i.e. the unstructured data,
a business use case of integrating SAP HANA with Hadoop based platform has created strong
buzz. Libraries such as Spark, process the unstructured data in Hadoop and store it as structured
data in SAP HANA using Hive adapters.
With the use of commodity hardware, DPH helps in reducing the data storage cost. This helps in
reducing the overall solution cost as cold data sets from SAP HANA can be archived on DPH,
thus providing the required scalability at a lower cost.
Some of the key benefits derived by organizations from implementation of this integrated solution
are:
•• Combining the social media data and logs along with CRM data available
in SAP HANA, companies can generate customized promotional offers for
customers on the basis of the analysis performed on a combination of CRM and
clickstream data
•• Preventive maintenance for the equipment placed at remote locations by
combining the sensor data (Unstructured) received from the equipment viz. a
viz. the procurement date and the maintenance schedule data (Structured)
•• Offload data and expensive processes from SAP HANA to the integrated
platform so as to overcome processing bottlenecks and offer increased capacity,
speed and flexibility
Extension of SAP HANA with DPH presents an opportunity for end-users and data scientists to
consume the required information whether in SAP HANA or in DPH system transparently from
the same user interface, without compromising on performance.
While the combined solution offers a plethora of features, many of its uses are simple and have
compelling results. DWH optimization is one of the many benefits presented by the combined
solution that has an easily quantifiable and immediate return on investment.
• Unprecedented Scalability:
NEC DPH allows customer to start small and scale as the demand grows for the analytics
platform, by adding one node at a time.
• Lower TCO:
Consolidating data from multiple clusters and costly data warehouse systems onto a cost effective
data platform enables organizations to distribute the workload effectively and reduce total cost of
ownership.
5.1.1. Business Use Case: Reduce lost opportunities by faster and accurate evaluation of credit score
Business opportunities were lost due to a paper based credit examination process that could
take up to several days for a financial organizations offering loan services. NEC in collaboration
with Vupico streamlined this process by integrating NEC SAP HANA appliance with Hadoop
Platform and SAP Vora & implemented a flow based on machine learning model that performs
risk assessment of an applicant's capability to repay giving a 98% accurate score within
minutes. Based on multiple dimensions like - the applicant’s past and present financial situation,
employment status, assets owned or the amount requested or the purpose, the model calculates a
credit score on a scale between 225 and 900, with lower score meaning high risk borrower and a
high score being a low risk borrower.
The high level solution implemented as part of this POC helps in addressing the challenge of
analytics by ingesting data from multiple sources to Hadoop whereas SAP Vora bridges the gap
between operational and high value data in SAP HANA and all structured/unstructured data in
Hadoop. Using SAP Vora along with Spark has helped us to simplify the data access between
Hadoop and SAP HANA and only recent data resides in SAP HANA for in-memory processing.
Instead of having a hardline decision saying if an applicant was granted a loan or not, Vupico
decided to label applicants as high, moderate and low risk on the predictions and generated a
FICO score like indicator on a scale of 225-900 to enable the manual assessment of borderline
cases and enabling processing of loan applicants that would have been rejected with traditional
criteria based models.
After analyzing the data volume and potential throughput requirements, Vupico and NEC decided
to present an architecture with an upstream integration that would ingest and process multiple
Downstream, data visualization and dash boarding were implemented on Tableau for self-
exploration and analysis of the data by the customer. This was possible because of the in-memory
capabilities of both SAP HANA and SAP Vora that were fed with the scored loan application
giving the customer a complete control over its operation.
To offload data from SAP HANA and save storage cost on the SAP HANA system, a process was
put in place to retain only the latest 3 years of data onto SAP HANA and the rest of the historical
data was transferred to SAP Vora residing on the Hadoop cluster.
Dashboards were built in Tableau and calculation view was used in SAP HANA that enabled
combining the data locally stored in SAP HANA and the data in SAP Vora, letting users query
not only the last 3 years but the whole dataset within acceptable processing time.
Tables in SAP Vora need to load their data from HDFS but in case data loaded is from ORC file,
and ORC file has been changed/updated then update will not be reflected in SAP Vora. In such
While HDFS provides the scalable, fault-tolerant, cost-efficient storage for your big data lake,
YARN provides the centralized architecture that enables you to process multiple workloads
simultaneously. YARN provides the resource management and pluggable architecture for enabling
a wide variety of data access methods.
Additionally SAP HANA has capabilities to support data tiering to manage the data storage
cost and processing at the database storage layer. It helps to extend the platform to intelligently
distribute data and its processing to low cost scalable platform by moving warm and cold data off
the memory to alternate disk based solution like Hadoop.
SAP Vora is an extended Spark execution framework which provides SQL like capabilities and
produce the accelerated results by processing and loading Hadoop data/tables in memory. SAP
Vora provides a simple graphical interface to model data and build star schemas which helps in
boosting the SQL performance. Additionally, it can help in building the hierarchies and drill down
on Hadoop data which is very difficult to realize in general.
SAP Vora bridges the gap between SAP HANA and Hadoop and enables customer to run several
key business use cases on integrated platform to lower the cost.
6.4. Tableau
Tableau is an interactive data visualization tools that enables users to create interactive and apt
visualizations in the form of dashboards, worksheets to gain business insights for the better
development. It allows users to easily create customized dashboards that provide insight to a
broad spectrum of information.
The characteristics of Tableau are as follows:
NEC Corporation
www.nec.com
VUPICO LLC
JAPAN
DEUX TOURS EAST 45F E4502-3-13-1 Harumi, Chuo-ku, Tokyo 104-0053
SINGAPORE
31, St Thomas Walk 0403, St Thomas Suites Singapore 238141
AUSTRALIA
607/17 Grattan Close Glebe NSW, 2037
INDIA
305 Adiya Trade Centre Ameerpet Hyderabad 500081
http://www.vupico.com/ info@vupico.com