Syncsort DMX H For Hadoop ETL (Brochure)

SOLUTION SHEET
Syncsort DMX-h for

Hadoop ETL
Unleash Hadoops potential with a smarter
approach to integrating and processing Big Data
APACHE HADOOP is gaining traction as a general

purpose framework for collecting, processing and
storing Big Data and ETL is emerging as the key use
case for those Hadoop implementations. Unfortunately,
as organizations ramp up their Hadoop initiatives, they
often face barriers that undermine its potential. Syncsort
offers a unique approach to Hadoop that lowers the
barriers for wider adoption and helps organizations
unleash the full potential of Hadoop, making it a more
robust environment for the enterprise. With Syncsort,
organizations can now use existing ETL skills to
accelerate their Hadoop initiatives, and process more
data in less time, with less money and fewer resources.
Big Data Is Breaking Existing Architectures

As organizations try to make sense of the ever-expanding
data avalanche, they are hitting the architectural limits
of their data processing environments. As a result,
organizations are increasingly adopting the Hadoop
MapReduce framework as a means to scale the collection
and processing of data while reducing costs. Unfortunately,
most traditional ETL tools only generate complex code to be
executed in Hadoop. Without any real integration, Hadoop
can turn into a heavy burden, forcing organizations to:
Acquire expensive, hard-to-find MapReduce skills.
Understand and manually maintain thousands of
lines of code, even for simple ETL flows.
Constantly add lots of hardware in order to scale
and maintain service level agreements.
Syncsort delivers a smarter approach to Hadoop ETL,

enabling organizations to:
Connect to virtually any data source, including
mainframe and MPP databases.
Move data into and out of Hadoop up to 6x faster
without the need for manual scripts.
Develop MapReduce ETL processes without
writing code.
Seamlessly accelerate Hadoop performance and
scalability for sort and ETL operations.
Smart Connectivity for Faster Data

Loads and Extracts
DMX-h writes data directly to HDFS using native Hadoop
interfaces. DMX-h can partition the data and parallelize
the loading processes to load multiple streams
simultaneously into HDFS, reducing loading times by up
to 6x, compared with the Hadoop put command.
Shortening the time it takes to get data into HDFS can
be critical for many companies, such as those that
must load billions of records each day. Reducing load
times can also be critical for organizations that plan to
increase the amount (and types) of data they will need to
load into Hadoop as their application or business grows.
In addition to providing fast and efficient loading, DMX-h
is commonly used to pre-process data prior to loading,
which alleviates complexity and inefficiencies that can
occur by loading raw, source data into HDFS directly.
By integrating and structuring the data with DMX-h
prior to loading it into HDFS, load times are reduced,
downstream MapReduce processing tasks are easier to
develop and execute faster and more efficiently, and
storage requirements on the cluster are reduced.
SOLUTION SHEET
SYNCSORT DMX-h FOR HADOOP ETL
Same Familiar Tool. Five Core Transformations. All The Possibilities.
One tool to connect Hadoop to all sources and

targets, including mainframe sources
Develop MapReduce ETL processes
without writing code
Leverage existing ETL skills
Development accelerators for CDC, and more
Five smart transformations
Patented algorithms
No code generation, no compiling
Execute within MapReduce
Sort
Join
Aggregate
Copy
Merge
Development accelerators for CDC

and other common data flows
DMX-h can also extract data from Hadoop to other data

stores, leveraging Hadoop as a key processing step in a
workflow that includes other database technologies for
example, using Hadoop to process data prior to loading it
into an analytic data warehouse or appliance.
Smart Development. No Coding.

No Scripting.
DMX-h provides a simple, powerful, and seamless
environment for developing MapReduce tasks for Hadoop.
It makes the development and maintenance of ETL tasks for
MapReduce, and applications that move data into and out of
HDFS, faster, easier, and less error prone.
The solution enables people with a much broader range
of skills not just developers to create ETL tasks that
execute within the MapReduce framework, replacing
complex Java programming or Pig scripting with a
powerful, easy-to-use graphical development environment.
It also simplifies the development of applications that load
data into HDFS, or that extract data from HDFS and load it
into other systems.
DMX-h makes it faster and easier to develop, maintain, and
re-use applications that execute on Hadoop via:
combine & reuse to create virtually

any data flow
Coding is optional, but not required.
Comprehensive built-in transformations.

Native mainframe data access and conversion
capabilities.
Heterogeneous DBMS access on the cluster for loading
Hadoop, loading warehouses from Hadoop without the
need for a temporary landing area, and for sourcing
data for lookups and other transformations.
A graphical development environment.
Built-in metadata capabilities, which enable greater
transparency into impact analysis, data lineage, and
execution flow.
Hadoop Integration... for Real

DMX-h can seamlessly replace the native sort within
MapReduce processing, providing performance benefits for
MapReduce tasks written in any language including Java
and Pig. This simple change can increase the performance
of sort steps in Hadoop by 2 to 3x with no programming
changes or tuning required for new or existing MapReduce
tasks. As a result, 2 to 3x more data can be processed in the
same amount of time on the same cluster.
DMX-h has a very small footprint, so it can be easily
deployed on every node, complementing Hadoops
SOLUTION SHEET
horizontal scalability by maximizing performance of each

node within the Hadoop cluster. Once deployed, DMX-h
automatically optimizes the resource utilization (e.g., CPU,
memory and I/O) on each node to deliver the highest levels
of performance, scalability, and throughput, with no manual
tuning needed.
The superior runtime performance of DMX-h is a result
of thousands of deployments, leveraging hundreds
of production-proven optimizations, with important
innovations in four areas:
A library of high performance algorithms for all key,
set-related data transformations.
Direct I/O access for the fastest data transfers.
High-performance compression to minimize I/O
and intermediate work file sizes.
A dynamic ETL optimizer to ensure maximum
performance at runtime, with minimum resource
utilization.
Benefits of DMX-h for Hadoop ETL

DMX-h delivers measurable strategic, financial, and
operational benefits to organizations across a wide range of
industries.
Faster Time to Insight. Organizations make better
decisions faster based on more accurate insights,
by processing more data in less time, with the same
resources.
Lower OPEX and CAPEX Costs. Organizations reduce
capital and operational expenses by eliminating the
need for additional compute nodes on the cluster, due
to more efficient hardware utilization.
More Jobs, Same Cluster. DMX-h provides better
performance and scalability for processing data in Hadoop,
enabling organizations to process up to 3x more data in
less time using the same, or fewer, resources. Better
performance means jobs finish sooner, freeing the cluster
to handle more jobs within the same processing windows,
avoiding incremental capital expense.
Hadoop Integration for Real (No Code Generation. No Compiling. No Tuning.)
Runs natively within MapReduce

Small footprint installs on every node
Open source contributions extend
capabilities of MapReduce
Pluggable sort
Expanded use cases (i.e. No sort option)
Vertical scalability
Design exibility (Map Map Reduce Reduce)
Hadoop Data Nodes
SOLUTION SHEET
Cost-effective Scalability. DMX-h enables Hadoop

clusters to scale more efficiently and cost-effectively
because data processing and loading performance does
not degrade as data volumes grow.
Improved Developer Productivity. Organizations simplify
developing, maintaining, and re-using ETL tasks on
Hadoop. DMX-h makes it easier to integrate and load
data from heterogeneous data sources, perform ETL
tasks within the MapReduce framework, and extract data
from Hadoop to other data stores.
Reduced Dependence on Expensive, Specialized New
Hires. Organizations minimize or eliminate the need to
hire new, specialized staff with expensive programming
skills (e.g. Java, Pig, Sqoop, etc.), lowering staffing and
training costs, and speeding time to value. Existing
staff can get more done with powerful, easy-to-use
tools, which increase productivity and lower application
development, maintenance, and re-use costs.
Exceptional Performance SLAs. Organizations are

able to more easily and cost-effectively meet or
exceed performance service level agreements (SLAs),
eliminating the risk of encountering performance
SLA penalties.
Increased Transparency. Built-in metadata capabilities
enable greater transparency into impact analysis, data
lineage, and execution flow, which facilitates re-use,
data governance and regulatory compliance.
Fast and painless installation and configuration.
Installation and configuration of DMX-h is fast and
simple, minimizing the time it takes to go from a
standing start to full productivity.
For more information, visit www.syncsort.com/hadoop.
Unleash Hadoops Potential
Minutes
Easy Setup
and Administration
ETL
2x
Faster
TeraSort Benchmark
Smart, Self-tuning
Engine
Light Footprint
Logging,
scheduling
Elapsed Time (min)
250
Single Install
2x
The Faster
Sort Technology
ETL Aggregations
Native Sort
200
150
Syncsort
100
Faster
MapReduce Jobs
Faster
3000
Elapsed Time (min)
<
TeraSort
50
Pig
2500
Java
2000
1500
1000
Syncsort
500
0
0
1000
2000
3000
4000
File Size (GB)
5000
0
0
500
1000
1500
File Size (GB)
2000
2500
About Syncsort
Syncsort provides data-intensive organizations across the big data continuum with a
smarter way to collect and process the ever-expanding data avalanche. With thousands of
deployments across all major platforms, including mainframe, Syncsort helps customers
around the world to overcome the architectural limits of todays ETL and Hadoop
environments, empowering their organizations to drive better business outcomes in less
time, with less resources and lower TCO. For more information visit www.syncsort.com.
2013 Syncsort Incorporated. All rights reserved. All company and product names used herein may be the trademarks
of their respective companies. DMX-SC-001-0213US
50 Tice Boulevard, Woodcliff Lake, NJ 07677

201.930.8200 | www.syncsort.com
3000

Syncsort DMX H For Hadoop ETL (Brochure)

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Syncsort DMX H For Hadoop ETL (Brochure)

Enviado por

Direitos autorais:

Formatos disponíveis

SOLUTION SHEET

Syncsort DMX-h for

APACHE HADOOP is gaining traction as a general

Big Data Is Breaking Existing Architectures

Syncsort delivers a smarter approach to Hadoop ETL,

Smart Connectivity for Faster Data

SYNCSORT DMX-h FOR HADOOP ETL

Same Familiar Tool. Five Core Transformations. All The Possibilities.

One tool to connect Hadoop to all sources and

Development accelerators for CDC

DMX-h can also extract data from Hadoop to other data

Smart Development. No Coding.

combine & reuse to create virtually

Comprehensive built-in transformations.

Hadoop Integration... for Real

SYNCSORT DMX-h FOR HADOOP ETL

horizontal scalability by maximizing performance of each

Benefits of DMX-h for Hadoop ETL

Hadoop Integration for Real (No Code Generation. No Compiling. No Tuning.)

Runs natively within MapReduce

Hadoop Data Nodes

SYNCSORT DMX-h FOR HADOOP ETL

Cost-effective Scalability. DMX-h enables Hadoop

Exceptional Performance SLAs. Organizations are

Unleash Hadoops Potential

Elapsed Time (min)

Elapsed Time (min)

File Size (GB)

File Size (GB)

50 Tice Boulevard, Woodcliff Lake, NJ 07677

Você também pode gostar

50 Tice Boulevard, Woodcliff Lake, NJ 07677