Spark On YARN

Spark on YARN
Best practices
Adarsh Pannu
IBM Analytics Platform
DRAFT: This is work in progress. Please send comments to adarshrp@us.ibm.com
Spark and Cluster Management

Spark supports four different cluster managers:
Local: Useful only for development
Standalone: Bundled with Spark, doesnt play well with other applications, fine for PoCs
YARN: Highly recommended for production
Mesos: Not supported in BigInsights
Each mode has a similar logical architecture although physical details differ in terms of which/where
processes and threads are launched.
Spark Cluster Architecture: Logical View

Executor
Driver Program
Task
Cluster Manager
SparkContext
Cache
Task
Driver runs the main() function of the application. This can run outside (client) or
inside the cluster (cluster)
SparkContext is the main entry point for Spark functionality. Represents the
connection to a Spark cluster.
Executor is a JVM that runs tasks and keeps data in memory or disk storage across
them. Each application has its own executors spread across a cluster.
Spark: Whats Inside an Executor?

Executor
Single JVM
Partitions from 2
different RDDs being
processed by 3 tasks
Cached RDD
partitions from yet
another RDD
Shuffle, Transport,
GC, and other system
threads
RDD P1
Task
RDD P2
Task
RDD P3
Task
RDD P2
RDD P1
Internal
Threads
Free Task Slots (Cores)
Spark: Standalone Cluster Manager

Client 1
Client 2
Inter-process communication not shown.

All orange boxes are JVMs
Deploy mode = Client
Machine 2
Machine 1
Master
Worker
Worker
(Client 1)
(Client 2)
(Client 1)
Executor
Executor
Executor
Standalone Mode: Configuration

Per Worker Node
Per Application
CPU
SPARK_WORKER_CORES
spark.cores.max
Memory
SPARK_WORKER_MEMORY
SPARK_WORKER_CORES
SPARK_WORKER_MEMORY
spark.cores.max
spark.executor.memory
Per Executor
# of cores to give to underlying Executors (default: all available cores).

Total memory to use on the machine, e.g. 1000m, 2g
(default: total memory minus 1 GB)
maximum # of cores to request for the application
across the cluster (default: all available cores)
Memory per executor (default: 512m)
Standalone mode uses as FIFO scheduler. As applications launch, it will try to balance the resource
consumption across the cluster. Strangely, cores are specified per application, yet memory is per
executor!
Spark on YARN: Architecture

Client
Machine 0
Resource Manager
Inter-process communication
not shown.
All orange boxes are JVMs
Machine 2
Machine 1
Node Manager
Node Manager
Container
Container
Container
Executor
Spark
Application
Master
Executor
Spark Configuration
Spark has scores of configuration options:
For many options, defaults generally work alright
However, there are some critical knobs that should be carefully tuned
Several settings are cluster manager specific. When running Spark on YARN, you must examine:
Yarn-specific settings: scheduler type and queues
Spark specific settings for YARN: # of executors, per-executor memory and cores, and more
Other general techniques will improve your applications on any cluster manager. For example:
Java object serialization schemes (Kryo vs Java)
Proper partitioning and parallelism levels
On-disk data formats (Parquet vs AVRO vs JSON vs ...)
And many more ... (to be covered elsewhere)
Spark on YARN: Managing queues

Your cluster may serve different applications/users, each with differing expectations:
Batch jobs could possibly wait but interactive users may not
Tight SLAs need to be honored often at the expense of others
There may be more than one instance of the same type of application, and yet, they may need to be
treated differently. E.g. different Spark jobs may have differing needs.
Step 1: Divide up your cluster resources into queues that are organized by target needs:
Choose scheduling strategy: Capacity vs. Fair.
Capacity scheduler is best for applications that need guarantees on availability of cluster resources
(although at the cost of elasticity)
Fair scheduler is best for applications that want to share resources in some pre-determined
proportions.
(This aspect is not covered in this document as its adequately documented elsewhere)
Step 2: Configure resources for Spark jobs based on the queue capacities.
Described in the next slide
Step 3: In your Spark application code, designate the right via queue or by settingspark.yarn.queue
Spark on YARN: Basic Configuration

YARN Settings (Per Node, not Per Queue)
Executor
Count
Spark Settings (Per

Executor)
--num-executors OR
spark.executor.instances
CPU
yarn.nodemanager.resource.memory-mb
--executor-cores OR
spark.executor.cores
Memory
yarn.nodemanager.resource.cpu-vcores
--executor-memory OR
Need to
specify
these
Spark internally adds an overhead to spark.executor.memory to account for off-heap JVM usage:
overhead = MIN(384 MB, 10% of spark.executor.memory)
// As of Spark 1.4
Yarn further adjusts requested container size:
1. Ensures memory is a multiple of yarn.scheduler.minimum-allocation-mb. Unlike its name, this
isnt merely a minimum bound. CAUTION: Setting yarn.scheduler.minimum-allocation-mb too
high can over-allocate memory because of rounding up.
2. Ensures request size is bounded by yarn.scheduler.maximum-allocation-mb
Spark on YARN: Memory Usage Inside an Executor

May need to tweak
Executor memory
breakdowns too:
App
Objects
spark.storage.memoryFraction
Default = 0.6 (60%)
Used for cached RDDs, useful
if .cache() or .persist() called.
Cache
This is the memory for

application objects. It is what is
left after setting the other two. If
youre seeing OOMs in your
code, you need more memory
here!
Shuffle
spark.shuffle.memoryFraction
Default = 0.2 (20%)
Used for shuffles. Increase this
for shuffle-intensive applications
wherein spills happen often.
Guideline: Stick with defaults, and check execution statistics to tweak settings.
Spark on YARN: Sizing up Executors

How many Executors? How many cores? How much memory?
Setting spark.executor.memory
! Size up this number first
Dont use excessively large executors as GC pauses become a problem.
Dont use overly skinny executors since JVM overhead becomes proportionately higher
10GB <= spark.executor.memory <= 48GB could be a good guideline?
Choose towards the higher end when working with bigger data partitions, using large broadcast
variables, etc.
Setting spark.executor.instances
! Given spark.executor.memory, compute spark.executor.instances to saturate available memory.
! In reality, spark.executor.memory and spark.executor.instances are computed hand-in-hand.
! Dont forget to account for overheads (daemons, application master, driver, etc.)
spark.executor.instances ~ #nodes * (yarn.nodemanager.resource.memory-mb * queue-fraction /
spark.executor.memory)
Setting spark.executor.cores
Over-request cores by 2 to 3 times the number of actual cores in your cluster.
Why? Not all tasks are CPU bound at the same time.
Spark on YARN: Sizing up Executors (Example)

Sample Cluster Configuration:
8 nodes, 32 cores/node (256 total), 128 GB/node (1024 GB total)
Running YARN Capacity Scheduler
Spark queue has 50% of the cluster resources
Naive Configuration:
spark.executor.instances = 8 (one Executor per node)
spark.executor.cores = 32 * 0.5 = 16 => Undersubscribed
spark.executor.memory = 64 MB => GC pauses
Better Configuration:
spark.executor.memory = 16 GB (just as an example)
spark.executor.instances = 8 * (128 GB * 0.5 / 16 GB) = 32 total
spark.executor.cores = total-available-cores * over-subscription-factor / spark.executor.instances
= (256 * 0.5) * 2.5 / 32 = 10
These calculations arent perfect -- they dont account for overheads, for the Application Master
container, etc. But hopefully you get the idea
Different applications dictate different settings. EXPERIMENT and FINE TUNE!
Spark on YARN: Exploiting Data Locality
Spark tries to execute tasks on nodes such that there will be minimal data movement (data locality)
! Loss of data locality = suboptimal performance
These tasks are run on executors, which are (usually) launched when a SparkContext is spawned,
and well before Spark knows what data will be touched.
Problem: How does Spark tell YARN where to launch Executors?
Your application can tell Spark the list of nodes that hold data (preferred locations). Using a simple
API, you can supply this information when instantiating a SparkContext
See SparkContext constructor (argument preferredNodeLocationData)

https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkContext
val hdfspath = hdfs://...
val sc = new SparkContext(sparkConf,
InputFormatInfo.computePreferredLocations(
Seq(new InputFormatInfo(conf,
classOf[org.apache.hadoop.mapred.TextInputFormat],
hdfspath ))
Spark on YARN: Dynamic Allocation
Prior to Release 1.3, Spark acquired all executors at application startup and held onto them for the
lifetime of an application.
Starting Release 1.3, Spark supports dynamic allocation of executors. This allows applications to
launch executors when more tasks are queued up, and release resources when the application is
idle.
Ideally suited for many interactive applications that might have see user down-time.
Major caveat: Spark may release executors with cached RDDs! Ouch! So if youre application uses
rdd.cache() or rdd.persist() to materialize expensive computations, you may not want to use dynamic
allocation for that application.
On the other hand, you could consider caching expensive computations in HDFS.
Spark on YARN: Dynamic Allocation settings

Configuration option
Default
Description
spark.dynamicAllocation.enabled
false
Set to true to get elasticity
spark.dynamicAllocation.minExecutors
Lower bound on # executors.

Leave as is.
spark.dynamicAllocation.maxExecutors
<Infinity>
Upper bound on # executors. Set

based on worksheet in previous
slide.
spark.dynamicAllocation.executorIdleTimeout
600 secs
(10 mins)
How long to wait before giving up

idle executors? Set to lower value,
say 1 minute?
spark.dynamicAllocation.schedulerBacklogTim
eout
spark.dynamicAllocation.sustainedSchedulerB
acklogTimeout
5 secs
How to launch new executors to

meet incoming demand?
Executors are launched in waves
of exponentially increasing
numbers. Leave as is.

Spark On YARN

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Spark On YARN

Enviado por

Direitos autorais:

Formatos disponíveis

Spark on YARN

Spark and Cluster Management

Local: Useful only for development

YARN: Highly recommended for production

Mesos: Not supported in BigInsights

Spark Cluster Architecture: Logical View

Spark: Whats Inside an Executor?

Free Task Slots (Cores)

Spark: Standalone Cluster Manager

Inter-process communication not shown.

Standalone Mode: Configuration

# of cores to give to underlying Executors (default: all available cores).

Spark on YARN: Architecture

Spark on YARN: Managing queues

Spark on YARN: Basic Configuration

Spark Settings (Per

Spark on YARN: Memory Usage Inside an Executor

This is the memory for

Spark on YARN: Sizing up Executors

Spark on YARN: Sizing up Executors (Example)

Spark on YARN: Exploiting Data Locality

Problem: How does Spark tell YARN where to launch Executors?

See SparkContext constructor (argument preferredNodeLocationData)

Spark on YARN: Dynamic Allocation

Spark on YARN: Dynamic Allocation settings

Set to true to get elasticity

Lower bound on # executors.

Upper bound on # executors. Set

How long to wait before giving up

How to launch new executors to

Você também pode gostar