Você está na página 1de 36

Hadoop. Resource management.

Alexey Filanovskiy
Cloudera certified developer

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

YARN and MRv2. General


architecture.

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

YARN and MRv2. Problem of MRv1


One coordinator for all MR jobs (JobTracker).
- Cluster scalable till 3000 nodes
- We want HA for JobTracker
- Not efficient way to use HW resource of the
cluster (separate map or reduce slots)
- Desire for federating different components
into one cluster (not only MR, also Impala,
for example)

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

YARN. Main idea #1.

Scalable of JobTracker.
Split it into two components:
- Resource Manager handing cluster resource (CPU, RAM). One per cluster.
- Application Master, coordinate dedicated MR. One per MR job

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

YARN. Main idea #2.

Move from slots approach of Resource management to physical world approach(Memory, CPU,
Disk).
Determinate amount of resource that can we used by each process for each node (for example,
Impala can use 4 core, 16 Gb RAM, MapReduce 12 cores 32 Gb RAM)
For each map of reduce are dedicated some amount RAM, cores, weight for IO operation
Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

YARN.
YARN: Yet-Another-Resource-Negotiator

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

YARN. Running jobs. Advanced

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

Scheduler

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

10

YARN. Running jobs. Advanced

Lets zoom it

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

YARN. Running jobs. Advanced

Scheduler determinate quire of MR Jobs and starting order

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

Scheduler

chedulers that available for CDH:

uler

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

FIFO Scheduler

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

14

FIFO (The first in, first out) Scheduler

queue. First comes, first goes!

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

15

DEMO

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

16

FIFO (The first in, first out) Scheduler. Demo


- One by one were started 15 MR applications.
on graph bellow obviously can be observed behavior
of FIFO scheduler.
-First 6 application occupied all available containers (MR slots).
Other applications goes to pending pool.

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

17

FIFO (The first in, first out) Scheduler. Demo


- One by one were started 15 MR applications.
on graph bellow obviously can be observed behavior
of FIFO scheduler.
-First 6 application occupied all available containers (MR slots).
Other applications goes to pending pool (total 9).
- When some jobs finished, 9 jobs from pending pool share
between themselves available resource

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

18

FIFO (The first in, first out) Scheduler. Demo


- One by one were started 15 MR applications.
on graph bellow obviously can be observed behavior
of FIFO scheduler.
-First 6 application occupied all available containers (MR slots).
Other applications goes to pending pool.
- When some jobs finished, 9 jobs from pending pool share
between themselves available resource
- After this we start another one 5 MR applications.
They goes to pending pool, but resource that is released
goes to application that already started. New one is still
pending

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

19

FIFO (The first in, first out) Scheduler. Demo.


Whole picture
-First 6 application occupied all
available containers (MR slots).
Other applications goes to
pending pool.
- When some jobs finished, 9 jobs
from pending pool share between
themselves available resource
-After this we start another one
5 MR applications.
They goes to pending pool, but
resource that is released
goes to application that already
started. New one is still
pending

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

20

Fair Scheduler

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

21

Fair Scheduler

Everything should be fair!

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

22

Fair Scheduler. Main concepts

All hardware resources shared by all applications based on config file (some policies)
Piece of HW resource that dedicate for each job determinate by queue
Each application are placed in some queue
f queue is not determinate explicitly application is put on default queue
rameter yarn.scheduler.fair.allow-undeclared-pools should be equal false)
When there is a single job running, that job uses the entire cluster.
When other jobs are submitted, tasks slots that free up are assigned to the new jobs, so that each job gets r
e same amount of CPU time.
The Fair Scheduler arose out of Facebooks need to share its data warehouse between multiple users

Better to see once than hear 100 times (C)

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |


23
http://stackoverflow.com/questions/13842241/can-we-use-both-fair-scheduler-and-capacity-scheduler-in-the-sam

Fair Scheduler. Ways to specify queues

ed user
queue you can log on as hdfs Linux user:

23(hadoop),1001(oinstall),1003(hdfs)
ob.queue.name during running MR job

yarn)
mples-2.3.0-cdh5.0.0.jar
red.job.queue.name=root.hdfs 1000000000 /tmp/test2
mple). This HQL will use root.hdfs queue

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

24

DEMO

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

25

Fair Scheduler. Main concepts

ve 4 queues: root, hdfs, someuser, default


dfs have eqal weight.
on only weight

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

26

Fair Scheduler. Main concepts

on root.someuser pool
U resource. Because its single Job in cluster

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

27

Fair Scheduler. Main concepts

another one MR job in root.root pool


ecouse in two equal part

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

28

Fair Scheduler. Main concepts

d) in root.hdfs pool
g file. Hdfs pool takes half recourse, root and someuser quoter

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

29

Fair Scheduler. Main concepts

anged weight for root.hdfs pool (increase to 3)

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

30

Fair Scheduler. Main concepts

anged weight for root.hdfs pool (increase to 3)


atically!

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

31

Fair Scheduler. Main concepts

of CPU for someuser pool (11 cores as maximum)


source

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

32

Fair Scheduler. Main concepts

imit number of CPU for someuser pool


released resource

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

33

CapacityScheduler

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

36

CapacityScheduler. Main concepts

CapacityScheduler is designed to allow sharing a large cluster while giving each organization a minimum
y guarantee.
entral idea is that the available resources in the Hadoop Map-Reduce cluster are partitioned among
e organizations who collectively fund the cluster based on computing needs.
apacity Scheduler from Yahoo offers similar functionality to the Fair Scheduler but takes a somewhat differe
hy
Capacity Scheduler, you define a number of named queues. Each queue has a configurable number
and reduce slots. The scheduler gives each queue its capacity when it contains jobs, and shares any unused
y between the queues.

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |


37
http://stackoverflow.com/questions/13842241/can-we-use-both-fair-scheduler-and-capacity-scheduler-in-the-sam

Copyright 2014 Oracle and/or its affiliates. All rights reserved. |

38

Você também pode gostar