Velocity China 2011 Operations and Data Freeway ShaoZheng

Operations and Big Data: Hadoop, Hive and Scribe
Zheng Shao @9 12/7/2011 Velocity China 2011
Agenda
1 2 3 4 5
Operations: Challenges and Opportunities Big Data Overview Operations with Big Data Big Data Details: Hadoop, Hive, Scribe Conclusion
Operations challenges and opportunities
Operations
Measure and Instrume nt
Collect
Model and Analyze
Understand
Improve
Monitor
Operations
Measure and Instrume nt
Collect
Model and Analyze
Understand
Improve
Monitor
Challenges
Huge amount of data
Sampling
may not be good enough
Distributed environment
Log
collection is hard failures are normal failures are hard to understand
Hardware
Distributed
Example 1: Cache miss and performance

Memcache layer has a bug that decreased the cache hit rate by half
Web
Memcache MySQL
MySQL layer got hit hard and performance of MySQL degraded Web performance degraded
Example 2: Map-Reduce Retries Attempt 1 Map Task Attempt 2 Attempt 3 Attempt 4

Attempt 1 hits a transient distributed file system issue and failed Attempt 2 hits a real hardware issue and failed Attempt 3 hits a transient application logic issue and failed Attempt 4, by chance, succeeded The whole process slowed down
Example 3: RPC Hierarchy

RPC 3A failed
RPC 1 RPC 0 RPC 2 RPC 3
PRC 1A The whole RPC 0 failed because PRC 1B

of that The blame was on owner of RPC 3A service 3 because the log in RPC 3B service 0 shows that.
Example 4: Inconsistent results in RPC

RPC 0 got results from both RPC 1 and RPC 2
RPC 0
RPC 1 RPC 2
Both RPC 1 and RPC 2 succeeded But RPC 0 detects that the results are inconsistent and fails We may not have logged any trace information for RPC 1 and RPC 2 to continue debugging.
Opportunities
Big Data Technologies
Distributed Distributed Distributed
logging systems storage systems computing systems
Logging Logging Logging

Collect
Deeper Analysis
Data
Storage Storage Storage

Model
mining and outlier detection analysis
Time-series
Compu0ng Compu0ng Compu0ng
Big Data Overview An example from Facebook
Big Data
What is Big Data?

Volume is big enough and hard to be managed by traditional technologies Value is big enough not to be sampled/dropped
Where is Big Data used?

Product analysis User behavior analysis Business intelligence
Why use Big Data for Operations?
Reuse existing infrastructure.
Overall Architecture
~3GB/sec Near-realtime Processing
PHP
Scribe Policy PTail Puma HBase
C++ Scribe-HDFS Java Nectar Scribe Client

~9GB/sec ~6GB/sec Batch Processing
Copy/Load
Central HDFS
Hive
Operations with Big Data
logview
Features
PHP
Fatal StackTrace StackTrace by similarity, order by counts with SVN/Task/Oncall tools
Group
Integrated Low-pri:
Scribe can drop logview data
PHP Scribe Client
Scribe Mid0er
Log View
HTTP
logmonitor
Rules
Regular-expression Rule
based: ".*Missing Block.*"
has levels: WARN, ERROR, etc rules Propagate rules Rules Storage Modify rules
Dynamic
Apply rules
<RuleName , Count, PTail / Local Log Examples> Logmonitor Logmonitor Client Stats Server
Top Rules Web
Self Monitoring
Goal:
Set
KPIs for SOA issues in distributed systems
logs
Scribe PTail, Hive
Isolate Make
it easy for service owners to monitor Service JMX ThriR/Fb303 counter query
Approach
Log4J
integration with Scribe counters
JMX/Thrift/Fb303 Client-side
logging + Server-side logging
Service Owner
Global Debugging with PTail

Logging instruction
Logging Logging
levels destination (log name) fields: Request ID Log data Log data Log data Scribe Log data PTail
Additional
Service 1
RPC + logging instruc0ons Service 3
RPC + Service 2 logging instruc0ons
Hive Pipelines
Daily and historical data analysis
What
is the trend of a metric? did this bug first happen?
When
Examples
SELECT SELECT
percentile(latency, 50,75,90,99) FROM latency_log;
request_id, GROUP_CONCAT(log_line) as total_log FROM trace GROUP BY request_id HAVING total_log LIKE "%FATAL%;
Big Data Details Hadoop, Hive, Scribe
Key Requirements
Ease of use
Smooth Easy
Latency
Real-time Historical
learning curve
data data
integration data
Structured/unstructured Schema
Reliability
Low
evolution
data loss computation
Scalable
Spiky Raw
Consistent
traffic and QoS
data / Drill-down support
Overall Architecture
~3GB/sec Near-realtime Processing
PHP
Scribe Policy PTail Puma HBase
C++ Scribe-HDFS Java Nectar Scribe Client

~9GB/sec ~6GB/sec Batch Processing
Copy/Load
Central HDFS
Hive
Distributed Logging System - Scribe
https://github.com/facebook/scribe
Distributed Logging System - Scribe

Scribe Policy Trac/Schema management
Meta Data
Meta Data
Applica0on Nectar Library Easy integra0on/schema evolu0on
ThriR RPC Log Data <category, message>
Scribe Service Log Collec0on Log Data Local Disk
Log Data ThriR RPC
Scribe Improvements
Network efficiency
Per-RPC
Compression (use quicklz)
Operation interface
Category-based
blacklisting and sampling
Adaptive logging
Use
BufferStore and NullStore to drop messages as needed
QoS
Use
separate hardware for now
Distributed Storage Systems - Scribe-HDFS

Architecture
Client Mid-tier Writers
C1
C2
DataNode
C1
C2
DataNode
Features
Scalability: No
Scribe Clients
9GB/sec
Calligraphus Mid-0er
Calligraphus Writers
HDFS
single point of failure (except NameNode)
Zookeeper
Not open-sourced yet
Distributed Storage Systems - HDFS

Architecture
NameNode: DataNodes:
namespace, block locations data blocks replicated 3 times Name Node PBs of spaces HDFS Client
Features
3000-node, Highly No
reliable Data Nodes
random writes
https://github.com/facebook/hadoop-20
HDFS Improvements
Efficiency
Random Faster Use
read keep-alive: HDFS-941
checksum - HDFS-2080
fadvise - HADOOP-7714
Credits:
http://www.cloudera.com/resource/hadoop-world-2011-presentation-
slides-hadoop-and-performance
Distributed Storage Systems - HBase

Architecture
<row,
col-family, col, value> Log Master HBase Client
Write-Ahead Records
are sorted in memory/files
Features
100-node. Random Great
read/write.
Region Servers
write performance.
http://svn.apache.org/viewvc/hbase/branches/0.89-fb/
Distributed Computing Systems MR

Architecture
JobTracker TaskTracker MR
JobTracker MR Client
Client
Features
Push
computation to data - Automatic retry
TaskTracker
Reliable Not
easy to use
MR Improvements
Efficiency
Faster MR
compareBytes: HADOOP-7761
sort cache locality: MAPREDUCE-3235 MAPREDUCE-64, MAPREDUCE-318
Shuffle:
Credits:
http://www.cloudera.com/resource/hadoop-world-2011-presentation-
slides-hadoop-and-performance
Distributed Computing Systems Hive

Architecture
MetaStore Compiler Execution
MetaStore Hive cmd line Compiler MR Client Map-Reduce Task Trackers
Features
SQL
Map-Reduce Group By, Join
Select, UDF,
UDAF, UDTF, Script
Useful Features in Hive

Complex column types
Array,
Struct, Map, Union TABLE (a struct<c1:map<string,string>,c2:array<string>>);
CREATE
UDFs
UDF,
UDAF, UDTF
Efficient Joins
Bucketed
Map Join: HIVE-917
Distributed Computing Systems Puma

Architecture
HDFS PTail Puma HBase
HDFS
HBase
Features
StreamSQL: UDF,
Select, Group By, Join
PTail + Puma
UDAF No data loss/duplicate
Reliable
Conclusion Big Data can help operations
Big Data can help Operations

5 Steps to make it effective:
Make Log
Big Data easy to use
more data and keep more sample whenever needed debugging infrastructure on top of Big Data real-time and historical analysis to improve Big Data
Build Both
Continue
(c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0

Velocity China 2011 Operations and Data Freeway ShaoZheng

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Velocity China 2011 Operations and Data Freeway ShaoZheng

Enviado por

Direitos autorais:

Formatos disponíveis

Operations and Big Data: Hadoop, Hive and Scribe

Zheng Shao @9 12/7/2011 Velocity China 2011

Operations challenges and opportunities

Measure and Instrume nt

Model and Analyze

Measure and Instrume nt

Model and Analyze

may not be good enough

collection is hard failures are normal failures are hard to understand

Example 1: Cache miss and performance

Example 2: Map-Reduce Retries Attempt 1 Map Task Attempt 2 Attempt 3 Attempt 4

Example 3: RPC Hierarchy

RPC 1 RPC 0 RPC 2 RPC 3

PRC 1A The whole RPC 0 failed because PRC 1B

Example 4: Inconsistent results in RPC

logging systems storage systems computing systems

Logging Logging Logging

Storage Storage Storage

mining and outlier detection analysis

Compu0ng Compu0ng Compu0ng

Big Data Overview An example from Facebook

Where is Big Data used?

Product analysis User behavior analysis Business intelligence

Why use Big Data for Operations?

Reuse existing infrastructure.

Scribe Policy PTail Puma HBase

C++ Scribe-HDFS Java Nectar Scribe Client

Operations with Big Data

Fatal StackTrace StackTrace by similarity, order by counts with SVN/Task/Oncall tools

Scribe can drop logview data

PHP Scribe Client

based: ".*Missing Block.*"

Top Rules Web

KPIs for SOA issues in distributed systems

Scribe PTail, Hive

integration with Scribe counters

logging + Server-side logging

Global Debugging with PTail

RPC + logging instruc0ons Service 3

RPC + Service 2 logging instruc0ons

is the trend of a metric? did this bug first happen?

percentile(latency, 50,75,90,99) FROM latency_log;

Big Data Details Hadoop, Hive, Scribe

data loss computation

traffic and QoS

data / Drill-down support

Scribe Policy PTail Puma HBase

C++ Scribe-HDFS Java Nectar Scribe Client

Distributed Logging System - Scribe

Distributed Logging System - Scribe

Applica0on Nectar Library Easy integra0on/schema evolu0on

ThriR RPC Log Data <category, message>

Scribe Service Log Collec0on Log Data Local Disk

Log Data ThriR RPC

Compression (use quicklz)

blacklisting and sampling

BufferStore and NullStore to drop messages as needed

separate hardware for now

Distributed Storage Systems - Scribe-HDFS

single point of failure (except NameNode)

Not open-sourced yet

Distributed Storage Systems - HDFS

reliable Data Nodes

read keep-alive: HDFS-941

based: ".Missing Block."