Escolar Documentos
Profissional Documentos
Cultura Documentos
Agenda
1 2 3 4 5
Operations: Challenges and Opportunities Big Data Overview Operations with Big Data Big Data Details: Hadoop, Hive, Scribe Conclusion
Operations
Collect
Understand
Improve
Monitor
Operations
Collect
Understand
Improve
Monitor
Challenges
Huge amount of data
Sampling
Distributed environment
Log
Hardware
Distributed
Web
Memcache MySQL
MySQL layer got hit hard and performance of MySQL degraded Web performance degraded
RPC 0
RPC 1 RPC 2
Both RPC 1 and RPC 2 succeeded But RPC 0 detects that the results are inconsistent and fails We may not have logged any trace information for RPC 1 and RPC 2 to continue debugging.
Opportunities
Big Data Technologies
Distributed Distributed Distributed
Deeper Analysis
Data
Time-series
Big Data
What is Big Data?
Volume is big enough and hard to be managed by traditional technologies Value is big enough not to be sampled/dropped
Overall Architecture
~3GB/sec Near-realtime Processing
PHP
Copy/Load
Central HDFS
Hive
logview
Features
PHP
Group
Integrated Low-pri:
Scribe Mid0er
Log View
HTTP
logmonitor
Rules
Regular-expression Rule
has levels: WARN, ERROR, etc rules Propagate rules Rules Storage Modify rules
Dynamic
Apply rules
<RuleName , Count, PTail / Local Log Examples> Logmonitor Logmonitor Client Stats Server
Self Monitoring
Goal:
Set
logs
Isolate Make
it easy for service owners to monitor Service JMX ThriR/Fb303 counter query
Approach
Log4J
JMX/Thrift/Fb303 Client-side
Service Owner
levels destination (log name) fields: Request ID Log data Log data Log data Scribe Log data PTail
Additional
Service 1
Hive Pipelines
Daily and historical data analysis
What
When
Examples
SELECT SELECT
request_id, GROUP_CONCAT(log_line) as total_log FROM trace GROUP BY request_id HAVING total_log LIKE "%FATAL%;
Key Requirements
Ease of use
Smooth Easy
Latency
Real-time Historical
learning curve
data data
integration data
Structured/unstructured Schema
Reliability
Low
evolution
Scalable
Spiky Raw
Consistent
Overall Architecture
~3GB/sec Near-realtime Processing
PHP
Copy/Load
Central HDFS
Hive
https://github.com/facebook/scribe
Meta Data
Meta Data
Scribe Improvements
Network efficiency
Per-RPC
Operation interface
Category-based
Adaptive logging
Use
QoS
Use
C1
C2
DataNode
C1
C2
DataNode
Features
Scalability: No
Scribe Clients
9GB/sec
Calligraphus Mid-0er
Calligraphus Writers
HDFS
Zookeeper
namespace, block locations data blocks replicated 3 times Name Node PBs of spaces HDFS Client
Features
3000-node, Highly No
random writes
https://github.com/facebook/hadoop-20
HDFS Improvements
Efficiency
Random Faster Use
checksum - HDFS-2080
fadvise - HADOOP-7714
Credits:
http://www.cloudera.com/resource/hadoop-world-2011-presentation-
slides-hadoop-and-performance
Write-Ahead Records
Features
100-node. Random Great
read/write.
Region Servers
write performance.
http://svn.apache.org/viewvc/hbase/branches/0.89-fb/
JobTracker MR Client
Client
Features
Push
TaskTracker
Reliable Not
easy to use
MR Improvements
Efficiency
Faster MR
compareBytes: HADOOP-7761
Shuffle:
Credits:
http://www.cloudera.com/resource/hadoop-world-2011-presentation-
slides-hadoop-and-performance
Features
SQL
Select, UDF,
CREATE
UDFs
UDF,
UDAF, UDTF
Efficient Joins
Bucketed
HDFS
HBase
Features
StreamSQL: UDF,
PTail + Puma
Reliable
more data and keep more sample whenever needed debugging infrastructure on top of Big Data real-time and historical analysis to improve Big Data
Build Both
Continue
(c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0