Apache Flume: Distributed Log Collection for Hadoop

Ebook219 pages1 hour

Apache Flume: Distributed Log Collection for Hadoop

Name: Apache Flume: Distributed Log Collection for Hadoop
Author: Steve Hoffman
ISBN: 9781782167921

By Steve Hoffman

Rating: 0 out of 5 stars

()

Read preview

About this ebook

A starter guide that covers Apache Flume in detail.Apache Flume: Distributed Log Collection for Hadoop is intended for people who are responsible for moving datasets into Hadoop in a timely and reliable manner like software engineers, database administrators, and data warehouse administrators

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateJul 16, 2013

ISBN9781782167921

Author

Steve Hoffman

Related to Apache Flume

Related ebooks

Skip carousel

Apache Solr High Performance
Ebook
Apache Solr High Performance
bySurendra Mohan
Rating: 0 out of 5 stars
0 ratings
HBase Essentials
Ebook
HBase Essentials
byNishant Garg
Rating: 0 out of 5 stars
0 ratings
INSTANT Premium Drupal Themes
Ebook
INSTANT Premium Drupal Themes
byPankaj Sharma
Rating: 0 out of 5 stars
0 ratings
Optimizing Hadoop for MapReduce
Ebook
Optimizing Hadoop for MapReduce
byKhaled Tannir
Rating: 0 out of 5 stars
0 ratings
Getting Started with hapi.js
Ebook
Getting Started with hapi.js
byJohn Brett
Rating: 5 out of 5 stars
5/5
Hadoop Essentials
Ebook
Hadoop Essentials
byShiva Achari
Rating: 5 out of 5 stars
5/5
Instant PostgreSQL Backup and Restore How-to
Ebook
Instant PostgreSQL Backup and Restore How-to
byShaun Thomas
Rating: 0 out of 5 stars
0 ratings
Apache Oozie Essentials
Ebook
Apache Oozie Essentials
bySingh Jagat Jasjit
Rating: 0 out of 5 stars
0 ratings
Monitoring Hadoop
Ebook
Monitoring Hadoop
byGurmukh Singh
Rating: 0 out of 5 stars
0 ratings
Apache Solr PHP Integration
Ebook
Apache Solr PHP Integration
byJayant Kumar
Rating: 0 out of 5 stars
0 ratings
Apache Hive Essentials
Ebook
Apache Hive Essentials
byDayong Du
Rating: 0 out of 5 stars
0 ratings
Instant Apache Stanbol
Ebook
Instant Apache Stanbol
byReto Bachmann-Gmur
Rating: 0 out of 5 stars
0 ratings
Splunk Developer's Guide
Ebook
Splunk Developer's Guide
byKyle Smith
Rating: 0 out of 5 stars
0 ratings
Getting Started with Ghost
Ebook
Getting Started with Ghost
byKezz Bracey
Rating: 0 out of 5 stars
0 ratings
Distributed Computing with Python
Ebook
Distributed Computing with Python
byFrancesco Pierfederici
Rating: 0 out of 5 stars
0 ratings
Instant MapReduce Patterns – Hadoop Essentials How-to
Ebook
Instant MapReduce Patterns – Hadoop Essentials How-to
bySrinath Perera
Rating: 0 out of 5 stars
0 ratings
Learning Puppet for Windows Server
Ebook
Learning Puppet for Windows Server
byFuat Ulugay
Rating: 0 out of 5 stars
0 ratings
Mastering Scala Machine Learning
Ebook
Mastering Scala Machine Learning
byAlex Kozlov
Rating: 0 out of 5 stars
0 ratings
Implementing Cloud Design Patterns for AWS
Ebook
Implementing Cloud Design Patterns for AWS
byMarcus Young
Rating: 0 out of 5 stars
0 ratings
Cassandra Design Patterns - Second Edition
Ebook
Cassandra Design Patterns - Second Edition
byThottuvaikkatumana Rajanarayanan
Rating: 0 out of 5 stars
0 ratings
Instant Redis Optimization How-to
Ebook
Instant Redis Optimization How-to
byArun Chinnachamy
Rating: 0 out of 5 stars
0 ratings
Learning Apache Thrift
Ebook
Learning Apache Thrift
byRakowski Krzysztof
Rating: 0 out of 5 stars
0 ratings
Hands-On Machine Learning Recommender Systems with Apache Spark
Ebook
Hands-On Machine Learning Recommender Systems with Apache Spark
byErnesto Lee
Rating: 0 out of 5 stars
0 ratings
Drush for Developers - Second Edition
Ebook
Drush for Developers - Second Edition
byJuampy Novillo Requena
Rating: 0 out of 5 stars
0 ratings
Mastering Puppet - Second Edition
Ebook
Mastering Puppet - Second Edition
byUphill Thomas
Rating: 0 out of 5 stars
0 ratings
Learning Shell Scripting with Zsh
Ebook
Learning Shell Scripting with Zsh
byGastón Festari
Rating: 0 out of 5 stars
0 ratings
Building a Web Application with PHP and MariaDB: A Reference Guide
Ebook
Building a Web Application with PHP and MariaDB: A Reference Guide
bySai Srinivas Sriparasa
Rating: 0 out of 5 stars
0 ratings
Getting Started with Gulp – Second Edition
Ebook
Getting Started with Gulp – Second Edition
byTravis Maynard
Rating: 0 out of 5 stars
0 ratings
PolyBase Revealed: Data Virtualization with SQL Server, Hadoop, Apache Spark, and Beyond
Ebook
PolyBase Revealed: Data Virtualization with SQL Server, Hadoop, Apache Spark, and Beyond
byKevin Feasel
Rating: 0 out of 5 stars
0 ratings
Hadoop Cluster Deployment
Ebook
Hadoop Cluster Deployment
byDanil Zburivsky
Rating: 0 out of 5 stars
0 ratings

Computers For You

Skip carousel

Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Python Machine Learning By Example
Ebook
Python Machine Learning By Example
byYuxi (Hayden) Liu
Rating: 4 out of 5 stars
4/5
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
Ebook
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Computer Science: A Concise Introduction
Ebook
Computer Science: A Concise Introduction
byIan Sinclair
Rating: 4 out of 5 stars
4/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
Ebook
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
byTJ Books
Rating: 0 out of 5 stars
0 ratings
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
Ebook
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
byHadelin de Ponteves
Rating: 0 out of 5 stars
0 ratings
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
Ebook
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
byAlex Parkinson
Rating: 4 out of 5 stars
4/5
Practical Lock Picking: A Physical Penetration Tester's Training Guide
Ebook
Practical Lock Picking: A Physical Penetration Tester's Training Guide
byDeviant Ollam
Rating: 5 out of 5 stars
5/5
The Insider's Guide to Technical Writing
Ebook
The Insider's Guide to Technical Writing
byKrista Van Laan
Rating: 0 out of 5 stars
0 ratings
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
Ebook
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
byAaron Smith
Rating: 0 out of 5 stars
0 ratings
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Elon Musk
Ebook
Elon Musk
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Deep Search: How to Explore the Internet More Effectively
Ebook
Deep Search: How to Explore the Internet More Effectively
byAlan Pearce
Rating: 5 out of 5 stars
5/5
YouTube: How to Build and Optimize Your First YouTube Channel, Marketing, SEO, Tips and Strategies for YouTube Channel Success
Ebook
YouTube: How to Build and Optimize Your First YouTube Channel, Marketing, SEO, Tips and Strategies for YouTube Channel Success
byTommy Swindali
Rating: 4 out of 5 stars
4/5
CompTIA Security+ Practice Questions
Ebook
CompTIA Security+ Practice Questions
byIP Specialist
Rating: 2 out of 5 stars
2/5
The Mega Box: The Ultimate Guide to the Best Free Resources on the Internet
Ebook
The Mega Box: The Ultimate Guide to the Best Free Resources on the Internet
byChris Mason
Rating: 4 out of 5 stars
4/5
Master Builder Roblox: The Essential Guide
Ebook
Master Builder Roblox: The Essential Guide
byTriumph Books
Rating: 4 out of 5 stars
4/5
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
Ebook
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
byQuentin Docter
Rating: 0 out of 5 stars
0 ratings
The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling
Ebook
The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling
byRalph Kimball
Rating: 0 out of 5 stars
0 ratings
The Professional Voiceover Handbook: Voiceover training, #1
Ebook
The Professional Voiceover Handbook: Voiceover training, #1
byPeter Baker
Rating: 5 out of 5 stars
5/5
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 0 out of 5 stars
0 ratings
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
Ebook
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
bySeth Stephens-Davidowitz
Rating: 4 out of 5 stars
4/5
Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands
Ebook
Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands
byTriumph Books
Rating: 5 out of 5 stars
5/5
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
Ebook
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
byGary Smith
Rating: 4 out of 5 stars
4/5
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
Ebook
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
byTriumph Books
Rating: 4 out of 5 stars
4/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

ThursdAI Aug 3 - OpenAI, Qwen 7B beats LLaMa, Orca is replicated, and more AI news
Podcast episode
ThursdAI Aug 3 - OpenAI, Qwen 7B beats LLaMa, Orca is replicated, and more AI news
byThursdAI - The top AI news from the past week
0 ratings
0% found this document useful
68: Ember 2 & The Ember Community: Summary Ember community leaders Audrey Listochkin (@listochkin) & Robert Jackson (@rwjblue) talk with us about the long awaited Ember 2 release and the Ember community across the globe. The future of Ember is larger than this 2.x release and...
Podcast episode
68: Ember 2 & The Ember Community: Summary Ember community leaders Audrey Listochkin (@listochkin) & Robert Jackson (@rwjblue) talk with us about the long awaited Ember 2 release and the Ember community across the globe. The future of Ember is larger than this 2.x release and...
byThe Web Platform Podcast
0 ratings
0% found this document useful
Rust in Production Ep 1 - InfluxData's Paul Dix: Paul Dix, CTO of InfluxDB, talks about the open-source time series database's development, the decision to use Go and Rust, challenges of managing high data volumes, performance improvements, future plans, and the value of hands-on learning.
Podcast episode
Rust in Production Ep 1 - InfluxData's Paul Dix: Paul Dix, CTO of InfluxDB, talks about the open-source time series database's development, the decision to use Go and Rust, challenges of managing high data volumes, performance improvements, future plans, and the value of hands-on learning.
byRust in Production
0 ratings
0% found this document useful
Big Data In The Browser: So why would anyone want to put alot of data into a browser? Well, for a lot of the same reasons that edge computing and distributed computing have become so popular. You get the data a lot closer to the user and you don’t have to pay for the compute...
Podcast episode
Big Data In The Browser: So why would anyone want to put alot of data into a browser? Well, for a lot of the same reasons that edge computing and distributed computing have become so popular. You get the data a lot closer to the user and you don’t have to pay for the compute...
byThe MapScaping Podcast - GIS, Geospatial, Remote Sensing, earth observation and digital geography
0 ratings
0% found this document useful
344: Grains of Salt: Shell text processing, data rebalancing on ZFS mirrors, Add Security Headers with OpenBSD relayd, ZFS filesystem hierarchy in ZFS pools, speeding up ZSH, How Unix pipes work, grow ZFS pools over time, the real reason ifconfig on Linux is deprecated, clear your terminal in style, and more.
Podcast episode
344: Grains of Salt: Shell text processing, data rebalancing on ZFS mirrors, Add Security Headers with OpenBSD relayd, ZFS filesystem hierarchy in ZFS pools, speeding up ZSH, How Unix pipes work, grow ZFS pools over time, the real reason ifconfig on Linux is deprecated, clear your terminal in style, and more.
byBSD Now
0 ratings
0% found this document useful
Doing DevRel on Easy Mode with Matty Stratton: Corey’s good friend Matt “Matty” Stratton, now a Staff Developer Advocate at Pulumi, is back for another round of “Screaming!” Now, with a job title that sits at the top of a “very strange career trajectory.” With beginnings at Chef, to IMB, and now Pulum
Podcast episode
Doing DevRel on Easy Mode with Matty Stratton: Corey’s good friend Matt “Matty” Stratton, now a Staff Developer Advocate at Pulumi, is back for another round of “Screaming!” Now, with a job title that sits at the top of a “very strange career trajectory.” With beginnings at Chef, to IMB, and now Pulum
byScreaming in the Cloud
0 ratings
0% found this document useful
Eliminating Garbage In/Garbage Out for Analytics and ML // Roy Hasson & Santona Tuli // MLOps Podcast #166
Podcast episode
Eliminating Garbage In/Garbage Out for Analytics and ML // Roy Hasson & Santona Tuli // MLOps Podcast #166
byMLOps.community
0 ratings
0% found this document useful
Episode 103. Let's share data cross-language with Apache Arrow! (among other things): We have a great time talking to Matt Topol from Voltron Data on one of his Apache Software Foundation projects called Apache Arrow. It's both a spec and implementation of a columnar data format that is not only efficient, but cross-language...
Podcast episode
Episode 103. Let's share data cross-language with Apache Arrow! (among other things): We have a great time talking to Matt Topol from Voltron Data on one of his Apache Software Foundation projects called Apache Arrow. It's both a spec and implementation of a columnar data format that is not only efficient, but cross-language...
byJava Pub House
0 ratings
0% found this document useful
69: Testing Front End Code: Summary Oren Rubin (@Shexman) goes through why it’s important to not only test the back-end code of our applications but also to test our Front End code, the integration points, and the full user experience. Oren also goes through...
Podcast episode
69: Testing Front End Code: Summary Oren Rubin (@Shexman) goes through why it’s important to not only test the back-end code of our applications but also to test our Front End code, the integration points, and the full user experience. Oren also goes through...
byThe Web Platform Podcast
0 ratings
0% found this document useful
Serverless Data APIs
Podcast episode
Serverless Data APIs
byThe Cloudcast
0 ratings
0% found this document useful
DOP 231: Automating API Development With Hasura: #231: We never thought that exposing our databases to the public internet was a good thing. However, when we started creating middleware API services that sat in front of those databases, we probably ended up doing almost the exact same type work that...
Podcast episode
DOP 231: Automating API Development With Hasura: #231: We never thought that exposing our databases to the public internet was a good thing. However, when we started creating middleware API services that sat in front of those databases, we probably ended up doing almost the exact same type work that...
byDevOps Paradox
0 ratings
0% found this document useful
Shouldn't Data Connections Be Easier? (with Ashley Jeffs)
Podcast episode
Shouldn't Data Connections Be Easier? (with Ashley Jeffs)
byDeveloper Voices
0 ratings
0% found this document useful
Episode 233: Demystifying Product Funnels with Rachel Obstler: What do traditional product funnels have to offer? Our guest today is Rachel Obstler, VP of product at Heap. You'll learn about the shortcomings of product funnels, how granular product data can unlock opportunities, common data science challenges, and more.
Podcast episode
Episode 233: Demystifying Product Funnels with Rachel Obstler: What do traditional product funnels have to offer? Our guest today is Rachel Obstler, VP of product at Heap. You'll learn about the shortcomings of product funnels, how granular product data can unlock opportunities, common data science challenges, and more.
byUI Breakfast: UI/UX Design and Product Strategy
0 ratings
0% found this document useful
Move Your Database To The Data And Speed Up Your Analytics With DuckDB: An interview with Hannes Mühleisen about the DuckDB engine for in-process OLAP queries that lets you use the power of SQL and the flexibility of programming languages side by side.
Podcast episode
Move Your Database To The Data And Speed Up Your Analytics With DuckDB: An interview with Hannes Mühleisen about the DuckDB engine for in-process OLAP queries that lets you use the power of SQL and the flexibility of programming languages side by side.
byData Engineering Podcast
0 ratings
0% found this document useful
Distributing Geospatial Data: Distributing Geospatial Data - Every wondered why you might what to do this? Or maybe you understand the why but are unsure about the how? Perhaps you have heard people talk about partitioning data or sharding data, you might have heard some of thes...
Podcast episode
Distributing Geospatial Data: Distributing Geospatial Data - Every wondered why you might what to do this? Or maybe you understand the why but are unsure about the how? Perhaps you have heard people talk about partitioning data or sharding data, you might have heard some of thes...
byThe MapScaping Podcast - GIS, Geospatial, Remote Sensing, earth observation and digital geography
0 ratings
0% found this document useful
70: Web Components at Microsoft: Summary Daniel Buchner (@csuwildcat), former Mozillian & Program Manager at Microsoft takes us through the plans for Web Components at Microsoft. Daniel is the creator of the Web Components free open source library, X-Tag which Microsoft is now...
Podcast episode
70: Web Components at Microsoft: Summary Daniel Buchner (@csuwildcat), former Mozillian & Program Manager at Microsoft takes us through the plans for Web Components at Microsoft. Daniel is the creator of the Web Components free open source library, X-Tag which Microsoft is now...
byThe Web Platform Podcast
0 ratings
0% found this document useful
MLOps Meetup #25 // Python and Dask: Scaling the DataFrame // Dan Gerlanc - Founder of Enplus Advisors
Podcast episode
MLOps Meetup #25 // Python and Dask: Scaling the DataFrame // Dan Gerlanc - Founder of Enplus Advisors
byMLOps.community
0 ratings
0% found this document useful
ThursdAI Aug 10 - Deepfakes get real, OSS Embeddings heating up, Wizard 70B tops tops the charts and more!
Podcast episode
ThursdAI Aug 10 - Deepfakes get real, OSS Embeddings heating up, Wizard 70B tops tops the charts and more!
byThursdAI - The top AI news from the past week
0 ratings
0% found this document useful
#08 - Tech stack: Metabase, Superset, Redash, Grafana
Podcast episode
#08 - Tech stack: Metabase, Superset, Redash, Grafana
byTOPP - The Open Podcast Podcast
0 ratings
0% found this document useful
Should managers of developers ever make technical decisions?: We talk about new content types on Stack Overflow, when its appropriate for managers to make technical decisions for their ICs, and Paul's experience getting a new tool online with Google's Cloud Run.
Podcast episode
Should managers of developers ever make technical decisions?: We talk about new content types on Stack Overflow, when its appropriate for managers to make technical decisions for their ICs, and Paul's experience getting a new tool online with Google's Cloud Run.
byThe Stack Overflow Podcast
0 ratings
0% found this document useful
Postgres Performance at Any Scale with Lukas Fittl: Welcome to another episode of the DevOps Toolchain podcast! In this episode, we have a special guest, Lukas Fittl, an experienced engineer and entrepreneur known for his work in optimizing Postgres performance. Lukas will share his insights and tips...
Podcast episode
Postgres Performance at Any Scale with Lukas Fittl: Welcome to another episode of the DevOps Toolchain podcast! In this episode, we have a special guest, Lukas Fittl, an experienced engineer and entrepreneur known for his work in optimizing Postgres performance. Lukas will share his insights and tips...
byTestGuild Devops Toolchain Podcast
0 ratings
0% found this document useful
Is Flink the answer to the ETL problem? (with Robert Metzger)
Podcast episode
Is Flink the answer to the ETL problem? (with Robert Metzger)
byDeveloper Voices
0 ratings
0% found this document useful
? ThursdAI Apr 4 - Weave, CMD R+, SWE-Agent, Everyone supports Tool Use + JAMBA deep dive with AI21
Podcast episode
? ThursdAI Apr 4 - Weave, CMD R+, SWE-Agent, Everyone supports Tool Use + JAMBA deep dive with AI21
byThursdAI - The top AI news from the past week
0 ratings
0% found this document useful
Composable Data Analytics
Podcast episode
Composable Data Analytics
byThe Cloudcast
0 ratings
0% found this document useful
2235: Data-Driven Success: Insights from Kazuki Ohta, CEO of Treasure Data: Join us as we sit down with Kazuki Ohta, CEO, and founder of Treasure Data, a leading customer data platform. Kaz has dedicated his career to helping businesses effectively leverage data to power exceptional customer experiences in marketing, service,...
Podcast episode
2235: Data-Driven Success: Insights from Kazuki Ohta, CEO of Treasure Data: Join us as we sit down with Kazuki Ohta, CEO, and founder of Treasure Data, a leading customer data platform. Kaz has dedicated his career to helping businesses effectively leverage data to power exceptional customer experiences in marketing, service,...
byThe Tech Talks Daily Podcast
0 ratings
0% found this document useful
Episode 101. Allright, let's talk about Kafka: Whew! So we took a big break over summer (like Bob said, we were just swamped with work.. oof), but we are BACK! and like always we are ready to explore even deeper Java topics for the professional developer. This time we set our sights in Apache...
Podcast episode
Episode 101. Allright, let's talk about Kafka: Whew! So we took a big break over summer (like Bob said, we were just swamped with work.. oof), but we are BACK! and like always we are ready to explore even deeper Java topics for the professional developer. This time we set our sights in Apache...
byJava Pub House
0 ratings
0% found this document useful
#07 - Tech stack: Rust, TypeScript, Edge Worker, and Cloudflare
Podcast episode
#07 - Tech stack: Rust, TypeScript, Edge Worker, and Cloudflare
byTOPP - The Open Podcast Podcast
0 ratings
0% found this document useful
67: Keeping Fluent with Web Technology: Summary How do you keep up with the vast amounts of web technology released daily? It can be a losing battle for some and a opportunity for others. One person in our community that comes to mind is Peter Cooper (@peterc) from Cooper Press. Join us...
Podcast episode
67: Keeping Fluent with Web Technology: Summary How do you keep up with the vast amounts of web technology released daily? It can be a losing battle for some and a opportunity for others. One person in our community that comes to mind is Peter Cooper (@peterc) from Cooper Press. Join us...
byThe Web Platform Podcast
0 ratings
0% found this document useful
BDTP. Producing Technical Content with Karl Hughes: Today we have another episode of Better Done Than Perfect. Listen in as we talk with Karl Hughes, founder of Draft.dev. You'll learn how to scale content for different goals, the editorial process he follows at his agency, tips for content promotion, and more.
Podcast episode
BDTP. Producing Technical Content with Karl Hughes: Today we have another episode of Better Done Than Perfect. Listen in as we talk with Karl Hughes, founder of Draft.dev. You'll learn how to scale content for different goals, the editorial process he follows at his agency, tips for content promotion, and more.
byUI Breakfast: UI/UX Design and Product Strategy
0 ratings
0% found this document useful
Couchbase and the Evolving World of Databases with Perry Krug
Podcast episode
Couchbase and the Evolving World of Databases with Perry Krug
byScreaming in the Cloud
0 ratings
0% found this document useful

Skip carousel

Set Up A Production- Ready Web Server
APC
Article
Set Up A Production- Ready Web Server
Nov 4, 2019
8 min read
How To Build The Linux Format Server
Linux Format
Article
How To Build The Linux Format Server
Oct 19, 2021
10 min read
Build A Pi-powered Network Storage Device
Linux Format
Article
Build A Pi-powered Network Storage Device
Dec 14, 2021
10 min read
Set Up A Production-ready Web Server
Linux Format
Article
Set Up A Production-ready Web Server
Sep 24, 2019
8 min read
HotPicks
Linux Format
Article
HotPicks
Nov 15, 2022
12 min read
HotPicks
Linux Format
Article
HotPicks
Jun 27, 2023
12 min read
FLASK Web Frameworks
Linux Format
Article
FLASK Web Frameworks
Jun 4, 2019
The main focus of Python has always been to get you cracking on with your coding – the language was never made for web programming. However, this has just made it more interesting to extend the language for the web, or to create an interface to web-b
9 min read
Answers
Linux Format
Article
Answers
Apr 7, 2020
7 min read
Package Delivery
Linux Format
Article
Package Delivery
Dec 15, 2020
9 min read
Answers
Linux Format
Article
Answers
Jul 2, 2019
Q How do I generate a very large file, say around 980GB, with the dd command? I’m trying to test a quota for an XFS file system under RHEL7. I use the following command as the user oracle, which has the user quota set: and it outputs Where is my
7 min read
Read RSS Feeds Direct From The Terminal
APC
Article
Read RSS Feeds Direct From The Terminal
Nov 1, 2021
3 min read
It’s A Virtual Serverworld
Linux Format
Article
It’s A Virtual Serverworld
Sep 21, 2021
9 min read
Read RSS Feeds Direct From The Terminal
Linux Format
Article
Read RSS Feeds Direct From The Terminal
Aug 24, 2021
3 min read
Perfect Backup: Perfect? No, But Darn Close
PCWorld
Article
Perfect Backup: Perfect? No, But Darn Close
Jan 11, 2023
3 min read
Succeed With Seedboxes
Maximum PC
Article
Succeed With Seedboxes
Feb 27, 2024
14 min read
Stream Audiobooks From Your Server
Linux Format
Article
Stream Audiobooks From Your Server
Oct 17, 2023
One often overlooked component of a media server is audiobooks. In the past, we’ve recommended Booksonic, but in 2021 a new arrival quickly swept all before it to provide a serious alternative to Audible. Audiobookshelf (www. audiobookshelf.org) is y
4 min read
More Online Backups
MacLife
Article
More Online Backups
May 11, 2017
4 min read
Get Packing!
Linux Format
Article
Get Packing!
May 3, 2022
8 min read
Get Packing!
Linux Format
Article
Get Packing!
May 3, 2022
8 min read
HotPicks
Linux Format
Article
HotPicks
Jun 29, 2021
12 min read
Build A Self-hosted Fediverse Server
Linux Format
Article
Build A Self-hosted Fediverse Server
Jan 11, 2022
7 min read
Build Your Own URL Shortening Service
Linux Format
Article
Build Your Own URL Shortening Service
May 4, 2021
7 min read
Adding Software
Linux Format
Article
Adding Software
Jun 28, 2022
1 min read
Adding Software
Linux Format
Article
Adding Software
Jun 28, 2022
1 min read
Run A Ghost Blog On Your Server
Linux Format
Article
Run A Ghost Blog On Your Server
Nov 16, 2021
David Rutland is a tinkerer and a dilettante. He buys domains on a whim and runs them from a Raspberry Pi behind the couch.. The internet is a bleak place in the third decade of the 21st century. Away from the walled-garden shouting matches and ego-s
10 min read
Family History Software: An Introduction
Family Tree UK
Article
Family History Software: An Introduction
Feb 11, 2020
5 min read
How Apple File System Works With Older Macs, Encryption, External Drives, And Other Questions
MacWorld
Article
How Apple File System Works With Older Macs, Encryption, External Drives, And Other Questions
Nov 29, 2017
With the release of macOS High Sierra and its upgrade for SSD-based startup volumes to Apple File System (APFS), Macworld readers had many questions about how this new filesystem—more efficient and reliable for SSDs—will interact with older Macs, har
5 min read
Flask Resources
Linux Format
Article
Flask Resources
Jun 4, 2019
The designers of Flask decided early on that the framework itself would not have all the functions embedded. This philosophy is, of course, an extension of all open source work. When you start using Flask, you may be disappointed by this. Don’t be: t
1 min read
Contributing For Non - Coders
Linux Format
Article
Contributing For Non - Coders
Jan 10, 2023
9 min read
Twenty Years Of WordPress Websites!
Linux Format
Article
Twenty Years Of WordPress Websites!
Oct 17, 2023
11 min read

Related categories

Skip carousel

Reviews for Apache Flume

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Apache Flume - Steve Hoffman

Apache Flume: Distributed Log Collection for Hadoop

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers and more

Why Subscribe?

Free Access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Errata

Piracy

Questions

1. Overview and Architecture

Flume 0.9

Flume 1.X (Flume-NG)

The problem with HDFS and streaming data/logs

Sources, channels, and sinks

Flume events

Interceptors, channel selectors, and sink processors

Tiered data collection (multiple flows and/or agents)

Summary

2. Flume Quick Start

Downloading Flume

Flume in Hadoop distributions

Flume configuration file overview

Starting up with Hello World

Summary

3. Channels

Memory channel

File channel

Summary

4. Sinks and Sink Processors

HDFS sink

Path and filename

File rotation

Compression codecs

Event serializers

Text output

Text with headers

Apache Avro

File type

Sequence file

Data stream

Compressed stream

Timeouts and workers

Sink groups

Load balancing

Failover

Summary

5. Sources and Channel Selectors

The problem with using tail

The exec source

The spooling directory source

Syslog sources

The syslog UDP source

The syslog TCP source

The multiport syslog TCP source

Channel selectors

Replicating

Multiplexing

Summary

6. Interceptors, ETL, and Routing

Interceptors

Timestamp

Host

Static

Regular expression filtering

Regular expression extractor

Custom interceptors

Tiering data flows

Avro Source/Sink

Command-line Avro

Log4J Appender

The Load Balancing Log4J Appender

Routing

Summary

7. Monitoring Flume

Monitoring the agent process

Monit

Nagios

Monitoring performance metrics

Ganglia

The internal HTTP server

Custom monitoring hooks

Summary

8. There Is No Spoon – The Realities of Real-time Distributed Data Collection

Transport time versus log time

Time zones are evil

Capacity planning

Considerations for multiple data centers

Compliance and data expiry

Summary

Index

Apache Flume: Distributed Log Collection for Hadoop

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: July 2013

Production Reference: 1090713

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78216-791-4

www.packtpub.com

Cover Image by Abhishek Pandey (<abhishek.pandey1210@gmail.com>)

Credits

Author

Steve Hoffman

Reviewers

Subash D'Souza

Stefan Will

Acquisition Editor

Kunal Parikh

Commissioning Editor

Sharvari Tawde

Technical Editors

Jalasha D'costa

Mausam Kothari

Project Coordinator

Sherin Padayatty

Proofreader

Aaron Nash

Indexer

Monica Ajmera Mehta

Graphics

Valentina D'silva

Abhinash Sahu

Production Coordinator

Kirtee Shingan

Cover Work

Kirtee Shingan

About the Author

Steve Hoffman has 30 years of software development experience and holds a B.S. in computer engineering from the University of Illinois Urbana-Champaign and a M.S. in computer science from the DePaul University. He is currently a Principal Engineer at Orbitz Worldwide.

More information on Steve can be found at http://bit.ly/bacoboy or on Twitter @bacoboy.

This is Steve's first book.

I'd like to dedicate this book to my loving wife Tracy. Her dedication to perusing what you love is unmatched and it inspires me to follow her excellent lead in all things.

I'd also like to thank Packt Publishing for the opportunity to write this book and my reviewers and editors for their hard work in making it a reality.

Finally, I want to wish a fond farewell to my brother Richard who passed away recently. No book has enough pages to describe in detail just how much we will all miss him. Good travels brother.

About the Reviewers

Subash D'Souza is a professional software developer with strong expertise in crunching big data using Hadoop/HBase with Hive/Pig. He has worked with Perl/PHP/Python, primarily for coding and MySQL/Oracle as the backend, for several years prior to moving into Hadoop fulltime. He has worked on scaling for load, code development, and optimization for speed. He also has experience optimizing SQL queries for database interactions. His specialties include Hadoop, HBase, Hive, Pig, Sqoop, Flume, Oozie, Scaling, Web Data Mining, PHP, Perl, Python, Oracle, SQL Server, and MySQL Replication/Clustering.

I would like to thank my wife, Theresa for her kind words of support and encouragement.

Stefan Will is a computer scientist with a degree in machine learning and pattern recognition from the University of Bonn. For over a decade has worked for several startup companies in Silicon Valley and Raleigh, North Carolina, in the area of search and analytics. Presently, he leads the development of the search backend and the Hadoop-based product analytics platform at Zendesk, the customer service software provider.

www.PacktPub.com

Support files, eBooks, discount offers and more

You might want to visit www.PacktPub.com for support files and downloads related to your book.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

http://PacktLib.PacktPub.com

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.

Why Subscribe?

Fully searchable across every book published by Packt

Copy and paste, print and bookmark content

On demand and accessible via web browser

Free Access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.

Preface

Hadoop is a great open source tool for sifting tons of unstructured data into something manageable, so that your business can gain better insight into your customers, needs. It is cheap (can be mostly free), scales horizontally as long as you have space and power in your data center, and can handle problems your traditional data warehouse would be crushed under. That said, a little known secret is that your Hadoop cluster requires you to feed it with data; otherwise, you just have a very expensive heat generator. You will quickly find, once you get past the playing around phase with Hadoop, that you will need a tool to automatically feed data into your cluster. In the past, you had to come up with a solution for this problem, but no more! Flume started as a project out of Cloudera when their integration engineers had to keep writing tools over and over again for their customers to import data automatically. Today the project lives with the Apache Foundation, is under active development, and boasts users who have been using it in their production environments for years.

In this book I hope to get you up and running quickly with an architectural overview of Flume and a quick start guide. After that we’ll deep-dive into the details on many of the more useful Flume components, including the very important File Channel for

Enjoying the preview?

Page 1 of 1

Apache Flume: Distributed Log Collection for Hadoop

About this ebook

Steve Hoffman

Read more from Steve Hoffman

Related authors

Related to Apache Flume

Related ebooks

Computers For You

Related podcast episodes

Related articles

Related categories

Reviews for Apache Flume

What did you think?

Book preview

Apache Flume - Steve Hoffman

Table of Contents

Apache Flume: Distributed Log Collection for Hadoop

Apache Flume: Distributed Log Collection for Hadoop

Credits

About the Author

About the Reviewers

Support files, eBooks, discount offers and more

Why Subscribe?

Preface