Você está na página 1de 10

Anika Technologies

Your Transforming Partner

Big Data and NO SQL


Training Contents

Course Contents

Big Data and NO SQL


Course Contents

NOSQL: What it is and Why you need it.


Day 1
NOSQL:Getting Initial Hands on Experience
Interfacing and Interacting with NO SQL
Understanding the Storage Architecture
Performing Crud Operations
Querying NOSQL Stores
Modifying
Data
Stores
and
Managing Evolution
Indexing and Ordering Data Sets Day2
Managing Transactions and Data
Integrity
Hadoop Introduction
MapReduce
Understanding Hadoop I/O
Distributing Data with HDFS
Map-Reduce Internals
Advanced MapReduce

Introduction to Big Data

NOSQL: WHAT IT IS AND WHY YOU NEED IT

NOSQL: GETTING INITIAL HANDS-ON EXPERIENCE

Scalability
Big Data
Defi nition and Introduction
Key/Value Stores
Context and a Bit of History
Document Databases
Summary
Sorted Ordered Column-Oriented Stores
Graph Databases
A First Look at Thrift
First Impressions Examining Two Simple Examples
Storing Car Make and Model Data
A Simple Set of Persistent Preferences Data
MongoDBs Drivers
Working with Language Bindings
Summary

INTERFACING AND INTERACTING WITH NOSQL

Language Bindings for PHP


Language Bindings for Python
Storing and Accessing Data
Querying HBase
Storing Data In and Accessing Data from Redis
Storing Data In and Accessing Data from MongoDB
Being Agnostic with Thrift
If No SQL, Then What?
Language Bindings for NoSQL Data Stores
Storing Data In and Accessing Data from Apache Cassandra
Language Bindings for Java

Anika Technologies
Your Transforming Partner
Mobile: +91 7719882295/ 9730463630
Email: sales@anikatechnologies.com
Website:www.anikatechnologies.com

UNDERSTANDING THE STORAGE ARCHITECTURE

Language Bindings for Ruby


Querying Apache Cassandra
Summary
Querying Redis
Querying MongoDB
Storing Data In and Accessing Data from HBase
Redis Internals
Summary
Storing Data in Memory-Mapped Files
HBase Distributed Storage Architecture
Eventually Consistent Non-relational Databases
Laying out the Webtable
Consistent Hashing
Gossip-Based Membership and Hinted Handoff
Working with Column-Oriented Databases
Using Tables and Columns in Relational Databases
Document Store Internals
Object Versioning
MongoDB Reliability and Durability
Under the Hood of Memcached
Contrasting Column Databases with RDBMS
Guidelines for Using Collections and Indexes in MongoDB
Understanding Key/Value Stores in Memcached and Redis
Horizontal Scaling
Column Databases as Nested Maps of Key/Value Pairs

PERFORMING CRUD OPERATIONS

Updating and Modifying Data in MongoDB, HBase, and Redis


Summary
Using the Create Operation in Key/Value Maps
Limited Atomicity and Transactional Integrity
Creating Records

QUERYING NOSQL STORES

The Historical Daily Market Data


Summary
Loading the MovieLens Data
MapReduce in MongoDB
Querying Redis Data Stores
Similarities Between SQL and MongoDB Query Features
Accessing Data from Column-Oriented Databases Like HBase

MODIFYING
DATA
MANAGING EVOLUTION

Creating Records in a Document-Centric Database


Accessing Data from HBase
Updating and Deleting Data
Accessing Documents from MongoDB
Using the Create Operation in Column-Oriented
Querying Redis
Accessing Data
Databases

STORES

Schema-less Flexibility
Data Evolution in Key/Value Stores
HBase Data Import and Export
Summary
Changing Document Databases
Exporting and Importing Data from and into MongoDB
Schema Evolution in Column-Oriented Databases

INDEXING AND ORDERING DATA SETS

AND

Creating Unique and Sparse Indexes


Compound and Embedded Keys
Indexing and Ordering in CouchDB
Indexing in Apache Cassandra
Keyword-based Search and MultiKeys
The B-tree Index in CouchDB

MANAGING TRANSACTIONS AND DATA INTEGRITY

Summary
Creating and Using Indexes in MongoDB
Essential Concepts Behind a Database Index
Indexing and Ordering in MongoDB
Summary
Consistency in Membase
Availability
Consistency
Isolation Levels and Isolation Strategies
Distributed Consistency in MongoDB
Upholding CAP
Compromising on Consistency
Compromising on Availability
Consistency Implementations in a Few NoSQL Products
Compromising on Partition Tolerance
Partition Tolerance
Distributed ACID Systems
RDBMS and ACID
Eventual Consistency in CouchDB

Hadoop Introduction

Move computation not data.


Volunteer Computing
Hadoop Releases
Hadoop performance and data scale facts.
The Apache Hadoop Project.
Grid Computing
Hadoop in the context of other data stores.
The Hadoop Ecosystem.
Apache Hadoop and the Hadoop Ecosystem
A Brief History of Hadoop
Hadoop an inside view: MapReduce and HDFS.

What about NoSQL?


RDBMS
Comparison with Other Systems

MapReduce

Constructing the basic template of a MapReduce program


Running a Distributed MapReduce Job
Data FlowCombiner Functions
Java MapReduceScaling Out
Counting things
Analyzing the Data with Hadoop
Map and Reduce
Hadoop Pipes
Adapting for Hadoops API changes
Improving performance with combiners

Hadoop Streaming

Streaming in Hadoop

Ruby
Python
Streaming with key/value pairs
Streaming with Unix commands
Streaming with the Aggregate package
Streaming with scripts

Understanding Hadoop I/O


File-Based Data Structures

MapFile
SequenceFile

Serialization

Implementing a Custom Writable


Serialization Frameworks
The Writable Interface
Writable Classes

Compression

Codecs
Using Compression in MapReduce
Compression and Input Splits

Data Integrity

Avro

ChecksumFileSystem
LocalFileSystem
Data Integrity in HDFS

Distributing Data with HDFS

Interfaces
Hadoop Filesystems
The Design of HDFS

Using Hadoop Archives

Parallel Copying with distcp

Anatomy of a File Write


Anatomy of a File Read
Coherency Model

The Command-Line Interface

Keeping an HDFS Cluster Balanced


Hadoop Archives

Data Flow

Limitations

Basic Filesystem Operations

The Java Interface

Querying the Filesystem


Reading Data Using the FileSystem API
Directories
Deleting Data
Reading Data from a Hadoop URL
Writing Data

Map-Reduce Internals
Failures

Anatomy of a MapReduce Job Run

Skipping Bad Records


Output Committers
The Task Execution Environment
Speculative Execution
Task JVM Reuse

Job Scheduling

The Reduce Side


The Map Side
Configuration Tuning

Task Execution

Classic MapReduce (MapReduce 1)


YARN (MapReduce 2)

Shuffle and Sort

Failures in YARN
Failures in Classic MapReduce

The Capacity Scheduler


The Fair Scheduler

Advanced MapReduce
Chaining MapReduce jobs

Creating a Bloom filter

Chaining preprocessing and postprocessing steps


Chaining MapReduce jobs in a sequence
Chaining MapReduce jobs with complex dependency
What does a Bloom filter do?
Bloom filter in Hadoop version 0.20+
Implementing a Bloom filter

Joining data from different sources

Reduce-side joining

Replicated joins using DistributedCache


Semijoin: reduce-side join with map-side filtering

Anika Technologies
Your Transforming Partner
Mobile: +91 7719882295/ 9730463630
Email: sales@anikatechnologies.com
Website:www.anikatechnologies.com

Você também pode gostar