Você está na página 1de 23

EDUREKA

Apache Hadoop
Installation and Cluster
setup on AWS EC2
(Ubuntu) Part 1
A guide to install and setup Multi-Node Apache
Hadoop Cluster on AWS EC2
edureka!
9/20/2013

A guide to setup a Multi-Node Apache Hadoop Cluster on AWS EC2 (using free tier eligible server)
APACHE HADOOP INSTALLATION AND
CLUSTER SETUP ON AWS EC2 (UBUNTU)
PART 1
A guide to install and setup Multi-Node Apache Hadoop Cluster on AWS
EC2

Table of Contents
Introduction ............................................................................................................................................ 2
1. Setting up the Cluster Infrastructure on AWS EC2 ......................................................................... 2
1.1 Creating a AWS Free Account ....................................................................................................... 2
1.1.1 Signup and register on AWS................................................................................................... 2
1.1.2 Use your correct contact number .......................................................................................... 4
1.1.3 Choose a Plan for your usage................................................................................................. 4
1.2 Login to AWS ................................................................................................................................. 6
1.3. Creating Cluster member servers ................................................................................................ 7
1.3.1 Choose a free tier eligible instance ........................................................................................ 7
1.3.2 Create a key pair .................................................................................................................. 11
1.3.3 Configure Security Group and Firewall settings................................................................... 12
1.3.4 Review the pre-launch ......................................................................................................... 13
1.3.5 Launch the servers ............................................................................................................... 14
1.4 Setup client access to AWS servers............................................................................................. 16
1.4.1 Generate the Public/Private KeyPair ................................................................................... 16
1.4.2 Import keypair and save public/private keys ....................................................................... 16
1.4.3 Access the AWS EC2 servers ................................................................................................ 17
1.4.4 Setup WINSCP access to AWS EC2 servers .......................................................................... 22

2013 Brain4ce Education Solutions Pvt. Ltd Page 1


Introduction
This setup and configuration document is a guide to setup a Multi-Node Apache Hadoop cluster on
Amazon Web Services (AWS) Elastic Cloud 2 (EC2) using free tier usage eligible Ubuntu (t1.micro)
servers. If you are new to both AWS and Hadoop, this guide comes handy to quickly setup a Multi-
Node Apache Hadoop Cluster on AWS EC2.

Note
AWS also provides a hosted solution for Hadoop, named Amazon Elastic
Map Reduce (EMR) but Only Pig and Hive are available as of now and with
a cost.

The guide describes the whole process in two parts:

Part 1: Setting up the Cluster Infrastructure on AWS EC2


This section describes step by step guide to setup an AWS account and launch the AWS EC2 free tier
eligible Ubuntu servers. These servers will be used to setup a four node Apache Hadoop Clusters on
AWS EC2 cloud infrastructure.

Part 2: Installing Apache Hadoop and Setting up the Cluster


This section provides step by step guide to install pre-requisites for Hadoop Installation and to
configure the cluster on EC2 servers. The section explains primary Hadoop configuration files,
Password-less SSH access, configuring master and slaves, and service start/stop in detail.

Note
The configuration described here is intended for learning purposes only.

1. Setting up the Cluster Infrastructure on AWS EC2


This section describes the steps to create a free account and launch Ubuntu servers on AWS EC2 for
Apache Hadoop Installation and Cluster Setup.

1.1 Creating a AWS Free Account


The first step is to create a free trial account in AWS. You can review the limit on free services at
http://aws.amazon.com/free/

1.1.1 Signup and register on AWS.


You can sign up on AWS using your email id and credit card.

Even though the AWS EC2 free tier eligible instances are available without any additional cost, you
need to specify the credit card during the account creation.

As explained in the following image, your credit card will be billed if your monthly usage goes
beyond the free tier. For example, using any additional AWS resource or service such as Elastic
Block Store (EBS).

2013 Brain4ce Education Solutions Pvt. Ltd Page 2


F IGURE 1-1 SPECIFY YOUR C REDIT C ARD DETAILS

2013 Brain4ce Education Solutions Pvt. Ltd Page 3


1.1.2 Use your correct contact number
Please ensure that you provide a correct contact number as AWS verify your identity through a
phone call on your number.

F IGURE 1-2 VERIFY THE DETAILS

1.1.3 Choose a Plan for your usage


Choose basic plan for trial usage. This plan is good enough to create the cluster and to play around p.

2013 Brain4ce Education Solutions Pvt. Ltd Page 4


F IGURE 1-3 C HOOSE A PLAN

2013 Brain4ce Education Solutions Pvt. Ltd Page 5


1.2 Login to AWS
Login to your AWS account and access the AWS Management Console.

F IGURE 1-4 AWS MANAGEMENT CONSOLE

Choose EC2 and access EC2 Dashboard to create cluster member servers.

2013 Brain4ce Education Solutions Pvt. Ltd Page 6


F IGURE 1-5 EC2 D ASHBOARD

1.3. Creating Cluster member servers


Click on Launch Instance and choose Classic Wizard to create, configure and launch your Cluster
Servers.

1.3.1 Choose a free tier eligible instance


Choose an Instance configuration. All the option with the orange colour star are Free tier eligible
instances. (If used with a micro instance).

2013 Brain4ce Education Solutions Pvt. Ltd Page 7


F IGURE 1-6 Q UICK L AUNCH

Choose Ubuntu 12.04.2 LTS. Remember to change number of Instances to 4. This will simultaneously
create four Ubuntu instances.

2013 Brain4ce Education Solutions Pvt. Ltd Page 8


F IGURE 1-7 C HOOSE I NSTANCE DETAILS

Ensure that you choose free tier for the setup. Keep the defaults but change the root volume to 5 or
6 GiB so that the total HDD usage (4*5 =20 GiB) is below the free tier limit of 30 GiB/Month.

2013 Brain4ce Education Solutions Pvt. Ltd Page 9


F IGURE 1-8 I NSTANCE DETAILS

Choose a name and add any other tag for billing or operations purpose.

F IGURE 1-9 C HOOSE NAME AND TAGS

2013 Brain4ce Education Solutions Pvt. Ltd Page 10


1.3.2 Create a key pair
This is the most important part of launching and creating the AWS instances. AWS provides a
private/public key based access to the servers. You can choose a previously created key or can
create a new key pair. We will create and download the fresh key pair. Keep the Key Pair file (.pem)
safe in your PC as this will be needed to access the servers.

F IGURE 1-10 C REATE AND DOWNLOAD A KEY PAIR

2013 Brain4ce Education Solutions Pvt. Ltd Page 11


1.3.3 Configure Security Group and Firewall settings
You need to choose a security group to control the access to the services on server. You can create a
new Group or use the existing one.

Create a group with default options and Add All TCP, All ICMP and SSH (22) under the inbound
rules. This will allow ping, SSH, and other similar commands among servers and from any other
machine on internet.

These protocols and ports are also required to enable communication among cluster servers. As this
is a test setup we are allowing access to all for TCP, ICMP and SSH and not bothering about the
details of individual server port and security.

F IGURE 1-11 CONFIGURE F IREWALL

2013 Brain4ce Education Solutions Pvt. Ltd Page 12


1.3.4 Review the pre-launch
Review all the settings before you proceed with the server creation.

F IGURE 1-12 REVIEW THE SERVER CREATION

2013 Brain4ce Education Solutions Pvt. Ltd Page 13


1.3.5 Launch the servers
Launch the servers and review the Instance page for newly launched servers.

F IGURE 1-13INSTANCE REVIEW AT EC2 D ASHBOARD

Rename the servers according to their roles in cluster.

F IGURE 1-14 RENAME THE SERVERS AS PER THEIR ROLES

2013 Brain4ce Education Solutions Pvt. Ltd Page 14


Here is the final list of instances:

F IGURE 1-15 SERVER DETAILS

Make a note of the public URL of servers such as ec2-54-212-38-184.us-west-


2.compute.amazonaws.com. These URLs will be used to access the servers from your PC and to
monitor the HDFS health from your browser.

F IGURE 1-16 SERVER DETAILS

2013 Brain4ce Education Solutions Pvt. Ltd Page 15


1.4 Setup client access to AWS servers
You need to setup password-less SSH access among servers to setup the cluster. Especially from
Master server to Slave servers to ensure that Master Server can remotely start the Data Node and
Task Tracker services on Slave servers.

1.4.1 Generate the Public/Private KeyPair


Download putty to access the AWS EC2 servers. Also download puttygen to generate the
public/private keypair from the .pem created in step 1.3.2 Create a Key pair

1.4.2 Import keypair and save public/private keys


Open puttygen and import the .pem file downloaded to your PC in step 1.3.2 Create a Key pair.

F IGURE 1-17 IMPORT THE KEY PAIR

You can give passphrase to protect your private key or leave the passphrase fields blank to use the
private key without any passphrase. The passphrase protects the private key from any unauthorized
access to servers using your machine and your private key. Every access to servers using passphrase
protected private key will require end user to enter the passphrase to enable the private key
enabled access to AWS EC2 server.

2013 Brain4ce Education Solutions Pvt. Ltd Page 16


F IGURE 1-18 C REATE P UBLIC/P RIVATE KEYS

1.4.3 Access the AWS EC2 servers


Access the servers using the private key created in Step 1.4.2 Import keypair and save public/private
keys and note down their hostname and IP addresses using ifconfig command.

2013 Brain4ce Education Solutions Pvt. Ltd Page 17


F IGURE 1-19 A DD THE PRIVATE KEY TO PUTTY

You may receive following error if you have not appropriately configured your security group in Step
1.1.3 .

2013 Brain4ce Education Solutions Pvt. Ltd Page 18


F IGURE 1-20 A DD THE PRIVATE KEY TO PUTTY

Note the IP Address and update the /etc/hosts file with hostname and IP address.

2013 Brain4ce Education Solutions Pvt. Ltd Page 19


F IGURE 1-21 H OST IP ADDRESS

Change the hostname to Public URL of AWS EC2 server using the following command:
$sudo hostname ec2-54-214-206-65.us-west-2.compute.amazonaws.com

2013 Brain4ce Education Solutions Pvt. Ltd Page 20


F IGURE 1-22 C HANGE HOSTNAME

Edit /etc/hosts with Public ID of your AWS EC2 server:

$sudo vi /etc/hosts

F IGURE 1-23 H OSTNAME CHANGE

Also, repeat all the steps in this particular Section (1.4.3) ion all the other three cluster servers to
enable public access to these AWS EC2 servers.

2013 Brain4ce Education Solutions Pvt. Ltd Page 21


1.4.4 Setup WINSCP access to AWS EC2 servers
Use the private key created in Step 1.4.2 Import keypair and save public/private keys to access the
servers from desktop with WINSCP for any file download and upload to/from the servers from/to
your PC.

F IGURE 1-24 SETUP WINSCP

Copy the .pem file and other keys to Master server using WinSCP

You are ready with the infrastructure to create your first Apache Hadoop Cluster.

Please Review the Part -2 of this guide to create the Apache Hadoop Cluster.

2013 Brain4ce Education Solutions Pvt. Ltd Page 22