Escolar Documentos
Profissional Documentos
Cultura Documentos
Apache Hadoop
Installation and Cluster
setup on AWS EC2
(Ubuntu) Part 1
A guide to install and setup Multi-Node Apache
Hadoop Cluster on AWS EC2
edureka!
9/20/2013
A guide to setup a Multi-Node Apache Hadoop Cluster on AWS EC2 (using free tier eligible server)
APACHE HADOOP INSTALLATION AND
CLUSTER SETUP ON AWS EC2 (UBUNTU)
PART 1
A guide to install and setup Multi-Node Apache Hadoop Cluster on AWS
EC2
Table of Contents
Introduction ............................................................................................................................................ 2
1. Setting up the Cluster Infrastructure on AWS EC2 ......................................................................... 2
1.1 Creating a AWS Free Account ....................................................................................................... 2
1.1.1 Signup and register on AWS................................................................................................... 2
1.1.2 Use your correct contact number .......................................................................................... 4
1.1.3 Choose a Plan for your usage................................................................................................. 4
1.2 Login to AWS ................................................................................................................................. 6
1.3. Creating Cluster member servers ................................................................................................ 7
1.3.1 Choose a free tier eligible instance ........................................................................................ 7
1.3.2 Create a key pair .................................................................................................................. 11
1.3.3 Configure Security Group and Firewall settings................................................................... 12
1.3.4 Review the pre-launch ......................................................................................................... 13
1.3.5 Launch the servers ............................................................................................................... 14
1.4 Setup client access to AWS servers............................................................................................. 16
1.4.1 Generate the Public/Private KeyPair ................................................................................... 16
1.4.2 Import keypair and save public/private keys ....................................................................... 16
1.4.3 Access the AWS EC2 servers ................................................................................................ 17
1.4.4 Setup WINSCP access to AWS EC2 servers .......................................................................... 22
Note
AWS also provides a hosted solution for Hadoop, named Amazon Elastic
Map Reduce (EMR) but Only Pig and Hive are available as of now and with
a cost.
Note
The configuration described here is intended for learning purposes only.
Even though the AWS EC2 free tier eligible instances are available without any additional cost, you
need to specify the credit card during the account creation.
As explained in the following image, your credit card will be billed if your monthly usage goes
beyond the free tier. For example, using any additional AWS resource or service such as Elastic
Block Store (EBS).
Choose EC2 and access EC2 Dashboard to create cluster member servers.
Choose Ubuntu 12.04.2 LTS. Remember to change number of Instances to 4. This will simultaneously
create four Ubuntu instances.
Ensure that you choose free tier for the setup. Keep the defaults but change the root volume to 5 or
6 GiB so that the total HDD usage (4*5 =20 GiB) is below the free tier limit of 30 GiB/Month.
Choose a name and add any other tag for billing or operations purpose.
Create a group with default options and Add All TCP, All ICMP and SSH (22) under the inbound
rules. This will allow ping, SSH, and other similar commands among servers and from any other
machine on internet.
These protocols and ports are also required to enable communication among cluster servers. As this
is a test setup we are allowing access to all for TCP, ICMP and SSH and not bothering about the
details of individual server port and security.
You can give passphrase to protect your private key or leave the passphrase fields blank to use the
private key without any passphrase. The passphrase protects the private key from any unauthorized
access to servers using your machine and your private key. Every access to servers using passphrase
protected private key will require end user to enter the passphrase to enable the private key
enabled access to AWS EC2 server.
You may receive following error if you have not appropriately configured your security group in Step
1.1.3 .
Note the IP Address and update the /etc/hosts file with hostname and IP address.
Change the hostname to Public URL of AWS EC2 server using the following command:
$sudo hostname ec2-54-214-206-65.us-west-2.compute.amazonaws.com
$sudo vi /etc/hosts
Also, repeat all the steps in this particular Section (1.4.3) ion all the other three cluster servers to
enable public access to these AWS EC2 servers.
Copy the .pem file and other keys to Master server using WinSCP
You are ready with the infrastructure to create your first Apache Hadoop Cluster.
Please Review the Part -2 of this guide to create the Apache Hadoop Cluster.