Você está na página 1de 3

Search for:

Skip to content
ZephyrRapier
explore big data, data analysis and machine learning

Step by Step of installing Apache Spark on Apache Hadoop


Posted on July 1, 2015 by cyrobin

HI guys,

This time, I am going to install Apache Spark on our existing Apache Hadoop 2.7.0.

Env versions

OS-Ubuntu 15.04

Scala-2.11.7

spark-spark-1.4.0-bin-hadoop2.6.tgz

1. Install Scala (refer to this)


-------------------------------

sudo apt-get remove scala-library scala


sudo wget http://www.scala-lang.org/files/archive/scala-2.11.7.deb
sudo dpkg -i scala-2.11.7.deb
sudo apt-get update
sudo apt-get install scala

2.Install Spark
------------------

sudo apt-get install spark

=====================================================================
wget http://apache.mirrors.ionfish.org/spark/spark-1.4.0/spark-1.4.0-bin-
hadoop2.6.tgz
wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.0-bin-hadoop2.4.tgz
tar -zxvf spark-1.4.0-bin-hadoop2.6.tgz
--------------------------------------------------------------
Copy the tar file from PD, Extract on desktop rename as spark
cd /home/ambreesh/Desktop
mv spark /usr/local/spark

3 get hadoop version

hadoop version
It should show 2.7.0
4 add spark home

sudo nano ~/.bashrc


add
export SPARK_HOME=/usr/local/spark
source ~/.bashrc
sudo apt-get install spark
5 Spark Version
Since spark-1.4.0-bin-hadoop2.6.tgz is an built version for hadoop 2.6.0 and later,
it is also usable for hadoop 2.7.0.
Thus, we dont bother to re-build by sbt or maven tools, which are indeed
complicated. If you download the source code from Apache spark org, and build with
command
build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.0 -DskipTests clean package
There are lots of build tool dependency crashed.
So, no bother about building spark.
6 lets verify our installation

cd $SPARK_HOME
7. launch spark shell (refer to this)

./bin/spark-shell
1
It means spark shell is running
8. Test spark shell

scala:> sc.parallelize(1 to 100).count()


background info
scspark context, Main entry point for Spark functionality.
A SparkContext represents the connection to a Spark cluster,
and can be used to create RDDs,
accumulators and broadcast variables on that cluster.
parallelizeDistribute a local Scala collection to form an RDD.
countReturn the number of elements in the dataset.
3
scala:> exit
9 Lets try another typical example

bin/spark-submit class org.apache.spark.examples.SparkPi master local[*]


lib/spark-example* 10
The last variable 10 s the argument for the main of the application. For here is
the slice number used for calculation Pi
4
Congratulations! We have finishing Spark installation and next we can start using
this powerfull tool to perform data analysis and many other fun stuffs.

Advertisements

Share this:
TwitterFacebook1Google

Step by Step of installing SingleNode Yarn on Ubuntu


Step by Step of Configuring Apache Spark to Connect with Cassandra
With 4 comments
Setup ipython notebook on PySpark
Post navigation
STEP BY STEP OF INSTALLING SINGLENODE YARN ON UBUNTUSTUDY SPAM CLASSIFIER CODE BY
MLLIB ON INTELLIJ
2 comments

AMK October 17, 2015


Do we have to repeat this process on all the nodes including namenode and
datanodes.

Like
Reply

cyrobin November 24, 2015


Most of the process you have to repeat.

Like
Reply
Leave a Reply

Enter your comment here...


Create a free website or blog at WordPress.com.
Follow
:)

Você também pode gostar