Você está na página 1de 10

Matrix Multiplication using Hadoop

Map-Reduce
Step 1: Install Hadoop in Stand-Alone Mode

Step 2: Matrix MultiplicationUsing MapReduce Programming

1.1 ​Installing Java

Check Existing Java version by running command

​ java -version

1.2 ​Create hadoop home directory

We will use hadoop 3.1.2.tar.gz here.

Extract hadoop file using following command

​ tar -xzvf hadoop-2.7.3.tar.gz

Move hadoop to /usr/local

sudo mv hadoop-3.1.2 /usr/local/hadoop


1.3 Configuring Hadoop's Java_home

Hadoop requires that you set the path to Java, either as an environment variable or
in the Hadoop configuration file.

The path to Java, /usr/bin/java is a symlink to /etc/alternatives/java, which is in


turn a symlink to default Java binary. We will use readlink with the -f flag to
follow every symlink in every part of the path, recursively. Then, we'll use sed to
trim bin/java from the output to give us the correct value for JAVA_HOME

To find the default Java path

readlink -f /usr/bin/java | sed "s:bin/java::"

Output :

/usr/lib/jvm/java-11-openjdk-amd64/

Use Readlink to Set the Value Dynamically

Sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh

Add this line for

export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")


1.4 Running Hadoop

Now we should be able to run Hadoop:

/usr/local/hadoop/bin/hadoop

Output :

The help means we've successfully configured Hadoop to run in stand-alone mode.
We'll ensure that it is functioning properly by running the example MapReduce
program it ships with. To do so, create a directory called input in our home
directory and copy Hadoop's configuration files into it to use those files as our
data.

mkdir ~/input

cp /usr/local/hadoop/etc/hadoop/*.xml ~/input
Next, we can use the following command to run the MapReduce hadoop-mapreduce-examples
program, a Java archive with several options. We'll invoke its grep program, one of many
examples included in hadoop-mapreduce-examples, followed by the input directory, input and
the output directory grep_example. The MapReduce grep program will count the matches of a
literal word or regular expression. Finally, we'll supply a regular expression to find
occurrences of the word principal within or at the end of a declarative sentence. The
expression is case-sensitive, so we wouldn't find the word if it were capitalized at the
beginning of a sentence:

/usr/local/hadoop/bin/hadoop jar
/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar grep
~/input ~/grep_example 'principal[.]*'

When the task completes, it provides a summary of what has been processed and errors it has
encountered, but this doesn't contain the actual results

Results are stored in the output directory and can be checked by running cat on the output
directory:

cat ~/grep_example/*
Step 2: Matrix Multi1plicationUsing MapReduce Programming

2.1. I​n ​mathematics​,​ matrix multiplication​ or the ​matrix product​ is a binary operation that
produces a matrix from two matrices. The definition is motivated by linear equations and linear
transformations on vectors, which have numerous applications in applied mathematics, physics, and
engineering. In more detail, if ​A​ is an ​n × m​ matrix and ​B​ is an ​m × p​ matrix, their matrix product
AB ​is an ​n × p​ matrix, in which the m entries across a row of ​A​ are multiplied with the m entries
down a column of ​B​ and summed to produce an entry of ​AB​. When two linear transformations are
represented by matrices, then the matrix product represents the composition of the two
transformations.
Algorithm for Map Function.
a. for each element m​ij​ of M do
produce (key,value) pairs as ((i,k), (M,j,m​ij​), for k=1,2,3,.. upto the number of
columns of N
b. for each element njk of N do
produce (key,value) pairs as ((i,k),(N,j,N​jk​), for i = 1,2,3,.. Upto the number of rows
of M.
c. return Set of (key,value) pairs that each key (i,k), has list with values (M,j,m​ij​)
and (N, j,n​jk​) for all possible values of j.
Algorithm for Reduce Function.
for each key (i,k) do
sort values begin with M by j in list​M
sort values begin with N by j in list​N
multiply m​ij​ and n​jk​ for jth value of each list
sum up m​ij​ x n​jk​ return (i,k), Σ​j=1​ m​ij​ x n​jk

2.2 Download the hadoop jar files with these links.

Download Hadoop Common Jar files :


wget https://goo.gl/G4MyHp -O hadoop-common-3.1.2.jar

Download Hadoop Mapreduce Jar File :


wget https://goo.gl/KT8yfB -O hadoop-mapreduce-client-core-3.1.2.jar
2.3 ​Creating Mapper file for Matrix Multiplication.
Refer Map.java
2.4 Creating Reducer.java file for Matrix Multiplication
​ educe.java
Refer R
2.5 Creating MatrixMultiply.java file
Refer MatrixMultiply.java

2.6 Compiling the program in particular folder named as operation/


javac -cp hadoop-common-3.1.2.jar:hadoop-mapreduce-client-core-3.1.2.jar:operation/:. -d operation/
Map.java

javac -cp hadoop-common-3.1.2.jar:hadoop-mapreduce-client-core-3.1.2.jar:operation/:. -d operation/


Reduce.java

javac -cp hadoop-common-3.1.2.jar:hadoop-mapreduce-client-core-3.1.2.jar:operation/:. -d operation/


MatrixMultiply.java

2.7 Let’s retrieve the directory after compilation.

ls -R operation/

​ reating Jar file for the Matrix Multiplication.


2.8 C

jar -cvf MatrixMultiply.jar -C operation/ .


Output :

2.9 Uploading the M, N file which contains the matrix multiplication data to HDFS.
Refer File ‘M’
Refer File ‘N’

hadoop fs -mkdir Matrix/


hadoop fs -copyFromLocal M Matrix/
hadoop fs -copyFromLocal N Matrix/

2.10 Executing the jar file using hadoop command and thus how fetching record from
HDFS and storing output in HDFS.
hadoop jar MatrixMultiply.jar MatrixMultiply Matrix result

NOTE : Here output of mapper and reducer will be generated


2.11 Getting Output from part-r-00000 that was generated after the execution of the
hadoop command.
hadoop fs -cat result/part-r-00000

Você também pode gostar