Escolar Documentos
Profissional Documentos
Cultura Documentos
2-1 What to do
In this in-lab we will try to compute some simple statistics and answer some questions about the
airline data using a modified version of the wordcount MapReduce job. Download the sample
code and data files from the D2L tab Week 3 lecture 06. Save your data to user/biadmin/lab3
on the sandbox. Use the supplied sample code (modify and add to) to achieve the following:
2-1-1 Write MapReduce job (in R) to find out how many unique carriers, origins,
destination in the given data (air.csv). View the carrier list using the R function View
2-1-2 Write MapReduce job (in R) to find the max departure delay per month and per
carrier (ALL Carriers) and draw the results. The results should look like the following
two figures.
2-1-3 Write ONE MapReduce job (in R) to find the max, min, and average Arrival Delay
per Carrier and draw a line plot for these statistics.
2-1-4 Write your own interpretation and comments on the results and graphs for (2-1-2 and
2-1-3).
2-2 What to hand
Write down your code in a single name it (lastname_wordcount_inlab3.R) file with your
comments (some marks will be awarded for commenting on the code) on the code and upload it
on the D2L before the due date.
3- Post-Lab: Understand the travel pattern
In this post-lab we will try to count the frequency of every origin-to-destination airport in the
air.csv file (you can download this file from the D2L Tab Week 3 lecture 06)
3-1 What to do
Write MapReduce job (in R) to find the following:
3-1-1 Top 10 Airports by total volume of flights for all destinations
It is required to rank every airport by the volume of flights that have this airport as
a destination (e.g. destination=JFK). You are asked to find the number of flights
that satisfy this condition per airport and then rank all airports according to the
calculated flight volumes. The results should be represented in a table with the top
10 airports and the calculate number of flights.
3-1-2 Busy Routes per year per month
Some cities are more attractive than others and thus many people visit it more
frequently. It is required to plot the monthly flight volume per airport and
highlight the busiest month per airport. The results should be tabulated and a
demonstration graph bar plot should be provide. Calculate this only for the top 10
destination airports, (one table and chart per input).
3-1-3 Create directed graph
A directed graph is a plot of a set of items (called vertices or nodes) that are
connected together by edges, where all the edges are directed from one
vertex/node to another. When drawing a directed graph, the edges are typically
drawn as arrows indicating the direction. We can use the directed graph to
visualize the flow of flights and the possible paths between cities/airports. In this
exercise, it is required to draw a directed graph for JFK airport. The
vertices/nodes represent the airports and the link direction represents the outgoing
or the ingoing flight direction. The node should contain the name of the airport,
the number of flight to this airport, and the distance between the JFK and other
airports. The results should look like the following graph.
LA, 100,2500
JFK
Sample directed outgoing graph for JFK airport and 3 different destinations with the number of
flights and distance between JFK and the other 3 airports
The nodes should be sorted such that the top node represents the highest
frequency of flights between JFK and the airport in this node. Follow these
requirements; it is required to develop the following directed graph in R and
MapReduce jobs as described in the following:
3-1-3-1 Create an outgoing directed graph for JFK as origin and other airports as
destination
3-1-3-2 Create an ingoing directed graph for JFK as destination and other airports as
origins
3-1-3-3 Augment both graphs into one big graph with the conjoint point JFK airport
3-1-3-4 Write your own interpretation and comments on the results and graphs
3-2 What to hand
Write down your code in a single name it (lastname_wordcount_postlab3.R) file with
your comments (some marks will be awarded for commenting on the code) on the code
and upload it on the D2L before the due date.