Você está na página 1de 3

Weather forecasting using BigData

Sandeep Kathula Rohit Surve


Indiana University Bloomington Indiana University Bloomington
F16-IG-3010 F16-IG-3019
skathula@indiana.edu rohsurve@indiana.edu

ABSTRACT tend to analyze this quantitatively by predicting this change.


Weather forecasting is the application of science and tech- Moreover, predicting weather is useful for making informed
nology to predict the state of the atmosphere for a given decisions. For example, weather forecast is used to deter-
location. Weather forecasting is one of the most challenging mine when to schedule important outdoor activities, meet-
problems in the world. It has been one of the most inter- ings etc. Predicting the future weather requires us to ana-
esting and fascinating domain. Scientists have been trying lyze the weather patterns from the past many years which
to forecast the weather using large set of methods, some of requires huge amount of weather data from last 50 - 100
them more accurate from others. For the weather prediction years. This data cannot be processed by traditional methods
different types of sensors are being employed in all the places and requires huge data processing techniques like Hadoop.
which collect massive amount of data. This data has to be
processed to forecast the weather. Even radars and satellites
which help in weather prediction produce enormous amount 2. PROBLEM STATEMENT
of data. Different sensor values along with data from the We, the human beings are trying to predict the weather in-
radars and satellites like temperature and humidity are used formally from thousands of years and formally since 19th
to predict the rain fall etc. When the sensors increase, data century. Weather forecasting is generally done by collecting
produced by them becomes huge in volume. The rate of pro- huge amount of data about current state of atmosphere and
duction (velocity) is very high. Thus, sensor data is a kind also from the past at a given place and by using scientific
of big data. Big data cannot be processed using traditional processes to project how it is going to change. In ancient
approaches. General computers cannot handle such huge days, weather forecast usually based on observed patterns
data. So, we wanted to process such data using leveraging of events. For example, we used to believe that if the sunset
map reduce with Hadoop [3]. Hadoop is open source frame- on a particular day was red, the following day would have
work developed by Apache for processing massive amount of fair weather. This experience accumulated over the genera-
data. Map Reduce programing helps to process huge data tions to produce weather myths. But not all the predictions
using parallel and distributed techniques. Processing sen- prove reliable, and majority of them have been found not
sor data with Map Reduce in Hadoop framework removes to succeed in rigorous statistical testing. In the 20th cen-
scalability bottleneck [4]. Speed of processing sensor data tury, advancement of atmospheric physics led to foundation
can increase many times when processed across cluster of of modern numerical weather prediction. The first comput-
nodes in a network. The main aim of our project is to build erized weather forecast was performed by a team led by the
the code for processing huge volume and high velocity of mathematician John von Neumann; von Neumann publish-
data from sensors using Map Reduce on Hadoop and an- ing the paper Numerical Integration of the Barotropic Vor-
alyze that data using statistical programming language R ticity Equation in 1950. Practical use of numerical weather
which will help us to identify the hidden patterns inside this prediction began in 1955, spurred by the development of
large dataset so as to transfer the retrieved information into programmable electronic computers.
usable knowledge for classification and prediction of climate
condition. To predict the weather accurately we must first know about
the processes occurring in the atmosphere to produce the
1. MOTIVATION present weather for the place which we are trying to fore-
cast. This is done by making certain observations like tem-
Changing climatic conditions around the world are of ma-
perature, air pressure, humidity, cloud cover, wind direction
jor concern for many countries and through our project we
and speed, precipitation, monsoon etc. As many factors
we take into account, as we do more complete observation
across the earths surface which effect the weather we ex-
perience, the better picture we have about the processes
producing the current weather we experience. By observ-
ing the changes to these factors over time and comparing
changing patterns with the historical patterns, we can fore-
cast future weather conditions. If we can understand how
the atmosphere changes over time in response to various fac-
tors; i.e., differences in warming across the earths surface
from solar radiation, radiational cooling at night, warming of years-2016) of same day of some particular station and y
the atmosphere due to latent heat release during condensa- values representing corresponding temperature readings of
tion, etc., and can write mathematical equations to express that particular weather station. Based on these values will
these changes, then a useful tool becomes available to the build a best fitting line or Regression line.
forecaster - computer models - which can be constructed to
express how the atmosphere is changing and will appear at The idea is to establish a below hypothesis function and
some future time. The output from these models can be used substitute the values of x (years) and y(temperatures) in
as an aid to forecasters in preparing the forecasts. For ob- the established function which will give us a straight line on
serving changes to the factors like temperatures in predicting the x-y plot.
the future weather conditions we need huge data produced
by the sensors from the past and present. This data needs to y= h(x) = 0 + 1 x
be processed effectively to make future predictions. Process-
ing such huge data cannot be done my traditional methods Our goal is to determine the 1 and 2 values that forms a
and by using normal computers. best fitting line through the plot which will calculated by
computing the cost function
3. OUR SOLUTION 1
(h (x)(i) y (i) )2
J(1 , 2 ) =
Different sensors which are employed in all the places can be 2m

used to measure weather parameters. Weather forecast de-


This equation gives the mean of squared difference between
partment has begun collecting and analyzing massive amount
a point in the graph and line i.e., the vertical distance be-
of data like temperature. When the number of sensors in-
tween the two. The lesser the value the more accurate is
creases, the volume of the data will increase and the ve-
the hypothesis. We will iterate for range of values of 1 and
locity of data will be high. There is a need of a scalable
2 until the hypothesis that minimizes the cost function to
analytics tool to process massive amount of data. The tradi-
the least possible value is determined. This hypothesis will
tional approach of process the data is very slow. We propose
constitute for the Best fitting line or is called the Regression
leveraging MapReduce with Hadoop to process the massive
Equation.
amount of data. Hadoop is an open source framework suit-
able for large scale data processing. MapReduce program-
With Regression Equation available we will substitute for x
ming model helps to process large data sets in parallel, dis-
value in hypothesis representing year (past or future) and
tributed manner. Processing the sensor data with MapRe-
the curve will give the corresponding y value representing
duce in Hadoop framework removes the scalability bottle-
temperature of that particular curve, which represents a par-
neck. The speed of processing data can increase rapidly
ticular day of a particular station. This algorithm is simi-
when across multi cluster distributed network. This project
larly applied for 365 days of whole year giving 365 different
aims to build a data analytical engine for high velocity,
curves each capable of predicting temperature of the days
huge volume temperature data from sensors using MapRe-
they represent. In this project we will predict for 5 upcom-
duce on Hadoop for forecasting the weather. For predicting
ing years (2016-2020). We will using Map Reduce model for
the weather forecast, we will be experimenting various ap-
computation process to achieve parallel execution and fast
proaches for prediction and further compare the accuracy
processing of the input data.
level from each of the below methods:
3. Sliding Window for prediction: [2] To predict the futures
1. Mean weather calculation using Map Reduce
weather condition, the variation in the conditions in past
years must be utilized. The probability that the weather
2. Linear Regression Algorithm Using Map Reduce
condition of the day in consideration will match the same
day in previous year is very less. But the probability that it
3. Sliding Window for prediction
will match within the span of adjacent fortnight of previous
year is very high. So, for the fortnight considered for previ-
1. Mean weather calculation using Map Reduce: We would
ous year a sliding window is selected of size equivalent to a
predict the weather by calculating mean value of tempera-
week. Every week of sliding window is then matched with
ture from dataset having temperatures of around 100 years
that of current years week in consideration. The window
using Map Reduce and predicting the future temperature
best matched is made to participate in the process of pre-
by using the mean values. The input weather dataset con-
dicting weather conditions. The prediction is made based
tains the values of temperature, time, place etc. In the
on sliding window algorithm. The month-wise results will
proposed method, the Map process will create a series of
be being computed for three years to check the accuracy.
key-value pairs where the key will be the Plain Old Java
Object(POJO), comprising of data fields like place, date,
We will be implementing the sliding window algorithm using
etc., and the value will be the temperature. These key-value
the approaches suggested in the paper to predict weather
pairs will be then shuffled into lists by key type. The shuffled
forecast using sliding window algorithm. Implementation
lists will be fed to the Reduce tasks, which will reduce the
will be done using R.
dataset volume by the values. The Reduce output is then a
simple list of averaged key-value pairs.
4. SOURCE OF OUR DATA
2. Linear Regression Algorithm Using Map Reduce: By us- [1] National Climatic Data Center (NCDC) have provide
ing simple linear regression algorithm, we are planning to weather datasets. Daily Global Weather Measurements 1929-
create a training set with x values representing years (past 2009 (NCDC, GSOD) dataset is one of the biggest dataset
available for weather forecast. Its total size is around 20 GB.
The United States National Climatic Data Center (NCDC),
previously known as the National Weather Records Center
(NWRC), in Asheville, North Carolina is the worlds largest
active archive of weather data. The Center has more than
150 years of data on hand with 224 gigabytes of new infor-
mation added each day. NCDC archives 99 percent of all
NOAA data, including over 320 million paper records; 2.5
million microfiche records; over 1.2 petabytes of digital data
residing in a mass storage environment. NCDC has satellite
weather images back to 1960.

Proposed System used the temperature dataset of NCDC,


GHCN (Global Historical Climatology Network)-Daily. It
is an integrated database of daily climate summaries from
land surface stations across the globe.

5. REFERENCES
[1] National centers for environmental information.
[2] S. S. B. Piyush Kapoorm. Weather forecasting using
sliding window algorithm. 2013:5, 2013.
[3] S. M. V. Riyaz P.A. Leveraging map reduce with
hadoop for weather data analytics. 17, 2015.
[4] T. White. Hadoop The Definitive Guide. OReilly
Media Inc, 4 edition.

Você também pode gostar