15bem0067 VL2018195004247 Pe004

Demand Forecasting using Data Analytics Techniques
Submitted in partial fulfilment of the requirements for the degree of
Bachelor of Technology
In
Mechanical Engineering
&
Mechanical Engineering with spec. in Energy
Engineering
by
PRANAV KALEVAR
15BEM0014
NIKHIL ANANTH NAYAK
15BEM0067
VARUN RAMESH
15BME0045
Under the guidance of

Dr. Rajyalakshmi G.
SMEC
VIT, Vellore.
April 2019
Scanned by CamScanner
Scanned by CamScanner
ACKNOWLEDGEMENTS
We would like to express our gratitude to our mentor, Dr. Rajyalakshmi G., as well as the Head of the
Departments of Mechanical Engineering as well as Thermal and Energy Engineering, who gave us the
opportunity to commit to and execute this project, and also for their constant guidance throughout the
duration of the project which helped us achieve the necessary results on time.
Secondly, we would also like to acknowledge the knowledge, support and motivation provided by
our family and friends, this project could not have been completed without their assistance at a
number of key points.
Lastly, we would like to thank our chancellor, Dr G. Viswanathan and Vellore Institute of Technology
for providing us with a plethora of resources and knowledge through the last 4 years and especially
throughout the duration of this project.
Varun Ramesh
Nikhil Ananth Nayak
Pranav Kalevar
i
Executive Summary
The project entitled “Demand Forecasting Using Data Analytics Techniques” utilises historic
sales data from a prominent furniture company to predict future fluctuations in sales as well
as providing accurate values for 3- and 4-year periods using Python.
The code initially involves the cleaning and pre-processing of sales data for furniture and
office supplies of the company by pulling order by order sales data from the database and
then segregating for efficient usage on the basis of type and date.
Subsequently, non-linear analysis techniques such as ARIMA (Auto Regressive Integrated

Moving Average) along with a specialised Python forecast module named FBProphet were
employed to predict sales values for pre-specified periods, the resultant data was then plotted
on to graphs to depict the sales levels, and they could also be used to predict the sales for
each month of the subsequent years with a comparatively higher degree of accuracy
compared to traditional forecast techniques.
ii
CONTENTS Page
No.
Acknowledgement i
Executive Summary ii
Table of Contents iii
List of Figures v
Abbreviations vi
1 INTRODUCTION 1
1.1 Objective 1
1.2 Background 1
1.3 Literature Survey 2
1.4 Gaps in Literature 8
2 PROJECT DESCRIPTION AND GOALS 10
2.1 Basic Description of the Project 10
2.2 Objective and Scope 10
3 TECHNICAL SPECIFICATIONS 11
3.1 Tools and Technologies Used 11
3.2 Technical Challenges 12
4 DESIGN APPROACH AND DETAILS 13
4.1 Techniques Used 13

4.2 Constraints, Alternatives and Trade-offs 15
.
5 SCHEDULE, TASKS AND MILESTONES 16.
5.1 Timeline and Deliverables 16
iii
5.2 Gantt Chart 18
6 PROJECT DEMONSTRATION 19
7 RESULTS AND DISCUSSION 38
8 SUMMARY 41
REFERENCES 42
.
.
iv
List of Figures
Figure No. Title Page No.

4.1 Time Series Decomposition 13
5.1 Gantt Chart 18
7.1 ARIMA Forecast vs Actual Sales Level 38
7.2 4 Year ARIMA Forecast 38
7.3 Furniture vs Office Supplies Sales Forecast 39
7.4 Furniture vs Office Supplies Straight Line Trend 39
7.5 Diagnostics Plot 40
v
List of Abbreviations
AIC Akaike Information Criterion
AR Models Auto Regressive Models
ARIMA Auto Regressive Int. Moving Avg.
ARMA Auto Regressive Moving Average
SARIMA Seasonal ARIMA
vi
1. INTRODUCTION
1.1. OBJECTIVE
The prime objective of the project was to devise and utilise an efficient and accurate forecasting technique
using Python code to predict any fluctuations in demand incoming in the future and thereby better equip
the company to maintain the requisite inventory levels to efficiently deal with the aforementioned
fluctuations which would ultimately lead to a significant rise in efficiency and increased cost savings for
the company by virtue of decreased wastage of inventory and other resources.
1.2 BACKGROUND
When we talk about supply chain analysis, it is essentially a device used to measure the growth opportunities
of a certain industry in a particular location and time. Fundamentally, the supply chain operations deal with
everything from the raw materials to the final product. The major processes involved here are planning,
information, sourcing, transportation, inventory management, production, warehousing and distribution.
Big data has huge potential in the field of SCM. The 3 main types of analytics techniques are descriptive,
predictive and prescriptive which can influence supply chain processes. Prescriptive analytics combines
descriptive analytics and predictive analytics. Descriptive analytics answers the question what happened
(in the past). Predictive analytics answers the question what is likely to happen. The final phase is
prescriptive analytics, which goes beyond predicting future outcomes by also suggesting actions to benefit
from the predictions and showing the implications of each decision option.
Inventory management is one of the major stages of the supply chain which deal with controlling and
overseeing ordering inventory, inventory storage and controlling the amount of product to be made to sell.
Determining the correct amount of stock to store is required for many purposes like profit maximization
and can be done using inventory forecasting, and the process of demand forecasting in turn helps use
maintain the requisite inventory levels to manage the inventory and thereby the supply chain efficiently.
1
1.3 LITERATURE SURVEY
In 2017, Shivani Aggarwal mentioned in her paper titled “Issues in supply chain planning of
Fruits and Vegetables in Agri-Food” that It is seen that the planning of supply chain of fruits
and vegetables is quite poor with bad collaboration between partners, large post-harvest
losses due to lack of proper infrastructure and cold chain facilities, multiple intermediaries
causing fragmented supply chains, improper monitoring, poorer produce quality, poor
knowledge of farmers and hesitation of farmers for getting into contractual agreements with
retail chains causing lower incomes. These are some of the major issues clearly highlighted.
Ways to enhance supply chain planning would be to facilitate vertical coordination among
farmers through cooperatives or associations, more contract farming, making the farmers
better informed, timely transportation of produce. To minimize wastes and improve
effectiveness of agri-chains demand forecasting should be done. Effective inventory
management must incorporate segmentation of fresh produce, distribution channels, supply
channels, customers and keeping price stability while keeping quality consistent.
Andrew C. Yao, John G. Carlson went ahead and observed the real time data communication
of inventory management that the requirement of a cost-effective, real-time information
systems has always been most important for manufacturing and distribution systems.
Demands for quality products with shorter lead time at best process have increased. Indirect
labor and equipment cost takes up a significant amount of value added. Modern punch cards
to barcoding, the distribution systems have evolved over the years and the next upgrade
RFDC – Radio Frequency Data Communications is mentioned.
The perks of this method of RFDC hardware and software are a better visual, paperless
transactions, timely and quality formation necessary for the maintenance of inventory control,
customer satisfaction and profitable operations. The process of the distribution and inventory
is mentioned briefly. Compared the previous versions of data communication this is
revolutionary as it is so quickly and cost effectively transmitting data to all member
computers on the network and the host computer can oversee all operations of inventory after
each event is logged in. These devices can even be hand-held. There’s also a quicker
response and better labour relation developed hence.
2
This process led to better material handling, less tie spent on activities related to barcodes,
lesser costs of products and increased products manufactured/ hour. It also leads to much
higher accuracy with inventory. It also has the advantage of being environment friendly as
paper is completely omitted.
In “A review and analysis of supply chain operations reference (SCOR) model” by Samuel
H. Huan, Sunil K. Sheoran, Ge Wang, it is established that Rapid growth in Internet
awareness among customer base demands the need of a strong change management strategy.
The first issue to be addressed is market analysis. The second issue is integration
synchronization, for firms to develop organizational and technological capabilities together,
and the third is network modelling tools to support decisions made by management as they
explain the dynamics of supply chain of each firm.
The most effective process is the analytical hierarchy process (AHP) which is essentially a
decision-making tool to help describe operations by decomposing a complex problem into
multi-level hierarchical structure of alternatives, objectives and sub-criteria.
An important paper utilised for this project was “Big Data Analytics-enabled Supply Chain
Transformation: A Literature Review” by Mondher Feki, Imed Boughzala, Samuel Fosso
Wamba which stated that for supply chain planning, companies use tools such as time series,
causal forecasting or data mining methods which are predictive. For sourcing companies use
prescriptive approach using analytical hierarchy process at strategic level to estimate and
select key suppliers and even use game theory to define auction rules and prescribe contracts.
In make process, companies use prescriptive approach to determine plant capacity and
predictive approach.
Combination of analytics techniques enables process optimization, shop floor management
and manufacture logistics. Data mining and visualization tools allow to generate information
relevant for decision makers. Using SCM initiatives such as TQM, JIT and SPC with big data
analytics can be used to monitor and control data quality in a supply chain.
Darya Plinere, Arkady Borisov in “Inventory Management Improvement” defined that

Inventory management is essential to all companies. A good inventory management can
decrease the costs of the company and helps better functioning. It can be concluded that
timely reaction to changes in can get out better results.
3
Some parameters like demand forecasts, safety stock and reorder points have not been
calculated which are also very crucial for inventory control.
“Supply Chain Management of linkage of agricultural Technology Management” by Kamni

Paia Biam and Utpal Barman stated that Mahagrapes as a company flourished well because
the once crowded market of grapes is now segregated accordingly and the whole industry is
much more varied. Better infrastructure has come into play and many more players are trying
to get into this market who are trying to make it much more efficient than before.
Also because of increased competition there also has been saturation in the market of
agriculture which is making people strive really hard to make a good income out of it.
K Venkata Subbaiah, K Narayana Rao and K Nookesh Babu in “Supply Chain Management
in a Diary Industry” elaborated that the processing unit from the chiller segregates into
different products of almost 46% of the total milk is consumed in the form liquid milk and
followed by 27% for Ghee. 6.5% goes in Butter and for curd it is over 7%. This way from
one commodity many is achieved which is then sold at different prices.
The research paper considered different shelf life of each of the different commodity and the
way to preserve them but did not consider the waste from these in the past.
In the paper titled “The impact of increasing demand visibility on production and inventory
control efficiency.” by Johanna Smaros, Juha-Matti Lehtonen, Patrik Appelqvist and Jan
Holmstrom, we observe that This paper focuses on the research and visibility models
presented by Lee et al (2000) by focusing on customers from VMI and non VMI. To be able
to make it realistic the company considered a fast-moving consumer goods manufacturer. The
simulation model however considered 1 VMI and 2 non VMI’s. However, there was no VMI
between the distributor and retail outlets. After using certain variables, the impact of the
distributors’ VMI adoption rate on the manufacturer’s production efficiency was examined.
The estimation of this sort of data sharing has been built up in numerous investigations. Be
that as it may, most of the exploration has concentrated on the perfect circumstance of the
producer approaching data from every single downstream gathering. Most companies fail to
benefit from VMI is because they only implement the sales and execution part of it while the
demand information is often neglected.
4
In “Supply Chain Co-ordination Models – A Literature Review” authored by Burra Karuna
Kumar, Dega Nagaraju and S. Narayanan, we learn that the main aim of the paper was to
present the literature and research available on Supply Chain Co-ordination Models, it also
attempts to explore and touch up on various co-ordination models suggested by other
researchers. The paper considered 142 articles in the domain of SCM published from 2000 to
2016 and classified it into 2 categories: two level models and three level models.
The paper concisely summarises and deals with a plethora of information from 142 research
papers for both, 3 Echelon and 2 Echelon models.
“Supply Chain Management – Theory, Practices and Challenges” by John Storey, Caroline
Emberson, Janet Godsell and Alan Harrison states that The paper aims to critically assess
current developments in the theory and practice of supply chain management and to identify
barriers, possibilities and key trends. The paper involves a three-year extensive study of 6
supply chains which encompassed a total of 72 companies in Europe. The 6 supply chains
belonged to focal firms which were all blue-chip companies operating on an international
scale. The paper deals with a number of influencing factors in any supply chain, identifying
trends, inhibitors, enablers along with widely utilised theories and practices.
The paper also identifies that there are substantial gaps between the theory and practices
followed in SCM but fails to address how to solve the problem. The research also fails to
identify who actually controls the SCM in a firm.
In the paper titled “Scope of Supply Chain Management in Fruits and Vegetables in India” by
Rais M and Sheoran A the paper deals with the use of supply chain management for perishables
like fruits and vegetables in India. The research deals with the scenario, problems, need and
improvement measures of supply chain management of fruits and vegetables in India.
The paper acknowledges the fact that there is a huge gap between per capita demand and
supply despite having high production rates due to enormous wastage during post-harvest
storage and handling.
The paper identifies key factors and problems associated with the production and delivery of
perishable goods such as fruits and vegetables in India.
The paper doesn’t provide sufficient data and models to provide an efficient solution for the
transport of fruits and vegetables in India.
In “Sales Forecasting using Neural Networks” by Thiesing, F., & Vornberger, O. the paper
shows that the paper shows that neural networks has a huge potential in the field of predictive
5
analytics, specifically demand forecasting. It shows a function devised to calculate demand
using neural networks and it shows the comparison between this method, the naïve approach
as well as statistical methods yielding the neural network approach as the best predictor
followed by the statistical methods and finally naïve approach.
Neural networks coupled with machine learning adapt to the situation and can soon make
other methods obsolete. It even considers holidays and other assumptions resulting in
desirable outcomes. It proves to be more complex but better than statistical methods.
Matthew A. Waller and Stanley E. Fawcett in “Data Science, Predictive Analytics, and Big
Data: A Revolution That Will Transform Supply Chain Design and Management” establishes
that the significance of data analytics, predictive analytics and big data in supply chain
management was explored with the authors insisting readers to explore the potential and send
them back research for the same. They described what each term means, gave a brief
description of analytics for an amateur in the field and put everything in an understandable
and concise manner.
More than actual proof of its transformation in SCM it talked about its future potential and
urged reader to explore the possibilities.
In “Probabilistic Demand Forecasting at Scale” by Joos-Hendrik Bose, Valentin Flunkert, Jan

Gasthaus, Tim Januschowski, Dustin Lange, David Salinas, Sebastian Schelter, Matthias
Seeger, Yuyang Wang, the paper researches on the best Machine language system for best
probabilistic demand forecasting. Consistency is the key forecasting system. Lots of
challenges were encountered like the sales data was very unpredictable. New data can come
out of nowhere but for an efficient ML, previous trend is needed. The scale is also important,
procedure for small scale and large is completely different. Different components in ML are
Data Integration, Forecasting, Evaluation, Output and Analysis/Research component. A
single code base was formed to analyse the demand forecasts. A focus on different trial and
error is needed for the best outcomes.
A new way of approach of the procedure to build an ML code was done where people first
establish simple baseline methods that produce forecasts and then the accuracy is improved
depending on the success and failures at each step. This method is considered the best for
assignments which can be easily maintained.
6
Maurizio Rossi and Davide Brunelli in “Electricity demand forecasting of single residential
units” described that this is the electricity demand and forecasting for future prediction.
Demand side management is a key concept for smart metering. Modern embedded MCU has
been used to provide these advanced features. The ability to be accurate and in par with the
forecast is extremely challenging. Two types of data were compared – one nationally and the
other household electricity consumption. It was found that electricity by the household was
much more irregular. Finally, to solve this problem, exponential smoothening method was
used.
The coefficient of the exponential smoothening method needed to be tuned and modified with
respect to the change in data. Comparing different data sets we can find that the trend for
electricity is different in each of the fields. A huge error was obtained which indicated a
unpredictability of the users’ electricity consumption.
“Short-Term Load Forecasting Methods: An Evaluation Based on European Data” by J. W.

Taylor and P. E. McSharry, states that This paper had datasets of 10 European countries of
electricity consumption. The paper believes that short term forecasting, multivariate
modelling is considered not practical. Artificial neural networks are very prominent in
forecasting literature. ANN can help forecast nonlinear and non-parametric features. The
paper uses electricity demand data, methods included and post-sample results of a
comparison. Using ARIMA, the data was smoothed out the unusual observations by taking
averages of observations.
10 time series of intraday electricity demand was done using the new periodic AR approach.
The best performing method was Holt-Winters exponential smoothening method, followed
by the PCA method.
7
1.4 GAPS IN RESEARCH
a] There is lack of empirical research on the significance of vegetable and fruits in agri-food
supply chain.
b] Although advantages of data analytics techniques in big data are highlighted and
suggested, exactly how to improve operational performance using SCM initiative like TQM
and JIT production have not been explained
c] It is an old system which used older versions of computers and technology as there have
been many advancements since.
d] The rank reversal problem in AHP is encountered under certain conditions like on addition
of exact copies of alternatives.
e] Parameters like demand forecasts, safety stock and reorder points have not been calculated.
f] Because of increased competition there also has been saturation in most industries.
g] The main characteristic of the simulation model studied was that the manufacturer gained
access to distributor sell-through data and thus could remove the impact of the distributors’
order batching on the demand information used for production and inventory control.
h] Sufficient research and data was not available for 3 echelon models to a certain extent,
especially in terms of trade credit and quantity discount.
i] The papers identifies that there are substantial gaps between the theory and practices
followed in SCM but fails to address how to solve the problem. The research also fails to
identify who actually controls the SCM in a firm.
j] The paper doesn’t provide sufficient data and models to provide an efficient solution for
the transport problem of fruits and vegetables in India.
8
k] The prototype program written in the paper works for a small subset of the supermarket’s
item and if it needs to be integrated for all the store’s items it becomes an immensely
complex program which cannot be easily understood.
l] The paper signifies that there is undoubtedly a lot of potential for data science in SCM but
it does not mention enough proof for the same and more than actually giving evidence they
urge readers to do so. It more a belief than a proof.
m] Tremendous amount of knowledge about the software is needed. A single mistake made
in the choice of strategy or methodology can be very costly, so utmost care is to be taken.
Apache Spark, the tool used here needs much more user friendly, needs more planning while
using it. Spark also needs optimisation of data frames which help in different features in
vector, matrices, and relational experience.
n] Smart meters which are available now must be improved and upgraded but also the data
which was provided should be starting point for new ideas.
o] Holt-Winters method seems like it is the most accurate method, but it has been shown to
be accurate for short term methods.
9
2. PROJECT DESCRIPTION AND GOALS
2.1 BASIC DESCRIPTION OF THE PROJECT
In this project, we primarily aim to build and execute python code to collect sales data and
build an accurate forecast to predict demand levels for the upcoming months/years for the
furniture company.
We researched numerous forecasting techniques utilising time series data to decode which
method would best serve the purpose of our project and came to the conclusion that we would
have to utilise an AR model to accurately predict the upcoming demand for the company.
2.2 OBJECTIVE AND SCOPE
a] The aim is to find out how to optimize inventory management for higher accuracy of
inventory control and higher profitability in the industry.
b] To clean, process and sort the data obtained to use it effectively.
c] To also use data analytics methods in order to forecast stocking requirement for effective
inventory management on the data collected by us
d] To use methods such as the Moving Average, Exponential Smoothing, AR, ARMA and
ARIMA Methods to accurately forecast the demand and requisite inventory levels in the future.
10
3. TECHNICAL SPECIFICATIONS
3.1 Tools and Technologies Used
We shall be utilizing a number of tools, techniques and software modules to assist us

with designing, modelling, and optimizing an effective and appropriate forecast for both
short term and long-term periods. Open source data set repositories like GitHub, Kaggle
and UCI have been used for the process of data mining. The entire process of data-
analytics and forecasting was carried out using python.
Python has a number of packages which were used to implement techniques for
processing the data efficiently, analysing it and then building a forecast, some of these
packages/modules are:
a] Pandas
Pandas is a software library written for Python which is extensively used for data
manipulation and analysis. In particular, it operates using data structures, algorithms
and operations for manipulating numerical tables and time series which is critical for
our project. Some salient features of pandas which were utilised repeatedly throughout
the project are date range generation, frequency conversion, moving window statistics,
date shifting/lagging, data set merging and joining.
b] NumPy
NumPy is another library in Python which is used for adding support for large, multi-
dimensional arrays and matrices along with a huge collection of high-level math
functions to operate on and edit these arrays.
c] FBProphet
FBProphet is used for forecasting time series data based on an additive model where
non-linear trends are fit with annual, weekly as well as daily seasonality plus holiday
effects. It works best with time series that have strong and defined seasonal effects of
historical data and therefore this module works well with our dataset. Prophet is also
robust at dealing with missing data shifts in the trend and handles outliers well.
11
3.2 Technical Challenges Faced
One of the first technical challenges associated with the project was the general
unavailability of ideal datasets with respect to the project and insufficiency of most data
sets we obtained online in terms of size and quality. The data collection process
therefore required a surplus of time and effort for obtaining the perfect dataset with
respect to the problem statement and initial direction of the project. We also faced a
tough time finding research papers with a high degree of relevance to the project during
the literature survey phase of research.
Another problem we faced continuously through the initial phase of the project was
identifying and recognising the problem properly as there was no clear definition of the
problem statement, for context, the title of the project was changed thrice before
arriving on the final topic, “Demand Forecasting using Data Analytics Techniques”.
A majority of the datasets we reviewed online from sources such as Kaggle and GitHub
were classified as “bad data” and had a number of issues, namely, incomplete data,
inaccurate data, poor data entry and duplicated data to name a few.
The finalised data set also had to be cleaned and processed extensively before we could
conduct an analysis and it still contained a large number of residuals.
The code also utilised FBProphet, a new forecasting module released by Facebook
which helps in predicting demand forecasts with highly accurate values by assisting the
code to account for holidays to better predict a surge or fall in sales based on the input
by the user. As this entire module had been introduced very recently, there were a
number of bugs which we had to be fixed on implementation and there was a lack on
online resources to help us deal with these errors in the code.
12
4. DESIGN APPROACH AND DETAILS
4.1 Techniques and Methods Used
a] Time Series Analysis
Time series analysis consists of ways to sort data of time series for making meaningful
analysis. Time series forecasting is used to make a pre-planned analysis and predict the
values for the future by using old values of the same variable. It follows the principle of
“Let the data speak for itself”.
Time series is used and practical for non-stationary data which is like the retail sales
which is used in this context. There are different ways which were chosen to forecast
retail sales in time series.
Fig 4.1 Decomposition of the observed time series into trend, seasonality and residuals
b] Data Pre-Processing and Cleaning
The data is first extracted from the excel file and the data needs to cleaned for better
analysis. So, this is done by removing unwanted values and arranging sales by date.
To organize this further and since there is a lot of data to be arranged it can get very
confusing. To make this simpler average sales value for that month would be taken and
to identify the month, the start date of each month would be used.
Now to understand how the furniture sales is working, we plot the graph the trend of the
sales in each year. After looking at the graph, we can conclude that the sales is very low
13
at the beginning of the year and gets relatively higher at the end of the year. To get a
better picture we look at trend, seasonality and noise by using time series.
c] ARIMA
Stands for Auto regressive Integrated Moving Average. ARIMA is usually denoted with
ARIMA (p, d, q). These stand for seasonality, trend and noise in data. A grid search is
used to find the optimal set of parameters which gets best performance for the models.
The grid search then yields the lowest AIC value. AIC quantifies the goodness of fit and
the simplicity of the model. So, output having the lower AIC is the better option. These
results are then plotted in a graph.
d] Validating Forecasts
The plot of sales each year is again showed. The forecast error is then showed for the
sales in 2017. When calculated the forecast error, it shows that there is not much error
and it coincides with the original sales value. This is then validated by calculating root
mean squared error of the forecasts.
Mean squared error (MSE) is the average of the squared values of the estimated values
and which is already estimated. The smaller the MSE is, it is better for finding best fit
line.
Root mean squared error (RMSE) tells us that since the error was very less (154.64) and
also the average sales range is around 400 to 1200. This is an example of a good model.
e] Time series of Furniture
Observed data of furniture sales is displayed till 2017, forecasting till 2022 is done. This
varies the accuracy of the sales in the future. Now since we want to look at it into more
accuracy, we bring in a couple of categories like comparing furniture vs the office
supplier.
14
Now putting the data of sales of furniture and office supply into one plot, we find that
there were way more sales from office supplies than for furniture overtime.
When plotted against each other we find that the sales for both the furniture sales and
office sales have a similar trend. Monthly furniture sales are more than office supplies
and that is because furniture is prices higher than office supplies. Now it is seen when
office sales first surpassed furniture.
Prophet was then installed to display different patterns on different time scales such as
yearly, weekly, and daily. It can even checkpoint holidays and implement its own
change. With this we can compare furniture sales and office sales and find the point
where office sales surpasses furniture sales. We can also find yearly improvement and
monthly improvement.
We can conclude that the worst month for furniture is April and for office supplies is
February. The best month for furniture is December and best month for office supplies
is October.
4.2 Constraints, Alternatives and Trade-offs
As mentioned earlier, the main constraint associated with the entire project was the lack
of clean and relevant data for constructing a proper forecast from scratch.
A number of assumptions also had to be made before constructing the forecast such as:
a] There are absolutely no time anomalies in the dataset.
b] The model parameters remain constant throughout time.
c] There are no seasonal dummies in the provided dataset.
d] There are no level shifts.
e] The entire error process is homoscedastic (or constant) throughout time.
f] There are no known/suspected predictor variables.
15
5. SCHEDULE, TASKS AND MILESTONES
5.1 Timeline and Deliverables
The project was then split into 5 main phases each having a permissible maximum
duration of 1 month each:
a] Zeroth Phase
The first major task of the project was to come up with a suitable title imbibing data
analytics and supply chain management as well as devising a problem statement and
objective to define the initial direction of the project. This process took a total of 2
weeks and an extra week was set aside to allow for any revisions to the topic.
After finalising the topic, we began with the literature survey phase where we collected
a number of research papers out of which 16 were selected for the survey from the
collected pool based on their relevance to the topic.
b] First Phase
The First Phase dealt with the construction of a preliminary design and methodology
which the project would abide to. The major milestones to be achieved in this phase
include a rudimentary hypothesis of the desired or expected outcomes along with a
well-defined framework. This phase spans a total of 4 weeks.
c] Second Phase
The second phase involved data collection and pre-processing, the data was collected
from online data repositories, namely GitHub, Data World and Kaggle. Before the
dataset is chosen, it is vetted thoroughly for any inconsistencies as bad data could lead
to incorrect conclusions. The selected database is then cleaned and pre-processed to
convert the raw data into sets which can be analysed with relative ease and efficiency.
This phase was executed over a time period of 4 weeks.
16
d] Third Phase
The third phase of the project deals with analysis of the pre-processed dataset and the
construction of the forecast using ARIMA and FBProphet. The tangible milestones of
this phase are the construction of the required forecasts and the acquirement of
numerical values for future sales data from the aforementioned forecasts. This phase
spans a total of 4 weeks as well.
e] Fourth Phase
The fourth and final phase deals with the condensation of the conclusions obtained from
the forecasts, correlograms and plots along with the preparation and documentation of
the final report and poster as well. This phase also lasted for a period of 4 weeks.
17
5.2 Gantt Chart
Fig 5.1 Gantt Chart for project timeline
18
6. PROJECT DEMONSTRATION
The code was executed for the given problem statement using Python and is shown below:
In [59]:
import warnings
import itertools
import xlrd
import numpy as np
import matplotlib.pyplot as plt
warnings.filterwarnings("ignore")
plt.style.use('fivethirtyeight')
import pandas as pd
import statsmodels.api as sm
import matplotlib
matplotlib.rcParams['axes.labelsize'] = 14
matplotlib.rcParams['xtick.labelsize'] = 12
matplotlib.rcParams['ytick.labelsize'] = 12
matplotlib.rcParams['text.color'] = 'red'
In [60]:
z = pd.read_excel(r"C:\Users\Pranav Kalevar\furniture
superstore\superstore.xls") furn = z.loc[z['Category'] == 'Furniture']
In [61]:
furn['Order Date'].min()
Out[61]:
Timestamp('2014-01-06 00:00:00')
In [62]:
furn['Order Date'].max()
Out[62]:
Timestamp('2017-12-30 00:00:00')
In [63]:
cols = ['Row ID', 'Order ID', 'Ship Date', 'Ship Mode', 'Customer ID', 'Customer
Name',
'Segment', 'Country', 'City', 'State', 'Postal Code', 'Region', 'Product ID', 'Categor
y', 'Sub-Category', 'Product Name', 'Quantity', 'Discount',
'Profit'] furn.drop(cols, axis=1, inplace=True) furn =
furn.sort_values('Order Date')
In [64]:
furn.isnull().sum()
Out[64]:
Order Date 0
Sales 0
19
3/27/2019 furniture 2
In [65]:
furn = furn.groupby('Order Date')['Sales'].sum().reset_index()

furn.head()
Out[65]:
Order Date Sales
0 2014-01-06 2573.820
1 2014-01-07 76.728
2 2014-01-10 51.940
3 2014-01-11 9.940
4 2014-01-13 879.939
In [66]:
furn = furn.set_index('Order Date')

furn.index
Out[66]:
DatetimeIndex(['2014-01-06', '2014-01-07', '2014-01-10', '2014-01-11',

'2014-01-13', '2014-01-14', '2014-01-16', '2014-01-19',
'2014-01-20', '2014-01-21',
...
'2017-12-18', '2017-12-19', '2017-12-21', '2017-12-22',
'2017-12-23', '2017-12-24', '2017-12-25', '2017-12-28',
'2017-12-29', '2017-12-30'],
dtype='datetime64[ns]', name='Order Date', length=889, freq=
None)
In [67]:
y = furn['Sales'].resample('MS').mean()
In [68]:
y.plot(figsize=(16, 4))
plt.show()
20
In [74]:
from pylab import rcParams

rcParams['figure.figsize'] = 22,8
decomposition = sm.tsa.seasonal_decompose(y, model='additive')

fig = decomposition.plot()
plt.show()
In [75]:
p = d = q = range(0, 2)
pdq = list(itertools.product(p, d, q))
seasonal_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p, d, q))]
print('Examples of parameter combinations for Seasonal ARIMA...')

print('SARIMAX: {} x {}'.format(pdq[1], seasonal_pdq[1]))
Examples of parameter combinations for Seasonal ARIMA...

SARIMAX: (0, 0, 1) x (0, 0, 1, 12)
SARIMAX: (0, 0, 1) x (0, 1, 0, 12)
SARIMAX: (0, 1, 0) x (0, 1, 1, 12)
SARIMAX: (0, 1, 0) x (1, 0, 0, 12)
21
In [76]:
for param in pdq:

for param_seasonal in seasonal_pdq:
try:
mod = sm.tsa.statespace.SARIMAX(y,
order=param,
seasonal_order=param_seasonal,
enforce_stationarity=False,
enforce_invertibility=False)
results = mod.fit()
print('ARIMA{}x{}12 - AIC:{}'.format(param, param_seasonal, results.aic))

except:
continue
22
ARIMA(0, 0, 0)x(0, 0, 0, 12)12 - AIC:769.0817523205916

ARIMA(0, 0, 0)x(0, 0, 1, 12)12
- AIC:1446.5593227130305 ARIMA(0, 0, 0)x(0, 1, 0,
12)12 - AIC:477.7170130920218 ARIMA(0, 0, 0)x(1, 0, 0,
12)12 - AIC:497.23144334183365 ARIMA(0, 0, 0)x(1, 0,
1, 12)12 - AIC:1172.208674145885 ARIMA(0, 0,
0)x(1, 1, 0, 12)12 - AIC:318.0047199116341 ARIMA(0, 0,
1)x(0, 0, 0, 12)12 - AIC:720.9252270758095 ARIMA(0, 0,
1)x(0, 0, 1, 12)12 - AIC:2900.357535652858 ARIMA(0, 0,
1)x(0, 1, 0, 12)12
- AIC:466.56074298091255 ARIMA(0, 0, 1)x(1, 0, 0, 12)12
- AIC:499.574045803366 ARIMA(0, 0, 1)x(1, 0, 1,
12)12 - AIC:2513.1394870316744 ARIMA(0, 0, 1)x(1,
1, 0, 12)12 - AIC:319.98848769468657 ARIMA(0, 1,
0)x(0, 0, 0, 12)12 - AIC:677.894766843944 ARIMA(0,
1, 0)x(0, 0, 1, 12)12 - AIC:1250.2320272227237
ARIMA(0, 1, 0)x(0, 1, 0, 12)12 -
AIC:486.63785672282035 ARIMA(0, 1, 0)x(1, 0, 0,
12)12 - AIC:497.78896630044073 ARIMA(0, 1, 0)x(1, 0,
1, 12)12 - AIC:1550.2003231687213 ARIMA(0, 1, 0)x(1,
1, 0, 12)12 - AIC:319.7714068109211 ARIMA(0, 1,
1)x(0, 0, 0, 12)12 - AIC:649.9056176816999 ARIMA(0, 1,
1)x(0, 0, 1, 12)12 - AIC:2683.886393076119 ARIMA(0, 1,
1)x(0, 1, 0, 12)12 - AIC:458.8705548482932 ARIMA(0, 1,
1)x(1, 0, 0, 12)12
- AIC:486.18329774427826 ARIMA(0, 1, 1)x(1, 0, 1,
12)12 - AIC:3144.981130223559 ARIMA(0, 1, 1)x(1, 1, 0,
12)12 - AIC:310.75743684172994 ARIMA(1, 0, 0)x(0, 0,
0, 12)12 - AIC:692.1645522067712 ARIMA(1, 0,
0)x(0, 0, 1, 12)12 - AIC:1343.1777877543473 ARIMA(1,
0, 0)x(0, 1, 0, 12)12 - AIC:479.46321478521355
ARIMA(1, 0, 0)x(1, 0, 0, 12)12 -
AIC:480.92593679352177 ARIMA(1, 0, 0)x(1, 0, 1,
12)12 - AIC:1243.8088413604426 ARIMA(1, 0, 0)x(1, 1,
0, 12)12 - AIC:304.4664675084554 ARIMA(1, 0, 1)x(0, 0,
0, 12)12 - AIC:665.779444218685 ARIMA(1, 0,
1)x(0, 0, 1, 12)12 - AIC:82073.66352065578 ARIMA(1, 0,
1)x(0, 1, 0, 12)12 - AIC:468.3685195814987 ARIMA(1, 0,
1)x(1, 0, 0, 12)12 - AIC:482.5763323876739 ARIMA(1, 0,
1)x(1, 0, 1, 12)12
- AIC:nan
ARIMA(1, 0, 1)x(1, 1, 0, 12)12 - AIC:306.0156002122138
ARIMA(1, 1, 0)x(0, 0, 0, 12)12
- AIC:671.2513547541902 ARIMA(1, 1, 0)x(0, 0, 1,
12)12 - AIC:1205.945960251849 ARIMA(1, 1, 0)x(0, 1,
0, 12)12 - AIC:479.2003422281134 ARIMA(1, 1, 0)x(1,
0, 0, 12)12 - AIC:475.34036587848493 ARIMA(1, 1,
0)x(1, 0, 1, 12)12 - AIC:1269.52639945458 ARIMA(1, 1,
0)x(1, 1, 0, 12)12 - AIC:300.6270901345443 ARIMA(1, 1,
1)x(0, 0, 0, 12)12 - AIC:649.0318019835024 ARIMA(1, 1,
1)x(0, 0, 1, 12)12
- AIC:101786.44160210912 ARIMA(1, 1, 1)x(0, 1, 0,
12)12 - AIC:460.4762687610111 ARIMA(1, 1, 1)x(1, 0, 0,
12)12 - AIC:469.52503546608614 ARIMA(1, 1, 1)x(1,
23
0, 1, 12)12 - AIC:2651.570039388935 ARIMA(1, 1,
1)x(1, 1, 0, 12)12 - AIC:297.7875439553055
24
In [77]:
mod = sm.tsa.statespace.SARIMAX(y,
order=(1, 1, 1),
seasonal_order=(1, 1, 0, 12),
enforce_stationarity=False,
enforce_invertibility=False)
results = mod.fit()
print(results.summary().tables[1])
======================================================================
==== ====
coef std err z P>|z| [0.025 0.
975]
--------------------------------------------------------------------------
----
ar.L1 0.0146 0.342 0.043 0.966 -0.655
0.684
ma.L1 -1.0000 0.360 -2.781 0.005 -1.705 -
0.295
ar.S.L12 -0.0253 0.042 -0.609 0.543 -0.107
0.056
sigma2 2.958e+04 1.22e-05 2.43e+09 0.000 2.96e+04 2.96
e+04
======================================================================
==== ====
In [81]:
results.plot_diagnostics(figsize=(20,12))
plt.show()
25
In [83]:
pred = results.get_prediction(start=pd.to_datetime('2017-01-01'), dynamic=False)

pred_ci = pred.conf_int()
ax = y['2014':].plot(label='observed')
pred.predicted_mean.plot(ax=ax, label='Forecast', alpha=.7, figsize=(16, 9))
ax.fill_between(pred_ci.index,
pred_ci.iloc[:, 0],
pred_ci.iloc[:, 1], color='k', alpha=.2)
ax.set_xlabel('Date')
ax.set_ylabel('Furniture Sales')
plt.legend()
plt.show()
In [84]:
y_forecasted = pred.predicted_mean
y_truth = y['2017-01-01':]
# Compute the mean square error

MSE = ((y_forecasted - y_truth) ** 2).mean()
print('The Mean Squared Error of the forecasts is {}'.format(round(MSE, 2)))
The Mean Squared Error of the forecasts is 22993.58
In [85]:
print('The Root Mean Squared Error of our forecasts

is {}'.format(round(np.sqrt(MSE), 2
)))
The Root Mean Squared Error of our forecasts is 151.64
In [86]:
furn = z.loc[df['Category'] == 'Furniture']

office = z.loc[df['Category'] == 'Office Supplies']
26
In [87]:
furn.shape, office.shape
Out[87]:
((2121, 21), (6026, 21))
In [89]:
cols = ['Row ID', 'Order ID', 'Ship Date', 'Ship Mode', 'Customer ID', 'Customer
Name',
'Segment', 'Country', 'City', 'State', 'Postal Code', 'Region', 'Product ID', 'Categor
y', 'Sub-Category', 'Product Name', 'Quantity', 'Discount',
'Profit'] furn.drop(cols, axis=1, inplace=True) office.drop(cols,
axis=1, inplace=True)
furn = furn.sort_values('Order Date')

office = office.sort_values('Order Date')
furn = furn.groupby('Order Date')['Sales'].sum().reset_index()

office = office.groupby('Order Date')['Sales'].sum().reset_index()
In [90]:
furn.head()
Out[90]:
Order Date Sales
0 2014-01-06 2573.820
1 2014-01-07 76.728
2 2014-01-10 51.940
3 2014-01-11 9.940
4 2014-01-13 879.939
In [91]:
office.head()
Out[91]:
Order Date Sales
0 2014-01-03 16.448
1 2014-01-04 288.060
2 2014-01-05 19.536
3 2014-01-06 685.340
4 2014-01-07 10.430
27
In [92]:
furn = furn.set_index('Order Date')

office = office.set_index('Order Date')
y_furn = furn['Sales'].resample('MS').mean()
y_office = office['Sales'].resample('MS').mean()
furn = pd.DataFrame({'Order Date':y_furn.index, 'Sales':y_furn.values})

office = pd.DataFrame({'Order Date': y_office.index, 'Sales': y_office.values})
sto = furn.merge(office, how='inner', on='Order Date')

sto.rename(columns={'Sales_x': 'furniture_sales', 'Sales_y': 'office_sales'},
inplace=T rue)
sto.head()
Out[92]:
Order Date furniture_sales office_sales
0 2014-01-01 480.194231 285.357647
1 2014-02-01 367.931600 63.042588
2 2014-03-01 857.291529 391.176318
3 2014-04-01 567.488357 464.794750
4 2014-05-01 432.049188 324.346545
In [96]:
plt.figure(figsize=(25,15))
plt.plot(sto['Order Date'], sto['furniture_sales'], 'b-', label = 'furniture')
plt.plot(sto['Order Date'], sto['office_sales'], 'r-', label = 'office supplies')
plt.xlabel('Date'); plt.ylabel('Sales'); plt.title('Sales of Furniture and Office
Suppl
ies')
plt.legend();
28
In [97]:
first_date = sto.ix[np.min(list(np.where(sto['office_sales'] >

sto['furniture_sales'])[ 0])), 'Order Date']
print("Office supplies first time produced higher sales than

furniture is {}.".format(f irst_date.date()))
Office supplies first time produced higher sales than furniture is 2014-
07 -01.
In [98]:
from fbprophet import Prophet
furn = furn.rename(columns={'Order Date': 'ds', 'Sales': 'y'})

furn_model = Prophet(interval_width=0.95)
furn_model.fit(furn)
office = office.rename(columns={'Order Date': 'ds', 'Sales': 'y'})

office_model = Prophet(interval_width=0.95)
office_model.fit(office)
INFO:fbprophet:Disabling weekly seasonality. Run prophet

with weekly_seaso nality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet
with daily_seasona lity=True to override this.
INFO:fbprophet:Disabling weekly seasonality. Run prophet
with weekly_seaso nality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet
with daily_seasona lity=True to override this.
Out[98]:
<fbprophet.forecaster.Prophet at 0x21286530748>
In [99]:
furn_forecast = furn_model.make_future_dataframe(periods=36, freq='MS')

furn_forecast = furn_model.predict(furniture_forecast)
office_forecast = office_model.make_future_dataframe(periods=36, freq='MS')

office_forecast = office_model.predict(office_forecast)
29
In [103]:
plt.figure(figsize=(25, 8))
furn_model.plot(furn_forecast, xlabel = 'Date', ylabel = 'Sales')
plt.title('Furniture Sales from Store');
<Figure size 1800x576 with 0 Axes>
30
In [104]:
office_model.plot(office_forecast, xlabel = 'Date', ylabel = 'Sales')
plt.title('Office Supplies Sales from Store');
<Figure size 1800x576 with 0 Axes>
31
In [105]:
furn_names = ['furniture_%s' % column for column in

furn_forecast.columns] office_names = ['office_%s' % column
for column in office_forecast.columns]
merge_furn_forecast = furn_forecast.copy()
merge_office_forecast = office_forecast.copy()
merge_furn_forecast.columns = furn_names
merge_office_forecast.columns = office_names
forecast = pd.merge(merge_furn_forecast, merge_office_forecast, how = 'inner', left_on

= 'furniture_ds', right_on = 'office_ds')
forecast = forecast.rename(columns={'furniture_ds': 'Date'}).drop('office_ds', axis=1)

forecast.head()
Out[105]:
Date furniture_trend furniture_yhat_lower furniture_yhat_upper furniture_trend_lower fu
2014-
0 01-01 731.079361 287.461477 827.871871 731.079361
2014-
1 02-01 733.206972 154.182560 680.315170 733.206972
2014-
2 03-01 735.128684 394.507086 939.259213 735.128684
2014-
3 04-01 737.256294 324.851957 873.457140 737.256294
2014-
4 05-01 739.315271 266.204612 822.200659 739.315271
5 rows × 31 columns
32
In [106]:
plt.plot(forecast['Date'], forecast['furniture_trend'], 'b-
') plt.plot(forecast['Date'], forecast['office_trend'], 'r-
') plt.legend(); plt.xlabel('Date'); plt.ylabel('Sales')
plt.title('Furniture vs. Office Supplies Sales Trend');
33
In [107]:
plt.plot(forecast['Date'], forecast['furniture_yhat'], 'b-
') plt.plot(forecast['Date'], forecast['office_yhat'], 'r-
') plt.legend(); plt.xlabel('Date'); plt.ylabel('Sales')
plt.title('Furniture vs. Office Supplies Estimate');
34
In [108]:
furn_model.plot_components(furn_forecast);
35
In [109]:
office_model.plot_components(office_forecast);
36
In [111]:
pred_uc = results.get_forecast(steps=50)
pred_ci = pred_uc.conf_int()
ax = y.plot(label='observed', figsize=(12, 7))

pred_uc.predicted_mean.plot(ax=ax, label='Forecast')
ax.fill_between(pred_ci.index,
pred_ci.iloc[:, 0],
pred_ci.iloc[:, 1], color='k', alpha=.25)
ax.set_xlabel('Date')
ax.set_ylabel('Furniture Sales')
plt.legend()
plt.show()
37
7. RESULTS AND DISCUSSION
The demand forecasting techniques were carried out in order to predict the future sales. The
ARIMA Method resulted in a highly accurate forecast compared with existing data.
Fig. 7.1 Furniture Sales over a period of 4 years 2014-2017 compared with the ARIMA forecast for 2017 with
confidence levels
ARIMA method’s results were used to predict the next 4-year sales data so that inventory could
be accordingly planned – less inventory for lower demand or more inventory for greater
demand to reduce losses incurred and to maximize profitability.
Fig. 7.2 ARIMA forecast of furniture sales for predictions from 2018-2022
38
We then implement another method of forecasting using FBProphet in Python to analyse
and compare the furniture sales and office supplies sales from the same company.
Fig 7.3 Furniture sales vs. Office supplies sales Predicted forecast
We observe from the graph that the office supplies sales actually surpasses the furniture sales
in the latter half of 2018 due to higher demand. These observations can be very useful to the
store in their inventory planning and management.
Fig. 7.4 Furniture sales vs. Office supplies sales Straight line trend
39
Fig. 7.5 Diagnostics Plot to assess the relevance and accuracy of ARIMA in the forecast
We conclude that the best sales month for furniture is December and for office supplies is
November. The worst sales month for furniture is April and for office supplies is February. The
sales for office supplies surpasses that of furniture in the latter half of 2018. The forecasts of
both ARIMA and FBProphet yield accurate results with ARIMA being more accurate on
comparing with existing data. If inventory data were available inventory prediction would have
been possible.
40
8. SUMMARY
The project entitled “Demand Forecasting Using Data Analytics Techniques” utilises historic
sales data from a prominent furniture company to predict future fluctuations in sales as well as
providing accurate values for 3- and 4-year periods using Python.
The code initially involves the cleaning and pre-processing of sales data for furniture and office
supplies of the company by pulling order by order sales data from the database and then
segregating for efficient usage on the basis of type and date.
Subsequently, non-linear analysis techniques such as ARIMA (Auto Regressive Integrated

Moving Average) along with a specialised Python forecast module named FBProphet were
employed to predict sales values for pre-specified periods, the resultant data was then plotted on
to graphs to depict the sales levels, and they could also be used to predict the sales for each
month of the subsequent years with a comparatively higher degree of accuracy compared to
traditional forecast techniques.
41
REFERENCES
[1] Shivani Agarwal. 2017. Issues in supply chain planning of Fruits and Vegetables in Agri-food
supply chain: A review of certain aspects, IMS Business School Presents Doctoral Colloquium
[2] Andrew C. Yao, John G. Carlson. 1999. The impact of real-time data communication
on inventory management. Int. J. Production Economics 59 (1999) 213-219
[3] Samuel H. Huan, Sunil K. Sheoran, Ge Wang. 2004. A review and analysis of supply chain
operations reference (SCOR) model. Supply Chain Management: An International Journal, Vol.
9 Iss: 1 pp. 23 – 29
[4] Mondher Feki, Imed Boughzala, Samuel Fosso Wamba. 2016. Big Data Analytics-enabled
Supply Chain Transformation: A Literature Review. 49th Hawaii International Conference on
System Sciences.
[5] Burbidge, J.L. (1994), “The use of period batch control (PBC) in the implosive industries”,
Production Planning & Control, Vol. 5 No. 1, pp. 97-102.
[6] Cachon, G. and Fisher, M. (1997), “Campbell Soup’s continuous replenishment program:
evaluation and enhanced inventory decision rules”, Production and Operations Management,
Vol. 6 No. 3, pp. 266-76.
[7] Chen, F., Drezner, Z., Ryan, J.K. and Simchi-Levi, D. (2000), “Quantifying the bullwhip
effect in a simple supply chain: the impact of forecasting, lead-times and information”,
Management Science, Vol. 46 No. 3, pp. 436-43.
[8] Burra Karuna Kumar, Dega Nagaraju and S. Narayanan. 2018. Supply Chain Co-ordination
Models: A Literature Review, Indian Journal of Science and Technology
42
[9] John Storey, Caroline Emberson, Janet Godsell and Alan Harrison, 2006. Supply Chain
Management – Theory, Practices and Challenges, International Journal of Operations and
Production Management
[10] Rais M and Sheoran A. 2015. Scope of Supply Chain Management in Fruits
and Vegetables in India, Journal of Food Processing and Technology
[11] K. Chakraborty, K Mehotra, C Mohan and S Ranka. Forecasting the behaviour

of multivariate time series using neural networks.
[12] Chen, H., Chiang, R., and Storey, V. 2012. “Business Intelligence and Analytics: From Big
Data to Big Impact.” MIS Quarterly 36(4):1165–88.
[13] A. Alexandrov, R. Bergmann, S. Ewen, J.-C. Freytag, F. Hueske, A. Heise, O. Kao, M.

Leich, U. Leser, V. Markl, et al. The stratosphere platform for big data analytics. VLDB
Journal, 23(6):939–964, 2014
[14] S. Caron and G. Kesidis, “Incentive-Based Energy Consumption Scheduling Algorithms

for the Smart Grid,” in 2010 First IEEE International Conference on Smart Grid
Communications, Oct. 2010, pp. 391–396.
[15] P. E. McSharry, S. Bouwman, and G. Bloemhof, "Probabilistic forecasts of the magnitude

and timing of peak electricity demand," IEEE Transactions Power Systems, vol. 20, pp. 1166-
1172, 2005.
[16] Ediger, V. Ş., & Akar, S. (2007). ARIMA forecasting of primary energy demand by fuel in
Turkey. Energy policy, 35(3), 1701-1708.
[17] Wang, Y., Wang, J., Zhao, G., & Dong, Y. (2012). Application of residual modification
approach in seasonal ARIMA for electricity demand forecasting: A case study of China. Energy
Policy, 48, 284-294.
43
[18] Bougadis, J., Adamowski, K., & Diduch, R. (2005). Short‐term municipal water demand
forecasting. Hydrological Processes: An International Journal, 19(1), 137-148.
[19] Ferreira, K. J., Lee, B. H. A., & Simchi-Levi, D. (2015). Analytics for an online retailer:
Demand forecasting and price optimization. Manufacturing & Service Operations Management,
18(1), 69-88.
[20] Souza, G. C. (2014). Supply chain analytics. Business Horizons, 57(5), 595-605.
44
Demand forecasting using
data analytics techniques
by Varun Ramesh, Nikhil Nayak,

Pranav Kalevar
Submission date: 02-Apr-2019 03:49PM (UTC+0530)

Submission ID: 1104406000
File name: 15BME0045_-_Varun_Ramesh.docx (2.01M)
Word count: 4266
Character count: 22308
Demand forecasting using data analytics techniques
ORIGINALITY REPORT
20 %
SIMILARITY INDEX
14%
INTERNET SOURCES
6%
PUBLICATIONS
16%
STUDENT PAPERS
PRIMARY SOURCES
1
zr9558.com
Internet Source 6%
2
www.scribd.com
Internet Source 1%
3
Submitted to University of Sydney
Student Paper 1%
Mondher Feki. "chapter 7 Big Data Analytics
4
Driven Supply Chain Transformation", IGI 1%
Global, 2019
Publication
5
Submitted to Rivier University
Student Paper 1%
6
www.typingservice.org
Internet Source 1%
7
cran.r-project.org
Internet Source 1%
8
Submitted to Indian School of Business
Student Paper 1%
9
www.omicsonline.org
Internet Source 1%
10
www.manage.gov.in
Internet Source 1%
11
www.linuxspace.org
Internet Source <1%
12
users.ox.ac.uk
Internet Source <1%
13
www.argility.com
Internet Source <1%
Submitted to University of Wales Institute,
14
Cardiff <1%
Student Paper
15
www.emeraldinsight.com
Internet Source <1%
16
d39w7f4ix9f5s9.cloudfront.net
Internet Source <1%
17
Submitted to Heriot-Watt University
Student Paper <1%
Batuhan Kocaoğlu. "A SCOR based approach
18
for measuring a benchmarkable supply chain <1%
performance", Journal of Intelligent
Manufacturing, 06/17/2011
Publication
19
Submitted to University of Western Ontario
Student Paper <1%
20
Submitted to University of Strathclyde
Student Paper <1%
21
www.sapient.com
Internet Source <1%
22
Submitted to University of Edinburgh
Student Paper <1%
James W. Taylor, Patrick E. McSharry.
23
"Univariate Methods for Short-Term Load <1%
Forecasting", Wiley, 2017
Publication
24
Submitted to Universiti Teknologi MARA
Student Paper <1%
John Storey, Caroline Emberson, Janet
25
Godsell, Alan Harrison. "Supply chain <1%
management: theory, practice and future
challenges", International Journal of Operations
& Production Management, 2006
Publication
26
dione.lib.unipi.gr
Internet Source <1%
27
Submitted to Massey University
Student Paper <1%
28
Submitted to University of Southern California
Student Paper <1%
29
Submitted to Sabanci Universitesi
Student Paper <1%
Yao, A.C.. "The impact of real-time data
30
communication on inventory management", <1%
International Journal of Production Economics,
199903
Publication
31
gorillaschool.com
Internet Source <1%
32
www.semanticscholar.org
Internet Source <1%
33
www.ojjdp.gov
Internet Source <1%
Submitted to Royal Melbourne Institute of
34
Technology <1%
Student Paper
Manohar Swamynathan. "Mastering Machine

35
Learning with Python in Six Steps", Springer <1%
Nature, 2017
Publication
36
citeseerx.ist.psu.edu
Internet Source
<1%
Submitted to University of Greenwich
37 Student Paper <1%
38
tomaugspurger.github.io
Internet Source <1%
39
pandas.pydata.org
Internet Source <1%
Fabio Nelli. "Python Data Analytics", Springer
40
Nature America, Inc, 2015 <1%
Publication
41
Submitted to Grenoble Ecole Management
Student Paper <1%
Tian Sheng Allan Loi, Jia Le Ng. "Anticipating
42
electricity prices for future needs – Implications <1%
for liberalised retail markets", Applied Energy,
2018
Publication
International Journal of Physical Distribution &

43
Logistics Management, Volume 33, Issue 4 <1%
(2006-09-19)
Publication
Abla Chaouni Benabdellah, Asmaa Benghabrit,

44
Imane Bouhaddou, El Moukhtar Zemmouri. <1%
"Big data for supply chain management:
Opportunities and challenges", 2016 IEEE/ACS
13th International Conference of Computer
Systems and Applications (AICCSA), 2016
Publication
Supply Chain Management: An International

45 Journal, Volume 9, Issue 1 (2006-09-19) <1%
Publication
Exclude quotes On Exclude matches Of f

Exclude bibliography On

15bem0067 VL2018195004247 Pe004

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

15bem0067 VL2018195004247 Pe004

Enviado por

Direitos autorais:

Formatos disponíveis

Demand Forecasting using Data Analytics Techniques

Submitted in partial fulfilment of the requirements for the degree of

Under the guidance of

Subsequently, non-linear analysis techniques such as ARIMA (Auto Regressive Integrated

Table of Contents iii

1.3 Literature Survey 2

1.4 Gaps in Literature 8

2 PROJECT DESCRIPTION AND GOALS 10

2.1 Basic Description of the Project 10

2.2 Objective and Scope 10

3.1 Tools and Technologies Used 11

3.2 Technical Challenges 12

4 DESIGN APPROACH AND DETAILS 13

4.1 Techniques Used 13

5 SCHEDULE, TASKS AND MILESTONES 16.

5.1 Timeline and Deliverables 16

7 RESULTS AND DISCUSSION 38

Figure No. Title Page No.

AIC Akaike Information Criterion

AR Models Auto Regressive Models

ARIMA Auto Regressive Int. Moving Avg.

ARMA Auto Regressive Moving Average

SARIMA Seasonal ARIMA

Darya Plinere, Arkady Borisov in “Inventory Management Improvement” defined that

“Supply Chain Management of linkage of agricultural Technology Management” by Kamni

In “Probabilistic Demand Forecasting at Scale” by Joos-Hendrik Bose, Valentin Flunkert, Jan

“Short-Term Load Forecasting Methods: An Evaluation Based on European Data” by J. W.

2.1 BASIC DESCRIPTION OF THE PROJECT

2.2 OBJECTIVE AND SCOPE

3.1 Tools and Technologies Used

We shall be utilizing a number of tools, techniques and software modules to assist us

4.1 Techniques and Methods Used

a] Time Series Analysis

b] Data Pre-Processing and Cleaning

e] Time series of Furniture

4.2 Constraints, Alternatives and Trade-offs

5.1 Timeline and Deliverables

Fig 5.1 Gantt Chart for project timeline

furn = furn.groupby('Order Date')['Sales'].sum().reset_index()

Order Date Sales

furn = furn.set_index('Order Date')

DatetimeIndex(['2014-01-06', '2014-01-07', '2014-01-10', '2014-01-11',

from pylab import rcParams

decomposition = sm.tsa.seasonal_decompose(y, model='additive')

print('Examples of parameter combinations for Seasonal ARIMA...')

Examples of parameter combinations for Seasonal ARIMA...

for param in pdq:

print('ARIMA{}x{}12 - AIC:{}'.format(param, param_seasonal, results.aic))

ARIMA(0, 0, 0)x(0, 0, 0, 12)12 - AIC:769.0817523205916

pred = results.get_prediction(start=pd.to_datetime('2017-01-01'), dynamic=False)

# Compute the mean square error

The Mean Squared Error of the forecasts is 22993.58

print('The Root Mean Squared Error of our forecasts

The Root Mean Squared Error of our forecasts is 151.64

furn = z.loc[df['Category'] == 'Furniture']

((2121, 21), (6026, 21))

furn = furn.sort_values('Order Date')

furn = furn.groupby('Order Date')['Sales'].sum().reset_index()

Order Date Sales

Order Date Sales

furn = furn.set_index('Order Date')

furn = pd.DataFrame({'Order Date':y_furn.index, 'Sales':y_furn.values})

sto = furn.merge(office, how='inner', on='Order Date')