Escolar Documentos
Profissional Documentos
Cultura Documentos
Assignment-1:
Page 1 of 5
Assignment-2:
Technical Details
First, data will be collected and stored in the HDFS. Next, for the process of Data Enrichment, we
will load this data into a NoSQL database like, HBase. After enriching this data, we need to
validate it, post which, you can use any of the Hadoop technologies like, MapReduce, Hive, or Pig
for data analysis. Results will be finally exported to both, the RDBMS and NoSQL database. This
job will be automated by using schedulers that will allow us to extract the outcomes on a daily
basis.
Feasibility Study
This project is helpful in defining the purchasing patterns of people over the company's website.
The project also helps in identifying fraud users and in finding out the net worth of the cancelled
products across the city among other things. With the outcomes of this analysis, the company can
derive solutions for improving their products in warehouses and how to eradicate frauds, and much
more.
Infrastructure Required
The project work will be carried out in a virtual environment, wherein, the Hadoop cluster, HBase,
and other required tools will be installed on a single machine using the Oracle virtual box/VMware
RAM: Min 4 GB
OS: Windows/Linux/Mac
Processor: Dual core processor or above
Software Required
Data Ingestion: Process of bringing raw data into the Hadoop storage unit.
Data Encryption: Encryption of highly sensitive data. Data migration from RDBMS to Hadoop
will also be password protected.
Page 4 of 5
Feasibility Study
As real banks do not share details of their customers, we have created a dummy data set that has
proper columns and other essential details. This will provide us with a way to analyze banking
data using Hadoop and come up with multiple insights.
Infrastructure Required
The project work will be completed in the virtual environment, wherein, the Hadoop cluster,
HBase and, other required tools will be installed on a single machine using the Oracle Virtual
Box/VMware.
RAM: Min 4 GB
OS: Windows/Linux/Mac
Processor: Dual core processor or above
Software Required
Apache Hadoop, Apache Hive, Apache HBase, Apache Sqoop, and MySQL
All the above mentioned tools are open source and require no prior permission to download and
install them
Project Three: Music Data Analysis
A leading music-catering company, MyRadio, is planning to analyze large amounts of data that it
receives from its mobile app and the website. MyRadio wants to track the behavior of its users,
classify them, calculate the royalties associated with songs and make appropriate business
strategies. As the data is very huge, we will be using an open-source framework of Apache
Hadoop, a NoSQL database called Hbase, and a few other tools for analysis.
In order to achieve the objectives, we can sub-divide the projects into the following phases:
Data ingestion
Understanding the data
Data validation
Data enrichment
Post-data enrichment steps
Data analysis
Optimizations
Post analysis
Page 5 of 5