Final Report For 7th Sem Divyanshu

A
Practical Training Report

On
“WEBSITE FOR BUSINESS”
Submitted in partial fulfillment for the award of the degree of
BACHELOR OF TECHNOLOGY
In
<COMPUTER SCIENCE ENGINEERING>
Submitted To: Submitted By:

Er. Pradeep Jha Harsha Bhurani
Associate Prof. & Head 15EAYCS038
Computer Science B.Tech. IV Yr. VIII Sem.
COMPUTER SCIENCE
ARYA COLLEGE OF ENGINEERING & RESEARCH CENTRE
SP-40, RIICO INDUSTRIAL AREA, KUKAS, JAIPUR, RAJASTHAN
RAJASTHAN TECHNICAL UNIVERSITY, KOTA

COMPUTER SCIENCE
CERTIFICATE
This is to certify that the practical training seminar report on “BIG DATA
HADOOP” at “ CDAC” from period of training is submitted by “ Divyanshu
Bansal(15EAYCS033)” in partial fulfillment for the award of degree of
Bachelor of Technology in COMPUTER SCIENCE has been found satisfactory
and is approved for submission.
Er. Pradeep Jha Er. Sudhanshu Vashisth

(HOD Name) (Seminar Coordinater)
Assistant Professor Assistant Professor
Computer Science Computer Science
ACKNOWLEDGEMENT
A scholarly and quality work like designing of any seminar can be

accomplished by motivation, guidance and inspiration of certain quarters
besides the individual efforts. Let me in this page express my heartiest gratitude
to all those who helped me in various stages of this study.
We are very much thankful to Er. HIMANSHU ARORA, PRINCIPAL,

ACERC College, Kukas and Er. Pradeep Jha, HOD, and Department of CS &
IT for giving me the permission to undergo this seminar and providing all the
necessary facilities.
During my seminar period all the staff members of department have helped us
With their skills. Here by I express our sincere thanks to seminar coordinator
Er.Sudhanshu Vashisth whose valuable guidance and kind cooperation,
without which this seminar would have not been possible.
PREFACE
As we know that an engineer has to serve an industry, for that one

must be aware of industrial environment, their management, problems
and the way of working out their solutions at the industry.
After the completion of the course, an engineer must have knowledge
of interrelation between the theory and the practical. For this, one
must be familiar with the practical knowledge with theory aspects.
To aware with practical knowledge the engineering courses provides a
four weeks industrial training where we get the opportunity to get
theory applying for running the various process and production in the
industry.
I have been lucky enough to get a chance for undergoing this training
at C-DAC ATC JAIPUR. This report has been prepared on the basis
of knowledge acquired by me during my training period at the plant.
The first chapter of the report is concerned with the Introduction of
the BIG DATA.
The second chapter of the report is concerned with the Introduction
of Organisation.
The third chapter of the report is concerned with the Training
attended or Technology and Project Description.
The fourth chapter of the report is concerned with the Deployment
of Hadoop.
The fifth chapter of the report is concerned with the Conclusion.
TABLE OF CONTENTS
Certificate-College ............................................................................................................................. ii
Certificate-Company ....................................................................................................................... iii
Acknowledgement… ........................................................................................................................ iv
Preface ............................................................................................................................................... v
Table of Contents .................................................................................................................... vi
Figures index..................................................................................................................................... ix
Chapter 1 Introduction 10-
1.1. Introduction To Big Data 10
1.2. Foundation of Big Data 11
1.3. The Internet Effects And Personal Computers 12

1.4. Big Data 13
1.4.1. Five V’s of Big Data 13

1.4.2. Big Data In Business or Industries 14
1.5. The Economics of The Telecommunication Industry 15
1.6. Introduction To Hadoop 16-17
1.7. What Comes Under Big Data 17

1.7.1. Types of Big Data 17
1.7.2. Hadoop Architecture 18

1.7.3. The Internet 19
1.7.4. Big Data Technologies 19
1.7.5. Operational Big Data 20

1.8. Conclusion 20
Chapter 2 Introduction of Organization 21-39
2.1. About Centum Electronics Ltd. 21

2.1.1. Introduction 21
2.1.2. Chairman of Centum Electronics Ltd. 23

2.1.4. Why Centum Electronics Ltd. 25
2.1.5. Overviews And Capabilities 25

2.1.6. Micro Electronics 26
2.1.7. Industrialization 27
2.1.8. Manufacturing 28
2.1.9. Logistics And Chain Supply 28
2.1.10. After Sales Services 29

2.2. About C-DAC Pvt. Ltd. 32
2.2.1. History 32
2.2.2. Products And Services 33

2.2.3. Research And Development 34
2.2.4. Company’s Vision And Mission 35

2.2.5. Hr Philosophy And Policy 36
2.2.6. Compensation Benefits 37

2.2.7. Quality Policy 38
2.3. Conclusion 39
Chapter 3 Company Project 40-72

3.1. Hadoop Distributed File System 40
3.1.1. Features of Hadoop Distributed File System 40
3.1.2. Name Node 40
3.1.3. Goals of HDFS 41

3.1.4. Hadoop Installation 42
3.1.5. Install And Configure Java 43
3.1.6. Change The Host Name IP Address 44

3.1.7. Handshaking Between Master 1 And Slave 1 44
3.1.8. Exporting Hadoop Configuration Path 45

3.1.9. Test Hadoop Is Installed or Not 46
3.1.10. Nano Core Site 47
ii
3.2. Nano HDFS Site 47
3.2.1. Nano Mapred Site 48

3.2.2. Nano Yarn Site.xml 48
3.2.2.1. Deployment of Hadoop 48-50
3.2.2.2. Basics Commands of HDFS 51

3.3. Hive 52
3.3.1. OLTP Vs OLAP 52

3.3.2. Hive Installation 53
3.3.3. Hive Server 2 And Bee Line 54-55
3.3.3.1. Create An Internal And External Table In Hive 56-58

3.3.4. Hive SQL 59
3.3.5. Hive Pationing And Bucketing 60

3.3.6. HBASE 61-67
3.3.7. Installation 68
3.4. Conclusion 69-70
Chapter 4 Conclusion 71-72
Chapter 5.Reference……………………………...............................................………......73
CHAPTER 1
INTRODUCTION TO BIG DATA
1.1 INTRODUCTION OF BIG DATA

Due to the advent of new technologies, devices, and communication means like
social networking sites, the amount of data produced by mankind is growing rapidly
every year. The amount of data produced by us from the beginning of time till 2003
was 5 billion gigabytes. If you pile up the data in the form of disks it may fill an
entire football field. The same amount was created in every two days in 2011, and in
every ten minutes in 2013. This rate is still growing enormously. Though all this
information produced is meaningful and can be useful when processed, it is being
neglected.
Big data means really a big data; it is a collection of large datasets that
cannot be processed using traditional computing techniques. Big data is not merely a
data, rather it has become a complete subject, which involves various tools,
techniques and frameworks.
Big Data has been described by some Data Management pundits (with a bit
of a snicker) as “huge, overwhelming, and uncontrollable amounts of information.”
In 1663, John Graunt dealt with “overwhelming amounts of information” as well,
while he studied the bubonic plague, which was currently ravaging Europe. Graunt
used statistics and is credited with being the first person to use statistical data
analysis. In the early 1800s, the field of statistics expanded to include collecting and
analyzing data.
The evolution of Big Data includes a number of preliminary steps for its
foundation, and while looking back to 1663 isn’t necessary for the growth of data
volumes today, the point remains that “Big Data” is a relative term depending on who
is discussing it. Big Data to Amazon or Google is very different than Big Data to a
medium-sized insurance organization, but no less “Big” in the minds of those
contending with it
10
1.2 FOUNDATIONS OF BIG DATA
Data became a problem for the U.S. Census Bureau in 1880. They estimated it would
take eight years to handle and process the data collected during the 1880 census, and
predicted the data from the 1890 census would take more than 10 years to process.
Fortunately, in 1881, a young man working for the bureau, named Herman Hollerith,
created the Hollerith Tabulating Machine. His invention was based on the punch cards
designed for controlling the patterns woven by mechanical looms. His tabulating
machine reduced ten years of labor into three months of labor.
In 1927, Fritz Plummer, an Austrian-German engineer, developed a means of
information magnetically on tape. Plummer had devised a method for adhering metal
stripes to cigarette papers (to keep a smokers’ lips from being stained by the rolling
papers available at the time), and decided he could use this technique to create a
magnetic strip, which could then be used to replace wire recording technology. After
experiments with a variety of materials, he settled on a very thin paper, striped with
iron oxide powder and coated with lacquer, for his patent in 1928.
During World War II (more specifically 1943), the British, desperate to crack
Nazi codes, invented a machine that scanned for patterns in messages intercepted
from the Germans. The machine was called Colossus, and scanned 5.000 characters a
second, reducing the workload from weeks to merely hours. Colossus was the first
data processor. Two years later, in 1945, John Von Neumann published a paper on the
Electronic Discrete Variable Automatic Computer (EDVAC), the first “documented”
discussion on program storage, and laid the foundation of computer architecture
today.
It is said these combined events prompted the “formal” creation of the United
States’ NSA (National Security Agency), by President Truman, in 1952. Staff at the
NSA was assigned the task of decrypting messages intercepted during the Cold War.
Computers of this time had evolved to the point where they could collect and process
data, operating independently and automatically.
1.2 THE INTERNET EFFECTS AND PERSONAL COMPUTERS
ARPANET began on Oct 29, 1969, when a message was sent from UCLA’s host
computer to Stanford’s host computer. It received funding from the Advanced
Research Projects Agency (ARPA), a subdivision of the Department of Defense.
Generally speaking, the public was not aware of ARPANET. In 1973, it connected 11
with a transatlantic satellite, linking it to the Norwegian Seismic Array. However, by
1989, the infrastructure of ARPANET had started to age. The system wasn’t as
efficient or as fast as newer networks. Organizations using ARPANET started moving
to other networks, such as NSFNET, to improve basic efficiency and speed. In 1990,
the ARPANET project was shut down, due to a combination of age and obsolescence.
The creation ARPANET led directly to the Internet.
In 1965, the U.S. government built the first data center, with the intention of storing
millions of fingerprint sets and tax returns. Each record was transferred to magnetic
tapes, and were to be taken and stored in a central location. Conspiracy theorists
expressed their fears, and the project was closed. However, in spite of its closure, this
initiative is generally considered the first effort at large scale data storage.
Personal computers came on the market in 1977, when microcomputers were

introduced, and became a major stepping stone in the evolution of the internet, and
subsequently, Big Data. A personal computer could be used by a single individual, as
opposed to mainframe computers, which required an operating staff, or some kind of
time-sharing system, with one large processor being shared by multiple individuals.
After the introduction of the microprocessor, prices for personal computers lowered
significantly, and became described as “an affordable consumer good.” Many of the
early personal computers were sold as electronic kits, designed to be built by
hobbyists and technicians. Eventually, personal computers would provide people
worldwide with access to the internet.
In 1989, a British Computer Scientist named Tim Berners-Lee came up with the
concept of the World Wide Web. The Web is a place/information-space where web
resources are recognized using URLs, interlinked by hypertext links, and is accessible
via the Internet. His system also allowed for the transfer of audio, video, and pictures.
His goal was to share information on the Internet using a hypertext system. By the fall
of 1990, Tim Berners-Lee, working for CERN, had written three basic IT commands
that are the foundation of today’s web:
2 HTML: HyperText Markup Language. The formatting language of the web.
12
3 URL: Uniform Resource Locator. A unique “address” used to identify each
resource on the web. It is also called a URI (Uniform Resource Identifier).
4 HTTP: Hypertext Transfer Protocol. Used for retrieving linked resources from
all across the web.
In 1993, CERN announced the World Wide Web would be free for everyone to
develop and use. The free part was a key factor in the effect the Web would have on
the people of the world. (It’s the companies providing the “internet connection” that
charge us a fee).
1.4 BIG DATA

Big data is data sets that are so big and complex that traditional data-processing
application software are inadequate to deal with them. Big data challenges include
capturing data, data storage, data analysis, search, sharing, transfer, visualization,
querying, updating, information privacy and data source. There are a number of
concepts associated with big data: originally there were 3 concepts volume, variety,
and velocity. Other concepts later attributed with big data are veracity (i.e., how much
noise is in the data) and value.
1.4.1 5 V’S OF BIG DATA
Velocity
First let’s talk about velocity. Obviously, velocity refers to the speed at which vast
amounts of data are being generated, collected and analyzed. Every day the number of
emails, twitter messages, photos, video clips, etc. increases at lighting speeds around
the world. Every second of every day data is increasing. Not only must it be analyzed,
but the speed of transmission, and access to the data must also remain instantaneous to
allow for real-time access to website, credit card verification and instant messaging.
Volume
Volume refers to the incredible amounts of data generated each second from social
media, cell phones etc. The vast amounts of data have become so large in fact that we
can no longer store and analyze data using traditional database technology. We now
use distributed systems, where parts of the data is stored in different locations and
brought together by software.
13
Value
When we talk about value, we’re referring to the worth of the data being extracted.
Having endless amounts of data is one thing, but unless it can be turned into value it
is useless. While there is a clear link between data and insights, this does not always
mean there is value in Big Data. The most important part of embarking on a big data
initiative is to understand the costs and benefits of collecting and analyzing the data to
ensure that ultimately the data that is reaped can be monetized.
Variety
Variety is defined as the different types of data we can now use. Data today looks
very different than data from the past. We no longer just have structured data (name,
phone number, address, financials, etc) that fits nice and neatly into a data table.
Today’s data is unstructured.
Veracity
Last, but certainly not least there is veracity. Veracity is the quality or trustworthiness
of the data. Just how accurate is all this data?
1.4.2 BIG DATA IN BUSINESS OR INDUSTRY

Big data can deliver value in almost any area of business or society:
1 It improves our health care: Government agencies can now predict flu outbreaks
and track them in real time and pharmaceutical companies are able to use big
data.
2 It helps us to improve security: Government and law enforcement agencies use
big data to foil terrorist attacks and detect cyber crime
3 It allows sport stars to boost their performance: Sensors in balls, cameras on the
pitch and GPS trackers on their clothes allow athletes to analyze and improve
upon what they do.
1.5 The Economics of the Telecommunications Industry

Telecommunications is a classic example of a network industry, experiencing network
effects, economies of scale, economies of scope, and related barriers to entry. It is
important to note, however, that many of these related barriers to entry may be
artificially created or exacerbated by state-induced constraints (i.e. as a regulated
monopoly).
14
Economies of scale can exist on either the supply or the demand side. On the
supply side, economies of scale exist where the average costs per unit of output
decrease with the increase in the scale or magnitude of the output being produced by a
firm.4 In telecommunications services for example, it does not cost much, if anything
(i.e. assuming that the network does not require expansion), for the service provider to
connect one more customer to the existing network. On the demand side, economies
of scale are often referred to as a “network effect” or “positive externality,” whereby
the addition of one more customer to the network increases the aggregate social value
of the network beyond the private value gained by the additional customer. In
telecommunications markets, network effects commonly serve to preserve the market
position of the incumbent network provider, and often give it a “first mover”
advantage when markets are opened to competition. Importantly, in countries in
which there is low population density or low demand for a particular service, only a
limited number of networks may be able to be sustained, thus adding to the network
effect of the dominant service provider. 6 Interestingly, economies of scale have not
been as prevalent in some developing countries where, lacking fixed-line networks,
providers have found it more advantageous to build wireless (i.e. mobile) networks.
The ability to sustain multiple networks, however, usually depends, in part, on both
the technology chosen and the population of the particular geographic area.
Supply-side economies of scope exist when it is cheaper for one firm to
produce (i.e. through joint production) and sell two or more products together, than
can a number of individual firms producing each good separately.7 In
telecommunications, for example, once a network is in place, local calling can be
inexpensively combined on a network (i.e. “bundled”) with other products and
services, such as optional local features, long distance calling, internet services,
television, and so on. When consumers value the range of services provided by a
single telecommunications carrier, it is known as a demand-side economy of scope.8
As various telecommunications technologies converge (e.g. voice and data
technologies), economies of scope are becoming more prevalent. Interestingly, this
process of convergence is also bringing about increased competition. With the
introduction of digitalisation, whereby all network traffic (i.e. whether voice, data, or
video) takes the same digital form, the distinction between voice and data has eroded,
allowing services formerly classified as “data” to compete in the provision of “voice”
15
services. Accordingly, formerly different networks (e.g. cable television, wireless, and
broadband) may have the potential to compete, and in some cases already are
competing, against the traditional public switched telephone network (“PSTN”).
Importantly, if legal or regulatory barriers shield economies of scale and
scope from competitive forces, market failure may result. Market failure occurs when
resources are misallocated or allocated inefficiently (i.e. this includes misallocation in
both the static and dynamic sense), resulting in lost value, wasted resources, or some
other non-optimal outcome. Market failures generally lead to higher prices than
would be charged under competitive conditions. This, in turn, leads to restricted
output (i.e. unless the regulated monopolist can perfectly discriminate among its
customers), and ultimately a loss to consumer welfare. Since regulated monopolists
are generally immune from competitive pressures, they do not have the signals or
incentives to minimize costs, undertake efficient business practices, or engage in
innovative technological change. Furthermore, regulators have often proven
ineffective in replicating such signals and incentives. Given both the prospects for and
the benefits of competition in the telecommunications industry, it is important to
avoid regulatory measures that protect incumbent operators from market forces.
1.6 INTRODUCTION TO HADOOP

Apache Hadoop is an open source, Scalable, and Fault tolerant framework written in
Java. It efficiently processes large volumes of data on a cluster of commodity
hardware. Hadoop is not only a storage system but is a platform for large data storage
as well as processing.
Hadoop is an open-source tool from the ASF – Apache Software Foundation.
Open source project means it is freely available and we can even change its source
code as per the requirements. If certain functionality does not fulfill your need then
you can change it according to your need. Most of Hadoop code is written by Yahoo,
IBM, Face book, Cloudera. It provides an efficient framework for running jobs on
multiple nodes of clusters. Cluster means a group of systems connected via LAN.
Apache Hadoop provides parallel processing of data as it works on multiple machines
simultaneously.
16
Apache Hadoop is not only a storage system but is a platform for data storage
as well as processing. It is scalable (as we can add more nodes on the fly), Fault
tolerant (Even if nodes go down, data processed by another node).
Following characteristics of Hadoop make it a unique platform:
1. Flexibility to store and mine any type of data whether it is structured, semi-
structured or unstructured. It is not bounded by a single schema.
2. Excels at processing data of complex nature. Its scale-out architecture divides
workloads across many nodes. Another added advantage is that its flexible
file-system eliminates ETL bottlenecks.
3. Scales economically, as discussed it can deploy on commodity hardware.
Apart from this its open-source nature guards against vendor lock.
Available In most other countries, the copper access network of the former
incumbents remains the only network over which end-user access can be provided at
reasonable cost. Accordingly, competition continues to be mostly “intramodal” in
those countries where new entry is conditional on access to the incumbent network.
In the case of mobile wireless services, although the fixed costs of serving a
particular customer are not as significant as in a fully wired network, there are
substantial fixed costs associated with the roll out of a network with adequate
geographic coverage. Furthermore, the potential for competition in wireless services
varies with both demand and population density. That is, while demand and density
are sufficiently high in large cities to sustain many competing networks of base
stations, the potential for competing networks is lower in low demand/low density
areas. In addition, mobile infrastructure competition is limited, in some countries, by
constraints on the amount of available spectrum (i.e. the frequency bands which are
dedicated to specific mobile services, such as GSM or 3G). In some countries,
spectrum scarcity originates from artificial allocation and licensing constraints and not
from binding physical limitations. Assigning more frequencies, making better use of
frequencies, and allowing frequency trading help to minimize such constraints.
1.7 WHAT COMES UNDER BIG DATA

1.7.1 Types of Big Data
Big data involves the data produced by different devices and applications. Given
below are some of the fields that come under the umbrella of Big Data.
17
Figure 1.1 Big Data
1. Black Box Data: It is a component of helicopter, airplanes, and jets, etc. It
captures voices of the flight crew, recordings of microphones and earphones,
and the performance information of the aircraft.
2. Social Media Data: Social media such as Facebook and Twitter hold
information and the views posted by millions of people across the globe.
3. Stock Exchange Data: The stock exchange data holds information about the
‘buy’ and ‘sell’ decisions made on a share of different companies made by the
customers.
4. Power Grid Data: The power grid data holds information consumed by a
particular node with respect to a base station.
5. Transport Data: Transport data includes model, capacity, distance and
availability of a vehicle.
6. Search Engine Data: Search engines retrieve lots of data from different
databases.
1.7.2 Hadoop Architecture
Hadoop framework includes following four modules:
 Hadoop Common: These are Java libraries and utilities required by other
Hadoop modules. These libraries provides filesystem and OS level
18
abstractions and contains the necessary Java files and scripts required to start
Hadoop.
 Hadoop YARN: This is a framework for job scheduling and cluster resource
management.

 Hadoop Distributed File System (HDFS™): A distributed file system that
provides high-throughput access to application data.

 Hadoop MapReduce: This is YARN-based system for parallel processing of
large data sets.
1.7.3 The Internet
The Internet’s development continued with the spin-off of MILNET to support DOD
operations and new mission-focused computer networks developed by the Department
of Energy, NASA, and NSF, together with broader networks such as USENET and
BITNET. In 1986, NSF launched its NSFNet program, an effort to link (among other
things) a number of U.S. academic supercomputing centers. NSFNet was envisioned
as a general high-speed network that could link many other academic or research
networks (including the ARPANET) using the research results and operational
experience obtained from ARPANET. The NSFNet proved very successful. Its
upgrade to handle growing demand and other subsequent changes opened the door for
the participation of the private sector in the network.9
The Internet was highly successful in meeting the original vision of enabling
computers to communicate across diverse networks and in the face of heterogeneous
made innovation underlying communications technologies.10 Its success—measured
in terms of commercial investment, wide use, and large installed base—is also widely
understood to have in the Internet much harder over time. (Innovation in the Internet’s
architecture proper should be distinguished from innovative uses of the network,
which have flourished as a direct consequence of the Internet’s flexible, general-
purpose design.) CSTB’s Looking Over the Fence at Networks: A Neighbor’s View of
Networking Research characterized this problem in terms of three types of potential
ossification: intellectual.
1.7.4 Big Data Technologies
Big data technologies are important in providing more accurate analysis, which may
lead to more concrete decision-making resulting in greater operational efficiencies,
cost reductions, and reduced risks for the business.
19
To harness the power of big data, you would require an infrastructure that can manage
and process huge volumes of structured and unstructured data in realtime and can
protect data privacy and security.
There are various technologies in the market from different vendors including
Amazon, IBM, Microsoft, etc., to handle big data. While looking into the technologies
that handle big data, we examine the following two classes of technology:
1.7.5 Operational Big Data
This includes systems like MongoDB that provide operational capabilities for real-
time, interactive workloads where data is primarily captured and stored.
NoSQL Big Data systems are designed to take advantage of new cloud computing
architectures that have emerged over the past decade to allow massive computations
to be run inexpensively and efficiently. This makes operational big data workloads
much easier to manage, cheaper, and faster to implement.
Some NoSQL systems can provide insights into patterns and trends based on
real-time data with minimal coding and without the need for data scientists and
additional infrastructure.
Analytical Big Data
This includes systems like Massively Parallel Processing (MPP) database systems and
MapReduce that provide analytical capabilities for retrospective and complex analysis
that may touch most or all of the data.
MapReduce provides a new method of analyzing data that is complementary to the
capabilities provided by SQL, and a system based on MapReduce.
1.8 CONCLUSION
We have entered an era of Big Data. Through better analysis of the large volumes of
data that are becoming available, there is the potential for making faster advances in
many scientific disciplines and improving the profitability and success of many
enterprises. However, many technical challenges described in this paper must be
addressed before this potential can be realized fully. The challenges include not just
the obvious issues of scale, but also heterogeneity, lack of structure, error-handling,
privacy, timeliness, provenance, and visualization, at all stages of the analysis pipeline
from data acquisition to result interpretation. These technical challenges are common
across a large variety of application domains, and therefore not cost-effective to
address in the context of one domain alone.
20
CHAPTER 2
INTRODUCTION OF ORGANIZATION
2.1 ABOUT CENTUM ELECTRONICS LTD.
2.1.1 INTRODUCTION
Established in 1994, Centum Electronics Ltd is one of the leading Indian companies in
electronics product distribution and services with an ISO TS 16949:2009 Quality
Certification.
Centum was founded in 1994 in Bangalore, India. Since then, Centum has
rapidly grown into a diversified electronics company with operations in North
America, EMEA and Asia. The company offers a broad range of products and
services across different industry segments. It has continuously invested in
strengthening its design & product development capabilities while developing deep
domain knowledge in the segments it operates in. Centum has also established truly
world-class manufacturing facilities with cutting edge infrastructure as well as a
global supply chain capable of delivering products with high quality and reliability.
The company is basically engaged in designing and manufacturing of advanced
electronics systems, subsystems, and components. These products cater to various
segments such as communications, military, space, automotive and industrial
electronics.
Centum Group has global design strength of over 630 design engineers
specialized in Electronic Hardware, Embedded Software, FPGA, analog, RF, Power
and mechanical domains. They work together in multidisciplinary teams to realize
customized products for mission critical applications in high technology segments.
For the last 25 years company have been helping their customers to turn their ideas
into products. A key contributor to Centum Group’s growth has been the strong
relationships forged with international customers and partners. This customer-focused
approach coupled with Centum’s culture hinged on the core-values of Technology-
Teamwork-Trust has resulted in a track-record of high quality products & services and
excellent execution ability.
21
Furthermore, company has its global locations in Europe, North America and
India enable us to work closely with their international customers while providing
competitive development costs by managing the onshore/offshore mix. Company also
provides the flexibility for customers to choose between world class production
facilities in India’s high tech capital-Bangalore, or at another supplier of their choice.
The company markets its products to majors like Nortel, Lucent, Nokia, BI
Technologies, Marconi, LG, Samsung, ITI, HTL, Punjab Communication, Crompton
Greaves, L&T, ABB and Deltron to name few. The company has received approval
from Regional Centre for Military Airworthiness to use in military and aerospace
applications.
The company is known as the largest design, manufacturing and exporting
company for frequency–controlled products in India. These products are used in
various applications such as aerospace, military, telecom, etc. the company is engaged
in designing and manufacturing of Crystals and Oscillators. Crystals are used in
Stratum applications. The company manufactures high–end products such as DC
converters for defense and space segment. Centum Electronics has received an order
worth Rs. 4.3crores to supply sub–systems to a defense sector unit.
The Company designs and exports electronic products in the analog, digital,
mixed signal, radio frequency and microelectronics domains. It operates through
Electronic System Design and Manufacturing (ESDM) segment. It is primarily
involved in the manufacture of Advanced Microelectronics Modules, Frequency
Control Products, Printed Circuit Board Assembly (PCBA) and Resistor Networks
catering to the communications, military, aerospace and industrial electronics markets.
It also offers a range of manufacturing and test solutions, including box builds, system
integration, PCBA and electromechanical assemblies. Its services include design
services, electronics manufacturing solutions and mechanical solutions. Its defense
products include Laser Receivers, Electronic Delay Units, Missile Interface Units,
Onboard Computers and Microelectronics Products. Its space products include Sun
Sensors, Earth Sensors and Reaction Wheel Controllers.
22
2.1.2 CHAIRMAN OF CENTUM ELECTRONIC LTD.
Fig: 2.1 MR. APPARAO MALLAVARAPU
At Centum, we are committed to managing our business in a manner that creates a

positive impact on society. To this end, we have several initiatives that cater to the
development of our local community and beyond. Our key programs are focused on
education and children as we strongly believe that children will shape the future of our
country. We also constantly strive to improve the quality of life for our workforce.
MR. APPARAO MALLAVARAPU
Chairman Centum electronics ltd.
2.1.3 COMPANY’S VISION AND MISSION
VISION
To create value by contributing to the success of our customers, by being your
innovation partner for design & manufacturing solutions in high technology areas.
MISSION
To keep people at the centre of everything we do.
To uphold a transparent work culture and ethical business practices in everything we
do.
To foster a culture of innovation, free thinking and empowerment, in everything we
do.
To fuel the passion of excellence in everything we do.
23
OUR VALUES
1 CUSTOMER RELATIONSHIP– Customer relationship is the heart of every
business and more so at Centum. Customers are not just one of the stakeholders
for our business but our reason to do business, maintaining delightful Customer
Relationship is our forte.
2 TEAMWORK– At Centum, Teamwork is coming together of a group of highly
motivated people who are committed to achieving organization goal and willing
to be held accountable at the same time for their actions and results.
3 OPENNESS & TRUST– The first and foremost aspect of openness is trust?
Fairness? Centum values its employees and believes in a work environment
encompassing Openness & Trust, in all of its communications and actions.
4 INTEGRITY– At Centum, Integrity is the foundation of our reputation. We
follow highest ethical and moral standards, and display honesty in all our
actions, methods and measures.
5 EXCELLENCE– At Centum, we strive for Excellence in all that we do however
big or small the task may be, and are never content with being the second best.
6 SOCIAL RESPONSIBILITY– As a responsible corporate citizen Centum
endeavors to have a positive impact on the greater society that we serve. Social
responsibility is intertwined in our self-belief and work ethics.
BUILD TO SPECIFICATION
1 Convenience of a single Point of Contact for Design/Engineering,

Industrialization and Manufacturing which reduces the need for multiple
interfaces at each stage of the project.
2 Faster time-to-market.
3 Facilitates a Design-To-Cost (DTC) approach and reduces the Total Cost of
Ownership (TCO).
4 Better product Lifecycle management as we are able to proactively and more
effectively manage issues such as obsolescence, performance upgrades, market-
specific localization and cost reduction.
24
2.1.4 WHY CENTUM ELECTRONICS LTD?
At Centum, we offer a wide range of manufacturing solutions focused on a High mix,

High Complexity products in high technology segment. These ranges from Printed
Circuit Board assemblies to Complex box builds; LRU’s and full system integration.
We help customers realize challenging products by having customer-focused teams
that leverage our streamlined processes and systems and adapt them to the specific
requirements of the customer and product where necessary.
By providing scalable manufacturing solutions and a flexible, proactive

approach to managing the supply chain and lifecycle related challenges; we help
customers achieve their goals of lower Total Cost of Ownership and reduced time-to-
market among others.
Our customers operate in highly regulated markets and environments where

quality and reliability is of paramount importance. Kaizen and Six Sigma lean
manufacturing are an integral part of the culture at Centum and are used to drive
continuous improvement and operational excellence in our processes to deliver high-
quality products to our customers. We have certifications in the specific industries we
operate in as well as Environmental certifications, ESD, Occupational Health and
safety certifications.
2.1.5 OVERVIEWS AND CAPABILITIES
World class manufacturing services hinge on state of the art facilities and a truly
competent professional team. Centum manufacturing facilities are located in
Bangalore, with a total of 350,000 sq. ft of production area. Over the years, we have
developed and invested in many new processes and capabilities to meet the most
complex and critical requirements of our customer.
At Centum, we have automated IT tools and technologies to manage the

complete shop floor system with product traceability back to each phase of the
manufacturing process. We provide a full range of test solutions be it to board level or
complete system. We identify and implement the most suitable test strategy to deliver
a fully working, quality product. We execute tests at several points in the electronics
hardware manufacturing flow such as In-Circuit, Boundary scan, JTAG & continuity
25
tests. A dedicated team is capable of developing fully Automated Test Equipment
(ATE) based on modular & general purpose equipment for testing the complete
functionality of a fully assembled product.
2.1.6 MICROELECTRONICS
Centum is India’s largest Microelectronics Design and Manufacturing Company with
more than 15 years of credible deliveries to Telecom, Industrial, Defense and Space
markets. The manufacturing is carried out on the state-of-the-art facilities in Class-
10K clean rooms with critical processes done in Class-100 LFTs.
DESIGN CAPABILITIES:
1 Design of Chip and Wire Microelectronics as per MIL-PRF-38534 guidelines.

2 Microelectronics Designs for Analog, Digital and Mixed Signal Circuits.
3 Low and High Frequency Microelectronics.
4 Design of high Power Microelectronics using Alumina, Beryllia and Aluminum
Nitride.
5 Special techniques for handling high currents.
6 Multilayer Microelectronics.
2.1.7 INDUSTRIALIZATION
Whether customers are introducing a new product in the market, transitioning from
in-house manufacturing to outsourced services or transferring production for
competitive advantages, Centum team of experienced engineers make this a seamless
process. At every step during the transfer, a dedicated cross functional team works
closely with our customers to identify the challenges and assume complete ownership
of the different phases to accomplish the goals for successful product
industrialization.
NEW PRODUCT INTRODUCTION (NPI): is a process that customizes all of the

necessary skills and techniques needed to prepare customers products for volume
production. Centum Electronics, as an Electronics Manufacturing Services (EMS)
provider has a long history of executing Product Transfers and developing processes
for manufacturing of new products for global customers. We draw on our experience
in design, manufacturing, engineering and supply chain management to drive value
for our customers in the NPI phase. The New Product Introduction process plays a
26
very important and pivotal role in the achievement of a sustainable product fulfillment
strategy.
DESIGN FOR EXCELLENCE (DFX): We provide customers with the option to
validate their designs to address issues that may arise in later stages of the product
lifecycle. Design for Manufacturability/Assembly (DFM/A) and Design for
Testability (DFT) are the key components of the analysis, which provide the feedback
to the design teams to make the appropriate changes. We offer this service for
products designed internally as well as for customer designed products. The latest
DFx methodologies are applied to achieve lower production costs without
compromising on compliance with the appropriate quality standards.
The NPI process also focusses on developing SCM strategies that will support product
launch and production ramp while clearly defining and minimizing liabilities.
Material and supplier selection is optimized for cost, availability, service support,
lifecycle, RoHS compliance and other parameters.
AT CENTUM, OUR APPROACH TOWARDS NPI FOCUSSES ON THE

FOLLOWING 5 BASIC PRINCIPLES:
1 Pass qualification first time.

2 100 % on time delivery.
3 Zero scraps in NPI.
4 Zero escalation from customer.
5 Ability to ramp up.
At the project outset, a program manager is assigned to coordinate internal

resources and establish a baseline schedule. This work includes developing task lists,
assigned responsibilities, and estimated completion dates. Progress reports are
published on a weekly basis, ensuring timely and effective communication to all
concerned. The composition of the NPI team, from both a technical and administrative
perspective, is determined by the nature of the project and timelines for completion.
From prototype to pre-production to final product release, all appropriate resources
are applied and the team works with a constant sense of urgency to meet all NPI
Objectives.
27
2.1.8 MANUFACTURING
MOLDING:
1 We specialize in Thermo-plastic Injection molding, Thermo-set Injection and

Compression molding facilities.
2 State-of-the-art molding facility with high end CNC closed loop molding
machines with microprocessor, use of De-humidifiers, MTC’s & Auto loaders.
3 Capable insert molding / horizontal molding / Vertical molding.
4 One stop solution for all kinds of engineering plastic products.
5 Thermo-plastic materials filled & unfilled like -PA, PC, PS, PP, ABS, PBT,
PPS, PET, POM, TPE, Stanyl, Ultramide, Grivoty, Kynar, all grades of clear
and white color materials.
6 Thermo-set plastic like -PS, SMC, BMC, Melopas, ‘X’-Pro, Ralopal, Bakelite,
Typcolite.
STAMPING:
1 We have hydraulic presses to cater the high accuracy sheet metal components,
capable of processing sheet thickness from 0.05mm to 8 mm thick.
2 Complex bending and profiled Progressive tools, compound dies, multi stage
dies, bending, forming, drawing die components can be processed.
3 We process all kinds of Steel, CRCA, Copper and Copper alloys, Aluminum
and Aluminum Alloys, Alu-Zinc, Covar, Invar & SS.
4 We also support various surface finish operations like – plating, anodizing,
tinning, Chrome finish, Nickel finish other finishes.
ASSEMBLY:
1 Electrical, electronic, and medical assemblies and sub-assemblies.

2 Electro-mechanical assemblies / Box builds and enclosures.
3 Assemblies including – plastics, sheet metal parts, standard brought out items
(likes fasteners, springs).
4 Integrated with PCBA and electronics.
28
5 Switch gear assemblies, MCCB assemblies, MCB assemblies, OESA & SFU
assemblies.
6 Bus-bars, surge arrestors, core arrestors, High voltage bracket assemblies for
electrical industries.
2.1.9 LOGISTICS AND CHAIN SUPPLY
Centum offers to its customers value through innovative supply chain solutions,
global sourcing support and local procurement, careful inventory management,
advanced
Component.
Figure 2.2 LOGISTICS AND CHAIN SUPPLY
Centum creates value by increasing speed to market for all NPI (New Product
Introduction) through a Customized solution thus enhancing cost saving through faster
production ramp-up from its Low Cost manufacturing locations. Centum supply chain
solutions are built to cater to high-mix, high-technology medium to low volume
requirements
29
STRATEGIC SOURCING & SUPPLY CHAIN ARCHITECT – SOURCING
STRATEGY
Centum uses advanced sourcing process and have a direct access to manufacturers &
Tier 1 distributors which offers its customers:
1 Optimized commodity based supply chain solutions via strategic global supply
base.
2 Global Price Management, Procurement, and Risk Mitigation.
3 Supply chain Performance Management.
4 Continuous Focus on Cost Management through commodity approaches and
supply chain solutions and strategies based on customer dynamics and industry
served.
5 Use of IT tools/infrastructure from RFQ management with Global suppliers.
LOCALIZATION & VERTICAL INTEGRATION OFFERING
Centum provides technical and commercial resources in domestic market to source
and qualify custom mechanical components and assemblies to meet unique supply
chain customer requirements. A comprehensive offering includes:
1 In-house vertical integration capability.

2 In-house tool engineering & design solutions.
3 In-house injection moulding & small metal stamping capabilities.
PLANNING & PROCUREMENT SERVICES
Centum comprehends that same planning process does not fit the dynamics of
different Customers. Centum has strong Customer Focus Teams (CFT) that work in
tandem with Planning & Purchase Team in terms of setting up Customer demand
management, map existing supply chain processes and use planning and analytical
tools to design a supply chain capable of meeting customer-specific requirements for
reliability, responsiveness, flexible and cost.
Customized supply chain solutions are based on five primary Supply Chain models
that Advanced Planning supports with support of BaaN ERP solutions:
1 Strategy & Planning: Build-to-Forecast or Build/Configure-to-Order.

2 Demand & Supply Management: Order Fulfillment.
30
2.1.10 AFTER SALES SERVICES
Centum’s customers are global OEM’s who work in highly regulated markets and
environments. Often these products are safety critical and have long lifecycles. We
offer a wide range of aftermarket services to enable our customer deliver exceptional
customer satisfaction.
Figure 2.3 AFTER SALES SERVICES
Depot-level maintenance and repair and Refurbishment services for damaged/faulty

products or for minor upgrades are provided as a standalone service as well as for
products designed or manufactured at Centum. We achieve quick turnaround times as
a result of skilled engineers who employ structured debugging approach to identify 31
and rectify complex problems. We also provide logistical support to get the products
to and from the end user.
2.2 ABOUT C-DAC PVT.LTD.
2.2.1 HISTORY
Centre for Development of Advanced Computing (C-DAC) is the premier R&D

organization of the Ministry of Electronics and Information Technology (MeitY) for
carrying out R&D in IT, Electronics and associated areas. Different areas of C-DAC,
had originated at different times, many of which came out as a result of identification
of opportunities.
1 The setting up of C-DAC in 1988 itself was to built Supercomputers in context

of denial of import of Supercomputers by USA. Since then C-DAC has been
undertaking building of multiple generations of Supercomputer starting from
PARAM with 1 GF in 1988.
2 Almost at the same time, C-DAC started building Indian Language Computing
Solutions with setting up of GIST group (Graphics and Intelligence based Script
Technology); National Centre for Software Technology (NCST) set up in 1985
had also initiated work in Indian Language Computing around the same period.
3 Electronic Research and Development Centre of India (ER&DCI) with various
constituents starting as adjunct entities of various State Electronic Corporations,
had been brought under the hold of Department of Electronics and
Telecommunications (now MeitY) in around 1988. They were focusing on
various aspects of applied electronics, technology and applications.
4 With the passage of time as a result of creative ecosystem that got set up in C-
DAC, more areas such as Health Informatics, etc., got created; while right from
the beginning the focus of NCST was on Software Technologies; similarly C-
DAC started its education & training activities in 1994 as a spin-off with the
passage of time, it grew to a large efforts to meet the growing needs of Indian
Industry for finishing schools.
C-DAC has today emerged as a premier R&D organization in IT&E

(Information Technologies and Electronics) in the country working on strengthening
32
national technological capabilities in the context of global developments in the field
and responding to change in the market need in selected foundation areas. In that
process, C-DAC represents a unique facet working in close junction with MeitY to
realize nation’s policy and pragmatic interventions and initiatives in Information
Technology. As an institution for high-end Research and Development (R&D), C-
DAC has been at the forefront of the Information Technology (IT) revolution,
constantly building capacities in emerging/enabling technologies and innovating and
leveraging its expertise, caliber, skill sets to develop and deploy IT products and
solutions for different sectors of the economy, as per the mandate of its parent, the
Ministry of Electronics and Information Technology, Ministry of Communications
and Information Technology, Government of India and other stakeholders including
funding agencies, collaborators, users and the market-place.
2.2.2 PRODUCTS AND SERVICES
Company provides wide range of products; services and solutions are designed to
cater to a large market ranging from health care systems, data warehousing,
multimedia and multilingual technologies, networking solutions to technical
consultancy, training and eGovernance solutions.
PRODUCTS
1 High Performance, Grid and Cloud Computing (HGCC)

2 Multilingual Computing and Heritage Computing
3 Professional Electronics, VLSI and Embedded Systems
4 Software Technologies including Foss
5 Cyber security
6 Health Information
7 Education and training
SERVICES
1 E-Hastakshar
2 E-pramaan
3 Mobile seva
33
4 Cyber Forensic Analysis, Training and laboratory development
5 E-Raktkosh
6 E-Aushadhi
7 Vulnerability Assessment and Penetration Testing (VAPT)
8 Data centre services
2.2.3 RESEARCH AND DEVELOPMENT
The primary activity in all centers of C-DAC is research and development in specific
areas of information and communication technology and electronics (ICTE). Across
all these centers, we span a wide range of topics in ICTE. Broadly, we can divide the
R&D activities into two broad classes: the enabling technologies and application
verticals. The research activities are usually driven by specific application areas, and
hence mostly applied in nature.
Figure 2.4 RESEARCH AND DEVELOPMENT
34
Based on the vision charted by the parent ministry (MCIT), international trends,
Indian requirements, etc CDAC identifies significant thrust areas for focus across the
various centers. The thrust areas, at present, are:
1 High Performance Computing including the series of supercomputers, Garuda

national grid initiative, development of scientific computing applications on
these platforms, and cloud computing.
2 Multi-lingual Computing spanning the entire range from fonts and encoding to
speech and language translation, which includes fonts for Indian languages,
encoding standards, information extraction and retrieval, machine aided
translation, speech recognition and synthesis, etc.
3 Professional Electronics covering electronic devices and embedded systems.
This area covers work such as underwater electronics, software radio, ubiquitous
computing.
4 Information and Cyber Security including intrusion detection and prevention,
malware analysis, cyber forensics, network security, etc.
5 Health Informatics including hospital information systems, electronic medical
records, telemedicine, and cancer networks. Health Informatics technologies
help to improve quality of healthcare, reduce medical errors, reduce healthcare
costs, increase administrative efficiency, and expand access to affordable
healthcare. C-DAC has developed several technologies and software/hardware
solutions in this area; many of which are deployed and are being used in
practice.
6 Software Technologies including e-governance solutions, e-learning
technologies, geometrics, open source software, accessibility, etc.
At present, most of the R&D activities in the various centres fall into these categories.
You can gather more information about the specific projects and systems by clicking
on these thematic areas on the left of the screen. We are building a more organised,
searchable repository of such information - watch this space!
As mentioned earlier, most of the R&D work has a driving practical application of
importance. Most of the works are, therefore, actually deployed and in use by
concerned.
35
2.2.4 COMPANY’S VISION AND MISSION
VISION
To emerge as the premier R&D institution for the design, development and
deployment of world class electronic and IT solutions for economic and human
advancement.
MISSION
C-DAC's Mission statement has evolved after deep thought and in consultation with
the members of C-DAC. The Mission Statement as defined below, reflects the fabric
and character of C-DAC and integrates in the fulfillment of C-DAC's Vision.
1 Expand the frontiers of Electronics and Information Technology.

2 Evolve technology solutions - architectures, systems and standards for
nationally important problems.
3 Achieve rapid and effective spread of knowledge by overcoming language
barriers through application of technologies.
4 Share experience and know-how to help build advanced competence in the areas
of Electronics and Information Technology.
5 Bring benefits of Electronics and Information Technology to society.
6 Utilize the Intellectual Property generated by converting it to business
opportunity.
CORE VALUES
1 Innovation and pursuit of excellence in 'Applications', 'Research' and

'Technology' (ART).
2 Integrity, transparency and openness in all our actions.
3 Working with and through the 'Teams' is our way of life.
4 Distributed Leadership across the organization at various levels.
5 Strive to continuously improve our processes and quality.
6 Address the needs of the society through user centric initiatives.
36
2.2.5 HR PHILOSOPHY AND POLICY
C-DAC's HR philosophy holds the employee, its 'Member' (of the C-DAC family), as
being at the center stage of the organization.
C-DAC's achievements are clearly attributed to the strong human resource

spread over its ten different locations across the country, and manifest itself in
employee centric policies, a great learning platform, freedom to think, innovate,
challenging areas to work and an informal work culture that is second to none.
C-DAC greatly values the contribution of its employees and keeps its human
resource issues under constant review, drawing inputs in this regard both through
internal climate surveys and the external environmental considerations.
The management at C-DAC is confident that with a sound combination of a

good hierarchal and functionally flat structure, an effective inter personal
communication system already in place, newer HR thrust areas currently on stream,
the employee centric HR philosophy will get a further boost and reinforce a belief in
the minds of IT professionals that C-DAC is truly a great a place to be in.
2.2.6 COMPENSATION BENEFITS
C-DAC believes in keeping its members financially comfortable and has accordingly
designed its compensation package, which is balanced and is in keeping with the
industry standards. At the induction level (fresh graduates), our compensation
package is amongst the best in the industry comprising of the basic salary, Dearness
allowance, City Compensatory allowance and other benefits like Contributory
Provident Fund, liberalized Leave Travel Concession, House Rent Allowance or
Leased Accommodation, Conveyance Reimbursements, Reimbursement of
books/journals/newspapers, Children Education allowance, Credit Card fees and long
term loans. Our Medical Scheme for the employees is comprehensive and among one
of the best covering the employee's and his / her dependant family members.
We also recognize and award performance through a well-designed

performance linked compensation scheme. As the employee advances rapidly in
his/her career, matching compensation also becomes due.
37
2.2.7 QUALITY POLICY
Figure 2.5 QUALITY POLICY
38
2.3 CONCLUSION
The technology currently in use at Prasar Bharati has improved significantly. At this stage there
has been advancement in signal reception quality as systems have changed from analog to digital
with the advancement in different audio and video compression techniques. For Doordardhan,
DTH (Direct to Home Service) satellite services have become more user friendly and also
evolution of SDTV into HDTV have made it a popular product among the people of India. It is
also accessible from remote areas with more channel and better reception. In AIR also, there have
been a lot of advancements being made such as transmission of more value added services such as
RDS, SCA, etc. These value added services have added a different taste in listening radio. Also,
presently the Prasar Bharti, i.e. Doordardhan is all going to broadcast the commonwealth games to
be held in New Delhi in HDTV. Slowly but steadily, the AIR and Doordardhan family of Prasar
Bharti is growing day by day and working for the next generation broadcasting technique in India.
39
CHAPTER 3
COMPANY PROJECT
3.1 Hadoop Distributed File System

Hadoop File System was developed using distributed file system design. It is run
on commodity hardware. Unlike other distributed systems, HDFS is highly
faulttolerant and designed using low-cost hardware.
HDFS holds very large amount of data and provides easier access. To store such
huge data, the files are stored across multiple machines. These files are stored in
redundant fashion to rescue the system from possible data losses in case of failure.
HDFS also makes applications available to parallel processing
Features of HDFS
 It is suitable for the distributed storage and processing.

 Hadoop provides a command interface to interact with HDFS.

 The built-in servers of namenode and datanode help users to easily check the
status of cluster.

 Streaming access to file system data.

 HDFS provides file permissions and authentication.
HDFS Architecture
Given below is the architecture of a Hadoop File System.
.
Fig: 3.1 HDFS archiecture
40
3.1.2 NameNode
The namenode is the commodity hardware that contains the GNU/Linux operating
system and the namenode software. It is a software that can be run on commodity
hardware. The system having the namenode acts as the master server and it does the
following tasks:
1 Manages the file system namespace.
2 Regulates client’s access to files.
3 It also executes file system operations such as renaming, closing, and opening
files and directories.
Datanode
The datanode is a commodity hardware having the GNU/Linux operating system and
datanodesoftwareForeverynode Commodityhardware/SystemCommodityhardware/Sy
stem in a cluster, there will be a datanode. These nodes manage the data storage of
their system.
1 Datanodes perform read-write operations on the file systems, as per client

request.
2 They also perform operations such as block creation, deletion, and replication
according to the instructions of the namenode.
Block
Generally the user data is stored in the files of HDFS. The file in a file system will be
divided into one or more segments and/or stored in individual data nodes. These file
segments are called as blocks. In other words, the minimum amount of data that
HDFS can read or write is called a Block. The default block size is 64MB, but it can
be increased as per the need to change in HDFS configuration.
3.1.3 Goals Of HDFS:
1 Fault detection and recovery : Since HDFS includes a large number of

commodity hardware, failure of components is frequent. Therefore HDFS
should have mechanisms for quick and automatic fault detection
and recovery. 41
2 Huge datasets : HDFS should have hundreds of nodes per cluster to manage the
applications having huge datasets.
3 Hardware at data : A requested task can be done efficiently, when the

computation takes place near the data. Especially where huge datasets are
involved, it reduces the network traffic and increases the throughput.
3.1.4 Hadoop InstallationIn
windows vmnet1 configure the 4.4.4.10
open vmware on linux (master1) login with root
*******Configure the ipaddress permanently
then click on network icon ---->vpn connection ---->configure vpn ----> wired ----
connect automatically---->select eth0---->edit ----> give the ip address 4.4.4.100 and
subnetmask 255.0.0.0 ----> apply ---->close
rc on desktop ----> open terminal
service network restart
ifconfig
ping 4.4.4.10 (windows machine)
*****stop the firewall in Linux
service iptables stop
chkconfig iptables off
------Now open the putty , login with root and do your work
******Add a user who is superuser of hadoop
useradd hadoop
passwd hadoop
rm -rf file name ////// delete file

42
******** to delete user****
userdel r 'username'
3.1.5 Install and configure java:
-----winscp the java software(jdk-7u65-linux-i586_2.rpm) into root
rpm -i jdk-7u65-linux-i586_2.rpm
java -version
----if java is installed then uninstalled it
rpm -qa|grep -i java
rpm -e --nodeps java-1.6.0-openjdk-1.6.0.0-1.45.1.11.1.el6.i686
rpm -e --nodeps java-1.5.0-gcj-1.5.0.0-29.1.el6.i686
----if java version shows incorrect then check all the latest jdk version---
rpm -qa |egrep -i "(jdk|jre)" /// if error then run it
rpm -e --nodeps ("latest version of ur jdk")
----then again unpack the jdk---
rpm -i jdk-7u65-linux-i586_2.rpm
java -version
******Configre /etc/hosts
43
-----add the lines
nano /etc/hosts
4.4.4.100 master1
4.4.4.101 slave1
4.4.4.102 slave2
4.4.4.10 windows
save it
cat /etc/host
3.1.6 Change the host name and IP address:
login with root
-----Configure the ipaddress permanently
then click on network icon ---->vpn connection ---->configure vpn ----> wired ----
connect automatically ----delete eth0---->select eth1---->edit ----> give the ip address
4.4.4.2 and subnetmask 255.0.0.0 ----> apply ---->close
-----Change the hostname
nano /etc/sysconfig/network //////run it in vm there you want to change name//////
change master1 to slave1
-----reboot your slave1
init 6
start master1 and login with putty by hadoop user in master1
3.1.7 Handshaking between Master1 & slave1:
Generating the public key

44
ssh-keygen -t rsa -P ""
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
----Copy the public key of master1 to slave1 from master1
scp -r ~/.ssh slave1:~/
enter the pasword hadoop
ssh slave1 ----From master we can direct login to slave without password
exit
-----winscp hadoop v2 into master1 by hadoop user
ls
tar -xzf hadoop-2.4.1.tar.g
mv hadoop-2.4.1 hadoop
tar -xzf user for untar ///////// use for extract file //////// mv hadoop-2.4.1 hadoop for
rename
3.1.8 Exporting Hadoop config path:
export HADOOP_HOME=/home/hadoop/Hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export JAVA_HOME=/usr/java/jdk1.8.0_171-i586
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
45
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export JAVA_HOME=/usr/java/jdk1.8.0_171-i586
****save the path permanently in .bash_profile file
cd
nano .bash_profile
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export JAVA_HOME=/usr/java/jdk1.7.0_65
save the file
3.1.9 Test Hadoop is installed or not:
hadoop version
----write the hadoop conf in core-site.xml
cd $HADOOP_HOME/etc/hadoop
46
3.1.10 Nano Core site:
nano core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master1:9000</value>
</property>
</configuration>
save the file
3.2 Nano Hdfs site:
nano hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>
47
save the file
3.2.1 Nano mapred -site:
nano mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
save the file
3.2.2 Nanano yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
save the fileno Yarn-site
3.2.2.1Deployment of Hadoop
nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
save the file
48
nano slaves
delete localhost
master1
slave1
slave2
save the file
----copy the hadoop dir from master1 to slave1
cd
scp -r hadoop slave1:/home/hadoop
scp .bash_profile slave1:/home/hadoop
scp -r hadoop slave2:/home/hadoop
scp .bash_profile slave2:/home/Hadoop
----now goto slave1
ssh slave1
nano slaves
delete the master1,slave2 line
save the file
exit
49
----now goto slave2
ssh slave2
nano slaves
delete the master1,slave1 line
save the file
exit
-----From master1 format the hdfs
hdfs namenode -format ////// from hadoop user //////
start-all.sh /// use to start hadoop ////
stop-all.sh /////stop hadoop///
slaves.sh /usr/java/jdk1.7.0_65/bin/jps ///// to check that hadoop is runnin or not////
show daemons
slaves.sh /usr/java/jdk1.8.0_171-i586/bin/jps
hadoop dfsadmin -report|more
hadoop fs -ls /
hadoop fs -mkdir /srm
**we use put command when we send file from one hdfs to other ext4**
create a user abc in your existing abc vm and install hadoop in it
now i need additniol of 30.76gb and run hadoop on it with a new
user change block size for existing file
50
download file from webui
3.2.2.2 Basic Commands of HDFS:
Starting HDFS
Initially you have to format the configured HDFS file system, open namenode (HDFS
server), and execute the following command.
$ hadoop namenode –format
After formatting the HDFS, start the distributed file system. The following command
will start the namenode as well as the data nodes as cluster.
$ start-dfs.sh
Listing Files in HDFS
After loading the information in the server, we can find the list of files in a directory,
status of a file, using ‘ls’. Given below is the syntax of ls that you can pass to a
directory or a filename as an argument.
$ $HADOOP_HOME/bin/hadoop fs -ls <args>
Inserting Data into HDFS
Assume we have data in the file called file.txt in the local system which is ought to be
saved in the hdfs file system. Follow the steps given below to insert the required file
in the Hadoop file system.
Step 1
You have to create an input directory.
$ $HADOOP_HOME/bin/hadoop fs -mkdir /user/input
Step 2
Transfer and store a data file from local systems to the Hadoop file system using tput
command.
51
$ $HADOOP_HOME/bin/hadoop fs -put /home/file.txt /user/input
Step 3
You can verify the file using ls command.
$ $HADOOP_HOME/bin/hadoop fs
-ls /user/input
Retrieving Data from HDFS
Assume we have a file in HDFS called outfile. Given below is a simple demonstration
for retrieving the required file from the Hadoop file system.
Step 1
Initially, view the data from HDFS using cat command.
$ $HADOOP_HOME/bin/hadoop fs -cat /user/output/outfile
Step 2
Get the file from HDFS to the local file system using get command.
$ $HADOOP_HOME/bin/hadoop fs -get /user/output/ /home/hadoop_tp/
Shutting Down the HDFS
You can shut down the HDFS by using the following command.
$ stop-dfs.sh
3.3 HIVE
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It

resides on top of Hadoop to summarize Big Data, and makes querying and analyzing
easy.
Hive: It is a platform used to develop SQL type scripts to do MapReduce

operations.
52
Hive is not
1 A relational database
2 A design for OnLine Transaction Processing (OLTP)
3 A language for real-time queries and row-level updates
Features of Hive
1 It stores schema in a database and processed data into HDFS.
2 It is designed for OLAP.
3 It provides SQL type language for querying called HiveQL or HQL.
4 It is familiar, fast, scalable, and extensible.
3.3.1 OLTP vs OLAP:
Figure 3.2 OLTP vs OLAP:

53
3.3.2 HIVE INSTALLATION:
step 1
change your core-site.xml means add the below lines
nano /home/hadoop/hadoop/etc/hadoop/core-site.xml
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
-----The above lines are to added with existing core-site.xml in master1 and slave1 or
in all master and slave
-----then reboot your hadoop
step 2
hiveserver2 &
----To check it run
jobs
54
step 3
----test the hiveserver in client/server mode using beeline
hadoop fs -chmod -R 777 /tmp/hadoop-yarn
beeline
!connect jdbc:hive2://1.1.1.61:10000/hive hive org.apache.hive.jdbc.HiveDriver
beeline> select * from gold;
beeline> select count(*) from gold;
3.3.3:HIVE SERVER 2 & BEE LINE
*****hiveserver runs on portno 10000
* For hive 2 in hadoop 2 configuration we had to change the core-site.xml file

for client,server in hadoop 2
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
55
-----The above lines are to added with existing core-site.xml in master1 and slave1 or
in all master and slave
-----then reboot your hadoop
****The default username and password is hive and hive
export HIVE_HOME=/sda3/hive
export PATH=$PATH:$HIVE_HOME/bin
hive-hwi-*.war: No such file or directory
hive --service hwi
http://documentation.altiscale.com/hdfs-trash-and-skiptrash
http://getindata.com/blog/tutorials/creating-hdfs-snapshots-and-recovering-
a-deleted-file/
/sda3/ant/lib
export ANT_LIB=/sda3/ant/lib
!connect jdbc:hive2://1.1.1.2:10000 hive hive org.apache.hive.jdbc.HiveDriver
0: jdbc:hive2://1.1.1.2:10000> SHOW TABLES;
hiveserver2 &
export ANT_LIB=/sda3/ant/lib
hadoop fs -chmod -R 777 /tmp/hadoop-yarn
hadoop fs -chmod -R 777 /user
beeline
beeline> !connect jdbc:hive2://4.4.4.1:10000 hive hive

org.apache.hive.jdbc.HiveDriver
56
0: jdbc:hive2://1.1.1.2:10000> show tables;
3.3.3.1 create an Internal and External table in HIVE
-----internal and external table in hive
nano a1.txt
1001,debolina,4567
1002,arpita,1234
1003,Jonny,7890
1004,Raja,4567
1005,Sumit,4567
1006,Sumit,6789
1001,debolina,4567
1002,arpita,1234
1003,Jonny,7890
1004,Raja,4567
1005,Sumit,4567
1006,Sumit,6789
1001,debolina,4567
1002,arpita,1234
1003,Jonny,7890
save the file
hive
CREATE TABLE emp1 (eid INT, en STRING,mob string) ROW FORMAT

DELIMITED FIELDS TERMINATED BY ',';
57
! hadoop fs -ls /user/hive/warehouse;
! hadoop fs -ls /user/hive/warehouse/emp1;
! hadoop fs -put a1.txt /user/hive/warehouse/emp1;
drop table emp1
! hadoop fs -ls /user/hive/warehouse/
****WHen we drop a internal table the data and the structure both are deleted.
CREATE external TABLE emp2 (eid INT, en STRING,mob string) ROW

FORMAT DELIMITED FIELDS TERMINATED BY ',';
! hadoop fs -put a1.txt /user/hive/warehouse/emp2;
drop table emp2
show tables;
****WHen we drop a external table the data will remain but the structure both are
deleted.
------We can create our own database;
*****By default the database name is default and the path is /user/hive/warehouse
describe database default;
create database project;
show databases;
58
describe database project;
-----The path is hdfs://master1:9000/user/hive/warehouse/project.db
-----To use the database
use project;
create table tt1(t int);
describe formatted tt1;
-----The path of tt1 is hdfs://master1:9000/user/hive/warehouse/project.db/tt1;
----executing hive command without hive prompt.
hive -e "select * from emp;"
----Executing hive with hive script
nano h1.hql
select * from emp;
save the file
hive -f h1.hql
3.3.4 HIVE SQl:
----To display the no of records
select * from em1 where en='RAJESH' limit 10;
----To insert the records in append mode
LOAD DATA LOCAL INPATH '/hom
e/biadmin/book1.csv' INTO TABLE emp2;
----To insert the records in overwrite mode
59
LOAD DATA LOCAL INPATH '/home/biadmin/book1.csv' overwrite INTO
TABLE emp2;
----Distinct clause
select distinct en from em1;
select count(distinct en) from em1;
select count(en) from em1 group by en;
----To Unique IPs Sorted by Number of Accesses
SELECT ip, COUNT(1) AS cnt FROM www_access GROUP BY ip ORDER BY

cnt DESC LIMIT 30;
---To insert data from one table to another insert into
table emp2 select * from em1 limit 4; insert overwrite
table emp2 select * from em1 limit 4; ALTER
TABLE em1 ADD COLUMNS (sal decimal);
ALTER TABLE em1 rename to em2;
ALTER TABLE old_table_name REPLACE COLUMNS (col1 TYPE, ...);
----To copy the structure of a table into a new table
create table emp2 like emp;
hadoopfs-putemp.txt /user/hive/warehouse/emp2/emp2.txt
3.3.5 HIVE PATIONING AND BUCKETING
what is bucketting?
Buckets in hive is used in segregating of hive table-data into multiple files or

directories. it is used for efficient querying. The data i.e. present in that partitions
60
can be divided further into Buckets The division is performed based on Hash of
particular columns that we selected in the table. Buckets use some form of Hashing
algorithm at back end to read each record and place it into buckets.
creating partion table..........
create table all_states(state string, District string,Enrolments string) row

format delimited fields terminated by ',';
* create a file of the states nano
states
westbengal,south24parganas,700021
rajasthan,churu,302102
bihar,sasaram,848421
-- loading the data from ext4 to hive
load data local inpath '/home/hadoop/states' into table all_states;
* now create a table for partioning...
create table state_part(District string,Enrolments string) PARTITIONED BY(state

string);
For partition we have to set this property
set hive.exec.dynamic.partition.mode=nonstrict
now to insert---
INSERT OVERWRITE TABLE state_part PARTITION(state) SELECT

district,enrolments,state from all_states;
now from hdfs check the number of partion----
61
hadoop fs -ls /user/hive/warehouse/all_states
// we created a table with the name "emp1"
create table empl (first_name string, job_id int, department string, salary
string, country string) row format delimited fields terminated by ',';
nano emplyee.txt
ravi,123,cs,50000,india
vicky,13,manager,52000,india
hardik,233,cs,34000,india
dev,132,cs,64000,india
varun,133,cs,23000,india
save
//create a samplebucket table-
create table samplebucket (first_name string, job_id int, department string,

salary string, country string) clustered by (country) into 4 buckets row format
delimited fields terminated by ',';
//Here we are loading data into sample bucket from employees table
from st insert overwrite table samplebucket select first_name,job_id,department,

salary, country;
hadoop fs -ls /user/hive/warehouse/samplebucket
62
3.3.6 HBASE:
HBASE INTRODUCTION
3.3.7 INSTALLATIION
start-all.sh
winscp hbase-0.98.12-hadoop1-bin.tar.gz to
master1 tar -xzf hbase-0.98.12-hadoop1-bin.tar.gz
mv hbase-0.98.12-hadoop1 hbase
now we have to write the java path in
cd hbase/conf
nano hbase-env.sh
save the file
cd
mkdir /home/hadoop/hbase_data
mkdir /home/hadoop/zookeeper
cd hbase/conf
nano hbase-site.xml
//Here you have to set the path where you want HBase to store its files.
<property>
<name>hbase.rootdir</name>
<value>file:/home/hadoop/hbase_data</value>
</property>
//Here you have to set the path where you want HBase to store its built
in zookeeper files.
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper</value>
</property>
cd /home/hadoop/hbase/bin
./start-hbase.sh
after it start
63
./hbase shell
slaves.sh /usr/java/jdk1.7.0_65/bin/jps
****You will see a new process as HMaster
Run all commandas from hbase shell
----To create a table in hbase as emp with one column family
create 'emp', 'en'
----To view the table
list
----Add a value in the column family like en.firstname, en.lastname
----Add the first row
put 'emp', 'row1', 'en:first_name', 'anjan'
put 'emp', 'row1', 'en:last_name', 'chatterjee'
put 'emp', 'row1', 'en:alias', 'trainer'
----Add the 2nd row
put 'emp', 'row2', 'en:first_name', 'sharthak'
put 'emp', 'row2', 'en:middle_name', 'kumar'
put 'emp', 'row2', 'en:last_name', 'acharjee'
put 'emp', 'row2', 'en:alias', 'learner'
----To display the records
scan 'emp'
----To display the records of a particular row
get 'emp', 'row2'
get 'emp', 'row1'
64
delete 'emp', 'row1', 'en:alias', 1470371756760
scan 'emp'
----To view the column name of a table
describe 'emp'
----create a table
create 'mobile', 'num'
list
drop 'mobile'
disable 'mobile'
describe 'mobile'
drop 'mobile'
----To disable a table
disable 'mobile'
list
----To display a table disable or enable
describe 'mobile'
drop 'mobile'
****Before drop the table must be disables
----To view the version of hbase
version
----To view the username of hbase
whoami
65
----To create a table with multi columns
create 'stud', 'sn', 'mob'
put 'stud', 'r1', 'sn:first_name', 'anjan'
put 'stud', 'r1', 'sn:last_name', 'chatterjee'
put 'stud', 'r1', 'mob:num1', '1234'
put 'stud', 'r1', 'mob:num2', '4567'
----Add the 2nd row
put 'stud', 'r2', 'sn:first_name', 'Sumit'
put 'stud', 'r2', 'sn:last_name', 'Kumar'
put 'stud', 'r2', 'mob:num1', '9999'
put 'stud', 'r2', 'mob:num2', '3456'
describe 'stud'
scan 'stud
----To alter a table for delete
alter 'stud', 'delete' => 'mob'
describe 'stud'
----To count the recs
count 'emp'
delete 'stud','r2','sn'
----To display the columns of a particular row
get 'emp', 'row2', {COLUMN => ['en']} , VERSIONS => 4}
66
get 'emp', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4
get 'prod', 'row2', {COLUMN => ['cat'] , VERSIONS => 1}
----To display a particular row based on timestamp
get 'emp', 'row1', {COLUMN => 'en', TIMESTAMP => 1431566996167}
----To display the recors based on time stamp range
get 'emp', 'row1', {COLUMN => 'en', TIMERANGE => [1431566996167 ,

1431567017638]
----To display individual rows
get 'emp', 'row2'
----To display individual columns of row2
get 'emp', 'row2', {COLUMN => ['en']}
----To display the column member value
get 'emp', 'row2', {COLUMN => ['en:alias']}
----To display all the rows for the column member first_name
scan 'emp' , {COLUMN => ['en:first_name']}
----To display all the rows for the column member first_name, last_name
scan 'emp' , {COLUMN => ['en:first_name' , 'en:last_name'] }
put ‘table name’,’row ’,'Column family:column name',’new value’
delete ‘<table name>’, ‘<row>’, ‘<column name >’, ‘<time stamp>’
**to delete all cell
deleteall ‘<table name>’, ‘<row>’,
67
ZOOKEPER FRAMEWORK
ZooKeeper is a distributed co-ordination service to manage large set of hosts. Co-
ordinating and managing a service in a distributed environment is a complicated
process. ZooKeeper solves this issue with its simple architecture and API.
ZooKeeper allows developers to focus on core application logic without worrying
about the distributed nature of the application.
The ZooKeeper framework was originally built at “Yahoo!” for accessing their
applications in an easy and robust manner. Later, Apache ZooKeeper became a
standard for organized service used by Hadoop, HBase, and other distributed
frameworks. For example, Apache HBase uses ZooKeeper to track the status of
distributed data. This tutorial explains the basics of ZooKeeper, how to install and
deploy a ZooKeeper cluster in a distributed environment, and finally concludes
with a few examples using Java programming and sample applications.
What is Apache ZooKeeper Meant For?
Apache ZooKeeper is a service used by a cluster (group of nodes) to coordinate

between themselves and maintain shared data with robust synchronization
techniques. ZooKeeper is itself a distributed application providing services for
writing a distributed application.
The common services provided by ZooKeeper are as follows -
Naming service - Identifying the nodes in a cluster by name. It is similar to DNS,

but for nodes.
Configuration management - Latest and up-to-date configuration information of the

system for a joining node.
Cluster management - Joining / leaving of a node in a cluster and node status at real
time.
Leader election - Electing a node as leader for coordination purpose.
Locking and synchronization service - Locking the data while modifying it. This
mechanism helps you in automatic fail recovery while connecting other distributed
68
applications like Apache HBase.
Highly reliable data registry - Availability of data even when one or a few nodes
are down.
Distributed applications offer a lot of benefits, but they throw a few complex and
hard-to-crack challenges as well. ZooKeeper framework provides a complete
mechanism to overcome all the challenges. Race condition and deadlock are
handled using fail-safe synchronization approach. Another main drawback is
inconsistency of data, which ZooKeeper resolves with atomicity.
Benefits of ZooKeeper
Here are the benefits of using ZooKeeper -
Simple distributed coordination process
Synchronization - Mutual exclusion and co-operation between server processes.

This process helps in Apache HBase for configuration management.
Ordered Messages
SerializationEncode the data according to specific rules. Ensure your application

runs consistently. This approach can be used in Map Reduce to coordinate queue to
execute running threads reliability
3.4 CONCLUSION
Big data with predictive analytics, high performance computing systems, machine
other strategies have been used in the past and will continue to be used heavily i
computational physics. By using these big data-related systems, engineers and scien
able to more easily design cars, airplanes, and other vehicles. They have also bee
accurately predict daily weather as well as natural disasters. Big data analytics has af
of computational physics almost since computational physics was created. Comput
with big data will continue to improve the quality of everyday life even though there
challenges. To overcome With the advent of Hadoop 2.0—the new release of Had
Yet Another Resource Negotiator (YARN)—the beyond–Map-Reduce (MR) thin
solidified. As is explained in this chapter, Hadoop YARN separates the resource s
69
from the MR paradigm. It should be noted that in Hadoop 1.0, the first-generatio
scheduling was tied with the MR paradigm—implying that the only processing that w
Hadoop Distributed File System (HDFS) data was the MR type or its orchestrations.
70
CHAPTER-4
CONCLUSION
Embedded systems are found in every field of both engineering and science. To meet
the demands of these applications, the designer faces a lot of challenges in terms of:
processor selection, IDE selection, and also different I/O components.
The main objective of the system is to provide a mechanism with which

designers are able to select suitable components for embedded systems design. These
components include Microcontrollers, Operating systems and IDEs. An Embedded
system deals with the graphical capture of specifications and requirements for
embedded systems development, as specified by designers considering the
application. Based on, these specifications and the available data, a list of the most
suitable embedded components is made available to the designer. The designer can
then select these components that are required for embedded systems design. A vast
database is usually maintained where specifications of different available embedded
components are stored. This database is searched whenever designers specify their
requirements, and the components that match with the requirements of the designers
are displayed.
These systems not only provide a mechanism for selection but also allow
designers to compare different components based on their applicability. This system
also allows designers to study the various existing embedded systems; their
characteristics and design issues, and their applications. Thus, this system acts as a
pre-design tool for embedded system designers, where planning of design and
development strategies can be done easily and efficiently.
Embedded Systems are used in a wide range of applications, and it is the task
of a designer to select a suitable processor from the vast list of processors, ranging
from 4 bit to 64 bit with various architectures. Embedded systems performance is
mostly dependent on the type of processor being used. Each processor is characterized
by a set of parameters and there are almost infinite alternatives which are available for
a designer to select the right or suitable processor which is a multidimensional search
problem. It is efficient because we have provided the weights and percentage of
71
accuracy to the designer, to specify the requirements and application characteristics,
which are considered in the selection. These can be altered as per the specific needs of
the project. It has user friendly GUI, through which the designer can alter the
specifications, and specify the new requirements for selection of these components for
a given application.
Today's embedded systems developers play a vital role in selecting the right
tool for development because there are a large number of IDEs that are available in
the market, with various features ranging from simple tool chain to complex tool
chain. In this work we have presented the common tool chain used for the embedded
systems development, and the selection criteria and evaluation criteria of IDEs.
Finally, we have presented the performance metrics of the selected IDEs with four
different applications. These results are achieved out with commercially available
IDEs which are widely available in the market.
Thus, it provides an integrated solution by providing the methods to choose

the best suitable processor, IDEs, and RTOS for a specific project to implement it
efficiently. Thus it acts as a pre-design tool for the designer.
Over the years, Arduino has gone out to become a huge success and a
common name among students. With Google deploying it, people’s imagination has
went out to much higher level than before. A developer in the annual GOOGLE IO
conference said “when Arduino and Android coming together, this really proves
“INFINITY EXISTS” in the future”. I think a study on Arduino and practical
experiments on Arduino must be added for UG courses of engineering, to help
students to leverage their talents, and imagination.
72
CHAPTER-5
REFERENCES
References
• http://www.arduino.cc -Arduino Official webpage
• http://en.wikipedia.org/wiki/Arduino -wikipedia
 Jonathan Oxer,Hugh Blemings “Practical Arduino-cool projects for open

source hardware”

 Simon monk “30 ARDUINO PROJECTS for the EVIL GENIUS”.


 http://www.arduino.cc/playground/Projects/ArduinoUser



























73

Final Report For 7th Sem Divyanshu

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Final Report For 7th Sem Divyanshu

Enviado por

Direitos autorais:

Formatos disponíveis

A

Practical Training Report

Submitted To: Submitted By:

RAJASTHAN TECHNICAL UNIVERSITY, KOTA

Er. Pradeep Jha Er. Sudhanshu Vashisth

A scholarly and quality work like designing of any seminar can be

We are very much thankful to Er. HIMANSHU ARORA, PRINCIPAL,

As we know that an engineer has to serve an industry, for that one

Chapter 1 Introduction 10-

1.1. Introduction To Big Data 10

1.2. Foundation of Big Data 11

1.3. The Internet Effects And Personal Computers 12

1.4.1. Five V’s of Big Data 13

1.7. What Comes Under Big Data 17

1.7.2. Hadoop Architecture 18

1.7.5. Operational Big Data 20

2.1. About Centum Electronics Ltd. 21

2.1.2. Chairman of Centum Electronics Ltd. 23

2.1.5. Overviews And Capabilities 25

2.1.10. After Sales Services 29

2.2.2. Products And Services 33

2.2.4. Company’s Vision And Mission 35

2.2.6. Compensation Benefits 37

Chapter 3 Company Project 40-72

3.1.3. Goals of HDFS 41

3.1.5. Install And Configure Java 43

3.1.6. Change The Host Name IP Address 44

3.1.8. Exporting Hadoop Configuration Path 45

3.1.10. Nano Core Site 47

3.2.1. Nano Mapred Site 48

3.2.2.1. Deployment of Hadoop 48-50

3.2.2.2. Basics Commands of HDFS 51

3.3.1. OLTP Vs OLAP 52

3.3.3. Hive Server 2 And Bee Line 54-55

3.3.3.1. Create An Internal And External Table In Hive 56-58

3.3.5. Hive Pationing And Bucketing 60

Chapter 4 Conclusion 71-72

1.1 INTRODUCTION OF BIG DATA

Personal computers came on the market in 1977, when microcomputers were

2 HTML: HyperText Markup Language. The formatting language of the web.

1.4 BIG DATA

1.4.2 BIG DATA IN BUSINESS OR INDUSTRY

1.5 The Economics of the Telecommunications Industry

1.6 INTRODUCTION TO HADOOP

1.7 WHAT COMES UNDER BIG DATA

2.1 ABOUT CENTUM ELECTRONICS LTD.

Fig: 2.1 MR. APPARAO MALLAVARAPU

At Centum, we are committed to managing our business in a manner that creates a

MR. APPARAO MALLAVARAPU

Chairman Centum electronics ltd.

2.1.3 COMPANY’S VISION AND MISSION

1 Convenience of a single Point of Contact for Design/Engineering,

At Centum, we offer a wide range of manufacturing solutions focused on a High mix,

By providing scalable manufacturing solutions and a flexible, proactive

Our customers operate in highly regulated markets and environments where

2.1.5 OVERVIEWS AND CAPABILITIES

At Centum, we have automated IT tools and technologies to manage the

1 Design of Chip and Wire Microelectronics as per MIL-PRF-38534 guidelines.

NEW PRODUCT INTRODUCTION (NPI): is a process that customizes all of the

AT CENTUM, OUR APPROACH TOWARDS NPI FOCUSSES ON THE

1 Pass qualification first time.

At the project outset, a program manager is assigned to coordinate internal

1 We specialize in Thermo-plastic Injection molding, Thermo-set Injection and