Escolar Documentos
Profissional Documentos
Cultura Documentos
BACHELOR OF TECHNOLOGY
In
<COMPUTER SCIENCE ENGINEERING>
COMPUTER SCIENCE
ARYA COLLEGE OF ENGINEERING & RESEARCH CENTRE
SP-40, RIICO INDUSTRIAL AREA, KUKAS, JAIPUR, RAJASTHAN
CERTIFICATE
This is to certify that the practical training seminar report on “BIG DATA
HADOOP” at “ CDAC” from period of training is submitted by “ Divyanshu
Bansal(15EAYCS033)” in partial fulfillment for the award of degree of
Bachelor of Technology in COMPUTER SCIENCE has been found satisfactory
and is approved for submission.
During my seminar period all the staff members of department have helped us
With their skills. Here by I express our sincere thanks to seminar coordinator
Er.Sudhanshu Vashisth whose valuable guidance and kind cooperation,
without which this seminar would have not been possible.
PREFACE
2.1.7. Industrialization 27
2.1.8. Manufacturing 28
2.1.9. Logistics And Chain Supply 28
2.2.1. History 32
2.3. Conclusion 39
ii
3.2. Nano HDFS Site 47
3.3.7. Installation 68
3.4. Conclusion 69-70
Chapter 5.Reference……………………………...............................................………......73
CHAPTER 1
INTRODUCTION TO BIG DATA
Big data means really a big data; it is a collection of large datasets that
cannot be processed using traditional computing techniques. Big data is not merely a
data, rather it has become a complete subject, which involves various tools,
techniques and frameworks.
Big Data has been described by some Data Management pundits (with a bit
of a snicker) as “huge, overwhelming, and uncontrollable amounts of information.”
In 1663, John Graunt dealt with “overwhelming amounts of information” as well,
while he studied the bubonic plague, which was currently ravaging Europe. Graunt
used statistics and is credited with being the first person to use statistical data
analysis. In the early 1800s, the field of statistics expanded to include collecting and
analyzing data.
The evolution of Big Data includes a number of preliminary steps for its
foundation, and while looking back to 1663 isn’t necessary for the growth of data
volumes today, the point remains that “Big Data” is a relative term depending on who
is discussing it. Big Data to Amazon or Google is very different than Big Data to a
medium-sized insurance organization, but no less “Big” in the minds of those
contending with it
10
1.2 FOUNDATIONS OF BIG DATA
Data became a problem for the U.S. Census Bureau in 1880. They estimated it would
take eight years to handle and process the data collected during the 1880 census, and
predicted the data from the 1890 census would take more than 10 years to process.
Fortunately, in 1881, a young man working for the bureau, named Herman Hollerith,
created the Hollerith Tabulating Machine. His invention was based on the punch cards
designed for controlling the patterns woven by mechanical looms. His tabulating
machine reduced ten years of labor into three months of labor.
In 1927, Fritz Plummer, an Austrian-German engineer, developed a means of
information magnetically on tape. Plummer had devised a method for adhering metal
stripes to cigarette papers (to keep a smokers’ lips from being stained by the rolling
papers available at the time), and decided he could use this technique to create a
magnetic strip, which could then be used to replace wire recording technology. After
experiments with a variety of materials, he settled on a very thin paper, striped with
iron oxide powder and coated with lacquer, for his patent in 1928.
During World War II (more specifically 1943), the British, desperate to crack
Nazi codes, invented a machine that scanned for patterns in messages intercepted
from the Germans. The machine was called Colossus, and scanned 5.000 characters a
second, reducing the workload from weeks to merely hours. Colossus was the first
data processor. Two years later, in 1945, John Von Neumann published a paper on the
Electronic Discrete Variable Automatic Computer (EDVAC), the first “documented”
discussion on program storage, and laid the foundation of computer architecture
today.
It is said these combined events prompted the “formal” creation of the United
States’ NSA (National Security Agency), by President Truman, in 1952. Staff at the
NSA was assigned the task of decrypting messages intercepted during the Cold War.
Computers of this time had evolved to the point where they could collect and process
data, operating independently and automatically.
1.2 THE INTERNET EFFECTS AND PERSONAL COMPUTERS
ARPANET began on Oct 29, 1969, when a message was sent from UCLA’s host
computer to Stanford’s host computer. It received funding from the Advanced
Research Projects Agency (ARPA), a subdivision of the Department of Defense.
Generally speaking, the public was not aware of ARPANET. In 1973, it connected 11
with a transatlantic satellite, linking it to the Norwegian Seismic Array. However, by
1989, the infrastructure of ARPANET had started to age. The system wasn’t as
efficient or as fast as newer networks. Organizations using ARPANET started moving
to other networks, such as NSFNET, to improve basic efficiency and speed. In 1990,
the ARPANET project was shut down, due to a combination of age and obsolescence.
The creation ARPANET led directly to the Internet.
In 1965, the U.S. government built the first data center, with the intention of storing
millions of fingerprint sets and tax returns. Each record was transferred to magnetic
tapes, and were to be taken and stored in a central location. Conspiracy theorists
expressed their fears, and the project was closed. However, in spite of its closure, this
initiative is generally considered the first effort at large scale data storage.
In 1989, a British Computer Scientist named Tim Berners-Lee came up with the
concept of the World Wide Web. The Web is a place/information-space where web
resources are recognized using URLs, interlinked by hypertext links, and is accessible
via the Internet. His system also allowed for the transfer of audio, video, and pictures.
His goal was to share information on the Internet using a hypertext system. By the fall
of 1990, Tim Berners-Lee, working for CERN, had written three basic IT commands
that are the foundation of today’s web:
12
3 URL: Uniform Resource Locator. A unique “address” used to identify each
resource on the web. It is also called a URI (Uniform Resource Identifier).
4 HTTP: Hypertext Transfer Protocol. Used for retrieving linked resources from
all across the web.
In 1993, CERN announced the World Wide Web would be free for everyone to
develop and use. The free part was a key factor in the effect the Web would have on
the people of the world. (It’s the companies providing the “internet connection” that
charge us a fee).
Volume
Volume refers to the incredible amounts of data generated each second from social
media, cell phones etc. The vast amounts of data have become so large in fact that we
can no longer store and analyze data using traditional database technology. We now
use distributed systems, where parts of the data is stored in different locations and
brought together by software.
13
Value
When we talk about value, we’re referring to the worth of the data being extracted.
Having endless amounts of data is one thing, but unless it can be turned into value it
is useless. While there is a clear link between data and insights, this does not always
mean there is value in Big Data. The most important part of embarking on a big data
initiative is to understand the costs and benefits of collecting and analyzing the data to
ensure that ultimately the data that is reaped can be monetized.
Variety
Variety is defined as the different types of data we can now use. Data today looks
very different than data from the past. We no longer just have structured data (name,
phone number, address, financials, etc) that fits nice and neatly into a data table.
Today’s data is unstructured.
Veracity
Last, but certainly not least there is veracity. Veracity is the quality or trustworthiness
of the data. Just how accurate is all this data?
14
Economies of scale can exist on either the supply or the demand side. On the
supply side, economies of scale exist where the average costs per unit of output
decrease with the increase in the scale or magnitude of the output being produced by a
firm.4 In telecommunications services for example, it does not cost much, if anything
(i.e. assuming that the network does not require expansion), for the service provider to
connect one more customer to the existing network. On the demand side, economies
of scale are often referred to as a “network effect” or “positive externality,” whereby
the addition of one more customer to the network increases the aggregate social value
of the network beyond the private value gained by the additional customer. In
telecommunications markets, network effects commonly serve to preserve the market
position of the incumbent network provider, and often give it a “first mover”
advantage when markets are opened to competition. Importantly, in countries in
which there is low population density or low demand for a particular service, only a
limited number of networks may be able to be sustained, thus adding to the network
effect of the dominant service provider. 6 Interestingly, economies of scale have not
been as prevalent in some developing countries where, lacking fixed-line networks,
providers have found it more advantageous to build wireless (i.e. mobile) networks.
The ability to sustain multiple networks, however, usually depends, in part, on both
the technology chosen and the population of the particular geographic area.
Supply-side economies of scope exist when it is cheaper for one firm to
produce (i.e. through joint production) and sell two or more products together, than
can a number of individual firms producing each good separately.7 In
telecommunications, for example, once a network is in place, local calling can be
inexpensively combined on a network (i.e. “bundled”) with other products and
services, such as optional local features, long distance calling, internet services,
television, and so on. When consumers value the range of services provided by a
single telecommunications carrier, it is known as a demand-side economy of scope.8
As various telecommunications technologies converge (e.g. voice and data
technologies), economies of scope are becoming more prevalent. Interestingly, this
process of convergence is also bringing about increased competition. With the
introduction of digitalisation, whereby all network traffic (i.e. whether voice, data, or
video) takes the same digital form, the distinction between voice and data has eroded,
allowing services formerly classified as “data” to compete in the provision of “voice”
15
services. Accordingly, formerly different networks (e.g. cable television, wireless, and
broadband) may have the potential to compete, and in some cases already are
competing, against the traditional public switched telephone network (“PSTN”).
Importantly, if legal or regulatory barriers shield economies of scale and
scope from competitive forces, market failure may result. Market failure occurs when
resources are misallocated or allocated inefficiently (i.e. this includes misallocation in
both the static and dynamic sense), resulting in lost value, wasted resources, or some
other non-optimal outcome. Market failures generally lead to higher prices than
would be charged under competitive conditions. This, in turn, leads to restricted
output (i.e. unless the regulated monopolist can perfectly discriminate among its
customers), and ultimately a loss to consumer welfare. Since regulated monopolists
are generally immune from competitive pressures, they do not have the signals or
incentives to minimize costs, undertake efficient business practices, or engage in
innovative technological change. Furthermore, regulators have often proven
ineffective in replicating such signals and incentives. Given both the prospects for and
the benefits of competition in the telecommunications industry, it is important to
avoid regulatory measures that protect incumbent operators from market forces.
16
Apache Hadoop is not only a storage system but is a platform for data storage
as well as processing. It is scalable (as we can add more nodes on the fly), Fault
tolerant (Even if nodes go down, data processed by another node).
Following characteristics of Hadoop make it a unique platform:
1. Flexibility to store and mine any type of data whether it is structured, semi-
structured or unstructured. It is not bounded by a single schema.
2. Excels at processing data of complex nature. Its scale-out architecture divides
workloads across many nodes. Another added advantage is that its flexible
file-system eliminates ETL bottlenecks.
3. Scales economically, as discussed it can deploy on commodity hardware.
Apart from this its open-source nature guards against vendor lock.
Available In most other countries, the copper access network of the former
incumbents remains the only network over which end-user access can be provided at
reasonable cost. Accordingly, competition continues to be mostly “intramodal” in
those countries where new entry is conditional on access to the incumbent network.
In the case of mobile wireless services, although the fixed costs of serving a
particular customer are not as significant as in a fully wired network, there are
substantial fixed costs associated with the roll out of a network with adequate
geographic coverage. Furthermore, the potential for competition in wireless services
varies with both demand and population density. That is, while demand and density
are sufficiently high in large cities to sustain many competing networks of base
stations, the potential for competing networks is lower in low demand/low density
areas. In addition, mobile infrastructure competition is limited, in some countries, by
constraints on the amount of available spectrum (i.e. the frequency bands which are
dedicated to specific mobile services, such as GSM or 3G). In some countries,
spectrum scarcity originates from artificial allocation and licensing constraints and not
from binding physical limitations. Assigning more frequencies, making better use of
frequencies, and allowing frequency trading help to minimize such constraints.
17
Figure 1.1 Big Data
1. Black Box Data: It is a component of helicopter, airplanes, and jets, etc. It
captures voices of the flight crew, recordings of microphones and earphones,
and the performance information of the aircraft.
2. Social Media Data: Social media such as Facebook and Twitter hold
information and the views posted by millions of people across the globe.
3. Stock Exchange Data: The stock exchange data holds information about the
‘buy’ and ‘sell’ decisions made on a share of different companies made by the
customers.
4. Power Grid Data: The power grid data holds information consumed by a
particular node with respect to a base station.
5. Transport Data: Transport data includes model, capacity, distance and
availability of a vehicle.
6. Search Engine Data: Search engines retrieve lots of data from different
databases.
1.7.2 Hadoop Architecture
Hadoop framework includes following four modules:
Hadoop Common: These are Java libraries and utilities required by other
Hadoop modules. These libraries provides filesystem and OS level
18
abstractions and contains the necessary Java files and scripts required to start
Hadoop.
Hadoop YARN: This is a framework for job scheduling and cluster resource
management.
Hadoop Distributed File System (HDFS™): A distributed file system that
provides high-throughput access to application data.
Hadoop MapReduce: This is YARN-based system for parallel processing of
large data sets.
1.7.3 The Internet
The Internet’s development continued with the spin-off of MILNET to support DOD
operations and new mission-focused computer networks developed by the Department
of Energy, NASA, and NSF, together with broader networks such as USENET and
BITNET. In 1986, NSF launched its NSFNet program, an effort to link (among other
things) a number of U.S. academic supercomputing centers. NSFNet was envisioned
as a general high-speed network that could link many other academic or research
networks (including the ARPANET) using the research results and operational
experience obtained from ARPANET. The NSFNet proved very successful. Its
upgrade to handle growing demand and other subsequent changes opened the door for
the participation of the private sector in the network.9
The Internet was highly successful in meeting the original vision of enabling
computers to communicate across diverse networks and in the face of heterogeneous
made innovation underlying communications technologies.10 Its success—measured
in terms of commercial investment, wide use, and large installed base—is also widely
understood to have in the Internet much harder over time. (Innovation in the Internet’s
architecture proper should be distinguished from innovative uses of the network,
which have flourished as a direct consequence of the Internet’s flexible, general-
purpose design.) CSTB’s Looking Over the Fence at Networks: A Neighbor’s View of
Networking Research characterized this problem in terms of three types of potential
ossification: intellectual.
1.7.4 Big Data Technologies
Big data technologies are important in providing more accurate analysis, which may
lead to more concrete decision-making resulting in greater operational efficiencies,
cost reductions, and reduced risks for the business.
19
To harness the power of big data, you would require an infrastructure that can manage
and process huge volumes of structured and unstructured data in realtime and can
protect data privacy and security.
There are various technologies in the market from different vendors including
Amazon, IBM, Microsoft, etc., to handle big data. While looking into the technologies
that handle big data, we examine the following two classes of technology:
1.7.5 Operational Big Data
This includes systems like MongoDB that provide operational capabilities for real-
time, interactive workloads where data is primarily captured and stored.
NoSQL Big Data systems are designed to take advantage of new cloud computing
architectures that have emerged over the past decade to allow massive computations
to be run inexpensively and efficiently. This makes operational big data workloads
much easier to manage, cheaper, and faster to implement.
Some NoSQL systems can provide insights into patterns and trends based on
real-time data with minimal coding and without the need for data scientists and
additional infrastructure.
Analytical Big Data
This includes systems like Massively Parallel Processing (MPP) database systems and
MapReduce that provide analytical capabilities for retrospective and complex analysis
that may touch most or all of the data.
MapReduce provides a new method of analyzing data that is complementary to the
capabilities provided by SQL, and a system based on MapReduce.
1.8 CONCLUSION
We have entered an era of Big Data. Through better analysis of the large volumes of
data that are becoming available, there is the potential for making faster advances in
many scientific disciplines and improving the profitability and success of many
enterprises. However, many technical challenges described in this paper must be
addressed before this potential can be realized fully. The challenges include not just
the obvious issues of scale, but also heterogeneity, lack of structure, error-handling,
privacy, timeliness, provenance, and visualization, at all stages of the analysis pipeline
from data acquisition to result interpretation. These technical challenges are common
across a large variety of application domains, and therefore not cost-effective to
address in the context of one domain alone.
20
CHAPTER 2
INTRODUCTION OF ORGANIZATION
2.1.1 INTRODUCTION
Established in 1994, Centum Electronics Ltd is one of the leading Indian companies in
electronics product distribution and services with an ISO TS 16949:2009 Quality
Certification.
Centum was founded in 1994 in Bangalore, India. Since then, Centum has
rapidly grown into a diversified electronics company with operations in North
America, EMEA and Asia. The company offers a broad range of products and
services across different industry segments. It has continuously invested in
strengthening its design & product development capabilities while developing deep
domain knowledge in the segments it operates in. Centum has also established truly
world-class manufacturing facilities with cutting edge infrastructure as well as a
global supply chain capable of delivering products with high quality and reliability.
The company is basically engaged in designing and manufacturing of advanced
electronics systems, subsystems, and components. These products cater to various
segments such as communications, military, space, automotive and industrial
electronics.
Centum Group has global design strength of over 630 design engineers
specialized in Electronic Hardware, Embedded Software, FPGA, analog, RF, Power
and mechanical domains. They work together in multidisciplinary teams to realize
customized products for mission critical applications in high technology segments.
For the last 25 years company have been helping their customers to turn their ideas
into products. A key contributor to Centum Group’s growth has been the strong
relationships forged with international customers and partners. This customer-focused
approach coupled with Centum’s culture hinged on the core-values of Technology-
Teamwork-Trust has resulted in a track-record of high quality products & services and
excellent execution ability.
21
Furthermore, company has its global locations in Europe, North America and
India enable us to work closely with their international customers while providing
competitive development costs by managing the onshore/offshore mix. Company also
provides the flexibility for customers to choose between world class production
facilities in India’s high tech capital-Bangalore, or at another supplier of their choice.
The company markets its products to majors like Nortel, Lucent, Nokia, BI
Technologies, Marconi, LG, Samsung, ITI, HTL, Punjab Communication, Crompton
Greaves, L&T, ABB and Deltron to name few. The company has received approval
from Regional Centre for Military Airworthiness to use in military and aerospace
applications.
The company is known as the largest design, manufacturing and exporting
company for frequency–controlled products in India. These products are used in
various applications such as aerospace, military, telecom, etc. the company is engaged
in designing and manufacturing of Crystals and Oscillators. Crystals are used in
Stratum applications. The company manufactures high–end products such as DC
converters for defense and space segment. Centum Electronics has received an order
worth Rs. 4.3crores to supply sub–systems to a defense sector unit.
The Company designs and exports electronic products in the analog, digital,
mixed signal, radio frequency and microelectronics domains. It operates through
Electronic System Design and Manufacturing (ESDM) segment. It is primarily
involved in the manufacture of Advanced Microelectronics Modules, Frequency
Control Products, Printed Circuit Board Assembly (PCBA) and Resistor Networks
catering to the communications, military, aerospace and industrial electronics markets.
It also offers a range of manufacturing and test solutions, including box builds, system
integration, PCBA and electromechanical assemblies. Its services include design
services, electronics manufacturing solutions and mechanical solutions. Its defense
products include Laser Receivers, Electronic Delay Units, Missile Interface Units,
Onboard Computers and Microelectronics Products. Its space products include Sun
Sensors, Earth Sensors and Reaction Wheel Controllers.
22
2.1.2 CHAIRMAN OF CENTUM ELECTRONIC LTD.
VISION
To create value by contributing to the success of our customers, by being your
innovation partner for design & manufacturing solutions in high technology areas.
MISSION
To keep people at the centre of everything we do.
To uphold a transparent work culture and ethical business practices in everything we
do.
To foster a culture of innovation, free thinking and empowerment, in everything we
do.
To fuel the passion of excellence in everything we do.
23
OUR VALUES
1 CUSTOMER RELATIONSHIP– Customer relationship is the heart of every
business and more so at Centum. Customers are not just one of the stakeholders
for our business but our reason to do business, maintaining delightful Customer
Relationship is our forte.
2 TEAMWORK– At Centum, Teamwork is coming together of a group of highly
motivated people who are committed to achieving organization goal and willing
to be held accountable at the same time for their actions and results.
3 OPENNESS & TRUST– The first and foremost aspect of openness is trust?
Fairness? Centum values its employees and believes in a work environment
encompassing Openness & Trust, in all of its communications and actions.
4 INTEGRITY– At Centum, Integrity is the foundation of our reputation. We
follow highest ethical and moral standards, and display honesty in all our
actions, methods and measures.
5 EXCELLENCE– At Centum, we strive for Excellence in all that we do however
big or small the task may be, and are never content with being the second best.
6 SOCIAL RESPONSIBILITY– As a responsible corporate citizen Centum
endeavors to have a positive impact on the greater society that we serve. Social
responsibility is intertwined in our self-belief and work ethics.
BUILD TO SPECIFICATION
24
2.1.4 WHY CENTUM ELECTRONICS LTD?
World class manufacturing services hinge on state of the art facilities and a truly
competent professional team. Centum manufacturing facilities are located in
Bangalore, with a total of 350,000 sq. ft of production area. Over the years, we have
developed and invested in many new processes and capabilities to meet the most
complex and critical requirements of our customer.
25
tests. A dedicated team is capable of developing fully Automated Test Equipment
(ATE) based on modular & general purpose equipment for testing the complete
functionality of a fully assembled product.
2.1.6 MICROELECTRONICS
Centum is India’s largest Microelectronics Design and Manufacturing Company with
more than 15 years of credible deliveries to Telecom, Industrial, Defense and Space
markets. The manufacturing is carried out on the state-of-the-art facilities in Class-
10K clean rooms with critical processes done in Class-100 LFTs.
DESIGN CAPABILITIES:
26
very important and pivotal role in the achievement of a sustainable product fulfillment
strategy.
DESIGN FOR EXCELLENCE (DFX): We provide customers with the option to
validate their designs to address issues that may arise in later stages of the product
lifecycle. Design for Manufacturability/Assembly (DFM/A) and Design for
Testability (DFT) are the key components of the analysis, which provide the feedback
to the design teams to make the appropriate changes. We offer this service for
products designed internally as well as for customer designed products. The latest
DFx methodologies are applied to achieve lower production costs without
compromising on compliance with the appropriate quality standards.
The NPI process also focusses on developing SCM strategies that will support product
launch and production ramp while clearly defining and minimizing liabilities.
Material and supplier selection is optimized for cost, availability, service support,
lifecycle, RoHS compliance and other parameters.
27
2.1.8 MANUFACTURING
MOLDING:
1 We have hydraulic presses to cater the high accuracy sheet metal components,
capable of processing sheet thickness from 0.05mm to 8 mm thick.
2 Complex bending and profiled Progressive tools, compound dies, multi stage
dies, bending, forming, drawing die components can be processed.
3 We process all kinds of Steel, CRCA, Copper and Copper alloys, Aluminum
and Aluminum Alloys, Alu-Zinc, Covar, Invar & SS.
4 We also support various surface finish operations like – plating, anodizing,
tinning, Chrome finish, Nickel finish other finishes.
ASSEMBLY:
28
5 Switch gear assemblies, MCCB assemblies, MCB assemblies, OESA & SFU
assemblies.
6 Bus-bars, surge arrestors, core arrestors, High voltage bracket assemblies for
electrical industries.
2.1.9 LOGISTICS AND CHAIN SUPPLY
Centum offers to its customers value through innovative supply chain solutions,
global sourcing support and local procurement, careful inventory management,
advanced
Component.
Centum creates value by increasing speed to market for all NPI (New Product
Introduction) through a Customized solution thus enhancing cost saving through faster
production ramp-up from its Low Cost manufacturing locations. Centum supply chain
solutions are built to cater to high-mix, high-technology medium to low volume
requirements
29
STRATEGIC SOURCING & SUPPLY CHAIN ARCHITECT – SOURCING
STRATEGY
Centum uses advanced sourcing process and have a direct access to manufacturers &
Tier 1 distributors which offers its customers:
1 Optimized commodity based supply chain solutions via strategic global supply
base.
2 Global Price Management, Procurement, and Risk Mitigation.
3 Supply chain Performance Management.
4 Continuous Focus on Cost Management through commodity approaches and
supply chain solutions and strategies based on customer dynamics and industry
served.
5 Use of IT tools/infrastructure from RFQ management with Global suppliers.
LOCALIZATION & VERTICAL INTEGRATION OFFERING
Centum provides technical and commercial resources in domestic market to source
and qualify custom mechanical components and assemblies to meet unique supply
chain customer requirements. A comprehensive offering includes:
Customized supply chain solutions are based on five primary Supply Chain models
that Advanced Planning supports with support of BaaN ERP solutions:
30
2.1.10 AFTER SALES SERVICES
Centum’s customers are global OEM’s who work in highly regulated markets and
environments. Often these products are safety critical and have long lifecycles. We
offer a wide range of aftermarket services to enable our customer deliver exceptional
customer satisfaction.
2.2.1 HISTORY
Company provides wide range of products; services and solutions are designed to
cater to a large market ranging from health care systems, data warehousing,
multimedia and multilingual technologies, networking solutions to technical
consultancy, training and eGovernance solutions.
PRODUCTS
SERVICES
1 E-Hastakshar
2 E-pramaan
3 Mobile seva
33
4 Cyber Forensic Analysis, Training and laboratory development
5 E-Raktkosh
6 E-Aushadhi
7 Vulnerability Assessment and Penetration Testing (VAPT)
8 Data centre services
The primary activity in all centers of C-DAC is research and development in specific
areas of information and communication technology and electronics (ICTE). Across
all these centers, we span a wide range of topics in ICTE. Broadly, we can divide the
R&D activities into two broad classes: the enabling technologies and application
verticals. The research activities are usually driven by specific application areas, and
hence mostly applied in nature.
34
Based on the vision charted by the parent ministry (MCIT), international trends,
Indian requirements, etc CDAC identifies significant thrust areas for focus across the
various centers. The thrust areas, at present, are:
At present, most of the R&D activities in the various centres fall into these categories.
You can gather more information about the specific projects and systems by clicking
on these thematic areas on the left of the screen. We are building a more organised,
searchable repository of such information - watch this space!
As mentioned earlier, most of the R&D work has a driving practical application of
importance. Most of the works are, therefore, actually deployed and in use by
concerned.
35
2.2.4 COMPANY’S VISION AND MISSION
VISION
To emerge as the premier R&D institution for the design, development and
deployment of world class electronic and IT solutions for economic and human
advancement.
MISSION
C-DAC's Mission statement has evolved after deep thought and in consultation with
the members of C-DAC. The Mission Statement as defined below, reflects the fabric
and character of C-DAC and integrates in the fulfillment of C-DAC's Vision.
CORE VALUES
36
2.2.5 HR PHILOSOPHY AND POLICY
C-DAC's HR philosophy holds the employee, its 'Member' (of the C-DAC family), as
being at the center stage of the organization.
C-DAC greatly values the contribution of its employees and keeps its human
resource issues under constant review, drawing inputs in this regard both through
internal climate surveys and the external environmental considerations.
C-DAC believes in keeping its members financially comfortable and has accordingly
designed its compensation package, which is balanced and is in keeping with the
industry standards. At the induction level (fresh graduates), our compensation
package is amongst the best in the industry comprising of the basic salary, Dearness
allowance, City Compensatory allowance and other benefits like Contributory
Provident Fund, liberalized Leave Travel Concession, House Rent Allowance or
Leased Accommodation, Conveyance Reimbursements, Reimbursement of
books/journals/newspapers, Children Education allowance, Credit Card fees and long
term loans. Our Medical Scheme for the employees is comprehensive and among one
of the best covering the employee's and his / her dependant family members.
37
2.2.7 QUALITY POLICY
38
2.3 CONCLUSION
The technology currently in use at Prasar Bharati has improved significantly. At this stage there
has been advancement in signal reception quality as systems have changed from analog to digital
with the advancement in different audio and video compression techniques. For Doordardhan,
DTH (Direct to Home Service) satellite services have become more user friendly and also
evolution of SDTV into HDTV have made it a popular product among the people of India. It is
also accessible from remote areas with more channel and better reception. In AIR also, there have
been a lot of advancements being made such as transmission of more value added services such as
RDS, SCA, etc. These value added services have added a different taste in listening radio. Also,
presently the Prasar Bharti, i.e. Doordardhan is all going to broadcast the commonwealth games to
be held in New Delhi in HDTV. Slowly but steadily, the AIR and Doordardhan family of Prasar
Bharti is growing day by day and working for the next generation broadcasting technique in India.
39
CHAPTER 3
COMPANY PROJECT
.
Fig: 3.1 HDFS archiecture
40
3.1.2 NameNode
The namenode is the commodity hardware that contains the GNU/Linux operating
system and the namenode software. It is a software that can be run on commodity
hardware. The system having the namenode acts as the master server and it does the
following tasks:
3 It also executes file system operations such as renaming, closing, and opening
files and directories.
Datanode
The datanode is a commodity hardware having the GNU/Linux operating system and
datanodesoftwareForeverynode Commodityhardware/SystemCommodityhardware/Sy
stem in a cluster, there will be a datanode. These nodes manage the data storage of
their system.
2 They also perform operations such as block creation, deletion, and replication
according to the instructions of the namenode.
Block
Generally the user data is stored in the files of HDFS. The file in a file system will be
divided into one or more segments and/or stored in individual data nodes. These file
segments are called as blocks. In other words, the minimum amount of data that
HDFS can read or write is called a Block. The default block size is 64MB, but it can
be increased as per the need to change in HDFS configuration.
then click on network icon ---->vpn connection ---->configure vpn ----> wired ----
connect automatically---->select eth0---->edit ----> give the ip address 4.4.4.100 and
subnetmask 255.0.0.0 ----> apply ---->close
ifconfig
------Now open the putty , login with root and do your work
useradd hadoop
passwd hadoop
userdel r 'username'
rpm -i jdk-7u65-linux-i586_2.rpm
java -version
----if java version shows incorrect then check all the latest jdk version---
rpm -i jdk-7u65-linux-i586_2.rpm
java -version
******Configre /etc/hosts
43
-----add the lines
nano /etc/hosts
4.4.4.100 master1
4.4.4.101 slave1
4.4.4.102 slave2
4.4.4.10 windows
save it
cat /etc/host
3.1.6 Change the host name and IP address:
then click on network icon ---->vpn connection ---->configure vpn ----> wired ----
connect automatically ----delete eth0---->select eth1---->edit ----> give the ip address
4.4.4.2 and subnetmask 255.0.0.0 ----> apply ---->close
init 6
ssh slave1 ----From master we can direct login to slave without password
exit
ls
mv hadoop-2.4.1 hadoop
tar -xzf user for untar ///////// use for extract file //////// mv hadoop-2.4.1 hadoop for
rename
export HADOOP_HOME=/home/hadoop/Hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export JAVA_HOME=/usr/java/jdk1.8.0_171-i586
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
45
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export JAVA_HOME=/usr/java/jdk1.8.0_171-i586
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
cd
nano .bash_profile
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export JAVA_HOME=/usr/java/jdk1.7.0_65
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
hadoop version
cd $HADOOP_HOME/etc/hadoop
46
3.1.10 Nano Core site:
nano core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master1:9000</value>
</property>
</configuration>
nano hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>
47
save the file
nano mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
3.2.2.1Deployment of Hadoop
nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_65
48
nano slaves
delete localhost
master1
slave1
slave2
cd
ssh slave1
cd $HADOOP_HOME/etc/hadoop
nano slaves
exit
49
----now goto slave2
ssh slave2
cd $HADOOP_HOME/etc/hadoop
nano slaves
exit
show daemons
slaves.sh /usr/java/jdk1.8.0_171-i586/bin/jps
hadoop fs -ls /
**we use put command when we send file from one hdfs to other ext4**
50
download file from webui
Starting HDFS
Initially you have to format the configured HDFS file system, open namenode (HDFS
server), and execute the following command.
After formatting the HDFS, start the distributed file system. The following command
will start the namenode as well as the data nodes as cluster.
$ start-dfs.sh
After loading the information in the server, we can find the list of files in a directory,
status of a file, using ‘ls’. Given below is the syntax of ls that you can pass to a
directory or a filename as an argument.
Assume we have data in the file called file.txt in the local system which is ought to be
saved in the hdfs file system. Follow the steps given below to insert the required file
in the Hadoop file system.
Step 1
Step 2
Transfer and store a data file from local systems to the Hadoop file system using tput
command.
51
$ $HADOOP_HOME/bin/hadoop fs -put /home/file.txt /user/input
Step 3
$ $HADOOP_HOME/bin/hadoop fs
-ls /user/input
Assume we have a file in HDFS called outfile. Given below is a simple demonstration
for retrieving the required file from the Hadoop file system.
Step 1
Step 2
Get the file from HDFS to the local file system using get command.
You can shut down the HDFS by using the following command.
$ stop-dfs.sh
3.3 HIVE
1 A relational database
Features of Hive
step 1
nano /home/hadoop/hadoop/etc/hadoop/core-site.xml
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
-----The above lines are to added with existing core-site.xml in master1 and slave1 or
in all master and slave
step 2
hiveserver2 &
jobs
54
step 3
beeline
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
55
-----The above lines are to added with existing core-site.xml in master1 and slave1 or
in all master and slave
export HIVE_HOME=/sda3/hive
export PATH=$PATH:$HIVE_HOME/bin
http://documentation.altiscale.com/hdfs-trash-and-skiptrash
http://getindata.com/blog/tutorials/creating-hdfs-snapshots-and-recovering-
a-deleted-file/
/sda3/ant/lib
export ANT_LIB=/sda3/ant/lib
hiveserver2 &
export ANT_LIB=/sda3/ant/lib
beeline
56
0: jdbc:hive2://1.1.1.2:10000> show tables;
nano a1.txt
1001,debolina,4567
1002,arpita,1234
1003,Jonny,7890
1004,Raja,4567
1005,Sumit,4567
1006,Sumit,6789
1001,debolina,4567
1002,arpita,1234
1003,Jonny,7890
1004,Raja,4567
1005,Sumit,4567
1006,Sumit,6789
1001,debolina,4567
1002,arpita,1234
1003,Jonny,7890
hive
****WHen we drop a internal table the data and the structure both are deleted.
show tables;
****WHen we drop a external table the data will remain but the structure both are
deleted.
*****By default the database name is default and the path is /user/hive/warehouse
show databases;
58
describe database project;
use project;
nano h1.hql
hive -f h1.hql
59
LOAD DATA LOCAL INPATH '/home/biadmin/book1.csv' overwrite INTO
TABLE emp2;
----Distinct clause
hadoopfs-putemp.txt /user/hive/warehouse/emp2/emp2.txt
what is bucketting?
60
can be divided further into Buckets The division is performed based on Hash of
particular columns that we selected in the table. Buckets use some form of Hashing
algorithm at back end to read each record and place it into buckets.
states
westbengal,south24parganas,700021
rajasthan,churu,302102
bihar,sasaram,848421
set hive.exec.dynamic.partition.mode=nonstrict
now to insert---
61
hadoop fs -ls /user/hive/warehouse/all_states
create table empl (first_name string, job_id int, department string, salary
string, country string) row format delimited fields terminated by ',';
nano emplyee.txt
ravi,123,cs,50000,india
vicky,13,manager,52000,india
hardik,233,cs,34000,india
dev,132,cs,64000,india
varun,133,cs,23000,india
save
//Here we are loading data into sample bucket from employees table
62
3.3.6 HBASE:
HBASE INTRODUCTION
3.3.7 INSTALLATIION
start-all.sh
winscp hbase-0.98.12-hadoop1-bin.tar.gz to
master1 tar -xzf hbase-0.98.12-hadoop1-bin.tar.gz
mv hbase-0.98.12-hadoop1 hbase
now we have to write the java path in
cd hbase/conf
nano hbase-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_65
save the file
cd
mkdir /home/hadoop/hbase_data
mkdir /home/hadoop/zookeeper
cd hbase/conf
nano hbase-site.xml
//Here you have to set the path where you want HBase to store its files.
<property>
<name>hbase.rootdir</name>
<value>file:/home/hadoop/hbase_data</value>
</property>
//Here you have to set the path where you want HBase to store its built
in zookeeper files.
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper</value>
</property>
cd /home/hadoop/hbase/bin
./start-hbase.sh
after it start
63
./hbase shell
slaves.sh /usr/java/jdk1.7.0_65/bin/jps
****You will see a new process as HMaster
Run all commandas from hbase shell
list
scan 'emp'
64
delete 'emp', 'row1', 'en:alias', 1470371756760
scan 'emp'
describe 'emp'
----create a table
list
drop 'mobile'
disable 'mobile'
describe 'mobile'
drop 'mobile'
disable 'mobile'
list
describe 'mobile'
drop 'mobile'
version
whoami
65
----To create a table with multi columns
describe 'stud'
scan 'stud
describe 'stud'
count 'emp'
delete 'stud','r2','sn'
66
get 'emp', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4
----To display all the rows for the column member first_name
----To display all the rows for the column member first_name, last_name
67
ZOOKEPER FRAMEWORK
ZooKeeper is a distributed co-ordination service to manage large set of hosts. Co-
ordinating and managing a service in a distributed environment is a complicated
process. ZooKeeper solves this issue with its simple architecture and API.
ZooKeeper allows developers to focus on core application logic without worrying
about the distributed nature of the application.
The ZooKeeper framework was originally built at “Yahoo!” for accessing their
applications in an easy and robust manner. Later, Apache ZooKeeper became a
standard for organized service used by Hadoop, HBase, and other distributed
frameworks. For example, Apache HBase uses ZooKeeper to track the status of
distributed data. This tutorial explains the basics of ZooKeeper, how to install and
deploy a ZooKeeper cluster in a distributed environment, and finally concludes
with a few examples using Java programming and sample applications.
Cluster management - Joining / leaving of a node in a cluster and node status at real
time.
Locking and synchronization service - Locking the data while modifying it. This
mechanism helps you in automatic fail recovery while connecting other distributed
68
applications like Apache HBase.
Highly reliable data registry - Availability of data even when one or a few nodes
are down.
Distributed applications offer a lot of benefits, but they throw a few complex and
hard-to-crack challenges as well. ZooKeeper framework provides a complete
mechanism to overcome all the challenges. Race condition and deadlock are
handled using fail-safe synchronization approach. Another main drawback is
inconsistency of data, which ZooKeeper resolves with atomicity.
Benefits of ZooKeeper
Ordered Messages
3.4 CONCLUSION
Big data with predictive analytics, high performance computing systems, machine
other strategies have been used in the past and will continue to be used heavily i
computational physics. By using these big data-related systems, engineers and scien
able to more easily design cars, airplanes, and other vehicles. They have also bee
accurately predict daily weather as well as natural disasters. Big data analytics has af
of computational physics almost since computational physics was created. Comput
with big data will continue to improve the quality of everyday life even though there
challenges. To overcome With the advent of Hadoop 2.0—the new release of Had
Yet Another Resource Negotiator (YARN)—the beyond–Map-Reduce (MR) thin
solidified. As is explained in this chapter, Hadoop YARN separates the resource s
69
from the MR paradigm. It should be noted that in Hadoop 1.0, the first-generatio
scheduling was tied with the MR paradigm—implying that the only processing that w
Hadoop Distributed File System (HDFS) data was the MR type or its orchestrations.
70
CHAPTER-4
CONCLUSION
Embedded systems are found in every field of both engineering and science. To meet
the demands of these applications, the designer faces a lot of challenges in terms of:
processor selection, IDE selection, and also different I/O components.
These systems not only provide a mechanism for selection but also allow
designers to compare different components based on their applicability. This system
also allows designers to study the various existing embedded systems; their
characteristics and design issues, and their applications. Thus, this system acts as a
pre-design tool for embedded system designers, where planning of design and
development strategies can be done easily and efficiently.
Embedded Systems are used in a wide range of applications, and it is the task
of a designer to select a suitable processor from the vast list of processors, ranging
from 4 bit to 64 bit with various architectures. Embedded systems performance is
mostly dependent on the type of processor being used. Each processor is characterized
by a set of parameters and there are almost infinite alternatives which are available for
a designer to select the right or suitable processor which is a multidimensional search
problem. It is efficient because we have provided the weights and percentage of
71
accuracy to the designer, to specify the requirements and application characteristics,
which are considered in the selection. These can be altered as per the specific needs of
the project. It has user friendly GUI, through which the designer can alter the
specifications, and specify the new requirements for selection of these components for
a given application.
Today's embedded systems developers play a vital role in selecting the right
tool for development because there are a large number of IDEs that are available in
the market, with various features ranging from simple tool chain to complex tool
chain. In this work we have presented the common tool chain used for the embedded
systems development, and the selection criteria and evaluation criteria of IDEs.
Finally, we have presented the performance metrics of the selected IDEs with four
different applications. These results are achieved out with commercially available
IDEs which are widely available in the market.
Over the years, Arduino has gone out to become a huge success and a
common name among students. With Google deploying it, people’s imagination has
went out to much higher level than before. A developer in the annual GOOGLE IO
conference said “when Arduino and Android coming together, this really proves
“INFINITY EXISTS” in the future”. I think a study on Arduino and practical
experiments on Arduino must be added for UG courses of engineering, to help
students to leverage their talents, and imagination.
72
CHAPTER-5
REFERENCES
References
• http://en.wikipedia.org/wiki/Arduino -wikipedia
73