Module Description

Enviado por

Vasudevan

0% acharam este documento útil (0 voto)

20 visualizações3 páginas

Something to read so that i can download and its about data skew

Direitos autorais

Formatos disponíveis

DOCX, PDF, TXT ou leia online no Scribd

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Denunciar este documento

Something to read so that i can download and its about data skew

Direitos autorais:

Formatos disponíveis

Baixe no formato DOCX, PDF, TXT ou leia online no Scribd

Sinalizar o conteúdo como inadequado

0% acharam este documento útil (0 voto)

20 visualizações3 páginas

Module Description

Enviado por

Vasudevan

Something to read so that i can download and its about data skew

Direitos autorais:

Formatos disponíveis

Baixe no formato DOCX, PDF, TXT ou leia online no Scribd

Sinalizar o conteúdo como inadequado

Pular para a página

Você está na página 1de 3

Pesquisar no documento

CHAPTER

MODULES

6.1 MODULE DESCRIPTION

1. Data Skew and its problem

2. Web Crawling using Hadoop

3. Data Analysis by Sampling and Data Skew Mitigation

4. Query Processing using MapReduce and Block Chain

Data Skew and its problem:

The imbalance in the amount of data assigned to each task

causes some tasks to take much longer to finish than others and can
significantly impact performance. This paper presents LIBRA, a
lightweight strategy to solve the data skew problem for reduce-side
applications in Map Reduce. In Traditional Hadoop clusters the data is
divided into partitions on a static way that each slave node will be
allocated with equal size of data which might lead to the Data Skew
problem. As all slave nodes are not equally capable and also the
complexity of data will not also be similar some slave nodes may
process the task assigned for a longer time than the other slave nodes in
the cluster. Master node will look up for the completion of all Map
Tasks and in this scenario will wait for the long running task on the
slave node to complete. This time lag should be eliminated so that the
master to through the overall output.
Web Crawling using Hadoop:

Web crawling is the fetching of data Resources residing in any of

the web server with or without the knowledge of the Resource provider.
Generally the Resource servers will incur high traffic and load while
web crawling is done and it can be detected by traditional DOS(Denial
of Service) detection schemes when implemented through a single
server. So we implement an efficient way to crawl resources residing
any server through our Hadoop cluster in which the traditional DOS
detection methods could fail to detect the attack. All the Resources
crawled will be stored for future use. Web crawling jobs include PDF
Web crawling stored as PDF format and medical question and answers
web crawling stored as CSV (Comma Separated Values) format.

Data Analysis by Sampling and Data Skew Mitigation:

Since data skew is difficult to solve if the input distribution

is unknown, a natural thought is to examine the data before deciding the
partition. There are two common ways to do this. One approach is to
launch some pre-run jobs which examine the data, collect the
distribution statistics, and then decide an appropriate partition. The
drawback of this approach is that the real job cannot start until those
pre-run jobs finish. The Hadoop range partitioned belongs to this
category; it can increase the job execution time significantly. The other
approach is to integrate the sampling into the normal map process and
generate the distribution statistics after all map tasks finish. Since
reduce tasks cannot start until the partition decision is made, this
approach cannot take advantage of parallel processing between the map
and the reduce phases. We take a different approach that the data
assigned earlier depending upon the capacity of machines is
dynamically split based on the complexity, job execution rate and
assigned to other nodes. We show the completion of jobs of slave nodes
in a optimal manner.

Query Processing using MapReduce and Block Chain:

The Query processing can be done using MapReduce

Framework of Hadoop system and the results are rendered to the client
page. Multiple Reduce tasks are done on the map task to give the
reduced object and the results can be rendered as and when user needs.
The response time is calculated for Querying with MapReduce. We also
implement the same Querying with our own approach Block Chain
Algorithm which stores the Map task output locally on the slave
machine and uses cache based rendering of results. The Response time
is compared with the Map Reduce response time and is up to the mark.
This can show more effectiveness when used with large data with small
cluster setup.

Você também pode gostar

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
No Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Nota: 4 de 5 estrelas
4/5 (5794)
Letter of Authorization
Documento1 página
Letter of Authorization
Ramanuja Suri
Ainda não há avaliações
Shoe Dog: A Memoir by the Creator of Nike
No Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Nota: 4.5 de 5 estrelas
4.5/5 (537)
Morning
Documento1 página
Morning
Vasudevan
Ainda não há avaliações
Yes Please
No Everand
Yes Please
Amy Poehler
Nota: 4 de 5 estrelas
4/5 (1891)
AAI 24042016morning PDF
Documento22 páginas
AAI 24042016morning PDF
john carter
Ainda não há avaliações
The Yellow House: A Memoir (2019 National Book Award Winner)
No Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Nota: 4 de 5 estrelas
4/5 (98)
Readme
Documento1 página
Readme
Vasudevan
Ainda não há avaliações
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
No Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Nota: 4 de 5 estrelas
4/5 (895)
Centre For Industrial Consultancy and Sponsored Research
Documento4 páginas
Centre For Industrial Consultancy and Sponsored Research
Vasudevan
Ainda não há avaliações
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
No Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Nota: 4.5 de 5 estrelas
4.5/5 (344)
03-Kun Peng
Documento5 páginas
03-Kun Peng
Vasudevan
Ainda não há avaliações
The Little Book of Hygge: Danish Secrets to Happy Living
No Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Nota: 3.5 de 5 estrelas
3.5/5 (399)
Unit 1 PDF
Documento6 páginas
Unit 1 PDF
Vasudevan
Ainda não há avaliações
Grit: The Power of Passion and Perseverance
No Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Nota: 4 de 5 estrelas
4/5 (588)
Dumbstruck
Documento1 página
Dumbstruck
Vasudevan
Ainda não há avaliações
The Emperor of All Maladies: A Biography of Cancer
No Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Nota: 4.5 de 5 estrelas
4.5/5 (271)
Compiler Questions
Documento27 páginas
Compiler Questions
selvikaruna
Ainda não há avaliações
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
No Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Nota: 4.5 de 5 estrelas
4.5/5 (266)
(ENG) C&T Catalog Hydrelio® Technology 2021
Documento24 páginas
(ENG) C&T Catalog Hydrelio® Technology 2021
Reen
Ainda não há avaliações
Never Split the Difference: Negotiating As If Your Life Depended On It
No Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Nota: 4.5 de 5 estrelas
4.5/5 (838)
English For Information Technology TEST 1 (Time: 90 Minutes)
Documento30 páginas
English For Information Technology TEST 1 (Time: 90 Minutes)
Lộc Ngô Xuân
Ainda não há avaliações
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
No Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Nota: 3.5 de 5 estrelas
3.5/5 (231)
Liquidlevel
Documento24 páginas
Liquidlevel
Anu Raj Anu Raj
Ainda não há avaliações
Principles: Life and Work
No Everand
Principles: Life and Work
Ray Dalio
Nota: 4 de 5 estrelas
4/5 (599)
FSP 851
Documento3 páginas
FSP 851
RICHI
Ainda não há avaliações
On Fire: The (Burning) Case for a Green New Deal
No Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Nota: 4 de 5 estrelas
4/5 (73)
Pawan Kumar Dubey: Profile
Documento4 páginas
Pawan Kumar Dubey: Profile
pawandubey9
Ainda não há avaliações
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
No Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Nota: 4.5 de 5 estrelas
4.5/5 (474)
Jobs Movie Review
Documento2 páginas
Jobs Movie Review
Saleh Rehman
Ainda não há avaliações
Team of Rivals: The Political Genius of Abraham Lincoln
No Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Nota: 4.5 de 5 estrelas
4.5/5 (234)
Optical Fiber Communication 06EC7 2: Citstudents - in
Documento4 páginas
Optical Fiber Communication 06EC7 2: Citstudents - in
Shailaja Udtewar
Ainda não há avaliações
The World Is Flat 3.0: A Brief History of the Twenty-first Century
No Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Nota: 3.5 de 5 estrelas
3.5/5 (2259)
X MGR Direct Fastening Technology Manual DFTM 2018 Product Page Technical Information ASSET DOC 2597885
Documento4 páginas
X MGR Direct Fastening Technology Manual DFTM 2018 Product Page Technical Information ASSET DOC 2597885
arnold
Ainda não há avaliações
Angela's Ashes: A Memoir
No Everand
Angela's Ashes: A Memoir
Frank McCourt
Nota: 4.5 de 5 estrelas
4.5/5 (440)
Tutorial 2 What Is The Output of The Below Program?
Documento2 páginas
Tutorial 2 What Is The Output of The Below Program?
Sunitha Babu
Ainda não há avaliações
Rise of ISIS: A Threat We Can't Ignore
No Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Nota: 3.5 de 5 estrelas
3.5/5 (137)
Ubnfilel
Documento3 páginas
Ubnfilel
Jorge Lorenzo
Ainda não há avaliações
Steve Jobs
No Everand
Steve Jobs
Walter Isaacson
Nota: 4.5 de 5 estrelas
4.5/5 (806)
Sao Paulo Brand Book
Documento104 páginas
Sao Paulo Brand Book
nicoagudelo82
Ainda não há avaliações
Fear: Trump in the White House
No Everand
Fear: Trump in the White House
Bob Woodward
Nota: 3.5 de 5 estrelas
3.5/5 (738)
Drilling Data Handbook 9th Edition
Documento4 páginas
Drilling Data Handbook 9th Edition
dedete
50% (2)
The Unwinding: An Inner History of the New America
No Everand
The Unwinding: An Inner History of the New America
George Packer
Nota: 4 de 5 estrelas
4/5 (45)
Toyota 5L Terminales de Ecm
Documento9 páginas
Toyota 5L Terminales de Ecm
Alfred Nayb Cañoli Ildefonso
0% (1)
Bad Feminist: Essays
No Everand
Bad Feminist: Essays
Roxane Gay
Nota: 4 de 5 estrelas
4/5 (1015)
Pump Installation Report: 30 Damascus Road, Suite 115 Bedford, Nova Scotia B4A 0C1
Documento1 página
Pump Installation Report: 30 Damascus Road, Suite 115 Bedford, Nova Scotia B4A 0C1
Yosif Babeker
Ainda não há avaliações
John Adams
No Everand
John Adams
David McCullough
Nota: 4.5 de 5 estrelas
4.5/5 (2409)
4wd System
Documento31 páginas
4wd System
Manuales Transmisiones Automaticas
Ainda não há avaliações
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
No Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
Nota: 4 de 5 estrelas
4/5 (1090)
DIS2116 Manual
Documento88 páginas
DIS2116 Manual
Hưng Tự Động Hoá
Ainda não há avaliações
The Glass Castle: A Memoir
No Everand
The Glass Castle: A Memoir
Jeannette Walls
Nota: 4.5 de 5 estrelas
4.5/5 (1712)
Municipality of Aloguinsan
Documento5 páginas
Municipality of Aloguinsan
Lady Mae Brigoli
Ainda não há avaliações
The Light Between Oceans: A Novel
No Everand
The Light Between Oceans: A Novel
M.L. Stedman
Nota: 4.5 de 5 estrelas
4.5/5 (789)
Barrett Light 50
Documento23 páginas
Barrett Light 50
Zayd Iskandar Dzolkarnain Al-Hadrami
Ainda não há avaliações
The Outsider: A Novel
No Everand
The Outsider: A Novel
Stephen King
Nota: 4 de 5 estrelas
4/5 (1839)
Btech Trainings Guide
Documento26 páginas
Btech Trainings Guide
Alfian Pamungkas Sakawiguna
Ainda não há avaliações
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
No Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Nota: 4.5 de 5 estrelas
4.5/5 (120)
FORCE Catalogue
Documento85 páginas
FORCE Catalogue
MaterialsOne
100% (1)
The Woman in Cabin 10
No Everand
The Woman in Cabin 10
Ruth Ware
Nota: 3.5 de 5 estrelas
3.5/5 (2322)
Solaris 10 Boot Process
Documento14 páginas
Solaris 10 Boot Process
tejasn1000
Ainda não há avaliações
Brooklyn: A Novel
No Everand
Brooklyn: A Novel
Colm Tóibín
Nota: 3.5 de 5 estrelas
3.5/5 (1937)
Electric Fan Repair
Documento12 páginas
Electric Fan Repair
Ysabelle Tagaruma
33% (3)
A Man Called Ove: A Novel
No Everand
A Man Called Ove: A Novel
Fredrik Backman
Nota: 4.5 de 5 estrelas
4.5/5 (4609)
Sheet 3
Documento5 páginas
Sheet 3
Yasmin Reda
Ainda não há avaliações
The Perks of Being a Wallflower
No Everand
The Perks of Being a Wallflower
Stephen Chbosky
Nota: 4.5 de 5 estrelas
4.5/5 (2101)
2SK3377
Documento5 páginas
2SK3377
Jheremy Sebastian Torres
Ainda não há avaliações
Wolf Hall: A Novel
No Everand
Wolf Hall: A Novel
Hilary Mantel
Nota: 4 de 5 estrelas
4/5 (3811)
HS GB 3109 PQ 12 Drive Electronics
Documento7 páginas
HS GB 3109 PQ 12 Drive Electronics
phankhoa83
Ainda não há avaliações
Little Women
No Everand
Little Women
Louisa May Alcott
Nota: 4 de 5 estrelas
4/5 (104)
4 10 59 08 Juni 2023
Documento5 páginas
4 10 59 08 Juni 2023
ihor.vezhnin
Ainda não há avaliações
A Tree Grows in Brooklyn
No Everand
A Tree Grows in Brooklyn
Betty Smith
Nota: 4.5 de 5 estrelas
4.5/5 (1929)
Manual Yamaha Clavinova P100
Documento69 páginas
Manual Yamaha Clavinova P100
Nelon Rocha
100% (1)
Manhattan Beach: A Novel
No Everand
Manhattan Beach: A Novel
Jennifer Egan
Nota: 3.5 de 5 estrelas
3.5/5 (792)
Materi 1 - Analisis Sinyal
Documento17 páginas
Materi 1 - Analisis Sinyal
Usman
Ainda não há avaliações
The Art of Racing in the Rain: A Novel
No Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Nota: 4 de 5 estrelas
4/5 (4200)
Managerial Communication - (MC) - SEM I-GTU
Documento93 páginas
Managerial Communication - (MC) - SEM I-GTU
keyur
Ainda não há avaliações
Sing, Unburied, Sing: A Novel
No Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Nota: 4 de 5 estrelas
4/5 (1103)
19A Time Collection Device Integration Activity Guide PDF
Documento78 páginas
19A Time Collection Device Integration Activity Guide PDF
Aliya Amar
Ainda não há avaliações
Her Body and Other Parties: Stories
No Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Nota: 4 de 5 estrelas
4/5 (821)
The Constant Gardener: A Novel
No Everand
The Constant Gardener: A Novel
John le Carré
Nota: 3.5 de 5 estrelas
3.5/5 (104)