Escolar Documentos
Profissional Documentos
Cultura Documentos
Table of Contents TITLE 1 Introduction 2 Problem Statement 3 ER Diagram And Tables 4 Application 5 Snapshot of the application 6 Contribution of individual members of the group in the project PAGE NO.
3
6-
10
13
Introduction
Social networking is the grouping of individuals into specific groups, like small rural communities or a neighborhood subdivision. Although social networking is possible in person, especially in the workplace, universities, and high schools, it is most popular online. This is because unlike most high schools, colleges, or workplaces, the internet is filled with millions of individuals who are looking to meet other people, to gather and share first-hand information and experiences about golfing, gardening, aesthetics and cosmetic surgery, developing friendships or professional alliances, finding employment, business-to-business marketing and even groups sharing information about the end of the Mayan calendar and the Great Shift. The topics and interests are as varied and rich as our society and the history of the human being. When it comes to online social networking, websites are commonly used. These websites are known as social sites. Social networking websites function like an online community of internet users. Depending on the website in question, many of these online community members share common interests in hobbies, religion, or politics. Once you are granted access to a social networking website you can begin to socialize. This socialization may include reading the profile pages of other members and possibly even contacting them.As we all know that a coin has two sides, there are dangers associated with social networking including data theft and viruses, which are on the rise. The most prevalent danger though often involves online predators or individuals who claim to be someone that they are not. Although danger does exist with networking online, it also exists with networking out in the real world. Our project is an online Social Networking Website which used a distributed database at the back end. As data is stored in huge quantities in Social Networking using a Single server to host the database is not practical. Hence, we illustrated how distributed database is maintained and data is fetched as per requirement from multiple servers.
The application of this Project is to any project which requires the maintenance of a large database (like Social Networking, employees database, Citizens database at government sites etc.)
Background
Hardware / Equipment used: 4 Pc Machines (1 Master and 3 Slaves, located in MSCLIS 1st year Lab)
Framework / Architecture used: Apache Hadoop Apache Hadoop is an open-source software framework that supports data-intensive distributed applications. The Hadoop framework transparently provides both reliability and data motion to applications. Hadoop implements a computational paradigm named Mapreduce where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, it provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both map/reduce and the distributed file system are designed so that node failures are automatically handled by the framework.
Each node in a Hadoop instance typically has a single namenode; a cluster of datanodes form the HDFS cluster. The file system uses the TCP/IP layer for communication. Clients use Remote procedure call (RPC) to communicate between each other. A small Hadoop cluster will include a single master and multiple worker nodes. The master node consists of a JobTracker, TaskTracker, NameNode and DataNode. A slave or worker node acts as both a DataNode and TaskTracker, though it is possible to have data-only worker nodes, and compute-only worker nodes
4
Problem Statement
Data Intensive Application like Social Networking, government Databases etc come under tremendous load and often fail to satisfy requests, eg IRCTC, MySpace etc. They quickly run out Resources like space, memory, CPU cycles, when requests are made on them on a global basis. Our Approach during the Project: We wished to illustrate how distributed database frameworks can be used to serve such data intensive needs. We used the Hadoop Framework that is deployed by the likes of Facebook, Yahoo etc Design of our Project: We already had the Hadoop Framework to work with. We designed Database schemas to it and then designed a Web Interface for our SamPra Networking Website. We configured the Master and Slaves to be used to serve data requests and brought it all together.
Slave 1 <---> Master <-->Slave 2 <--> Slave 3 Configure Hadoop Master / Slaves Design Database Schema for SamPra Populate the Database in the Hadoop Storage SamPra Website Front-end
Tables
Table Name: Profile Column Family: ID Attribute Name First_Name Last_Name Email Password Phone_No Gender Birthday Address About_Me IDP Type Varchar Varchar Varchar Varchar Int Char Date Varchar Varchar Int Column Family: Work Attribute Type Size 3 2 20 80 40 2 Description Companies Involved Years of Experience Name of Company Address of Company Designation of Person No. of Work Months Total_Companies Int Total_Experience Int CName Address Designation Work_Months Varchar Varchar Varchar Int Size 20 20 25 20 10 2 40 80 6 Description The First Name of Person The Last Name of Person Email of Person Hashes of Password Selected The Mobile Number The Gender of Person The Persons Birthday The Persons Residence Person in his own words Unique Id of Person
Column Family: Education Attribute School_Name Percentage_10 JCollege_Name Type Varchar Int Varchar Size 20 2 20 Description Name of School Attended Percentage in 10th Name of Junior College Attended
Percentage_12 College_Name
Int Varchar
2 20 2
Percentage_Grad Int
Size 30 3 3 30
Description List of Friends No. of Requests Requests Pending UID of Blocked People
Size 2 2 20 3
Description Total Posts made Status of Post Fname of Person No. of Likes on Post
7
Size 6 6 6 100 3
Description Unique Id of Comment Id of Sender Id of Receiver The Actual Comment The Date of Comment The no. of Likes on Comment
Size 3 100 6 1
Description The total count of Messages The Actual Message The UID of Sender Read or Not
Management of distributed data with different levels of transparency like network transparency, fragmentation transparency, replication transparency, etc. Increase reliability and availability. Easier expansion. Reflects organizational structure database fragments are located in the departments they relate to. Protection of valuable data if there were ever a catastrophic event such as a fire, all of the data would not be in one place, but distributed in multiple locations. Improved performance data is located near the site of greatest demand, and the database systems themselves are parallelized, allowing load on the databases to be balanced among servers. (A high load on one module of the database won't affect other modules of the database in a distributed database.) Modularity systems can be modified, added and removed from the distributed database without affecting other modules (systems). Reliable transactions - Due to replication of database. Distributed query processing can improve performance. Distributed transaction management.
9
The Login Page (You can Login here or follow link to Register)
The Register Page (where first time users make a new profile)
10
11
12
Role 2: Populated the Database Tables with initial entries Role 3: Designed the developed the website Front-end using HTML and CSS and JSP
Member 2: Sameer Patil Distributed Computing Framework Deployment Role 1: Configured the Apache Hadoop framework The distributed computing framework Hadoop was deployed on 1 master and 3 slaves. The master would have following services running: HQuorumPeer HMaster NameNode JobTracker The slaves would have following services running: Regionserver Datanode TaskTracker Role 2: Coding Modules
13