Você está na página 1de 30

A SYSTEM TO FILTER UNWANTED MESSAGES FROM OSN USER WALLS

INTRODUCTION: Online Social Networks (OSNs) are today one of the most popular interactive medium to communicate, share, and disseminate a considerable amount of human life information. Daily and continuous communications imply the exchange of several types of content, including free text, image, audio, and video data. In the sharing In OSNs, information filtering can also be used for a different, more sensitive, purpose. This is due to the fact that in OSNs there is the possibility of posting or commenting other posts on particular public/private areas, called in general walls. Information filtering can therefore be used to give users the ability to automatically control the messages written on their own walls, by filtering out unwanted messages.

SCOPE OF THE PROJECT: The OSN have three layer there are graphical user interface, social network application and social network managers. The social network managers handle the basic functionalities like profile management, network based function etc. But in this project focused on other two layers and apply some new condition. Application layer have short text classifier and content based message filtering. Short text classifier classifying the messages based on the content. Content based message filter have black list and filtering policies. First, find relationship between the user and message senders and it will filter and calculate the probabilities using classifier. And the send a empty message below the probabilities result to the user. So our proposed system will give the direct control to the user that what kind of messages displays on their wall.

LITERATURE SURVEY:

Title: BoosTexter: A Boosting-based System for Text Categorization


Author: Robert E Schapire,Yoram Singer Year: 2000 Description: We use adopt a different approach in which we use two extensions of AdaBoost that were
specifically intended for multiclass, multi-label data. In the first extension, the goal of the learning algorithm is to predict all and only all of the correct labels. Thus, the learned classifier is evaluated in terms of its ability to predict a good approximation of the set of labels associated with a given document. In the second extension, the goal is to design a classifier that ranks the labels so that the correct labels will receive the highest ranks. We next describe BoosTexter, a system which embodies four versions of boosting based on these extensions, and we discuss the implementation issues that arise in multilabel text categorization.

Title:A Comparison of Classifiers and Document Representations for the Routing Problem
Author: H. Schutze, D.A. Hull,J.O. Pedersen YEAR:1995 Description: We compare two approaches to document routing, relevance feedback via query expansion and statistical classification with error minimization. We show that advanced classification algorithms perform 10-15% better than relevance feedback on the Tipster document collection. Since learning algorithms based on error minimization and numerical optimization are computationally intensive and prone to over fitting in a high dimensional feature space, it is necessary to apply some method of dimensionality reduction. We examine two different approaches, latent semantic indexing and feature selection of terms using a x2 -test of non independence.

Title: Content-Based Book Recommending Using Learning for Text Categorization


Author: Raymond J. Mooney, Loriene Roy Year: 1999 Description: Recommender systems improve access to relevant products and information by making personalized suggestions based on previous examples of a user's likes and dislikes. Most existing recommender systems use social filtering methods that base recommendations on other users' preferences. By contrast, content-based methods use information about an item itself to

make suggestions. This approach has the advantage of being able to recommended previously unrated items to users with unique interests and to provide explanations for its recommendations. We describe a content-based book recommending system that utilizes information extractionand a machine-learning algorithm for text categorization. Initial experimental results demonstrate that this approach can produce accurate recommendations. These experiments are based on ratings from random samplings of items and we discuss problems with previous experiments that employ skewed samples of user-selected examples to evaluate performance.

Title: Content-based Filtering in On-line Social Networks


Author: M. Vanetti, E. Binaghi, B. Carminati, M. Carullo and E. Ferrari Year: 2010 Description: This work is the first step of a wider project. The early encouraging results we have obtained on the classification procedure prompt us to continue with other work that will aim to improve the quality of classification. Additionally, we plan to enhance our filtering rule system, with a more sophisticated approach to manage those messages caught just for the tolerance and to decide when a user should be inserted into a BL. For instance, the system can automatically take a decision about the messages blocked because of the tolerance, on the basis of some statistical data (e.g., number of blocked messages from the same author, number of times the creator has been inserted in the BL) as well as data on creator profile (e.g., relationships with the wall owner, age, sex). Further, we plan to test the robustness of our system against different adversary models. The development of a GUI to make easier BL and filtering rule specification is also a direction we plan to investigate.

Title: Inductive Learning Algorithms and Representations for Text Categorization


Author: Susan Dumais, John Platt, David Heckerman, Mehran Sahami Year: 1998 Description: Here, we describe results from experiments using a collection of hand-tagged financial newswire stories from Reuters. We use supervised learning methods to build our classifiers, and evaluate the resulting models on new test cases. The focus of our work has been on comparing the effectiveness of different inductive learning algorithms (Find Similar, Nave Bayes, Bayesian Networks, Decision Trees, and Support Vector Machines) in terms of learning

speed, real-time classification speed, and classification accuracy. We also explored alternative document representations (words vs. syntactic phrases, and binary vs. non-binary features), and training set size.

Title: Learning and Revising User Profiles: The Identification of Interesting Web Sites
Author: M.J. Pazzani and D. Billsus Year: 1997 Description: We discuss algorithms for learning and revising user profiles that can determine which World Wide Web sites on a given topic would be interesting to a user. We describe the use of a naive Bayesian classifier for this task, and demonstrate that it can incrementally learn profiles from user feedback on the interestingness of Web sites. Furthermore, the Bayesian classifier may easily be extended to revise user provided profiles. In an experimental evaluation we compare the Bayesian classifier to computationally more intensive alternatives, and show that it performs at least as well as these approaches throughout a range of different domains. In addition, we empirically analyze the effects of providing the classifier with background knowledge in form of user defined profiles and examine the use of lexical knowledge for feature selection. We find that both approaches can substantially increase the prediction accuracy.

Title: Towards the Next Generation of Recommender Systems: A Survey of the State-of-theArt and Possible Extensions Author: Gediminas Adomavicius and Alexander Tuzhilin Year: 2001 Description: The paper presents an overview of the field of recommender systems and describes the current generation of recommendation methods that are usually classified into the following three main categories: content-based, collaborative, and hybrid recommendation approaches. The paper also describes various limitations of current recommendation methods and discusses possible extensions that can improve recommendation capabilities and make recommender systems applicable to an even broader range of applications. These extensions include, among others, improvement of understanding of users and items, incorporation of the contextual information into the recommendation process, support for multi-criteria ratings, and provision of more flexible and less intrusive types of recommendations.

Title:Text Categorization with Support Vector Machines: Learning with Many Relevant
Features Author: T. Joachims Year: 1998 Description: They are introduces support vector machines for text categorization. It provides both theoretical and empirical evidence that SVMs are very well suited for text categorization. The theoretical analysis concludes that SVMs acknowledge the particular properties of text In that experimental results show that SVMs consistently achieve good performance on text on text categorization tasks, outperforming existing methods substantially and significantly. With their ability to generalized well in high dimensional feature spaces, SVMs eliminate the need for feature selection, making the application of text categorization considerably easier. Another advantage of SVMs over the conventional methods in their robustness. Modules: Authentication Profile Generation Accept Friends Request Send Request Share Photos Post Comment or Text Block Unwanted Comments

Module Description and Module Diagram: Authentication (login /Registration)

Database

User

Allow to Access Page

In this module User want to register the personal details in the database and get the authentication processes to go forward. In this module User want to give the database to admin all the registration process is done by a admin. After the registration process completed User can get the authentication permission, by using username and password login website. If the user

enters a valid username/password combination they will be granted to access data. If the user enter invalid username and password that user will be considered as unauthorized user and denied access to that user. Profile Generation

User

Create Profile

Data Base

In this module user make our profile that details store in database the profile contains name, contact no, and email address, photos, and other information. Logged users can see their details and if they wish to change any of their information they can edit it. Accept Friends Request

User

New Friends

Data Base

View Friends In this module user add new friends and view our friends and details. Logged users can see their friend list and if they wish to add friends Send Request User

Select friend

Data Base

Send request

In this module user select friend to send request. logged user view request accept our friend request. Share Photos

User

Add New Photos

Publish

View Photos

Data Base

In this module user get the photo from the database and add new photo and publish on their wall.

Post comments

Share Photo

Known or Unknown person

Post Comments

In this module user post any photo in public wall, any one post a comments for that photo. Unknown persons also post comments but we dont know the character about the member in case out of group.

Block the unwanted comments

Post Comments

Calculate Probability based on content of comments

Display the probability result on user private wall

Data Base In this module we have to calculate the probability of the message contents. That result will be display on the user private wall with two options like accept and reject Input Output Design: Authentication (login /Registration) Input: Register the user Details and give the username and Password to login Output: They will be granted to access the datas Profile Input: user enter our information like email id, contact no, and other information Output: create profile and store information into database Friends Input: user selects our friend to add Output: add friend in our profile Send Request Input: user select friend to request Output: Send request Post Photos Input: User post some photos on their wall. Output: Display the photos. Post Comments Input: Known or unknown persons are send a comment. Output: The comments will display.

Block the unwanted comments Input: The comments compare to the database data and calculate the probability. Output: The probability results display on user private wall so we have to control directly.

TECHNIQUE USED OR ALGORITHM USED Machine Learning (ML) Text Categorization Techniques Step 1: SHORT TEXT CLASSIFIER i)Text Representation we consider three types of features, BoW, Document properties (Dp) andContextual Features (CF). they are entirely derived from the information contained within the text of the message. Text representation using endogenous knowledge has a good general applicability; however, in operational settings, it is legitimate to use also exogenous knowledge, i.e., any source of information outside the message body but directly or indirectly related to the message itself.

ii) Machine Learning-Based Classification We address short text categorization as a hierarchical twolevel classification process. The firstlevel classifier performs a binary hard categorization that labels messages as Neutral and Nonneutral. The first-level filtering task facilitates thesubsequent second-level task in which a finer-grained classification is performed. The second-level classifier performs a soft-partition of Nonneutral messages assigning a given message a gradual membership to each of the nonneutral classes.

iii) Radial Basis Function Networks (RBFN) RFBNs have a single hidden layer of processing units with local, restricted activation domain: a Gaussian function is commonly used, but any other locally tunable function can be used. They were introduced as a neural network evolution of exact interpolation, and are demonstrated to have the universal approximation property.

Step 2: FILTERING RULES AND BLACKLIST MANAGEMENT i)Filtering Rules: a) Creator specification This implies to state conditions on type, depth, and trust values of the relationship( s) creators should be involved in order to apply themthe specified rules. A creator specification creatorSpec implicitly denotes a set of OSN users A set of attribute constraints of the form an OP av,where an is a user profile attribute name, av and OP are, respectively, a profile attribute value and a comparison operator, compatible with ans domain. A set of relationship constraints of the form (m, rt, minDepth, maxTrust) denoting all the OSN users participating with user m in a relationship of type rt, having a depth greater than or equal to minDepth, and a trust value less than or equal to maxTrust. b)Filtering Rule author is the user who specifies the rule; creatorSpec is a creator specification, specified according to Definition 1; contentSpec is a Boolean expression defined on content constraints of the form (C, ml) where C is a class of the first or second level and ml is the minimum membership level threshold required for class C to make the constraint satisfied; action { fblock notify} denotes the action to be performed by the system on the messages matching contentSpec and created by users identified by creatorSpec. ii) Blacklists A BL rule is a tuple (author, creatorSpec, creatorBehavior, T), where author is the OSN user who specifies the rule, i.e., the wall owner; creatorSpec is a creator specification, specified according to Definition 1; creatorBehavior consists of two components RFBlocked and minBanned. T denotes the time period the users identified by creatorSpec and creatorBehavior have to be banned from author wall.

HARDWARE AND SOFTWARE REQUIREMENTS HARDWARE REQUIREMENTS: Processor Hard disk Mouse RAM Keyboard : Pentium Dual Core 2.00GHZ : 40 GB : Logitech. : 2GB(minimum) : 110 keys enhanced.

SOFTWARE REQUIREMENTS: Operating system IDE Technology Coding Language Backend : Windows7 : Microsoft Visual Studio .Net 2010 : Asp.Net : C# : SQL Server 2008

SYSTEM DESIGN In our project user make our profile that details store in database. Logged users can see their details and if they wish to change any of their information they can edit it.user add new friends and view our friends and details. Logged users can see their friend list and if they wish to add friends user add new friends and view our friends and details. Logged users can see their friend list and if they wish to add friends user post new message and view our friend message. Logged users post a photo on their wall at the time known or unknown persons are post comments about that photo. Sometimes they are sending unwanted messages like vulgar, politics, Violence etc. So we have to calculate the probability based on the comment contents. And those results will send to user private wall. Based on the result user will take the decision.

USE CASE DIAGRAM & EXPLANATION A use case diagram is a type of behavioral diagram created from a Use-case analysis. The purpose of use case is to present overview of the functionality provided by the system in terms of actors, their goals and any dependencies between those use cases.

In the below diagram eleven use cases are depicted. They are used to search result using CST methods.

Register/Login

Profile OSN User

Accept the Request

Send request OSN Managers

Post Photos and display Known Person Comments

Unknown Person Calculate the Probability

CLASS DIAGRAM & EXPLANATION

A class diagram in the UML, is a type of static structure diagram that describes the structure of a system by showing the systems classes, their attributes, and the relationships between the classes. Private visibility hides information from anything outside the class partition. Public visibility allows all other classes to view the marked information. Protected visibility allows child classes to access information they inherited from a parent class.

OBJECT DIAGRAM & EXPLANATION

An object diagram in the Unified Modeling Language (UML) is a diagram that shows a complete or partial view of the structure of a modeled system at a specific time. An Object diagram focuses on some particular set of object instances and attributes, and the links between the instances. A correlated set of object diagrams provides insight into how an arbitrary view of a system is expected to evolve over time. Object diagrams are more concrete than class diagrams, and are often used to provide examples, or act as test cases for the class diagrams. Only those aspects of a model that are of current interest need be shown on an object diagram. User Login Name=***** Password=***** * Profile Category=***** Property=***** Photos Add photos=****

Photos Add photos=****

Friends Characters=**** Book Informations=****

Message Post message=**** Probability=****

STATE DIAGRAM & EXPLANATION A state diagram is a type of diagram used in computer science and related fields to describe the behavior of systems. State diagrams require that the system described is composed of a finite number of states; sometimes, this is indeed the case, while at other times this is a reasonable abstraction. There are many forms of state diagrams, which differ slightly and have different semantics.

Login

User Known/Unknown Person Send Request

Accept the Request

Share the Photo

Post Comments

Block the unwanted comments

ACTIVITY DIAGRAM & EXPLANATION Activity diagram are a loosely defined diagram to show workflows of stepwise activities and actions, with support for choice, iteration and concurrency. UML, activity diagrams can be used to describe the business and operational step-by-step workflows of components in a system. UML activity diagrams could potentially model the internal logic of a complex operation. In many ways UML activity diagrams are the object-oriented equivalent of flow charts and data flow diagrams(DFDs)from structural development.

User

Known/Unknown Persons

Login

Send Request

Accept the Requests

Share Photos/Messages

Post the comments

SEQUENCE DIAGRAM & EXPLANATION A sequence diagram in UML is a kind of interaction diagram that shows how processes operate with one another and in what order.

It is a construct of a message sequence chart. Sequence diagrams are sometimes called Event-trace diagrams, event scenarios, and timing diagrams.

Login

User

Known/Unknow n Person

Public Wall

Firewall

Login

Login

Send request

Acccept the requests

Share Photos

Post Comments

Classification based on the content

Calculate the Probability based on the comments content

Direct Control Accept/Reject

COLLABORATION DIAGRAM & EXPLANATION A collaboration diagram show the objects and relationships involved in an interaction, and the sequence of messages exchanged among the objects during the interaction. The collaboration diagram can be a decomposition of a class, class diagram, or part of a class diagram. it can be the decomposition of a use case, use case diagram, or part of a use case diagram. The collaboration diagram shows messages being sent between classes and object (instances). A diagram is created for each system operation that relates to the current development cycle(iteration).

Login 1: Login 3: Send request 2: Login Known/Unkno wn Person 4: Acccept the requests

User

5: Share Photos 9: Direct Control Accept/Reject 8: Calculate the Probability based on the comments content 6: Post Comments 7: Classification based on the content Public Wall Firewall

COMPONENT DIAGRAM & EXPLANATION

A collaboration diagram show the objects and relationships involved in an interaction, and the sequence of messages exchanged among the objects during the interaction. The collaboration diagram can be a decomposition of a class, class diagram, or part of a class diagram. It can be the decomposition of a use case, use case diagram, or part of a use case diagram. The collaboration diagram shows messages being sent between classes and object (instances). A diagram is created for each system operation that relates to the current development cycle (iteration).

Login

User

Known/Unknown Person

Black List

FireWall

Calculate Probability

Classifier

DATA FLOW DIAGRAM & EXPLANATION A data flow diagram(DFD) is a graphical representation of the flow of data through an information system. It differs from the flowchart as it shows the data flow instead of the control flow of the program. A data flow diagram can also be used for the visualization of data processing. The DFD is designed to show how a system is divided into smaller portions and to highlight the flow of data between those parts. Level 0:

Level 1:

Level 2:

Level 3:

Level 4:

All Level Diagram:

E-R DIAGRAM & EXPLANATION

In software engineering, an entity-relationship model (ERM) is an abstract and conceptual representation of data. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or semantic data model of a system, often a relational database, and its requirements in a top-down fashion. Diagrams created by this process are called entity-relationship diagrams, ER diagrams, or ERDs.

Password

Username
Black List

Email ID

Users

Having Account

Filter Wall

Classifier

Profile

Address

Known/Unknown Persons Username


Password

Profile

SYSTEM ARCHITECTURE & EXPLANATION

The OSN have three layers, there are graphical user interface, social network application and social network managers. The social network managers handle the basic functionalities like profile management, network based function etc. But in this project focused on other two layers

and apply some new condition. Application layer have short text classifier and content based message filtering. Short text classifier classifying the messages based on the content. Content based message filter have black list and filtering policies. First, find relationship between the user and message senders and it will filter and calculate the probabilities using classifier. And the send a empty message below the probabilities result to the user. So our proposed system will give the direct control to the user that what kind of messages displays on their wall.

Private Wall Known/Unkno wn Post Unwanted Comments

Filter Wall

Accept the friend Request Send Friend Request

Label the comments based on the content

Analysis Creator Specification

Share Files

OSN User
Probability Result

FUTURE ENHANCEMENT Future Enhancement Description We plan to extend this work in several directions. In proposed system we have black Comments temporarily based on the contents and create specification. But, in our future we have to block the permanently based on the content probability and creator specifications. Future Enhancement Module Diagram Permanent Block

Users User

Share Photo Share Photo

Black Permanently

Comments

Known/Unknown Known/Unknow Persons n Persons

Creator Specification

Data base In this module the unknown persons post comments for the user share files. So we have plan to black unknown creator comments permanently.

ADVANTAGES: Proposed system give the control to what kind of messages post on own wall. It is secure because filter wall act as Administrator. The core components of the proposed system are the Content-Based Messages Filtering (CBMF) and the Short Text Classifier modules. Rule layer adopted for filtering unwanted messages. We start by describing FRs, then we illustrate the use of BLs.

BL mechanism to avoid messages from undesired creators, independent from their contents.

APPLICATIONS

Online Social Network Application: In online social networks application every account holder post the comments for the user sharing files like messages or photos. In this process some known or unknown persons post unwanted comments like politics, vulgar, violence and etc. So users have direct control which kind of messages post on their wall. In this concept we have to implement all kind of online social networks like facebook, twitter, orkut etc.

CONCLUSION: We have presented system direct control to the user block unwanted messages on their social network wall. The system using the machine language soft classifier to label the contents is Neutral and Nonneutral. And then applying the Filter Rule based on the creators. Moreover, the flexibility of the system in terms of filtering options is enhanced through the management of BLs.

REFERENCE OR BIBLIOGRAPHY 1.M. Chau and H. Chen, A Machine Learning Approach to Web Page Filtering Using Content and Structure Analysis, Decision Support Systems, vol. 44, no. 2, pp. 482-494, 2008. 2. R.J. Mooney and L. Roy, Content-Based Book Recommending Using Learning for Text Categorization, Proc. Fifth ACM Conf. Digital Libraries, pp. 195-204, 2000. 3.F. Sebastiani, Machine Learning in Automated Text Categorization, ACM Computing Surveys, vol. 34, no. 1, pp. 1-47, 2002. 4. M. Vanetti, E. Binaghi, B. Carminati, M. Carullo, and E. Ferrari, Content-Based Filtering in On-Line Social Networks, Proc. ECML/PKDD Workshop Privacy and Security Issues in Data Mining and Machine Learning (PSDML 10), 2010. 5. N.J. Belkin and W.B. Croft, Information Filtering and Information Retrieval: Two Sides of the Same Coin? Comm. ACM, vol. 35, no. 12, pp. 29-38, 1992. 6. P.J. Denning, Electronic Junk, Comm. ACM, vol. 25, no. 3, pp. 163-165, 1982. 7. P.W. Foltz and S.T. Dumais, Personalized Information Delivery: An Analysis of Information Filtering Methods, Comm. ACM, vol. 35, no. 12, pp. 51-60, 1992. [9] P.S. Jacobs and L.F. Rau, Scisor: Extracting Information from On- Line News, Comm. ACM, vol. 33, no. 11, pp. 88 97, 1990. 8.S. Pollock, A Rule-Based Message Filtering System, ACM Trans. Office Information Systems, vol. 6, no. 3, pp. 232-254, 1988. 9.P.E. Baclace, Competitive Agents for Information Filtering, Comm. ACM, vol. 35, no. 12, p. 50, 1992.

10. M.J. Pazzani and D. Billsus, Learning and Revising User Profiles: The Identification of Interesting Web Sites, Machine Learning, vol. 27, no. 3, pp. 313-331, 1997.

Você também pode gostar