Você está na página 1de 8

Report title: Spam Detection

Area of project work: Networking and security

Name of students: Humera Fatima Hitesh Soni

Name of Project Guide: Mrs. Shalini Gupta

College Name and Address: Atma Ram Sanatan Dharma College, Dhaula Kuan. New Delhi.

ABSTRACT

It is now a common practice for e-commerce web sites to enable their customers to write reviews of products that they have purchased. Such reviews provide valuable sources of information on these products. They are used by potential customers to find opinions of existing users before deciding to purchase a product. They are also used by product manufacturers to identify problems of their products and to find competitive intelligence information about their competitors. Unfortunately, this importance of reviews also gives good incentive for spam, which contains false positive or malicious negative opinions. In this project, we make an attempt to study review spam and spam detection. Spam detections fall into two categories: rule-based and statistical-based. The former refers to the detection which is performed by looking for spam-liked patterns in an email. Since the rules can be shared, they have been popularized quickly. The rules, however, are built manually it is hard to keep them up with the variation of spam. In this project, we have studied a decentralized privacy-preserving approach to spam ltering. Our solution exploits robust digests to identify messages that are a slight variation of one another and a structured peer-to-peer architecture between mail servers to collaboratively share knowledge about spam.

Apart from this, an evident based content trust model is also described. In this approach, we explore a set of evidences, most of them based on the content of web pages and on the basis of these evidences to decide the trust factor of web pages. Finally, a robust technique for tackling image spam is also described which also includes conventional spam filters used for text spam detection apart from usual image spam filters.

Project Synopsis On

SPAM DETECTION
In partial fulfillment of the requirements of the award for the degree of B.Sc(H)Computer Science.

Submitted by: Humera Fatima

Roll no.: 2209

ATMA RAM SANATAN DHARMA COLLEGE (DELHI UNIVERSITY) DHAULA KUAN

Project Synopsis On

SPAM DETECTION

In partial fulfillment of the requirements of the award for the degree of B.Sc(H)Computer Science.

Submitted by: Hitesh Soni Roll no.: 2210

ATMA RAM SANATAN DHARMA COLLEGE (DELHI UNIVERSITY) DHAULA KUAN

INTRODUCTION

These are some basic concepts and queries regarding spam.

What is Spam?
Spam is unwanted advertising email invading our mail-boxes. It promotes things like adult websites, amazing mortgage deals, and get-rich-quick schemes. These days almost everyone with an email account has been spammed and most people agree its annoying and timewasting to deal with. There are some essential characteristics of all spam: It is unsolicited i.e. sent without the recipients permission It promotes products or services for sale. Given this, spam is often referred to in a formal sense as unsolicited commercial email (UCE). In everyday use spam can also refer to any unwanted email such as chain letters from known senders or commercial email from retailers we have dealt with previously. Web spam is one of the major obstacles for high quality information retrieval on the web.

Why do Spammers Spam?


The reason is quite simple; they do it because theres potential money in it. Spammers send out their messages in the millions in the hope that a few people reply. Income is gained from actual products sold, or a percentage commission from products sold. Response rates and profit margins are typically low but so are the costs. Tools and mailing lists for sending spam are cheap and easy to use. The millions of messages spammers can send in a day can add up to significant revenues with even the most modest response rates. For example, with a $1 per unit profit margin, and only a 0.1% response rate, a spammer could make $10,000 by sending 10 million email messages. All a spammer needs to send spam is the following: an email address list, spamming software, an email server and a financial opportunity.

The Costs of Spam


Spam is a real problem for organizations around the globe. The volume of spam is growing, it costs corporate organizations by lowering productivity and consuming IT resources, and it represents a potential legal liability. The costs of spam for enterprises can be broken down into three components: Loss of productivity. The costs associated with users dealing with spam messages. Consumption of IT resources. The costs incurred by IT as spam consumes bandwidth, storage space, and administrators time. Help desk burden. The extra costs when annoyed users call the help desk to complain either about the volume of spam or its offensive nature.

Legal Liability
Spam also raises potential legal liability issues when it contains sexual or otherwise questionable content. This type of email is easily forwarded to people inside and outside the organization. Email is a business tool. Anything sent from a corporate email address is effectively written on electronic company letterhead. As a result, any views, quotes, or discussions made via company email can be representative of the company and legally binding. The casual use of profanity in business email (as in any other documented communication) has obvious implications for a business's reputation. Such emails have had more concrete repercussions as well. There have been several lawsuits involving sexual harassment in the workplace, based on lewd comments sent by email. In many cases the organization has been held responsible for not controlling their email content so as to avoid offensive exposure to employees.

Motivation for spamming can range from advertising and self-promotion to disruption and disparagement of competitors. Spamming is economically viable because the barrier for entry into the abused systems is generally low and because it requires virtually no operating costs

beyond the management of the automatic spamming software. In addition, it is often difficult to hold spammers accountable for their behavior. Any system that relies on user-generated content is vulnerable to spam in one form or another. Indeed, many other electronic systems that allow users to store, share, and find online resources have also come under attack from spamming attempts in recent years. Search engines, for instance, suffer increasingly from so-called spamdexing attempts with content especially created to trick search engines into giving certain pages a higher ranking for than they deserve. Spam comments are also becoming an increasingly bigger problem for websites that allow users to react to content, like blogs and video and photo sharing websites. A new trend in email spam is the emergence of image spam. Although current anti-spam technologies are quite successful in filtering text-based spam emails, the new image spams are substantially more difficult to detect, as they employ a variety of image creation and randomization algorithms. PDF (Portable Document Format) is widely used for document exchange in the business world. As such, it is a trusted format, and many native anti-spam solutions automatically whitelist all messages containing a PDF file. In fact, such is the importance and general acceptance of PDF in the business world that practically all computers in a corporate environment will have a PDF viewer installed. This makes PDF an excellent vector for spam messages.

Você também pode gostar