Escolar Documentos
Profissional Documentos
Cultura Documentos
College Name and Address: Atma Ram Sanatan Dharma College, Dhaula Kuan. New Delhi.
ABSTRACT
It is now a common practice for e-commerce web sites to enable their customers to write reviews of products that they have purchased. Such reviews provide valuable sources of information on these products. They are used by potential customers to find opinions of existing users before deciding to purchase a product. They are also used by product manufacturers to identify problems of their products and to find competitive intelligence information about their competitors. Unfortunately, this importance of reviews also gives good incentive for spam, which contains false positive or malicious negative opinions. In this project, we make an attempt to study review spam and spam detection. Spam detections fall into two categories: rule-based and statistical-based. The former refers to the detection which is performed by looking for spam-liked patterns in an email. Since the rules can be shared, they have been popularized quickly. The rules, however, are built manually it is hard to keep them up with the variation of spam. In this project, we have studied a decentralized privacy-preserving approach to spam ltering. Our solution exploits robust digests to identify messages that are a slight variation of one another and a structured peer-to-peer architecture between mail servers to collaboratively share knowledge about spam.
Apart from this, an evident based content trust model is also described. In this approach, we explore a set of evidences, most of them based on the content of web pages and on the basis of these evidences to decide the trust factor of web pages. Finally, a robust technique for tackling image spam is also described which also includes conventional spam filters used for text spam detection apart from usual image spam filters.
Project Synopsis On
SPAM DETECTION
In partial fulfillment of the requirements of the award for the degree of B.Sc(H)Computer Science.
Project Synopsis On
SPAM DETECTION
In partial fulfillment of the requirements of the award for the degree of B.Sc(H)Computer Science.
INTRODUCTION
What is Spam?
Spam is unwanted advertising email invading our mail-boxes. It promotes things like adult websites, amazing mortgage deals, and get-rich-quick schemes. These days almost everyone with an email account has been spammed and most people agree its annoying and timewasting to deal with. There are some essential characteristics of all spam: It is unsolicited i.e. sent without the recipients permission It promotes products or services for sale. Given this, spam is often referred to in a formal sense as unsolicited commercial email (UCE). In everyday use spam can also refer to any unwanted email such as chain letters from known senders or commercial email from retailers we have dealt with previously. Web spam is one of the major obstacles for high quality information retrieval on the web.
Legal Liability
Spam also raises potential legal liability issues when it contains sexual or otherwise questionable content. This type of email is easily forwarded to people inside and outside the organization. Email is a business tool. Anything sent from a corporate email address is effectively written on electronic company letterhead. As a result, any views, quotes, or discussions made via company email can be representative of the company and legally binding. The casual use of profanity in business email (as in any other documented communication) has obvious implications for a business's reputation. Such emails have had more concrete repercussions as well. There have been several lawsuits involving sexual harassment in the workplace, based on lewd comments sent by email. In many cases the organization has been held responsible for not controlling their email content so as to avoid offensive exposure to employees.
Motivation for spamming can range from advertising and self-promotion to disruption and disparagement of competitors. Spamming is economically viable because the barrier for entry into the abused systems is generally low and because it requires virtually no operating costs
beyond the management of the automatic spamming software. In addition, it is often difficult to hold spammers accountable for their behavior. Any system that relies on user-generated content is vulnerable to spam in one form or another. Indeed, many other electronic systems that allow users to store, share, and find online resources have also come under attack from spamming attempts in recent years. Search engines, for instance, suffer increasingly from so-called spamdexing attempts with content especially created to trick search engines into giving certain pages a higher ranking for than they deserve. Spam comments are also becoming an increasingly bigger problem for websites that allow users to react to content, like blogs and video and photo sharing websites. A new trend in email spam is the emergence of image spam. Although current anti-spam technologies are quite successful in filtering text-based spam emails, the new image spams are substantially more difficult to detect, as they employ a variety of image creation and randomization algorithms. PDF (Portable Document Format) is widely used for document exchange in the business world. As such, it is a trusted format, and many native anti-spam solutions automatically whitelist all messages containing a PDF file. In fact, such is the importance and general acceptance of PDF in the business world that practically all computers in a corporate environment will have a PDF viewer installed. This makes PDF an excellent vector for spam messages.