Você está na página 1de 8

Introduction

A CAPTCHA or Captcha is a type of challenge-response test used in


computing to ensure that the response is not generated by a computer. The
process usually involves one computer (a server) asking a user to complete a
simple test which the computer is able to generate and grade. Because other
computers are unable to solve the CAPTCHA, any user entering a correct
solution is presumed to be human. Thus, it is sometimes described as a
reverse Turing test, because it is administered by a machine and targeted to a
human, in contrast to the standard Turing test that is typically administered
by a human and targeted to a machine. A common type of CAPTCHA
requires that the user type letters or digits from a distorted image that
appears on the screen.
The term "CAPTCHA" (based upon the word capture) was coined in 2000
by Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford (all
of Carnegie Mellon University). It is a contrived acronym for "Completely
Automated Public Turing test to tell Computers and Humans Apart."
Carnegie Mellon University attempted to trademark the term, but the
trademark application was abandoned on 21 April 2008.

CAPTCHA: Telling Humans and Computers Apart Automatically


A CAPTCHA is a program that protects websites against bots by generating
and grading tests that humans can pass but current computer programs
cannot. For example, humans can read distorted text as the one shown
below, but current computer programs can't:

The term CAPTCHA (for Completely Automated Public Turing Test To


Tell Computers and Humans Apart) was coined in 2000 by Luis von Ahn,
Manuel Blum, Nicholas Hopper and John Langford of Carnegie Mellon
University.
A free, secure and accessible CAPTCHA implementation is available from
the reCAPTCHA project. Easy to install plugins and controls are available
for Word Press, Media Wiki, PHP, ASP.NET, Perl, Python, Java, and many
other environments. reCAPTCHA also comes with an audio test to ensure
that blind users can freely navigate your site. reCAPTCHA is our officially
recommended CAPTCHA implementation.
Test Drive a CAPTCHA
ReCAPTCHA. Stop spam and help digitize books at the same time! The
words shown come directly from old books that are being digitized.
SQUIGL-PIX. Our newest CAPTCHA!
ESP-PIX. A CAPTCHA script that's close to our hearts. Instead of typing
letters, you authenticate yourself as a human by recognizing what object is
common in a set of images. This was the first example of a CAPTCHA
based on image recognition.

What is CAPTCHA and how it works?

If you're here on this site, possibly you're tired to death of unceasing


spambot attacks on your site, forum or blog. And yes, this article with a real
working example of CAPTCHA PHP script based on TheCAPTCHA can
help you to solve the problem and protect your site against spambots.
I. Theory
Spambots are ubiquitous. They fill forms on web sites, send SMS via web-
interface, take part in online polls, they just do all those things you expect
from a real human. This situation gave a push to security specialists to create
a number of protection methods against spambots, and spammers, in their
turn, are permanently trying to break the shields.
Human or robot?
To determine whether a site visitor is a human or a robot, he (or it) has to
accomplish a task, easy for human and hard or even impossible for robot.
These tasks are commonly named CAPTCHA, which is an acronym for
“Completely Automated Public Turing test to tell Computers and Humans
Apart”. It is named after Professor Alan Turing, who in 1950 in his article
“Imitation Game” described a test of a machine's capability to demonstrate
intelligence. It proceeds as follows: a human judge engages in a natural
language conversation with two other parties, one a human and the other a
machine; if the judge cannot reliably tell which is which, then the machine is
said to pass the test. It is assumed that both the human and the machine try
to appear human.
CAPTCHA is a form of reverse Turing test. When, for example, logging on
to a web site, the user is presented with a word or number in a distorted
graphic image and asked to enter it. If the value entered does not match what
is expected, then the user is rejected. There are various implementations of
CAPTCHA, the most popular at the moment are:
Image with random letters and digits, sometimes distorted, sometimes with
some noise over the text. A human can recognize distorted text, while a
spambot can't.
Image with some random mathematical expression, which has to be
calculated by a visitor. For example, if expression is 47+39, a visitor has to
put in the form the right answer, which in this case is 86.
An audio file with pronounced random letters or digits, the sound is
sometimes distorted. A visitor has to type the heard text in the form.
A random question or a riddle. For example, a visitor has to tell the first
name of Shakespeare.
A set of images of various objects, for example cats, dogs and birds. A
visitor has to select only objects of the kind that is specified by the test, for
example only cats.
Which way is the best? The last and the next to last are the worst, because in
order to be effective against spambot attacks you have to acquire a
prohibitively huge database of images or questions. The next disadvantage
of these methods is that a visitor, if he is a human, has to be intimate with
the language of the question. An answer to a riddle can be easily added to
spambot's database, but it may be impossible to answer for those who are not
native speakers of the language.
Audio-CAPTCHA is also not the best way, because a visitor has to have
some audio-equipment on his computer (many workers in big corporations
with strict computer policies don't have).
You may think that math CAPTCHA is quite a good idea, because a
spambot has to recognize the digits and operators and then to calculate the
expression. No man, computers calculate much more faster than humans :)
So, what remains? A graphic CAPTCHA with letters and digits. This is the
most easy to implement method, it is quite secure if you follow security
policies, and in this article I'll teach you how to create a strong but simple
PHP script to protect your web site.
Abordage
Creating a graphic CAPTCHA is always a balance between possibilities of
both humans and bots to recognize distorted and noisy text. If text is too
much distorted there will be no submissions of forms on your site at all —
this is not what we want. On the other hand, a weak CAPTCHA with
ineffective distortion and noise or even without any distortion at all is like a
toy shield against a viking axe.
How do spambots recognize symbols? This process can be divided into 2
steps:
Definition of place and borders of each symbol.
Then, when above is done, a spambot tries to recognize each symbol.
If symbols are always on the same places with fixed spaces between them,
your CAPTCHA is weak, so that's why symbols should always have
different coordinates:

The task is getting much more difficult if symbols on your CAPTCHA have
always random position, angle and space. Having done with the definition of
place of each symbol, the next step is to compare color of background with
the color of what is assumed to be a symbol. If these colors are different
(and they must be different to be easily read by human), recognition by
spambots is quite easy, so you have either to:
put some noise under and over symbols or
place each symbol so close to each other, that they will form a one box of
symbols with no spaces between them, and finally distort the text.
The second method is more powerful but much more complicated to
implement at the same time. When there's no idea on where the symbol
starts and where it ends, the task becomes almost impossible to accomplish
for spambots, though still easy for a human being. However, if the noise you
put on your CAPTCHA is of the right kind, the first method can be quite
secure from break, and in this article I will use this very method to show you
how to make your own CAPTCHA script. The same method is used in my
TheCAPTCHA script.
The right noise
There are several types of noise commonly used with CAPTCHA scripts to
embarrass the recognition of symbols by spambots:
Pixel noise, sometimes of random color, looks like an old grainy film or
3200 ISO images of your digital camera.
Lines, sometimes of random color and angle, sometimes they form a kind of
grid.
Rectangles and/or circles, sometimes filled with color.
The first type is the weakest and there is no sense to use it, as it only makes
the symbols difficult to recognize by a human, not by a spambot.
The second and the third are good only when they are random. As you can
see, when creating a CAPTCHA you have to forget anything systematic.
Any well-ordered structure on an image is a hole in your hauberk. If a
creator of a spambot knows that on some particular image there is a regular
grid, lines of which have always the same regular angle, he just excludes the
grid from the image, thus making this grid useless. The same with rectangles
and circles. That's why you need to put this kind of noise under and over the
symbols only on irregular basis. And note, that the color of these objects
must be either the same as the color of the background or the same as the
color of symbols, thus making it harder for spambots to define place and
borders of each symbol.

Applications of CAPTCHAs:
CAPTCHAs have several applications for practical security, including (but
not limited to):
Preventing Comment Spam in Blogs: Most bloggers are familiar with
programs that submit bogus comments, usually for the purpose of raising
search engine ranks of some website (e.g., "buy penny stocks here"). This is
called comment spam. By using a CAPTCHA, only humans can enter
comments on a blog. There is no need to make users sign up before they
enter a comment, and no legitimate comments are ever lost!
Protecting Website Registration:. Several companies (Yahoo!, Microsoft,
etc.) offer free email services. Up until a few years ago, most of these
services suffered from a specific type of attack: "bots" that would sign up for
thousands of email accounts every minute. The solution to this problem was
to use CAPTCHAs to ensure that only humans obtain free accounts. In
general, free services should be protected with a CAPTCHA in order to
prevent abuse by automated script.
Protecting Email Addresses From Scrapers. Spammers crawl the Web in
search of email addresses posted in clear text. CAPTCHAs provide an
effective mechanism to hide your email address from Web scrapers. The
idea is to require users to solve a CAPTCHA before showing your email
address. A free and secure implementation that uses CAPTCHAs to
obfuscate an email address can be found at reCAPTCHA MailHide.
Online Polls: In November 1999, http://www.slashdot.org released an online
poll asking which was the best graduate school in computer science (a
dangerous question to ask over the web!). As is the case with most online
polls, IP addresses of voters were recorded in order to prevent single users
from voting more than once. However, students at Carnegie Mellon found a
way to stuff the ballots using programs that voted for CMU thousands of
times. CMU's score started growing rapidly. The next day, students at MIT
wrote their own program and the poll became a contest between voting
"bots." MIT finished with 21,156 votes, Carnegie Mellon with 21,032 and
every other school with less than 1,000. Can the result of any online poll be
trusted? Not unless the poll ensures that only humans can vote.
Preventing Dictionary Attacks. CAPTCHAs can also be used to prevent
dictionary attacks in password systems. The idea is simple: prevent a
computer from being able to iterate through the entire space of passwords by
requiring it to solve a CAPTCHA after a certain number of unsuccessful
logins. This is better than the classic approach of locking an account after a
sequence of unsuccessful logins, since doing so allows an attacker to lock
accounts at will.

Search Engine Bots: It is sometimes desirable to keep webpages unindexed


to prevent others from finding them easily. There is an html tag to prevent
search engine bots from reading web pages. The tag, however, doesn't
guarantee that bots won't read a web page; it only serves to say "no bots,
please." Search engine bots, since they usually belong to large companies,
respect web pages that don't want to allow them in. However, in order to
truly guarantee that bots won't enter a web site, CAPTCHAs are needed.
Worms and Spam. CAPTCHAs also offer a plausible solution against email
worms and spam: "I will only accept an email if I know there is a human
behind the other computer." A few companies are already marketing this
idea.
Guidelines:
If your website needs protection from abuse, it is recommended that you use
a CAPTCHA. There are many CAPTCHA implementations, some better
than others. The following guidelines are strongly recommended for any
CAPTCHA code:
Accessibility. CAPTCHAs must be accessible. CAPTCHAs based solely on
reading text — or other visual-perception tasks — prevent visually impaired
users from accessing the protected resource. Such CAPTCHAs may make a
site incompatible with Section 508 in the United States. Any implementation
of a CAPTCHA should allow blind users to get around the barrier, for
example, by permitting users to opt for an audio or sound CAPTCHA.
Image Security. CAPTCHA images of text should be distorted randomly
before being presented to the user. Many implementations of CAPTCHAs
use undistorted text, or text with only minor distortions. These
implementations are vulnerable to simple automated attacks.
Script Security. Building a secure CAPTCHA code is not easy. In addition
to making the images unreadable by computers, the system should ensure
that there are no easy ways around it at the script level. Common examples
of insecurities in this respect include: (1) Systems that pass the answer to the
CAPTCHA in plain text as part of the web form. (2) Systems where a
solution to the same CAPTCHA can be used multiple times (this makes the
CAPTCHA vulnerable to so-called "replay attacks"). Most CAPTCHA
scripts found freely on the Web are vulnerable to these types of attacks.
Security Even After Wide-Spread Adoption. There are various
"CAPTCHAs" that would be insecure if a significant number of sites started
using them. An example of such a puzzle is asking text-based questions,
such as a mathematical question ("what is 1+1"). Since a parser could easily
be written that would allow bots to bypass this test, such "CAPTCHAs" rely
on the fact that few sites use them, and thus that a bot author has no
incentive to program their bot to solve that challenge. True CAPTCHAs
should be secure even after a significant number of websites adopt them.
Should I Make My Own CAPTCHA? In general, making your own
CAPTCHA script (e.g., using PHP, Perl or .Net) is a bad idea, as there are
many failure modes. We recommend that you use a well-tested
implementation such as reCAPTCHA.
The "Pornography Attack" is Not a Concern
It is sometimes rumored that spammers are using pornographic sites to solve
CAPTCHAs: the CAPTCHA images are sent to a porn site, and the porn site
users are asked to solve the CAPTCHA before being able to see a
pornographic image. This is not a security concern for CAPTCHAs. While it
might be the case that some spammers use porn sites to attack CAPTCHAs,
the amount of damage this can inflict is tiny (so tiny that we haven't even
noticed a dent!). Whereas it is trivial to write a bot that abuses an
unprotected site millions of times a day, redirecting CAPTCHAs to be
solved by humans viewing pornography would only allow spammers to
abuse systems a few thousand times per day. The economics of this attack
just don't add up: every time a porn site shows a CAPTCHA before a porn
image, they risk losing a customer to another site that doesn't do this.
Advancing Artificial Intelligence:
CAPTCHA tests are based on open problems in artificial intelligence (AI):
decoding images of distorted text, for instance, is well beyond the
capabilities of modern computers. Therefore, CAPTCHAs also offer well-
defined challenges for the AI community, and induce security researchers, as
well as otherwise malicious programmers, to work on advancing the field of
AI. CAPTCHAs are thus a win-win situation: either a CAPTCHA is not
broken and there is a way to differentiate humans from computers, or the
CAPTCHA is broken and an AI problem is solved.

Você também pode gostar