Captchas: Vulnerability To Attacks: International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 2, March April 2013 ISSN 2278-6856
CAPTCHAs: VULNERABILITY TO ATTACKS

Prerna Sharma1, Nidhi Tyagi2 and Deepali Singhal3
1, 2
UG, Department of Information Technology Engineering
Raj Kumar Goel Institute of Technology for Women, Ghaziabad, Gautam Buddh Technical University, Lucknow, (India)
3
Senior Lecturer, Department of Computer Science Engineering
Raj Kumar Goel Institute of Technology for Women, Ghaziabad, Gautam Buddh Technical University, Lucknow, (India)
ABSTRACT: The prevalent growth of web-based services

has revolutionized the way people communicate and share information necessitating firm security measures. A significant threat comes from malicious automated programs designed to take advantage of online facilities which results into degrading the quality of service of a given system and breach of web security. CAPTCHAs (Completely Automated Public Turing test to tell Computers and Human Apart) is a challenge response test that can, automatically tell human and computer programs apart that ultimately aid preventing unauthorized access and abuse. Any user entering a correct solution is presume to be human else user boot and denied access. OCRs (Optical Character Recognition) are not able to read CAPTCHAs .It provide additional layer of security and are based on text, images, audio, and video, also they are vulnerable to attacks by computers. The vulnerabilities identified during the research were classified into three broad categories: breaching client-side trust, manipulating serverside implementation and attacking the CAPTCH image. In this paper, we will look at the interesting and the most common vulnerabilities identified during the research. Keywords: Boot, CAPTCHAs, OCR (OPTICAL CHARACTER RECOGNITION) ,SPAM,TURING TEST
1. INTRODUCTION
Security of websites is of paramount concern today. So, when we sign up for any email service on Yahoo or Gmail we first need to pass a challenge response test which is very simple and straightforward for human being to solve but it is impossible for computers to pass this test. This sort of test is a CAPTCHA which is also known as a type of Human Interaction Proof (HIP). You all must have probably seen CAPTCHA tests on lots of websites . The most common form of CAPTCHA is an image of several distorted letters. The user who is trying to login the email service has to type the correct series of letters into a form. If the letters entered by the user matches with the ones in the distorted image, then the user will pass the test else it is considered as a spam bot. CAPTCHA incorporates two levels of testing that includes identification of displayed characters and secondly interpreting the logical ordering based on the embedded numbers. It can be conveniently implemented Volume 2, Issue 2 March April 2013
on Web sites and provides the advantages of robustness and low space requirements. CAPTCHAs are acronym for Completely Automated Public Turing test to tell Computers and Humans Apart. The term "CAPTCHA" was coined in 2000 by Luis Von Ahn, Manuel Blum, Nicholas J. Hopper (all of Carnegie Mellon University, and John Langford (then of IBM). The basic purpose of it is to block form submission from spam bots-automated scripts that gather email addresses from publicly available web forms. CAPTCHAs are used because of the fact that it is difficult for the computers to extract the text from such a distorted image, whereas it is relatively easy for a human to understand the text hidden behind the distortions. Therefore, the correct response to a CAPTCHA challenge is assumed to come from a human and the user is permitted into the website. Why would anyone need to create a test that can tell humans and computers apart? It's because of people trying to game the system -- they want to exploit weaknesses in the computers running the site. While these individuals probably make up a minority of all the people on the Internet, their actions can affect millions of users and Web sites. For example, a free email service might find itself bombarded by account requests from an automated program. That automated program could be part of a larger attempt to send out spam mail to millions of people. The CAPTCHA test helps identify which users are real human beings and which ones are computer programs. Spammers are constantly trying to build algorithms that read the distorted text correctly. So strong CAPTCHAs have to be designed and built so that the efforts of the spammers are thwarted.
2. THE TURING TEST(CHALLENGE RESPONSE TEST) AND THE CAPTCHA

CAPTCHA technology finds its application in an experiment called the Turing Test. Alan Turing, also called the father of modern computing, suggested the test as a way to examine whether machines can think or not -or appear to think just like humans. The classic test is a game of imitation in which an interrogator asks two Page 73

participants a series of questions. One of the participants is a machine and the other one is a human. The interrogator can't see or hear the participants and has no way of knowing which one is machine or human. If the interrogator is unable to figure out which participant gives machine based responses, the machine clears the Turing Test. The goal of CAPTCHA is to create a test that humans but machines cannot pass easily. It's also important that the CAPTCHA application is able to present a unique CAPTCHAs to different users. If a visual CAPTCHA presented a static image that was the same for every user, it wouldn't take long before a spammer to spot the form, decipher the letters, and program an application to type in the correct answer automatically. Mostly, but not every CAPTCHAs rely on a visual test. Computers lack the sophistication that human beings have in processing visual data. We can analyze image and pick out patterns more easily than a computer. The human mind sometimes perceives patterns even that which do not exist, a quirk we call pareidolia . But not all CAPTCHAs depends on visual patterns. In fact, it's mandatory to have an alternative to a visual CAPTCHA. Otherwise, the Web site administrator hold the risk of disenfranchising any Web user who has a visual impairment. An alternative to a visual test is an audible one. An audio CAPTCHA usually presents the user with a series of verbal letters or numbers. It's not unusual for the program to distort the speaker's voice, and it's also common for the program to include background noise in the audio. Another option is to create a CAPTCHA that asks the reader to read and interpret a short passage of text. A contextual CAPTCHA quizzes the reader and tests his comprehension skills. 3.1.1 Gimpy Gimpy is a reliable text CAPTCHA made by CMU in collaboration with Yahoo for their Messenger service. Gimpy is based on the human ability to read extremely distorted text sometimes overlapped and the inability of computer programs to do the same. Gimpy chooses ten words randomly from a dictionary, and displaying them in a distorted and overlapped manner. Gimpy then expects the users to differentiate and enter the subset of the words in the image. The human user is capable of identifying the words correctly, whereas a computer program cannot determine the words correctly.
Fig 3.1.1 Gimpy CAPTCHA
3.1.2Ez Gimpy Ez-Gimpy is a simple version of Gimpy CAPTCHA and is adopted by Yahoo in their signup page. It basically picks any single word and applies distortion to the text and then the user is asked to enter the text correctly.
Fig 3.1.2 Yahoos Ez Gimpy CAPTCHA
3. TYPES OF CAPTCHAs
3.1TEXT BASED CAPTCHA Typically Text based CAPTCHAs based on sophisticated distortion of text images rendering them unrecognizable to the state of the art of the pattern recognition programs but recognizable by humans. The simplest yet novel approach is to present the user with some questions which only a human user can solve. Examples of such questions are: What is the last letter in CALIFORNIA? What is ten minus two? If tomorrow is tuesday , what is today? Such questions are very easy for a human user to solve, but its very difficult to program a computer to solve them. These are also friendly to people with visual disability such as audio CAPTCHAs are useful in this context. Volume 2, Issue 2 March April 2013
3.1.3MSN CAPTCHAs Microsoft uses a different CAPTCHA for services provided under MSN umbrella. These are popularly called MSN Passport CAPTCHAs. They use eight characters (upper case) and digits. Foreground is dark blue, and background is grey. Warping is used to distort the characters, to produce a ripple effect, which makes computer recognition very difficult
XTNM5YR
Fig 3.1.3 MSN Passport CAPTCHA
3.2 Graphic CAPTCHAs Graphic CAPTCHAs are challenges that involve pictures or objects that have some sort of similarity that the users have to guess. They are visual puzzles, similar to Mensa tests. Computer generates the puzzles and grades the answers, but is itself unable to solve it. 3.2.1Bongo BONGO is named after M.M. Bongard, who published a book on pattern recognition problems in the 1970s [3]. BONGO asks the user to solve a visual pattern recognition problem. It displays two series of blocks, the left and the right. The blocks in the left series differ from Page 74

those in the right, and the user must find the characteristic that differentiates them . A possible left and right series is shown in Figure 3.2.1 protection system have already defeated the current generation of text recognition software.
Fig 3.4 reCAPTCHA and book digitization
4 . A SECURE CAPTCHA IMPLEMENTATION

Fig 3.2.1 Bongo CAPTCHA
After seeing the two blocks, the user is presented with a set of four single blocks and is asked to determine to which group the each block belongs to. The user clears the test if s/he determines correctly to which set the blocks belong to. We have to be careful to see that the user is not perplexed by a large number of options 3.3 Audio CAPTCHAs The Audio CAPTCHAs program picks a word or a sequence of numbers at random, renders the word or the numbers into a sound clip and distorts the sound clip by adding noise ; it then presents the distorted sound clip to the user and asks users to enter its contents textually .This CAPTCHA is based on the difference in ability between humans and computers in recognizing verbal language. . The idea is that a human is able to efficiently disregard the distortion and interpret the characters being spoken while software would find the distortion being applied difficult, and need to be effective at speech to text translation in order to be successful. This is a crude way to filter humans and it is not very popular because the user has to understand the language and the accent in which the sound clip is recorded.
The following steps and various cautions are followed for secure and strong CAPTCHA implementation . The image and description below explain the various steps of the CAPTCHAs generation and verification process. 1. The client requests a CAPTCHA from the server with or without a valid SESSIONID. 2. If the client does not provide a valid SESSIONID, a new SESSIONID is generated and corresponding +session store is instantiated. 3. The server-side code creates a new CAPTCHA with random text. 4. CAPTCHA solution is stored in the HTTP session store. 5. CAPTCHA image is sent to the client. If the client request did not provide a valid SESSIONID , the newly generated SESSIONID in step 2 is also returned. 6. The client sends CAPTCHA solution along with SESSIONID for verification. 7. Server side code retrieves CAPTCHA solution from the HTTP Session and verifies it against the solution provided by the client. 8. Server-side CAPTCHA state is cleared 9. If verification is successful, client is sent to next logical step. If not, client is forced to request a new CAPTCHA (step 1).
Fig 3.3 Audio CAPTCHAs
3.4 reCAPTCHA and book digitization reCAPTCHA is a free service that helps to digitize books and newspapers. It improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More generally, those words which cannot be read correctly by OCR is placed on an image and used as a CAPTCHA. This is possible because most OCR programs alert you when a word cannot be read correctly. Computers will be unable to determine what a smudged or warped word should mean. This means that all the words that appear as part of a Re-CAPTCHA site
Fig 4. Image shows a secure CAPTCHA implementation
5 .VARIOUS VULNERABILITIES IDENTIFIED DURING THE RESEARCH

5.1 Breaching Client-Side Trust It was observed that many developers depend on the client to perform CAPTCHA validation, generation, and storage. The secure design principles for distributed Page 75
Volume 2, Issue 2 March April 2013

systems and web application recommend not trusting the client for performing security checks. This allows the client to directly access CAPTCHA solution, avoid the verification process, and generate CAPTCHAs of its own choice. Client-side flaws identified during analysis of CAPTCHA implementations are discussed below. 5.1.1Hidden fields and client-side storage Hidden fields and client-side storage have long been considered as an insecure means to transfer sensitive information between client and server. CAPTCHA implementations were found to depend on hidden fields to relay CAPTCHA solutions between client and server. These implementations are completely dependent on client to provide value for both the CAPTCHA solution and the user-entered CAPTCHA value. As the server has no means of performing meaningful validation to have access to original CAPTCHA solution, so an attacker could provide values of his choice. This particular implementation does not offer any protection and also requires minimum effort to break the CAPTCHA solution. Usually, it was observed that some implementations depend on JavaScript code and hidden fields to verify the CAPTCHA on the client side with no validation on the server side. 5.1.2 Chosen CAPTCHA text attack The chosen CAPTCHA text attack allows the attacker to choose the CAPTCHA value and completely avoid the protection offered. It was usually observed that a few websites mandate CAPTCHA generation routines to the clients while keeping the verification component at the server. An observed real-world implementation is explained below: 1. On the registration page, JavaScript code was used to generate a random number. 2. This random number was sent to the server along with a SESSIONID to generate a CAPTCHA image. 3. The server generated the CAPTCHA image with a random number received from the client. The random number was also stored in HTTP session for verification purposes. 4. The CAPTCHA image was retrieved and displayed on the registration page as a challenge for the user. To exploit this vulnerability, an attacker has to do the following: 1. Obtain a valid SESSIONID. 2. Set the CAPTCHA value of his choice into the HTTP session by using the SESSIONID obtained in the above step. 3. Make a submission with the attacker generated CAPTCHA value to bypass the protection. 5.1.3 Arithmetic CAPTCHAs Volume 2, Issue 2 March April 2013
Fig5.2.1 Sample CAPTCHA rainbow table implementation with numeric identifiers.
In this type of CAPTCHA the user need to solve an arithmetic problem. When the CAPTCHA data is saved at the client-side, so the minimal effort is required to bypass this CAPTCHA implementation : just parse the HTML content of the returned page, extract the arithmetic question, and solve it at client-side. Thus, any CAPTCHAs implementation that stores CAPTCHA data client-side fails to offer any significant protection.
Fig 5.1.3 Arithmetic CAPTCHAs
5.2 Servers-Side Attacks

So far we have looked at attacks that target client-side trust. Let us now look at various attacks that target the server-side implementation flaws.
5.2.1 CAPTCHA Rainbow Tables As we know that randomly generating CAPTCHAs during runtime is one of the important aspects of a secure CAPTCHA design. It was generally observed that a very large number of websites used a finite number of CAPTCHAs and each CAPTCHA was recognized by using an identifier. These identifiers were observed to be either numeric or finite length character strings. The identifiers were generally sent to the client as hidden fields or were available as part of URL while retrieving the CAPTCHA. Further, some websites did not change the CAPTCHA identifiers ever; others chose to randomly change identifiers on periodic basis. Rainbow table-based attack vectors target websites that use a finite CAPTCHA set are discussed below. Attacking static CAPTCHA identifiers For some of the websites that use static CAPTCHA identifiers, a large number of CAPTCHAs can be downloaded and solved locally using optical character recognition (OCR) engines, custom solvers, or manually. A rainbow table can then be created with a static CAPTCHA identifier and the solution. Whenever the server returns a CAPTCHA identifier for which there is a pre-solved value available, the solution can be quickly looked up and submitted to avoid the CAPTCHA restriction. Multiple CAPTCHA requests can also be made to so that CAPTCHA with a known identifier is returned by the server.
Page 76

thorough analysis for its effectiveness against automated text extraction. An alarming number of websites rely on home-grown CAPTCHA image designs that offer little protection when subjected to generic image processing techniques and OCR tools. 5.3.1 OCR-assisted CAPTCHA brute-forcing A technique of brute-forcing CAPTCHAs can be done by an OCR software. CAPTCHAs can be copied locally and solved offline using multiple OCR engines. Also, if the CAPTCHA implementation is vulnerable to the in-session CAPTCHA brute-force vulnerability, the OCR-assisted technique can be used to significantly reduce the number of attempts required to guess the correct solution in a live HTTP session. The following methods perform OCRassisted CAPTCHA brute-force.
5.2.2 The chosen CAPTCHA identifier attack In certain CAPTCHA implementations, servers return the CAPTCHA unique identifiers to the user but do not store the corresponding identifier or CAPTCHA solution in the HTTP session. When an online form submission arrives, the CAPTCHA identifier is extracted from the request body and then used to perform CAPTCHA solution lookup for verification. Attackers can exploit this behavior by solving a single CAPTCHA, recording its unique identifier, and then submitting the recorded identifier and corresponding solution over multiple requests.
Fig5.2.2 . a) A secure CAPTCHA implementation scenario where the CAPTCHA key is stored in an HTTP session. Fig5.3.1. A CAPTCHA solution that combines results of two difference OCR engines
1. Each CAPTCHA is subjected to multiple OCR engines,
Fig 5.2.2 b). An insecure CAPTCHA implementation scenario where the CAPTCHA identifier is not stored in an HTTP session 5.2.3 CAPTCHA accumulation Certain CAPTCHA implementations accumulate CAPTCHA solutions or identifiers in their HTTP session. That is, for each request for a new CAPTCHA, the previous value is retained and a new CAPTCHA solution or identifier is also added to the HTTP session. An attacker can exploit this scenario by manually solving one CAPTCHA for an HTTP session and then reusing that solution or identifier and the SESSIONID value to make a large number of successful submissions. 5.3 Attacking the Image A strong CAPTCHA image design is the foundation for an effective anti-automation mechanism. Like encryption, the CAPTCHA image design should be subjected to Volume 2, Issue 2 March April 2013
and results are combined. The image above shows an example where a CAPTCHA was subjected to two different OCR engines and results were combined. The image assumes that the CAPTCHA implementation is vulnerable to an in-session CAPTCHA brute-force attack. Here the OCR1 attempt will send rGsyg, causing a failure. The second OCR will send r6sy9, again causing the failure. Since both the solutions differ by two characters, they can be combined to find a correct solution r6syg. 2. After extracting text from CAPTCHA using an OCR engine, a selective brute-force can also be attempted. For example, lets assume that the OCR engine returns the result as TE5T12. The brute force attempt begins by changing the first character T and retaining the values of the other five charactersand then moves on to the second character and so forth . After this, two characters can be brute-forced in tandem, followed by three, and throughout the entire length. This technique, like other brute-force techniques is high on time and resource requirements. It is important to note that the OCR engines are better at solving CAPTCHAs with clear text visibility and may not be beneficial for all CAPTCHA types.
Page 77

Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 2, March April 2013 ISSN 2278-6856 6. Conclusion
CAPTCHAs have been one of the important mechanisms to protect web applications form from user boot , automated form submissions, email spam and preventing comment spam. As observed in this paper, the attacker can compromise the security of web based services through numbers of ways and also slight oversight can render a CAPTCHAs implementation weak or even ineffective. To have an impelling protection against automated form submissions , it is necessary to build a strong CAPTCHA ecosystem. A weak implementation can only provide a false sense of security.
REFERENCES
Journal Papers: [1] Kluever, K.A. (2008) Evaluating the Usability and Security of a Video CAPTCHA. Rochester Institute of Technology [2]reCAPTCHA (2010). What is reCAPTCHA? Obtained through the Internet: Available on : http://recaptcha.net/learnmore.html, [accessed June 29, 2010]. [3]Yan, J., and Salah, A. (2008) A low-cost attack on a Microsoft captcha. Paper presented at the 15th ACM Conference on Computer and Communications Security. 2008. Alexandria, Virginia . [4] Bursztein, E., and Bethard , S. (2009) Decaptcha: Breaking 75% of eBay Audio CAPTCHAs. 3rd USENIX Workshop on Offensive Technologies, August 2009, Montreal [5] S. Rice, G. Nagy, and T. Nartker. Optical Character Recognition: An Illustrated Guide to the Frontier. Kluwer Academic Publishers, Boston, 1999. Book: [6] Gursev Singh Kalra,wp attacking captchas for fun profit [7]. Wikipedia. (2007) Case Study. [Internet].US. Available from <http://en.wikipedia.org/wiki/Case_studies > [Accessed 20 April 2007]. [8]. Wikipedia.(2007) Qualitative Research[Internet]. Available from: <http://en.wikipedia.org/wiki/Qualitative_method > [Accessed 3 April 2007].
Volume 2, Issue 2 March April 2013
Page 78

Captchas: Vulnerability To Attacks: International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Captchas: Vulnerability To Attacks: International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)

Enviado por

Direitos autorais:

Formatos disponíveis

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)