Escolar Documentos
Profissional Documentos
Cultura Documentos
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 2, March April 2013 ISSN 2278-6856
Raj Kumar Goel Institute of Technology for Women, Ghaziabad, Gautam Buddh Technical University, Lucknow, (India)
3
Raj Kumar Goel Institute of Technology for Women, Ghaziabad, Gautam Buddh Technical University, Lucknow, (India)
1. INTRODUCTION
Security of websites is of paramount concern today. So, when we sign up for any email service on Yahoo or Gmail we first need to pass a challenge response test which is very simple and straightforward for human being to solve but it is impossible for computers to pass this test. This sort of test is a CAPTCHA which is also known as a type of Human Interaction Proof (HIP). You all must have probably seen CAPTCHA tests on lots of websites . The most common form of CAPTCHA is an image of several distorted letters. The user who is trying to login the email service has to type the correct series of letters into a form. If the letters entered by the user matches with the ones in the distorted image, then the user will pass the test else it is considered as a spam bot. CAPTCHA incorporates two levels of testing that includes identification of displayed characters and secondly interpreting the logical ordering based on the embedded numbers. It can be conveniently implemented Volume 2, Issue 2 March April 2013
on Web sites and provides the advantages of robustness and low space requirements. CAPTCHAs are acronym for Completely Automated Public Turing test to tell Computers and Humans Apart. The term "CAPTCHA" was coined in 2000 by Luis Von Ahn, Manuel Blum, Nicholas J. Hopper (all of Carnegie Mellon University, and John Langford (then of IBM). The basic purpose of it is to block form submission from spam bots-automated scripts that gather email addresses from publicly available web forms. CAPTCHAs are used because of the fact that it is difficult for the computers to extract the text from such a distorted image, whereas it is relatively easy for a human to understand the text hidden behind the distortions. Therefore, the correct response to a CAPTCHA challenge is assumed to come from a human and the user is permitted into the website. Why would anyone need to create a test that can tell humans and computers apart? It's because of people trying to game the system -- they want to exploit weaknesses in the computers running the site. While these individuals probably make up a minority of all the people on the Internet, their actions can affect millions of users and Web sites. For example, a free email service might find itself bombarded by account requests from an automated program. That automated program could be part of a larger attempt to send out spam mail to millions of people. The CAPTCHA test helps identify which users are real human beings and which ones are computer programs. Spammers are constantly trying to build algorithms that read the distorted text correctly. So strong CAPTCHAs have to be designed and built so that the efforts of the spammers are thwarted.
3.1.2Ez Gimpy Ez-Gimpy is a simple version of Gimpy CAPTCHA and is adopted by Yahoo in their signup page. It basically picks any single word and applies distortion to the text and then the user is asked to enter the text correctly.
3. TYPES OF CAPTCHAs
3.1TEXT BASED CAPTCHA Typically Text based CAPTCHAs based on sophisticated distortion of text images rendering them unrecognizable to the state of the art of the pattern recognition programs but recognizable by humans. The simplest yet novel approach is to present the user with some questions which only a human user can solve. Examples of such questions are: What is the last letter in CALIFORNIA? What is ten minus two? If tomorrow is tuesday , what is today? Such questions are very easy for a human user to solve, but its very difficult to program a computer to solve them. These are also friendly to people with visual disability such as audio CAPTCHAs are useful in this context. Volume 2, Issue 2 March April 2013
3.1.3MSN CAPTCHAs Microsoft uses a different CAPTCHA for services provided under MSN umbrella. These are popularly called MSN Passport CAPTCHAs. They use eight characters (upper case) and digits. Foreground is dark blue, and background is grey. Warping is used to distort the characters, to produce a ripple effect, which makes computer recognition very difficult
XTNM5YR
Fig 3.1.3 MSN Passport CAPTCHA
3.2 Graphic CAPTCHAs Graphic CAPTCHAs are challenges that involve pictures or objects that have some sort of similarity that the users have to guess. They are visual puzzles, similar to Mensa tests. Computer generates the puzzles and grades the answers, but is itself unable to solve it. 3.2.1Bongo BONGO is named after M.M. Bongard, who published a book on pattern recognition problems in the 1970s [3]. BONGO asks the user to solve a visual pattern recognition problem. It displays two series of blocks, the left and the right. The blocks in the left series differ from Page 74
After seeing the two blocks, the user is presented with a set of four single blocks and is asked to determine to which group the each block belongs to. The user clears the test if s/he determines correctly to which set the blocks belong to. We have to be careful to see that the user is not perplexed by a large number of options 3.3 Audio CAPTCHAs The Audio CAPTCHAs program picks a word or a sequence of numbers at random, renders the word or the numbers into a sound clip and distorts the sound clip by adding noise ; it then presents the distorted sound clip to the user and asks users to enter its contents textually .This CAPTCHA is based on the difference in ability between humans and computers in recognizing verbal language. . The idea is that a human is able to efficiently disregard the distortion and interpret the characters being spoken while software would find the distortion being applied difficult, and need to be effective at speech to text translation in order to be successful. This is a crude way to filter humans and it is not very popular because the user has to understand the language and the accent in which the sound clip is recorded.
The following steps and various cautions are followed for secure and strong CAPTCHA implementation . The image and description below explain the various steps of the CAPTCHAs generation and verification process. 1. The client requests a CAPTCHA from the server with or without a valid SESSIONID. 2. If the client does not provide a valid SESSIONID, a new SESSIONID is generated and corresponding +session store is instantiated. 3. The server-side code creates a new CAPTCHA with random text. 4. CAPTCHA solution is stored in the HTTP session store. 5. CAPTCHA image is sent to the client. If the client request did not provide a valid SESSIONID , the newly generated SESSIONID in step 2 is also returned. 6. The client sends CAPTCHA solution along with SESSIONID for verification. 7. Server side code retrieves CAPTCHA solution from the HTTP Session and verifies it against the solution provided by the client. 8. Server-side CAPTCHA state is cleared 9. If verification is successful, client is sent to next logical step. If not, client is forced to request a new CAPTCHA (step 1).
3.4 reCAPTCHA and book digitization reCAPTCHA is a free service that helps to digitize books and newspapers. It improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More generally, those words which cannot be read correctly by OCR is placed on an image and used as a CAPTCHA. This is possible because most OCR programs alert you when a word cannot be read correctly. Computers will be unable to determine what a smudged or warped word should mean. This means that all the words that appear as part of a Re-CAPTCHA site
In this type of CAPTCHA the user need to solve an arithmetic problem. When the CAPTCHA data is saved at the client-side, so the minimal effort is required to bypass this CAPTCHA implementation : just parse the HTML content of the returned page, extract the arithmetic question, and solve it at client-side. Thus, any CAPTCHAs implementation that stores CAPTCHA data client-side fails to offer any significant protection.
5.2.1 CAPTCHA Rainbow Tables As we know that randomly generating CAPTCHAs during runtime is one of the important aspects of a secure CAPTCHA design. It was generally observed that a very large number of websites used a finite number of CAPTCHAs and each CAPTCHA was recognized by using an identifier. These identifiers were observed to be either numeric or finite length character strings. The identifiers were generally sent to the client as hidden fields or were available as part of URL while retrieving the CAPTCHA. Further, some websites did not change the CAPTCHA identifiers ever; others chose to randomly change identifiers on periodic basis. Rainbow table-based attack vectors target websites that use a finite CAPTCHA set are discussed below. Attacking static CAPTCHA identifiers For some of the websites that use static CAPTCHA identifiers, a large number of CAPTCHAs can be downloaded and solved locally using optical character recognition (OCR) engines, custom solvers, or manually. A rainbow table can then be created with a static CAPTCHA identifier and the solution. Whenever the server returns a CAPTCHA identifier for which there is a pre-solved value available, the solution can be quickly looked up and submitted to avoid the CAPTCHA restriction. Multiple CAPTCHA requests can also be made to so that CAPTCHA with a known identifier is returned by the server.
Page 76
5.2.2 The chosen CAPTCHA identifier attack In certain CAPTCHA implementations, servers return the CAPTCHA unique identifiers to the user but do not store the corresponding identifier or CAPTCHA solution in the HTTP session. When an online form submission arrives, the CAPTCHA identifier is extracted from the request body and then used to perform CAPTCHA solution lookup for verification. Attackers can exploit this behavior by solving a single CAPTCHA, recording its unique identifier, and then submitting the recorded identifier and corresponding solution over multiple requests.
Fig5.2.2 . a) A secure CAPTCHA implementation scenario where the CAPTCHA key is stored in an HTTP session. Fig5.3.1. A CAPTCHA solution that combines results of two difference OCR engines
1. Each CAPTCHA is subjected to multiple OCR engines,
Fig 5.2.2 b). An insecure CAPTCHA implementation scenario where the CAPTCHA identifier is not stored in an HTTP session 5.2.3 CAPTCHA accumulation Certain CAPTCHA implementations accumulate CAPTCHA solutions or identifiers in their HTTP session. That is, for each request for a new CAPTCHA, the previous value is retained and a new CAPTCHA solution or identifier is also added to the HTTP session. An attacker can exploit this scenario by manually solving one CAPTCHA for an HTTP session and then reusing that solution or identifier and the SESSIONID value to make a large number of successful submissions. 5.3 Attacking the Image A strong CAPTCHA image design is the foundation for an effective anti-automation mechanism. Like encryption, the CAPTCHA image design should be subjected to Volume 2, Issue 2 March April 2013
and results are combined. The image above shows an example where a CAPTCHA was subjected to two different OCR engines and results were combined. The image assumes that the CAPTCHA implementation is vulnerable to an in-session CAPTCHA brute-force attack. Here the OCR1 attempt will send rGsyg, causing a failure. The second OCR will send r6sy9, again causing the failure. Since both the solutions differ by two characters, they can be combined to find a correct solution r6syg. 2. After extracting text from CAPTCHA using an OCR engine, a selective brute-force can also be attempted. For example, lets assume that the OCR engine returns the result as TE5T12. The brute force attempt begins by changing the first character T and retaining the values of the other five charactersand then moves on to the second character and so forth . After this, two characters can be brute-forced in tandem, followed by three, and throughout the entire length. This technique, like other brute-force techniques is high on time and resource requirements. It is important to note that the OCR engines are better at solving CAPTCHAs with clear text visibility and may not be beneficial for all CAPTCHA types.
Page 77
REFERENCES
Journal Papers: [1] Kluever, K.A. (2008) Evaluating the Usability and Security of a Video CAPTCHA. Rochester Institute of Technology [2]reCAPTCHA (2010). What is reCAPTCHA? Obtained through the Internet: Available on : http://recaptcha.net/learnmore.html, [accessed June 29, 2010]. [3]Yan, J., and Salah, A. (2008) A low-cost attack on a Microsoft captcha. Paper presented at the 15th ACM Conference on Computer and Communications Security. 2008. Alexandria, Virginia . [4] Bursztein, E., and Bethard , S. (2009) Decaptcha: Breaking 75% of eBay Audio CAPTCHAs. 3rd USENIX Workshop on Offensive Technologies, August 2009, Montreal [5] S. Rice, G. Nagy, and T. Nartker. Optical Character Recognition: An Illustrated Guide to the Frontier. Kluwer Academic Publishers, Boston, 1999. Book: [6] Gursev Singh Kalra,wp attacking captchas for fun profit [7]. Wikipedia. (2007) Case Study. [Internet].US. Available from <http://en.wikipedia.org/wiki/Case_studies > [Accessed 20 April 2007]. [8]. Wikipedia.(2007) Qualitative Research[Internet]. Available from: <http://en.wikipedia.org/wiki/Qualitative_method > [Accessed 3 April 2007].
Page 78