To create a new account on Google's Gmail service, users must complete a CAPTCHA as part of the registration process. ©2008 MytourImagine you're using your computer to purchase tickets for a local concert. Before completing the purchase, you must pass a test. This test is a CAPTCHA, which stands for Completely Automated Public Turing Test to Tell Computers and Humans Apart.
This test is not difficult — in fact, that's the idea. It is designed to be easy for human users, but nearly impossible for automated systems to solve. The purpose of CAPTCHAs is to distinguish legitimate human users from harmful bots, ensuring smooth interaction for users without any issues.
CAPTCHA tests are a form of Human Interaction Proof (HIP), a security measure you’ve likely encountered on many websites. The most familiar type involves typing a series of distorted letters from an image. If the letters you input match those in the image, you’ve passed the test!
Another popular variant is the image CAPTCHA. In this test, you're presented with a series of photographs featuring everyday scenes, such as streets, highways, or parks. You'll need to select only the images that contain specific objects, like bicycles, fire hydrants, or streetlights. Select correctly, and you'll pass!
Image recognition CAPTCHAs tend to be more challenging for bots compared to text-based ones. Distorted or blurry images are even more effective, as they complicate the bot's ability to recognize them. Creating CAPTCHAs that are solvable only by humans is essential.
Why create a test to distinguish between humans and computers? It’s because some people try to exploit system vulnerabilities, aiming to manipulate the technology behind websites. While these individuals represent a small fraction of online users, their actions can significantly impact millions of websites and users.
For instance, a free e-mail service might face an overwhelming number of account sign-up attempts from automated programs. These programs could be part of a wider effort to send spam emails to countless recipients. The CAPTCHA test plays a critical role in distinguishing real users from computer-driven programs.
An intriguing aspect of CAPTCHA tests is that the designers don’t necessarily mind when the tests are bypassed. This is because, for a CAPTCHA to fail, someone must have figured out how to teach a computer to solve it. In essence, every failure in CAPTCHA technology is a step forward in advancing artificial intelligence.
In the following section, let’s take a deeper dive into what exactly a CAPTCHA is.
One of the paradoxes of CAPTCHA technology is that a CAPTCHA system might create a test that even it can't solve without already knowing the correct solution.
CAPTCHAs and the Turing Test
Not all CAPTCHA tests involve typing text. Some versions ask users to trace specific shapes within photographs using a mouse. ©2008 MytourCAPTCHA technology is rooted in an experiment called the Turing Test. Alan Turing, often regarded as the father of modern computing, proposed the test to assess whether machines could think or simulate human-like thinking. The classic version involves a game where an interrogator asks questions to two participants—one a machine, the other a human. The interrogator cannot see or hear them, and if the interrogator can’t distinguish the machine from the human based on responses, the machine passes the test.
With CAPTCHA, the goal is to create a challenge that's easy for humans to solve but difficult for machines. It’s also crucial that the CAPTCHA system can present unique tests to each user. If a visual CAPTCHA displayed the same image for everyone, it wouldn’t take long before a spammer cracked the code, wrote a program, and automated the solution.
Most CAPTCHAs, but not all, rely on visual puzzles. Computers don't have the same capacity as humans to process visual information. We can spot patterns in images more easily than a computer can. The human brain is so adept at this that it sometimes perceives patterns where none exist—a phenomenon known as pareidolia. For example, ever spot shapes in the clouds or see faces on the moon? That’s your brain trying to make sense of random data.
However, not all modern CAPTCHAs rely on visual patterns. It’s crucial to provide alternatives to visual tests, to ensure that visually impaired users are not excluded. One such alternative is the audio CAPTCHA. This type typically plays a series of spoken letters or numbers, often distorted or with background noise to make it harder for voice recognition software to understand.
Another alternative is to design a CAPTCHA that requires the user to interpret a short text passage. A contextual CAPTCHA tests the reader's comprehension abilities. While computer programs can extract keywords from text, they struggle to grasp the true meaning behind those words.
In the upcoming section, we'll explore the types of websites that utilize CAPTCHA to verify whether you’re human or not.
Occasionally, a CAPTCHA presents an image or sound that’s so garbled, even humans can't make sense of it. That’s why many CAPTCHA systems offer an option to refresh and try a new one. Hopefully, the second attempt will be clearer than the first.
Who Uses CAPTCHA
When you sign up for a Yahoo! account, instead of typical words, alphanumeric strings are used as CAPTCHAs.A common use for CAPTCHA is to validate online polls. A former Slashdot poll highlights the potential problems that arise when pollsters fail to apply proper filters.
In 1999, Slashdot ran a poll asking visitors to choose the graduate school with the best computer science program. Students from Carnegie Mellon and MIT created automated bots to vote multiple times for their respective schools. While these two schools received thousands of votes, other schools only received a few hundred each. This raises the question: if a program can manipulate poll results, how can we trust online surveys? CAPTCHA forms help prevent this abuse of polling systems.
CAPTCHAs are frequently used in registration forms on websites. Free email services like Hotmail, Yahoo! Mail, and Gmail allow users to create email accounts at no cost. While users typically provide personal information during account creation, this data isn’t always verified. CAPTCHAs are used to block spammers from employing bots to generate a large number of spam accounts.
Ticketing services like TicketMaster also make use of CAPTCHA applications. These applications help prevent ticket scalpers from flooding the system with bulk purchases for popular events. Without these protections, scalpers can use bots to quickly place orders for hundreds or thousands of tickets, leaving legitimate customers empty-handed. Although CAPTCHA doesn't eliminate scalping entirely, it does make it more challenging to scale up ticket hoarding.
Some websites feature message boards or contact forms where visitors can post messages or directly communicate with web administrators. To avoid being overwhelmed by spam, many of these sites employ a CAPTCHA system. While a CAPTCHA won't stop someone intent on posting inappropriate messages or harassing admins, it can effectively prevent bots from automatically submitting posts.
The most widespread CAPTCHA form asks users to enter a word or a combination of letters and numbers that has been distorted. One creative enhancement to this process was introduced by CAPTCHA creators who digitized books. A program called CAPTCHA reCAPTCHA utilizes responses in CAPTCHA fields to help verify the text of scanned documents. Since computers struggle to read text from digital scans, humans must intervene to confirm what the printed material says, making it possible for search engines to index scanned content.
Here’s how it operates: First, the administrator scans a book digitally. Then, the reCAPTCHA system randomly selects two words from the scanned image. The application is able to recognize one word, and when the user correctly enters that word, it assumes the second word entered is accurate as well. This second word is added to a pool of words to be presented to future users. As more people type the word, the system compares each submission with the original answer, eventually verifying the word’s accuracy and adding it to the confirmed pool.
Though it may seem time-consuming, remember that this type of CAPTCHA serves a dual purpose. Not only is it validating the contents of a scanned book, but it is also ensuring that the individuals filling out the web forms are human users. In return, these users gain access to a service they wish to use.
Next, we'll dive into the steps involved in creating a CAPTCHA.
Creating a CAPTCHA
The first step in designing a CAPTCHA is to understand how humans and machines process information differently. Machines follow strict instructions, and when something falls outside the scope of those instructions, the machine cannot adapt.
When creating a CAPTCHA, the designer must take this difference into account. For instance, it's easy to write a program that reads metadata—data that humans cannot see but machines can interpret. If a visual CAPTCHA contains metadata with the solution embedded, the CAPTCHA can be easily cracked.
Likewise, creating a CAPTCHA that doesn't distort characters in some way is a poor choice. An undistorted string of characters is not very secure, as many programs can quickly scan an image and identify simple shapes like letters and numbers.
One method for creating a CAPTCHA is to pre-select the images and answers it will use. This method relies on a database containing all the solutions, which can undermine the reliability and security of the CAPTCHA.
According to experts Kumar Chellapilla and Patrice Simard from Microsoft Research, humans should be able to solve any given CAPTCHA with an 80 percent success rate, while machines should only have a 0.01 percent success rate. However, if a spammer somehow obtained a list of all possible CAPTCHA answers, they could design an application to perform a brute force attack, trying every answer until one works. For this to be avoided, a CAPTCHA system would require over 10,000 possible variations to ensure its security.
Other CAPTCHA systems generate random combinations of letters and numbers, ensuring that you’ll rarely encounter the same sequence twice. Randomization makes brute force attacks ineffective—the likelihood of a bot guessing the correct combination of random characters is very slim. The longer the string, the harder it becomes for a bot to succeed.
CAPTCHAs employ various methods to distort words. Some stretch and distort the letters as if viewed through warped glass, while others obscure the text with a grid of lines. Some even alter the colors or place the text against a backdrop of dots. The ultimate aim is the same: to make it incredibly difficult for a computer to read and decode the CAPTCHA.
Designers can also craft puzzles that are simple for humans but challenging for bots. Some CAPTCHAs use pattern recognition or logical reasoning to pose questions, such as asking which shape in a series logically follows next. However, this method has its flaws—some humans may struggle with such tasks, and their success rate might fall below the ideal 80 percent.
Next, we’ll explore how computers can crack CAPTCHAs.
Audible CAPTCHAs function similarly to visual ones. Using a database approach, the CAPTCHA designer records a person or a machine speaking each sequence of characters, matching them with the correct answers. With the randomization method, each character is pre-recorded individually, and the system randomly combines them to generate CAPTCHAs.
Breaking a CAPTCHA
The Gimpy CAPTCHA shows 10 words, but to pass, you only need to type three of them correctly.
©2008 MytourThe difficulty in cracking a CAPTCHA isn't about deciphering the message itself—humans should achieve at least an 80 percent success rate. The true challenge lies in teaching a computer to process information similarly to human cognition. Often, those who attempt to break CAPTCHAs focus on simplifying the problem rather than making computers smarter.
Imagine you’ve protected a web form with a CAPTCHA that displays English words. The CAPTCHA distorts the font, stretching and bending the letters in random ways. On top of this, a randomly generated background is placed behind the word.
A programmer attempting to crack this CAPTCHA could break it down into steps. They would first need to develop an algorithm — a set of instructions that guides the machine through the necessary tasks. One step could involve converting the image to grayscale, eliminating one level of complexity by removing color.
The algorithm might then instruct the computer to search for patterns in the black-and-white image. The program would compare these patterns to standard letters, looking for matches. If only a few letters are matched, the program could reference a database of English words and try filling the submission field with plausible guesses. This technique is often surprisingly effective. It may not always work, but it’s reliable enough to be tempting for spammers.
How do more advanced CAPTCHAs hold up? The Gimpy CAPTCHA displays 10 distorted English words against a chaotic background. These words are arranged in pairs and overlap. To pass, the user must correctly type three of the words. How dependable is this method?
It turns out that even with more complex CAPTCHAs, the right algorithm can still make it vulnerable. In their paper, Greg Mori and Jitendra Malik discussed their method for cracking the Gimpy CAPTCHA. One key advantage they had was that Gimpy uses real words, not random strings of letters. With this knowledge, they created an algorithm that analyzed the beginning and end of each word, using Gimpy’s 500-word dictionary to improve accuracy.
Mori and Malik conducted a series of tests with their algorithm and discovered it could correctly identify words in a Gimpy CAPTCHA 33% of the time. While far from perfect, this result is still noteworthy. For spammers, having one-third of attempts succeed is enough when bots are breaking CAPTCHAs hundreds of times a minute.
You might expect the creators of CAPTCHA to be frustrated by their work being cracked by hackers, but surprisingly, they’re not. Want to know why? Keep reading to find out.
What about CAPTCHA and Artificial Intelligence? Find out in the next section.
Hackers have found ways to train computers to recognize text in EZ-Gimpy CAPTCHAs. ©2008 MytourLuis von Ahn, a professor from Carnegie Mellon University, is a co-creator of CAPTCHA. In a 2006 lecture, he discussed how CAPTCHA intersects with artificial intelligence (AI). Since CAPTCHA acts as a barrier to spammers and hackers, these individuals have invested time and resources into breaking it. Their success indicates that machines are advancing, and each time a method is discovered to teach a machine to overcome CAPTCHA, we are moving closer to true artificial intelligence.
As new ways to bypass CAPTCHA are discovered, experts like von Ahn continue to design CAPTCHA systems that address other challenges in AI. Even if a CAPTCHA is compromised, it still contributes to the advancement of AI. Every failure can be seen as a step forward for AI, where each setback is also a form of progress.
However, web administrators may not share von Ahn's optimistic outlook. From their perspective, they are still grappling with the issue of spammers and hackers. Those in charge of websites or online polls must recognize that many CAPTCHA systems have become ineffective over time.
It's crucial for web administrators to research which CAPTCHA systems are still dependable. Staying current with developments in CAPTCHA technology is equally important. If a CAPTCHA system becomes obsolete, administrators will need to remove it and replace it with a more effective solution.
CAPTCHA creators face a delicate challenge. As technology improves, the difficulty of these tests must also increase. However, if the test becomes so advanced that humans can no longer solve it with reasonable success, the system fails. The solution may not always lie in distorting text, but could involve tasks like solving math problems or answering questions based on a brief passage. As these challenges become more complex, there's a risk that users will lose interest. For example, how many people would still be willing to engage in an online forum discussion if they had to solve a quadratic equation first?
In 2014, Google, which had acquired reCAPTCHA in 2009, began phasing out its traditional CAPTCHA service. Instead, users were asked to check a box that said "I am not a robot," a system called No CAPTCHA. Then in 2017, Google introduced Invisible reCAPTCHA, which no longer required users to check a box. This system analyzed user behavior, such as how they move the mouse or their browsing history, to determine if they were human or a robot. If any suspicious activity was detected, users would still be prompted with an old-style reCAPTCHA challenge for further verification.
