Key role

This is why those annoying CAPTCHA boxes that make you type in words to prove you’re a human are actually great

Next time you are confronted by one of these, you can take pride in the fact that you are doing your part to preserve history

ONE of the larger annoyances during our day-to-day dealings online is having to constantly prove you’re a human amid a sea of robots by filling out those painful CAPTCHAs.

You know the things: You are presented with a few wavy, blurry words and you have to dutifully type them into a box, your teeth clenched, your mouth yelling, “I can’t read that mess.”

We’ve all filled a CAPTCHA box, but did you know what you were really doing?

Well, next time you are confronted by one of these, you can take pride in the fact that you are doing your part to preserve history.

Yes, those annoying CAPTCHAs are actually being used to help digitalise decades of old texts — books, magazines and newspapers — that scanning programs struggle to decipher.

The reason the words are blurry or warped isn’t to test your patience; these are taken from scanned texts, which are often mistranslated by auto-digitising programs — or optical character recognition (OCR) software if you wish to get technical. That’s where we step in.

Through the use of CAPTCHAs, humans around the world digitalised 20 years worth of New York Times back issues in mere months. Within the first year, 440 million words had been deciphered: the equivalent of 17,600 books.

You probably didn’t realise this annoying process is part of a really cool project

Google bought the technology in 2009, and is using it as the cornerstone of its ambitious Google Books project, which digitalises ancient, rare, and out-of-print works and offers them for free.

The technology came to be used in this way after the inventor of the CAPTCHA, Louis von Ahn, realised that while it only took a few seconds to type the letters, collectively humans were wasting hundreds of thousands of man-hours each day doing so, and so he set about discovering the best way to harness this energy.

“Human computation” is the less-than-charming term von Ahn uses to describe the process he arrived at. The updated software was dubbed the reCAPTCHA.


Initially CAPTCHAs would work by offering up a series of jumbled letters and intentionally warping these just enough that humans could easily read them but robots could not.

In the case of a ticketing company, this would stop software being developed by scalpers in order to automatically buy multiple tickets.

Related Stories

Google unveils Duo video phone app and it could cause YOU some major embarrassment
MPs accuse David Cameron of helping big firms such as Google to avoid paying their tax bills
Wiki leak?
Paul Pogba has completed world record move to Manchester United… according to Google
jogging murder horror
Google manager, 27, is stripped naked, raped, killed and set on FIRE while running in woods near her mum's home

But the same inherent flaw that allowed CAPTCHAs to trip up robots also meant that OCR programs often failed to accurately decipher scanned text with any imperfections.

Fading, damage to the paper, and printing flaws means that OCR software incorrectly reads around 20 per cent of words — an unacceptable amount by any standards.
