An Unsuccessful Attempt to use OCR to Pass text-based Captcha in Selenium Automated Tests

Not recommended, but a fun exercise with using an OCR

Nov 15, 2022

This article is included in the “How to in Selenium WebDriver” series.

First of all, Captchas are designed to stop automation. Ideally, Captchas would be disabled for automated testing, but this is not always possible (out of your control or due to human reasons).

Most Captchas these days are more advanced. They use images and very heavily distorted text. Some sites still use more basic text captchas, like below.

Example of a basic text captcha still in use

Today, I came up with an idea to attempt OCR (Optical Character Recognition) in automated test scripts to parse text-based captchas like the above. I give the result first: very low accuracy, i.e. no practical use. However, I think it is a good exercise.

Tesseract OCR library

Tesseract is a popular, free and open-source OCR library, and it runs on multiple platforms. Use the package manager to install it, for example, Homebrew for macOS.

brew install tesseract

The usage tesseract <image_file> <output_file> , example:

tesseract captcha1.png output

Output on screen like below:

Estimating resolution as 272

and a new file output.txt was created, in this case, the content is

4g34

Advanced Tesseract

We can further enhance the recognition accuracy by tweaking some of Tesseract’s configurations, e.g. 1 word, 1 character or vertically aligned.

Here I will use the page segmentation mode. For instance, say that I know the Captcha is always 1 word (in our 3g34 example), I can specify this with the relevant page segmentation mode (8: Treat the image as a single word.). Our new command looks like this:

tesseract captcha1.png output --psm 8

And when reading output.txt you will see the result is:

3g34

It is correct! For more information on the other modes and configurations, see the tesseract manual page.

Use Tesseract in Automated Tests

Test Design:

Save the Image
Run Tesseract to ‘read’ the captcha text
Use it in the test scripts

Saving the Image

Firstly, we need to get the Captcha image for Tesseract to analyse. Selenium 4 allows you to take a screenshot of a web element.

tmp_dir = File.expand_path File.join(File.dirname(__FILE__), "..", "testdata")
dest_image_file_path = File.join(tmp_dir, "page_captcha.png")
FileUtils.rm dest_image_file_path if File.exists?(dest_image_file_path)

# save the captcha image, selenium 4 new feature
elem.save_screenshot(dest_image_file_path)
expect(File.exists?(dest_image_file_path)).to be true

Run Tesseract to ‘read’ the captcha text

Now we want to execute the command. In Ruby, we can simply use backticks (`) around it. Alternatively, you can look into the system command.

captcha_value = `tesseract #{dest_image_file_path} output; cat output.txt

`captcha_value = captcha_value.force_encoding('UTF-8').gsub(" ", "").strip

Now that we have the Captcha text that Tesseract ‘read’, we want to put it back into the page as our ‘guess’.

driver.find_element(:name, "checkcode").send_keys(captcha_value)

Demo

Below is a video of the test refreshing the Captcha and running Tesseract on it 5 times.

Run test steps in ‘debugging mode’ in TestWise IDE

We can see that in the above video (in animated GIF), it got the correct Captcha once ( sCg5) 🥳.

As our first example, Tesseract is not accurate and not suitable for automated testing at all. Having said that, it was interesting, and I only played with Tesseract for 30 minutes. There might be tweaked better for our image type.

Complete Code

load File.dirname(__FILE__) + "/../test_helper.rb"
describe "Use Tesseract to get through Captcha" do
  include TestHelper
 
  before(:all) do
    # browser_type, browser_options, site_url are defined in test_helper.rb
    @driver = $driver = Selenium::WebDriver.for(browser_type, browser_options)
    driver.manage().window().resize_to(1280, 720)
    driver.get(site_url)
    visit("/member/login")
  end

  after(:all) do
    driver.quit unless debugging?
  end

  it "Tesseract Captcha" do
    login_page = LoginPage.new(driver)
    5.times do
      elem = driver.find_element(:xpath, "//img[@title='refresh']")
      elem.click
      sleep 1
      tmp_dir = File.expand_path File.join(File.dirname(__FILE__), "..", "testdata")
      dest_image_file_path = File.join(tmp_dir, "page_captcha.png")
      FileUtils.rm dest_image_file_path if File.exists?(dest_image_file_path)
      # save the captcha image, selenium 4 new feature
      elem.save_screenshot(dest_image_file_path)
      expect(File.exists?(dest_image_file_path)).to be true
captcha_value = `tesseract #{dest_image_file_path} output; cat output.txt`
      captcha_value = captcha_value.force_encoding('UTF-8').gsub(" ", "").strip
      puts captcha_value
      login_page.enter_captcha_code(captcha_value)
      sleep 2
    end
   end
  end
end

A guest post by

Courtney Zhan

Software Engineer at Amazon. I'm interested in test automation.

The Agile Way

Discussion about this post

Ready for more?