An Unsuccessful Attempt to use OCR to Pass text-based Captcha in Selenium Automated Tests
Not recommended, but a fun exercise with using an OCR
This article is included in the “How to in Selenium WebDriver” series.
First of all, Captchas are designed to stop automation. Ideally, Captchas would be disabled for automated testing, but this is not always possible (out of your control or due to human reasons).
Most Captchas these days are more advanced. They use images and very heavily distorted text. Some sites still use more basic text captchas, like below.
Today, I came up with an idea to attempt OCR (Optical Character Recognition) in automated test scripts to parse text-based captchas like the above. I give the result first: very low accuracy, i.e. no practical use. However, I think it is a good exercise.
Tesseract OCR library
Tesseract is a popular, free and open-source OCR library, and it runs on multiple platforms. Use the package manager to install it, for example, Homebrew for macOS.
brew install tesseract
The usage tesseract <image_file> <output_file>
, example:
tesseract captcha1.png output
Output on screen like below:
Estimating resolution as 272
and a new file output.txt
was created, in this case, the content is
4g34
Advanced Tesseract
We can further enhance the recognition accuracy by tweaking some of Tesseract’s configurations, e.g. 1 word, 1 character or vertically aligned.
Here I will use the page segmentation mode. For instance, say that I know the Captcha is always 1 word (in our 3g34
example), I can specify this with the relevant page segmentation mode (8: Treat the image as a single word.
). Our new command looks like this:
tesseract captcha1.png output --psm 8
And when reading output.txt you will see the result is:
3g34
It is correct! For more information on the other modes and configurations, see the tesseract manual page.
Use Tesseract in Automated Tests
Test Design:
Save the Image
Run Tesseract to ‘read’ the captcha text
Use it in the test scripts
Saving the Image
Firstly, we need to get the Captcha image for Tesseract to analyse. Selenium 4 allows you to take a screenshot of a web element.
tmp_dir = File.expand_path File.join(File.dirname(__FILE__), "..", "testdata")
dest_image_file_path = File.join(tmp_dir, "page_captcha.png")
FileUtils.rm dest_image_file_path if File.exists?(dest_image_file_path)
# save the captcha image, selenium 4 new feature
elem.save_screenshot(dest_image_file_path)
expect(File.exists?(dest_image_file_path)).to be true
Run Tesseract to ‘read’ the captcha text
Now we want to execute the command. In Ruby, we can simply use backticks (`) around it. Alternatively, you can look into the system
command.
captcha_value = `tesseract #{dest_image_file_path} output; cat output.txt
`captcha_value = captcha_value.force_encoding('UTF-8').gsub(" ", "").strip
Now that we have the Captcha text that Tesseract ‘read’, we want to put it back into the page as our ‘guess’.
driver.find_element(:name, "checkcode").send_keys(captcha_value)
Demo
Below is a video of the test refreshing the Captcha and running Tesseract on it 5 times.
We can see that in the above video (in animated GIF), it got the correct Captcha once ( sCg5
) 🥳.
As our first example, Tesseract is not accurate and not suitable for automated testing at all. Having said that, it was interesting, and I only played with Tesseract for 30 minutes. There might be tweaked better for our image type.
Complete Code
load File.dirname(__FILE__) + "/../test_helper.rb"
describe "Use Tesseract to get through Captcha" do
include TestHelper
before(:all) do
# browser_type, browser_options, site_url are defined in test_helper.rb
@driver = $driver = Selenium::WebDriver.for(browser_type, browser_options)
driver.manage().window().resize_to(1280, 720)
driver.get(site_url)
visit("/member/login")
end
after(:all) do
driver.quit unless debugging?
end
it "Tesseract Captcha" do
login_page = LoginPage.new(driver)
5.times do
elem = driver.find_element(:xpath, "//img[@title='refresh']")
elem.click
sleep 1
tmp_dir = File.expand_path File.join(File.dirname(__FILE__), "..", "testdata")
dest_image_file_path = File.join(tmp_dir, "page_captcha.png")
FileUtils.rm dest_image_file_path if File.exists?(dest_image_file_path)
# save the captcha image, selenium 4 new feature
elem.save_screenshot(dest_image_file_path)
expect(File.exists?(dest_image_file_path)).to be true
captcha_value = `tesseract #{dest_image_file_path} output; cat output.txt`
captcha_value = captcha_value.force_encoding('UTF-8').gsub(" ", "").strip
puts captcha_value
login_page.enter_captcha_code(captcha_value)
sleep 2
end
end
end
end