Use Regular Expressions in Test Automation

Extracting dynamic text is made a lot easier with regular expressions

Nov 15, 2022

This article is included in my “How to in Selenium WebDriver” series.

Regular Expressions are patterns used to match character combinations in strings. To put it simply in the context of test automation, I use regular expressions to find or extract dynamic text in strings. In this article, I will illustrate a couple of examples.

Regular Expression originated in 1951 and is widely used in Unix programs. Most programming languages, if not all, have built-in support for Regular expression, a medium-level programmer should be comfortable with it. However, few test engineers I have met knew about Regular Expression.

Let me share a funny story first. After I explained my test scripts to Nick, a newly hired senior tester (in a government project many years ago), Nick said this to me: “Oh, Regular Expression, I know it. It is specific to Ruby.😱 ” This ‘senior automated tester’ claimed had developed over 10,000 IBM RFT automated tests; of course, he did not, as I never saw him writing a single automated test for 6+ months (until I left the project). In fact, he had been sabotaging test automation. I often use this real story to warn companies to be aware of fake test automation engineers.

I will use Ruby in the following two practices, while the regular expression will work for all mainstream languages. Ruby’s String has built-in support for Regex to extract pattern text in a simpler way.

Table of Contents:
· Extract dynamic text from web page
· Scan pattern text
· Quick Regex syntax
· Recommend Regex Tool: Rubular

Extract dynamic text from the web page

After executing a typical payment test, we often need to save the receipt number for later use. For example, in the case below, the number (in this case, called the booking number) is 604.

If the developer has a testing mindset, he would have wrapped the booking number in a specific tag, such as

Then the task is easy.

booking_no = driver.find_element(:id, "booking_number").text

However, the web pages are often not test-automation friendly. Test automation engineers need to extract the booking number from the page text.

The first step, we need to get the whole page text.

body_text = driver.find_element(:tag_name, "body").text

We can print out ( puts body_text ) or write it to a file ( File.open('tmp_file.txt',’w').write(body_text) ) to inspect.

Below is a part of the page text.

Confirmation
Booking number: 604
Flights (oneway Trip)

To extract the booking number, we look for the pattern Booking number: %NUMBER% . It is quite easy with regular expression.

body_text = driver.find_element(:tag_name, "body").text
booking_number = nil 
if body_text =~ /Booking number:\s+(\d+)/
   booking_number = $1
end
puts booking_number # => 604, might change next time

The / / means the inside is a regular expression and =~does pattern matching in Ruby. The regex Booking number:\s+(\d+) means: to find a text matching “Booking number:” followed by one or more space characters and a number. Save the number into the variable$1 .

Scan pattern text

Task: to extract the hidden version number (in comments) on a web page like below.


<!-— Version: 2.19.1.9798 -->

Just needs a one-line statement.

(driver.page_source)[/<!-— Version: (.*?) -->/, 1]

The (.*?) is to match the text between <!--and ` →`, and 1 is to return the first capturing group. If there is no match, nil is returned.

Here is a complete version of the test script to verify the version number.

ver = (driver.page_source)[/<!-— Version: (.*?) -->/, 1]
puts ver   #=> in format of 2.19.1.9798 
expect(ver.split(".").length).to eq(4) # break into a list
expect(ver.split(".")[0]).to eq("2")   # major version
expect(ver.split(".")[1]).to eq("19")  # minor version

How about extracting multiple occurrences of a pattern text in a web page?

<! — TestWise Version: 6.6.12 -->
...
<! — WhenWise Version: 3.0.6 -->

Use String’s scan method, which returns an array of matched text for a given pattern in Regex.

app_vers = driver.page_source.scan(/<!-- (\w+) Version: (.*?) -->/)    
puts app_vers.inspect # [["TestWise", "6.6.12"], ["WhenWise",  
                          "3.0.6"]]
expect(app_vers.size).to eq(2)
expect(app_vers.last).to eq(["WhenWise", "3.0.6"])

Quick Regex syntax

It takes a lot of effort to become a regular expression master, I certainly am not. But I met one, I was deeply impressed by his work. You would be surprised how much Regex you could use to simplify your programming and testing tasks. Anyway, below is the core regex syntax I have used often, which I managed to do OK these years.

^  : the beginning of a regular expression
$  : the end of a regular expression
.  : any single character
|  : logical OR
\  : escaping character
     The sequence \\ matches \ and \( matches ( () : match the string and store it to variables $1, $2, ...
[] : check for any single character in []
?  : check for 0 or one occurrence of the preceding character
+  : check for 1 or more occurrence of the preceding character
*  : check for any number occurrences of the preceding character
     (including 0)\s : a white space character, including tab and newline as well.
\d : a digit, equivalent to [0-9].
\w : any word character plus underscore

Example: A simple regular expression for Email.

^[A-Za-z0-9+_.-]+@(.+)$

Recommend Regex Tool: Rubular

A visual (with progressive feedback) Regex tool will be very helpful when working with a regular expression. I recommend Rubular, a free online utility.

Summary

The above are just two examples of using regular expressions in web tests. Regular expressions will be much more useful in API testing, where text-parsing is essential.

For more examples like the above, please check out my book: “Selenium WebDriver Recipes in Ruby”.

The Agile Way

Discussion about this post