Automated Testing PDF Download in Selenium WebDriver

How to test downloading PDFs in Selenium WebDriver

Nov 15, 2022

This is also included in the “How to in Selenium WebDriver” series. You can find more Selenium examples like this one in this eBook: Selenium WebDriver Recipes in Ruby.

Many websites feature links that download a PDF document. These PDF files might be static (e.g. a restaurant menu) or dynamically generated (e.g. a receipt).

This tutorial will show you how to download a PDF document and verify its contents in a Selenium WebDriver automated test script.

Table of Contents:
· Test Design
  ∘ Saving the download file to a specific location
  ∘ PDF verification library
· Open browser with specified download folder
· Download and Verify the downloaded file
· Verify the PDF
  ∘ Verify PDF page count
  ∘ Verify PDF contents
· Completed Test Script

Test Design

Navigate to a web page and download the PDF
For my example, I’m downloading a book sample PDF (static) at http://zhimin.com/books/pwta.
Verify the downloaded PDF exists
Once the file is downloaded from the browser, check if it is downloaded successfully on the machine.
Read and verify the PDF’s contents
It’s good practice to verify the PDF’s content, just to make sure.

Saving the download file to a specific location

By default, the browser will save downloaded files into a special folder, such as /Users/Me/Downloads on macOS. This folder might change on the Windows platform or even with permission issues when used by test automation. To test safely and avoid conflicts, we should specify the download folder in the automated test scripts.

PDF verification library

I use the ‘pdf-reader’ gem, a PDF parser library, to verify the PDF. Install it from the command line:

gem install pdf-reader

Open a browser with the specified download folder

To set a download location in Selenium WebDriver, we can set it in the browser (Chrome) options.

before(:all) do
  # set up download settings
  @download_path = "/Users/zhimin/tmp"
  options = Selenium::WebDriver::Chrome::Options.new
  options.add_preference("download.prompt_for_download", false)
  options.add_preference("download.default_directory", @download_path)
    
  @driver = Selenium::WebDriver.for(:chrome, 
                                    :capabilities => options)  
  driver.get(site_url)
end

Setting prompt_for_download to false means that you won’t receive a pop-up asking what to name the file and where to save it to. The default_directory is the download location we want Chrome to save into.

Run the test script to start a Chrome browser, then do a quick check for downloaded files.

Download and Verify the downloaded file

To download the PDF, click the web app's download link. In the test script, I added some delays to allow time for the file download to complete.

driver.find_element(:link_text, "Download").click
sleep 10

The 10 is the maximum limit, shall take the slow connection into consideration.
I don’t recommend to use high fixed wait, instead, using polling retry, see this article: Test AJAX Properly and Efficiently with Selenium WebDriver, and Avoid ‘Automated Waiting’ .

After that, we want to verify if the PDF is there. This can be easily done by using Ruby’s File.exists?(file_path) function.

expect(File.exists?("#{@download_path}/sample.pdf")).to be_truthy

Note that expect(...).to be_truthy is equivalent to expect(...).to eq(true). However, I find be_truthy is more readable than eq(true).

Run the test script, then succeed!

it "Download PWTA sample" do
  visit("/books/pwta")
  saved_file = "#{@download_path}/practical-web-test-automation-sample.pdf"
  FileUtils.rm(saved_file) if File.exists?(saved_file)  driver.find_element(:link_text, "Download").click
  sleep 10
  expect(File.exists?(saved_file)).to be_truthy
end

Note, to be sure, we shall delete that destination file before checking it. A common pattern.

FileUtils.rm(saved_file) if File.exists?(saved_file)

Verify PDF

We aren’t entirely done yet. How can we be sure the PDF we downloaded is valid (openable) and correct (contents-wise)?

Here I will use the PDF reader gem pdf-reader to extract the text contents and verify.

First, let’s load the PDF file.

reader = PDF::Reader.new("#{@download_path}/practical-web-test-automation-sample.pdf")

Verify PDF page count

Use pdf-reader’s page_count.

expect(reader.page_count).to eq(62)

Verify PDF contents

We can extract the text version of the PDF document using pdf-reader, which works by treating each page separately. This means that you will need to loop through all the pages to read the whole PDF or use indexing to go to a particular page.

For this sample PDF, the first page is the cover image. I will verify the second page.

second_page_text = reader.pages[1].text
expect(second_page_text).to include("Test web applications wisely with Selenium WebDriver")

Apart from the text content, pdf-reader can also extract PDF metadata, page orientation and raw-content streams, which may be helpful in assertions.

Completed Test Script

load File.dirname(__FILE__) + "/../test_helper.rb"
require "pdf-reader"
describe "PDF Download and Verification" do
  include TestHelper
  before(:all) do
    @download_path = "/Users/me/tmp"
    options = Selenium::WebDriver::Chrome::Options.new
    options.add_preference("download.prompt_for_download", false)
    options.add_preference("download.default_directory", @download_path)
    @driver =Selenium::WebDriver.for(:chrome, :capabilities => options)
    driver.get(site_url)
  end

  after(:all) do
    driver.quit unless debugging?
  end

  it "Download Practical Web Test Automation sample" do
    visit("/books/pwta")
    saved_file = "#{@download_path}/practical-web-test-automation-sample.pdf"
    FileUtils.rm(saved_file) if File.exists?(saved_file)
    driver.find_element(:link_text, "Download").click
    sleep 5
    expect(File.exists?(saved_file)).to be_truthy
    reader = PDF::Reader.new(saved_file)
    puts reader.info
    expect(reader.page_count).to eq(62)
    second_page_text = reader.pages[1].text
    puts second_page_text
    expect(second_page_text).to include("Test web applications wisely with Selenium WebDriver")
  end
end

Notes

I recommend using a relative path to the test script for the download directory, see the script snippet below. For simplicity, I used an absolute path in the above, but if the test is run on a different machine/shared, it could cause problems.

@download_path = File.expand_path File.join(File.dirname(__FILE__), "..", "tmp", "download")

How about Firefox?

The above Selenium settings are specific for Chrome. The same concept applies to other browsers as well. Below is for Firefox.

download_path = "/Users/me/tmp"
profile = Selenium::WebDriver::Firefox::Profile.new
profile["browser.download.folderList"] = 2
profile["browser.download.dir"] = download_path
profile["browser.helperApps.neverAsk.saveToDisk"] = 'application/pdf'

# disable Firefox's built-in PDF viewer
profile["pdfjs.disabled"] = true

options = Selenium::WebDriver::Firefox::Options.new
options.profile = profile

driver = Selenium::WebDriver.for(:firefox, :options => options)

The Agile Way

Discussion about this post