Case Study: Extract All Substack Article Titles and Links. Part B: Extract 25 articles on one page

Add looping.

Dec 04, 2024

This article series:

Part A: Extract Individual Article Data
Part B: Extract 25 articles on one page
Part C: Extract All
Part D: Publish
Part E: Annotation by Zhimin Zhan *
(offering valuable tips for test automation engineers to level up their skills, exclusively available on Substack)

Continue from Part A. After successfully extracting the title and link of a single article, proceed to retrieve up to 25 articles from a single Substack list page.

Extract all 25 articles on one page

In the special `debugging_spec.rb` (still in TestWise Debugging mode), change to extract all 25 articles.

 article_links.each do |article_link_elem|
    the_data = extract_article_data(article_link_elem)
    File.open("/Users/me/tmp.csv", "a").puts(the_data.inspect)
 end

Please note that I used `a` (appending flag) when writing to a file, allowing me to view ongoing data, after multiple attempts.

It was going OK for about 20 seconds.

Then, it failed.

Why? After inspecting the web page and error stack trace shown in TestWise. It was due to unable to click the “View post” button.

Challenge: Scrolling

The reason: When the ‘View post’ button is ‘hidden’ behind the “Ask a question” (layer), it was unclickable.

The solution is simple (and logical), add some scrolling, after done extracting each individual article.

  article_links.each do |article_link_elem|
      the_data = extract_article_data(article_link_elem)

      driver.action.scroll_by(0, 100).perform

  end

How did I come up with “100”? Just by experimenting. It happens `100` is a good number.

Get the Proper CSV

The above process was still in the experimentation phase. The generated file wasn't a proper CSV, which wasn't surprising—when opened in a spreadsheet, it appeared incorrect due to an issue with the delimiter.

To create a proper CSV is easy in Ruby.

 csv_file = File.join(File.dirname(__FILE__), "..", "substack-published-articles.csv")
 CSV.open(csv_file, "w") do |csv|
    csv << ["Title", "Subtitle","Published On", "Link"]
      
    story_data.each do |sd|
        csv << sd
    end
 end

Run the automation script again in TestWise.

Verify

After the execution of this automation script, the output `substack-published-articles.csv’ contains the data for the 25 articles on the first page.

Looks good.

The Agile Way

Discussion about this post