Automate Chromium, Firefox, and WebKit with Go (preview)

Control browsers in the cloud with Golang by using the Playwright library.

By Max Schmitt, Published on 9/10/2020

⚠️This blog post is outdated. Playwright for Go is now part of the Playwright Community.

Introduction

Automating browsers in the cloud, like on Kubernetes with a lightweight, type-safe, and easy to use programming language is especially useful for critical applications. In this blog post, we're gonna focus on the Go version of Playwright, what the current state is and what their limits are.

Playwright is a Node.js library to automate Chromium, Firefox and WebKit with a single API. Playwright is built to enable cross-browser web automation that is ever-green, capable, reliable and fast. Headless execution is supported for all the browsers on all platforms.

Playwright for Go aims to be a 1:1 wrapper of the JavaScript version. It uses like Playwright for Python the official driver implementation internally. Go has key differences compared to JavaScript and Python which affected on the Playwright implementation:

Examples

The core concept of the driver implementations like Go or Python is based on a sub process which is started in the background. This will be done with the Run() and the corresponding Stop() method.

Create screenshots

package main
import (
"log"
"github.com/mxschmitt/playwright-go"
)
func main() {
// Launching the driver internally
pw, err := playwright.Run()
if err != nil {
log.Fatalf("could not launch playwright: %v", err)
}
// Start the Chromium browser
browser, err := pw.Chromium.Launch()
if err != nil {
log.Fatalf("could not launch Chromium: %v", err)
}
// Creates internally a context and a new page
page, err := browser.NewPage()
if err != nil {
log.Fatalf("could not create page: %v", err)
}
// Visit the website and wait for a network idle for at least 500ms
if _, err = page.Goto("http://example.org", playwright.PageGotoOptions{
WaitUntil: playwright.String("networkidle"),
}); err != nil {
log.Fatalf("could not goto: %v", err)
}
if _, err = page.Screenshot(playwright.PageScreenshotOptions{
Path: playwright.String("foo.png"),
}); err != nil {
log.Fatalf("could not create screenshot: %v", err)
}
if err = browser.Close(); err != nil {
log.Fatalf("could not close browser: %v", err)
}
if err = pw.Stop(); err != nil { log.Fatalf("could not stop Playwright: %v", err) }
}

The core concept of Playwright is based on a browser that has multiple contexts. A single context is an isolated entity that has separate state for cookies or local storage. Each context has multiple pages with browser session shared between each other.

We specify networkidle to the Page.Goto() call which waits after there is a network idle on the page for at least 500ms. You can either pass then a path to the Page.Screenshot() method or use the returned data ([]byte) directly.

Automate websites

In this example we're navigating to Hacker News (popular tech news) site, crawling the current entries, and printing them to the standard output. This could in this case be done using HTML scrapers since Hacker News is based on returning rendered HTML but our approach would also work for dynamically generated sites with JavaScript which use e.g. React, Vue or Angular as a frontend framework.

package main
import (
"fmt"
"log"
"github.com/mxschmitt/playwright-go"
)
func main() {
pw, err := playwright.Run()
if err != nil {
log.Fatalf("could not start playwright: %v", err)
}
browser, err := pw.Chromium.Launch()
if err != nil {
log.Fatalf("could not launch browser: %v", err)
}
page, err := browser.NewPage()
if err != nil {
log.Fatalf("could not create page: %v", err)
}
if _, err = page.Goto("https://news.ycombinator.com"); err != nil {
log.Fatalf("could not goto: %v", err)
}
// Looping over the DOM elements
entries, err := page.QuerySelectorAll(".athing")
if err != nil {
log.Fatalf("could not get entries: %v", err)
}
for i, entry := range entries {
// Finding the next title link element
titleElement, err := entry.QuerySelector("td.title > a")
if err != nil {
log.Fatalf("could not get title element: %v", err)
}
title, err := titleElement.TextContent()
if err != nil {
log.Fatalf("could not get text content: %v", err)
}
// Printing it to the console
fmt.Printf("%d: %s\n", i+1, title)
}
if err = browser.Close(); err != nil {
log.Fatalf("could not close browser: %v", err)
}
if err = pw.Stop(); err != nil {
log.Fatalf("could not stop Playwright: %v", err)
}
}

In this example, we're making use of a selector on the latest news entries. We then loop over them, get its corresponding header/title element and print it to the console.

Summary

In this introduction, we went through two examples and an introduction of Playwright for Go and its current state. We are working to make it bulletproof, concurrent-safe, and adding more tests in the next few weeks to ensure it can be used in production. For further reference, see on GitHub.