Puppeteer

Browser Automation using Node.js and Puppeteer

Updated: 03 September 2023

Installing

To get started with Puppeteer you will need to install it to your package npm i puppeteer this will install the required packages as well as a Google Chrome instance for Puppeteer

Basic Usage

In general, you will create a new browser instance, and interact with that instance using the puppeteer api. A basic example of using Puppeteer to take a screenshot can be seen below which will run a headless browser instance

1
const puppeteer = require('puppeteer')
2
3
;(async () => {
4
const browser = await puppeteer.launch()
5
const page = await browser.newPage()
6
await page.goto('https://www.google.com')
7
await page.screenshot({ path: 'google/index.png' })
8
9
await browser.close()
10
})()

Non-Headless Mode

Some of the settings that are available when creating a browser are the headless:false and the slow-down speed slowMo:

1
const browser = await puppeteer.launch({
2
headless: false,
3
slowMo: 250,
4
})

Screenshots

To navigate, type, and take some screenshots you can see the following:

1
const puppeteer = require('puppeteer')
2
const fs = require('fs')
3
4
const run = async () => {
5
const browser = await puppeteer.launch({
6
headless: false,
7
slowMo: 150,
8
defaultViewport: null,
9
})
10
const page = await browser.newPage()
11
await page.goto('https://www.google.com')
12
await page.screenshot({ path: 'google/index.png' })
13
14
await page.type('input[type = text]', 'Hello World')
15
await page.keyboard.press('Enter')
16
await page.screenshot({ path: 'google/search.png' })
17
18
const searchText = await page.$eval('*', (el) => el.innerText)
19
fs.writeFileSync('google/text.txt', searchText)
20
21
await browser.close()
22
}
23
24
run()

Running JS Code in the Browser

It can sometimes be useful to execute arbitrary code in browser window that interacts with the DOM, for example replacing some text in the HTML. This can be done by using the .evaluate function:

1
await page.evaluate(() => {
2
document.body.innerHTML = 'Hello World' // this will update the DOM
3
})

Alternatively, the .evaluate function can also take data to share from the Node.js process to the browser process as a second argument, like so:

1
const name = 'Bob'
2
const age = 32
3
4
await page.evaluate(
5
(props) => {
6
console.log(props.name, props.age) // this will be logged in the browser console
7
},
8
{ name, age }
9
)

Connect to a Running Chrome Instance

To connect to a chrome instance, you can start chrome from your terminal and pass it the following argument:

Terminal window
1
chrome.exe --remote-debugging-port=9222

The above will work on Windows, use the following for MacOS

Terminal window
1
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222

You can also switch out port 9222 for any other port you want, thereafter use puppeteer.connect instead of puppeteer.launch like so:

1
const browser = await puppeteer.connect({
2
browserURL: `http://localhost:9222`,
3
slowMo: 250,
4
})

Again, note that the port can be any port you like