Finally Making Puppeteer work on AWS Lambda
After spending months on and off without success, I finally put together a working solution with Node.js 18+.
The solution provided here is universal and does not require any layers, external dependencies, or any other special configurations to work. I’m not providing the infrastructure examples since there are many ways of deploying on AWS Lambda and I assume you already have a way of deploying functions.
TL;DR here is a gist with most of what you need.
Dependencies
The following production dependency is required: `puppeteer-core`, and the following two dev dependencies are needed to make it work locally: `puppeteer` and `@sparticuz/chromium-min`. Before you go ahead and install them, please check the Puppeteer and Chromium compatibility matrix here, and choose the versions carefully. I’ve prepared a working example with the latest compatible versions for you below, at the time of writing:
npm install puppeteer-core@21.5.0
npm install —-save-dev puppeteer@21.5.0
npm install —-save-dev @sparticuz/chromium-min@119.0.2
Nodejs code
In the lambda code itself, we need to import the dependencies specified above, as well as conditionally instantiate the browser depending on the execution environment: aws or local.
The last piece of magic is downloading a compatible version of chromium at runtime, which will automatically be done during the first execution. The key here is to use the same version (119.0.2 in our case) as was specified in `package.json`.
Here is a working example (index.ts):
import puppeteer from 'puppeteer-core';
import chromium from '@sparticuz/chromium-min';
export const handler = async (): Promise<any> => {
// identify whether we are running locally or in AWS
const isLocal = process.env.AWS_EXECUTION_ENV === undefined;
const browser = isLocal
// if we are running locally, use the puppeteer that is installed in the node_modules folder
? await require('puppeteer').launch()
// if we are running in AWS, download and use a compatible version of chromium at runtime
: await puppeteer.launch({
args: chromium.args,
defaultViewport: chromium.defaultViewport,
executablePath: await chromium.executablePath(
'https://github.com/Sparticuz/chromium/releases/download/v119.0.2/chromium-v119.0.2-pack.tar',
),
headless: chromium.headless,
});
console.log('browser is up');
await browser.close();
console.log('browser is closed');
}
Running locally
Given the example uses modules, you need to specify `"module": "true"` at the top level of `package.json`. I also encourage you to add a script that would test the function locally, here is how `package.json` could look like after the changes:
"module": "true",
"scripts": {
"execute": "npx ts-node -e \"console.log(require('./index.ts').handler());\""
}
Run `npm run execute` and you should see an output similar to this, which would indicate that the browser successfully started, and exited, without any errors:
Promise { <pending> }
browser is up
browser is closed
Next, go ahead and deploy your function to AWS Lambda to test it! As I mentioned earlier, there is no need for layers, external dependencies, or any other configuration changes.
Optimizing for production
Now that you’ve made it work, there are a few optimizations that can improve the performance and overall reliability of the solution.
Unzipping an archive and running a browser is resource-intensive. If your budget allows you to allocate more memory to your function, it will speed up both the cold start and the subsequent executions to a certain degree. In my experience, going below 1024MB will be extremely slow, there is a significant improvement going from 1024MB to 2048MB, a noticeable improvement going from 2048MB to 3072MB, and diminishing returns afterward. The sweet spot seems to be around the 2048MB-3072MB range, especially if you are optimizing for cold start performance.
Downloading a 58MB archive at runtime can take some time, especially if you’re on AWS and the package is hosted on GitHub. Also, there is no real guarantee that the package won’t be taken down at some point. You may want to create a private S3 bucket in the same region as the lambda and upload the .tar there, then point the code to the S3 object URL instead of Github. You will get better reliability, and speed, and remove a potential security threat this way as you won’t need to download chromium from outside your VPC. On a massive scale, it could also reduce your AWS bill thanks to lower network traffic costs.
You have just saved me from one week of pain trying to figure this out. Thank you good sir.
Thank you so much for this