Hi Robert, this is very cool! What are the downsides of having the chromium dependency as part of the docker image? I need something with the most optimal performance so it would make more sense to me to download it when the image is built rather than at runtime. Cheers!
Hi, for optimal startup performance, you would definitely need to bundle everything together, os the only overhead is actually starting Chromium for the first time, not downloading + unzipping + starting. Downloading from S3 in the same region and using larger lambdas (2GB+) would help but only to some degree.
The strengths of the approach described here is simplicity: no need for Docker-based Lambda images, no need for layers, external modules, or any other configuration changes compared to a typical lambda.
In my experience, if you want a lean-ish image with Puppeteer in it, there were a few gotchas if you want a Node version of more recent than 14.x. Maybe it was me, but it wasn't easy to figure it out at the time.
Feels like you have a mismatch in the filename, where is the `aws.tar.br` coming from? I'd recommend to upload the file as-is, and also download it to the `/tmp` folder as-is, without renaming it. Should work!
Actually your example worked perfectly I was still importing chromium from the base chromium module instead of chromium-min. Once I updated that it worked perfectly. Thank you so much!
You have just saved me from one week of pain trying to figure this out. Thank you good sir.
Thank you so much for this
Hi. This is great. It worked for me. I got this working in Node 20.0 using this chromium link executablePath: await chromium.executablePath(
'https://github.com/Sparticuz/chromium/releases/download/v123.0.1/chromium-v123.0.1-pack.tar',
), and the latest puppeteer version :)
Which puppeteer version you used? 22.6 should be good?
I see this in matrix which was mentioned above:
Chrome for Testing 123.0.6312.122 - Puppeteer v22.6.4
Chrome for Testing 123.0.6312.105 - Puppeteer v22.6.3
Chrome for Testing 123.0.6312.86 - Puppeteer v22.6.2
Chrome for Testing 123.0.6312.58 - Puppeteer v22.6.0
Hi Robert, this is very cool! What are the downsides of having the chromium dependency as part of the docker image? I need something with the most optimal performance so it would make more sense to me to download it when the image is built rather than at runtime. Cheers!
Hi, for optimal startup performance, you would definitely need to bundle everything together, os the only overhead is actually starting Chromium for the first time, not downloading + unzipping + starting. Downloading from S3 in the same region and using larger lambdas (2GB+) would help but only to some degree.
The strengths of the approach described here is simplicity: no need for Docker-based Lambda images, no need for layers, external modules, or any other configuration changes compared to a typical lambda.
In my experience, if you want a lean-ish image with Puppeteer in it, there were a few gotchas if you want a Node version of more recent than 14.x. Maybe it was me, but it wasn't easy to figure it out at the time.
Worked for me, thanks very much.
//package.json
{
"type": "module",
"name": "puppeteer_new_v2",
"version": "1.0.0",
"scripts": {
"execute": "node ./index.js"
},
"devDependencies": {
"@sparticuz/chromium-min": "^119.0.2",
"puppeteer": "^21.5.0"
},
"dependencies": {
"puppeteer-core": "^21.5.0"
}
}
//index.js
import puppeteer from "puppeteer-core";
import chromium from "@sparticuz/chromium-min";
export const handler = async () => {
// identify whether we are running locally or in AWS
const isLocal = process.env.AWS_EXECUTION_ENV === undefined;
const browser = isLocal
? // if we are running locally, use the puppeteer that is installed in the node_modules folder
await require("puppeteer").launch()
: // if we are running in AWS, download and use a compatible version of chromium at runtime
await puppeteer.launch({
args: chromium.args,
defaultViewport: chromium.defaultViewport,
executablePath: await chromium.executablePath(
"https://name.s3.region.amazonaws.com/chromium-v119.0.2-pack.tar"
),
headless: chromium.headless,
});
console.log("browser is up");
await browser.close();
console.log("browser is closed");
};
//serverless.yml (with all necessary iam roles and statements)
functions:
webAPI:
handler: ./index.handler
events:
- http:
path: /call
method: post
But I am getting this error:
{
"errorType": "Error",
"errorMessage": "Cannot find package '@sparticuz/chromium-min' imported from /var/task/index.js",
"trace": [
"Error [ERR_MODULE_NOT_FOUND]: Cannot find package '@sparticuz/chromium-min' imported from /var/task/index.js",
" at packageResolve (node:internal/modules/esm/resolve:858:9)",
" at moduleResolve (node:internal/modules/esm/resolve:931:18)",
" at moduleResolveWithNodePath (node:internal/modules/esm/resolve:1161:14)",
" at defaultResolve (node:internal/modules/esm/resolve:1204:79)",
" at ModuleLoader.defaultResolve (node:internal/modules/esm/loader:383:12)",
" at ModuleLoader.resolve (node:internal/modules/esm/loader:352:25)",
" at ModuleLoader.getModuleJob (node:internal/modules/esm/loader:227:38)",
" at ModuleWrap.<anonymous> (node:internal/modules/esm/module_job:87:39)",
" at link (node:internal/modules/esm/module_job:86:36)"
]
}
Not sure what I am doing wrong! Thanks for the help
Thank you for the response! I just downloaded the chromium-v119.0.2-pack.tar and uploaded it to an S3 bucket so my executable path looks like this:
let browser = await puppeteer.launch({
args: chromium.args,
defaultViewport: chromium.defaultViewport,
executablePath: await chromium.executablePath("https://bucket-name.s3.us-west-1.amazonaws.com/chromium-v119.0.2-pack.tar"),
headless: "new",
ignoreHTTPSErrors: true
});
I did it your way but got the error:
Error: Failed to launch the browser process!
/tmp/chromium: /tmp/chromium: cannot execute binary file
TROUBLESHOOTING: https://pptr.dev/troubleshooting
at Interface.l (/var/task/src/modules/documents/route.js:2593:462)
at Interface.emit (node:events:529:35)
at Interface.close (node:internal/readline/interface:534:10)
at Socket.onend (node:internal/readline/interface:260:10)
at Socket.emit (node:events:529:35)
at endReadableNT (node:internal/streams/readable:1368:12)
at process.processTicksAndRejections (node:internal/process/task_queues:82:21)
Environment:
- AWS lambda
- Node 18
any ideas for me?
Thank you!
Most probably, one of the following is the problem:
- the dependencies are not installed exactly the same way with the same versions and same dev/prod dependency setup
- the lambda is in a subnet without internet connectivity and hence can not download the binaries
- you may want to try and upload the same file to S3 and download it from S3 to see if there is any difference
Hope it helps!
I have done this and opened up my S3 but I am seeing this error:
[Error: ENOENT: no such file or directory, open '/tmp/chromium-pack/aws.tar.br'] {
errno: -2,
code: 'ENOENT',
syscall: 'open',
path: '/tmp/chromium-pack/aws.tar.br'
}
Any ideas? When I view the contents of chromium-v119.0.2-pack.tar I don't see any aws.tar.br in there. Could that be the issue?
Feels like you have a mismatch in the filename, where is the `aws.tar.br` coming from? I'd recommend to upload the file as-is, and also download it to the `/tmp` folder as-is, without renaming it. Should work!
Actually your example worked perfectly I was still importing chromium from the base chromium module instead of chromium-min. Once I updated that it worked perfectly. Thank you so much!
Hello, could you please share the working repo or code for the same?
Sure, I created a simple gist here
- https://gist.github.com/konarskis/7217d16f943d0c2405629fdef268f806
Hope it helps!