Filename for PDF rendered with Puppeteer in NodeJS and Express

When generating PDF files in NodeJS, one of the most used ways is to render them through Puppeteer. However, there is no standard way to change the filename on the Puppeteer side when serving the Buffer through Express (or any other webserver), in this post I explain how to change the filename by using appropriate HTTP headers.

Puppeteer

If you’d been writing webservices using NodeJS you’ve probably heard about the all-mighty, awesome, open source, project: Puppeteer. As the official introduction in their GitHub repo says:

Puppeteer is a Node library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol. Puppeteer runs headless by default, but can be configured to run full (non-headless) Chrome or Chromium.

Puppeteer makes it easy to interact with the Chrome/Chromium web browser in multiple ways by simple commands provided to the DevTools console. It is a perfect match for test automation and, for what matters in context of this post, it has capabilities of generating screenshots and PDFs of the pages rendered which makes it an easy and simple way of generating PDFs using plain HTML, CSS and JS with all the flexibility you’d expect from a normal Chrome/Chromium installation.

Generating a PDF in Puppeteer using NodeJS is as simple as:

1
2
3
4
5
6
7
8
9
10
const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://news.ycombinator.com', {waitUntil: 'networkidle2'});
  await page.pdf({path: 'hn.pdf', format: 'A4'});

  await browser.close();
})();

Rendering On-Demand

Rendering PDFs with Puppeteer couldn’t be more easy. However, when generating files in webservices, the best practice is to tranfer the files directly without storing them on the NodeJS process instance. Why would we like to avoid storing files on the webserver instance? Storing files complicate the process of a normal API interaction, just compare these 2 pipelines:

Storing the file:

  1. Launch Puppeteer.
  2. Load HTML, CSS and JS and wait for networkIdle1.
  3. Render the PDF.
  4. Store the PDF in some volume/directory of the webserver instance.
  5. Wait for the entire file to be stored.
  6. Read the file and transmit it back to the client.

Without storing the file:

  1. Launch Puppeteer.
  2. Load HTML, CSS and JS and wait for networkIdle1.
  3. Render the PDF.
  4. Pipe the generated Buffer to the res object inmediately to the client.

The second approach gives us the ability to leave our webserver instance running with minimal disk interaction (which tends to be slower than a network transmission!) and for the webserver itself to chunk and handle the information in ways that will not clog the NodeJS process with unnecesary file IO operations.

A simple implementation (assuming you’re using NodeJS and Express but would be applicable for any webserver environment):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import { Response } from 'express';
import * as puppeteer from 'puppeteer';

/**
 * Generate a simple PDF from the Google homepage.
 * @param res The Express response object.
 */
export async function generateGooglePDF(res: Response): Promise<void> {
  // Launch puppeteer + chrome/chromium.
  const browser = await puppeteer.launch();

  // Get the page object.
  const page = browser.newPage();

  // Go to Google.com
  await page.goto('https://google.com', { waitUntil: 'networkIdle0' });

  // We are using a callback.
  await page.pdf({ format: 'A4' })
    .then((renderedPdfBuffer: Buffer) => {
      // Set the headers appropriately.
      res.setHeader('Content-Type', 'application/pdf');               // Content Type is PDF.
      res.setHeader('Content-Length', renderedPdfBuffer.byteLength);  // The length of the file.
      res.setHeader('Content-Description', 'File Transfer');          // The description of the content.
      res.setHeader('Content-Transfer-Encoding', 'binary');           // PDFs are in binary format.

      // Send the file.
      res.send(renderedPdfBuffer);
    })
    .catch(renderError => console.error);

  // Give us back that RAM kid!
  await browser.close();
}

This is probably a pretty reductionist implementation, but you get the general idea. With this method, you are able to render some webpage, get a PDF and send it back to a client without the file actually touching some disk. However, if you try the code above you’ll get the file but the filename within will be pdf.pdf which, well, ain’t that good! But how do we change the file name?

If we go back to the first example, you’ll notice that, when generating the PDF file, we passed Puppeteer an option to the .pdf() function named path. Now, that key works if we would’ve saved the file in the disk, however we piped the file through the response (res) object so we had no use (and actually if you provide the name the file won’t be streamed back at you!) for the key which, also, just works when saving the file to disk. This has been asked on StackOverflow with complex solutions involving saving the file and running PDF/Image metadata modification command line utilities like exiftool and then you would send back the file to client. But there’s a simpler way.

Content Disposition

The missing part is the Content-Disposition HTTP header which, from the awesome MDN documentation:

The Content-Disposition header is defined in the larger context of MIME messages for e-mail, but only a subset of the possible parameters apply to HTTP forms and POST requests. Only the value form-data, as well as the optional directive name and filename, can be used in the HTTP context.

This header allows us to describe how the final file is going to be handled by the browser and has a few configuration options based on the flow we’d like to implement on the Client (receiver) side of the operation:

You already see were we are heading? Great! Now, the standard implementation of the Content-Disposition header does not include an option just to set the filename of a response, however we are actually able to just set the header like the following:

And it will work perfectly!

The changes that would have to be made to the code to add this new header are:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
  // We are using a callback.
  await page.pdf({ format: 'A4' })
    .then((renderedPdfBuffer: Buffer) => {
      // Set the headers appropriately.
      res.setHeader('Content-Type', 'application/pdf');               // Content Type is PDF.
      res.setHeader('Content-Length', renderedPdfBuffer.byteLength);  // The length of the file.
      res.setHeader('Content-Description', 'File Transfer');          // The description of the content.
      res.setHeader('Content-Transfer-Encoding', 'binary');           // PDFs are in binary format.

      // Add the Content Disposition header.
      res.setHeader('Content-Disposition', `filename="${new Date().toLocaleString()}.pdf"`);

      // Send the file.
      res.send(renderedPdfBuffer);
    })
    .catch(renderError => console.error);

That’s it

Now you can serve files rendered with Puppeteer with custom filenames through Express in NodeJS! If you have any comments and/or feedback, you can always find me on twitter. Thank you for reading.

  1. The networkIdle parameter tells Puppeteer to wait until every external resource (like images, css, scripts etc) to be loaded and properly rendered before capturing the information to an image or PDF.  2