Skip to content
Go back

Implementing Encryption Feature in pdf-lib

At work, a requirement to password protect existing PDF files came up. I was tasked with adding this feature to our existing backend that uses Express.

I thought it would be a straightforward task to implement it since there are probably plenty of libraries in NPM that are capable of this functionality.

For the sake of simplicity, password protection and encryption is used interchangeably in this article.

After half a day of searching, I realize that not many libraries support modifying existing PDF files, let alone encryption features. The notable candidate is hummusRecipe (Mostly written in C/C++). However, it does not support output as Buffer (as of this writing). In the end, I have to save the encrypted PDF file to the local disk and load it back as a Buffer (Extremely inefficient).

My next preferred candidate is pdf-lib. The library provides many useful PDF modification features and works with all JavaScript environments. However, the encryption feature is not yet available in this library.

So I thought to myself, can I implement this feature?

To get started, I started reading the PDF Specification file to understand the overall structure of a PDF file. I’ve written a short article here.

Encryption in PDF Specification

A Conforming Reader determines if a PDF file is encrypted by first looking at Encrypt entry in Trailer dictionary. If it exists, then the file is encrypted and the Conforming Reader will go through the decryption routine to extract the content before rendering the file.

Following is an example of Trailer dictionary of an encrypted PDF file

The Encrypt entry is referencing to Indirect Objects with object number 7 and generation number 0 . This object is known as Encryption dictionary and contains all encryption-related information.

Following is an example of anEncryption dictionary.

It might seem very complicated, but let’s go through each of the entries in the dictionary to understand what it meant.

The following field is only used if /Filter is /Standard

A security handler is a software module that implements various aspects of the encryption process and controls access to the contents of the encrypted document. While the PDF Standard specify a “Standard password-based security handler” that all Conforming Reader must support, a custom security handler could be added by Conforming Reader to enhance the security of encrypted PDF File.

Crypt filters provide finer granularity control over encryption within a PDF file (Only for PDF 1.5 and above with a value of /V entry equal to 4). For example, you could define different encryption mechanisms for streams ( /StmF ) and strings ( /StrF ) using different crypt filters defined in /CF of Encryption dictionary. There is a couple of entry in crypt filters that we should understand, as follow:

Furthermore, crypt filters do support public-key security handlers implementation (Not the scope of this article).

For sake of simplicity, the article will focus on the “Standard password-based security handler” and the algorithm used in conjunction with the crypt filter shown in the example above.

The encryption of data in a PDF file is based on the use of an encryption key computed by the security handler. While we could define our custom security handler with a different mechanism to compute the encryption key, the process of encrypting the data never changes (Follow *“*Algorithm 1: Encryption of data using the RC4 or AES algorithms” defined in the specification).

The next question would be, how do we compute the encryption key following the standard security handler in PDF specification?

The process to compute encryption key is described in *“*Algorithm 2: Computing an encryption key” of the PDF specification. It requires two inputs that have to be computed beforehand, namely O and ID entry in the Encryption dictionary.

Computation of U and O entry follows steps of padding (to ensure consistent length), MD5 hashing and RC4 encryption function.

U entry is computed using either “Algorithm 4: Computing the encryption dictionary’s U (user password) value (Security handlers of revision 2)” or “Algorithm 5: Computing the encryption dictionary’s U (user password) value (Security handlers of revision 3 or greater)”.

Sample code for computing U entry

O entry is computed using “Algorithm 3: Computing the encryption dictionary’s O (owner password) value”.

Sample code for computing O entry

PDF specification does not define how ID entry should be computed. However, since it is also used as a unique file identifier, we could pass the Info entry from the Trailer dictionary into an MD5 Hash to generate the required byte-strings.

Sample code for computing ID entry

(Fun fact: No matter how ID entry is computed, it will not cause encryption or decryption to fail, as long as it is the constant)

Once we have all the ingredients, we could compute the encryption key as follow:

Sample code for computing encryption key

With the computed encryption key, an encryption function (encrypt the actual data in PDF file) can be created for use by the PDFWriter (Code that saves the PDF in binary format).

Sample code to generate Encryption Function

Permission Flag

Coming back to the /P entry in the Encryption dictionary, it is also known as user access permission. It is an unsigned 32-bit integer containing a set of flags specifying which access permission shall be granted when a document is opened with user passwords. Only bit positions 3, 4, 5, 6, 9, 10, 11 and 12 is meaningful.

Extracted from PDF 32000–1:2008 (Table 22 — User Access Permission)

In the example above, the /P entry has a value -3896 in decimal format, which is equivalent to 1111000011001000 , since only bit position 4 is set , the user would be able to modify the file but not print.

With all the above steps in place, all that is left is to plug the code in and ensure that the encryption function is run at the right place for the right data in the PDFWriter of pdf-lib . Many thanks to the security module of [pdfkit](https://github.com/foliojs/pdfkit) for the code on encryption algorithm.

Link to the Pull-Request for the Encryption Feature for pdf-lib below.

Feature: Document Encryption by PhakornKiong · Pull Request #917 · Hopding/pdf-lib
*Document encryption feature development under #243, Able to encrypt newly created document using PDFWriter Still facing…*github.com

Useful Reference​

  1. Developing with PDF by Leonard Rosenthol (Chapter 1. PDF Syntax)
  2. PDF 32000–1:2008


Share this post on:

Previous Post
MySejahtera is a Perfectly Good App With No Exploits
Next Post
Basic Structure of Portable Document Format (PDF)