Implementing Encryption Feature  in pdf-lib

Implementing Encryption Feature in pdf-lib

At work, a requirement to password protect existing PDF files came up. I was tasked with adding this feature to our existing backend that uses Express.

I thought it would be a straightforward task to implement it since there are probably plenty of libraries in NPM that are capable of this functionality.

For the sake of simplicity, password protection and encryption is used interchangeably in this article.

After half a day of searching, I realize that not many libraries support modifying existing PDF files, let alone encryption features. The notable candidate is [hummusRecipe](https://github.com/chunyenHuang/hummusRecipe) (Mostly written in C/C++). However, it does not support output as Buffer (as of this writing). In the end, I have to save the encrypted PDF file to the local disk and load it back as a Buffer (Extremely inefficient).

My next preferred candidate is [pdf-lib](https://github.com/Hopding/pdf-lib). The library provides many useful PDF modification features and works with all JavaScript environments. However, the encryption feature is not yet available in this library.

So I thought to myself, can I implement this feature?

To get started, I started reading the PDF Specification file to understand the overall structure of a PDF file. I’ve written a short article here.

Encryption in PDF Specification

A Conforming Reader determines if a PDF file is encrypted by first looking at Encrypt entry in Trailer dictionary. If it exists, then the file is encrypted and the Conforming Reader will go through the decryption routine to extract the content before rendering the file.

Following is an example of Trailer dictionary of an encrypted PDF file

The Encrypt entry is referencing to Indirect Objects with object number 7 and generation number 0 . This object is known as Encryption dictionary and contains all encryption-related information.

Following is an example of anEncryption dictionary.

It might seem very complicated, but let’s go through each of the entries in the dictionary to understand what it meant.

  • /Filter — Identifies the security handler for the document. /Standard means the “Standard password-based security handler” is used.

  • /V — Specify the algorithm to be used in encrypting and decrypting the document.

  • /Length — Length of the encryption key

  • /CF — A dictionary whose keys is crypt filter names and whose values is the corresponding crypt filter dictionary (Only meaningful for PDF 1.5 and above)

  • /StmF — Name of the crypt filter to be used by default when decrypting streams in the document

  • /StrF — Name of the crypt filter to be used by default when decrypting all strings in the document

The following field is only used if /Filter is /Standard

  • /R — Number specifying revision of the standard security handler

  • /O — 32-byte string, based on both owner and user passwords. Used in computing the encryption key and determining whether a valid owner password was entered

  • /U — 32-byte string, based on user password. Used in determining whether a valid owner or user password was entered

  • /P — A set of permission flags will be permitted when the document is opened with user password. We will get back to this in the latter part of the article.

A security handler is a software module that implements various aspects of the encryption process and controls access to the contents of the encrypted document. While the PDF Standard specify a “Standard password-based security handler” that all Conforming Reader must support, a custom security handler could be added by Conforming Reader to enhance the security of encrypted PDF File.

Crypt filters provide finer granularity control over encryption within a PDF file (Only for PDF 1.5 and above with a value of /V entry equal to 4). For example, you could define different encryption mechanisms for streams ( /StmF ) and strings ( /StrF ) using different crypt filters defined in /CF of Encryption dictionary. There is a couple of entry in crypt filters that we should understand, as follow:

  • /CFM — Define the method used by conforming reader to decrypt data. Can be None , V2 or AESV2 . The value V2 uses RC4 algorithm while AESV2 uses AES algorithm in Cipher Block Chaining (CBC) mode with 16-byte block size.

  • /AuthEvent — Define the event used to trigger authorization to access the encryption key used by this filter. Either DocOpen or EFOpen .The value DocOpen means authorization will be required when a document is opened, while EFOpen means authorization will be required when accessing embedded files. Default to DocOpen .

  • /Length — The bit length of the encryption key. Multiple of 8 in the range of 40 to 128.

Furthermore, crypt filters do support public-key security handlers implementation (Not the scope of this article).

For sake of simplicity, the article will focus on the “Standard password-based security handler” and the algorithm used in conjunction with the crypt filter shown in the example above.

The encryption of data in a PDF file is based on the use of an encryption key computed by the security handler. While we could define our custom security handler with a different mechanism to compute the encryption key, the process of encrypting the data never changes (Follow *“*Algorithm 1: Encryption of data using the RC4 or AES algorithms” defined in the specification).

The next question would be, how do we compute the encryption key following the standard security handler in PDF specification?

The process to compute encryption key is described in *“*Algorithm 2: Computing an encryption key” of the PDF specification. It requires two inputs that have to be computed beforehand, namely O and ID entry in the Encryption dictionary.

Computation of U and O entry follows steps of padding (to ensure consistent length), MD5 hashing and RC4 encryption function.

U entry is computed using either “Algorithm 4: Computing the encryption dictionary’s U (user password) value (Security handlers of revision 2)” or “Algorithm 5: Computing the encryption dictionary’s U (user password) value (Security handlers of revision 3 or greater)”.

Sample code for computing U entry

O entry is computed using “Algorithm 3: Computing the encryption dictionary’s O (owner password) value”.

Sample code for computing O entry

PDF specification does not define how ID entry should be computed. However, since it is also used as a unique file identifier, we could pass the Info entry from the Trailer dictionary into an MD5 Hash to generate the required byte-strings.

Sample code for computing ID entry

(Fun fact: No matter how ID entry is computed, it will not cause encryption or decryption to fail, as long as it is the constant)

Once we have all the ingredients, we could compute the encryption key as follow:

Sample code for computing encryption key

With the computed encryption key, an encryption function (encrypt the actual data in PDF file) can be created for use by the PDFWriter (Code that saves the PDF in binary format).

Sample code to generate Encryption Function

Permission Flag

Coming back to the /P entry in the Encryption dictionary, it is also known as user access permission. It is an unsigned 32-bit integer containing a set of flags specifying which access permission shall be granted when a document is opened with user passwords. Only bit positions 3, 4, 5, 6, 9, 10, 11 and 12 is meaningful.

Extracted from PDF 32000–1:2008 (Table 22 — User Access Permission)

In the example above, the /P entry has a value -3896 in decimal format, which is equivalent to 1111000011001000 , since only bit position 4 is set , the user would be able to modify the file but not print.

With all the above steps in place, all that is left is to plug the code in and ensure that the encryption function is run at the right place for the right data in the PDFWriter of pdf-lib . Many thanks to the security module of [pdfkit](https://github.com/foliojs/pdfkit) for the code on encryption algorithm.

Link to the Pull-Request for the Encryption Feature for pdf-lib below.

Feature: Document Encryption by PhakornKiong · Pull Request #917 · Hopding/pdf-lib
*Document encryption feature development under #243, Able to encrypt newly created document using PDFWriter Still facing…*
github.com

Useful Reference​

  1. Developing with PDF by Leonard Rosenthol (Chapter 1. PDF Syntax)

  2. PDF 32000–1:2008