DocRaptor

DocRaptor vs Open-Source HTML-to-PDF Libraries

Like many developers, we lean towards using open-source libraries when possible. They’re free and often simpler than commercial tools. And for some PDF documents, an open-source HTML-to-PDF tool is the best answer.

So why did we build the DocRaptor HTML-to-PDF API on top of the PrinceXML commercial engine? After struggling with weeks with open-source libraries, we realized there is a significant technical gap between Prince and all the browser-based HTML-to-PDF libraries:

Prince knows PDFs have multiple pages.

Browsers assume you’re making a long scrolling webpage. When there isn't even a concept of a “page”, repeating elements such as headers or watermarks aren’t supported. This one fact, along with a dedicated focus on PDFs, means Prince’s commercial HTML to PDF engine has far superior support for common requirements such as:

  • Headers and footers
  • Page numbers
  • Page breaks
  • Columns
  • Watermarks
  • Odd/even page styling
  • Cover page styling
  • Tables or images spanning multiple pages
  • Floats across pages
  • Footnotes
  • Varying page sizes or layouts
  • Accessible PDFs

If you don’t need any of that functionality, the open-source tools may be a good solution for your document. They’re great for one-page invoices, simple letters, or exact copies of existing webpages. With enough effort and polyfills, slightly more complex documents can be supported.

Well Documented

Code Examples

Review all of the DocRaptor’s documentation library to learn more about how we work, and to get a look into our code examples for languages like C#, Java, JavaScript, jQuery, .NET, Node, PHP, Python, Rails, Ruby, and others.

Style & Formatting

Consistency and clarity for PDF and Excel documents can be yours with DocRaptor. Learn how to adjust styling with basic HTML and CSS.

Scalability Concerns

Beyond technical capabilities, the other problem we faced was scalability. PDF generation is more complex than it appears.

A traditional web server returns the HTML content for a website in milliseconds and is then free to respond to other requests. A PDF generator must download all the external assets, such as JavaScript and images, and then completely render the document before delivering the PDF. This work is traditionally done by the browser.

Because of all this additional server-side work, the execution time of PDF generators is much larger than a web server request. Instead of milliseconds, it takes seconds or in the case of an image-heavy document even minutes to generate a PDF. Sometimes assets timeout and document generation time spikes.

There are also larger startup time, CPU usage, and memory consumption to consider. A popular Node-based open-source HTML to PDF library notes in the readme:

Note: It is strongly recommended that you keep Chrome running side-by-side with Node.js. There is significant overhead starting up Chrome for each PDF generation which can be easily avoided.

It's suggested to use pm2 to ensure Chrome continues to run. If it crashes, it will restart automatically.

As of this writing, headless Chrome uses about 65mb of RAM while idle.

These problems are all solvable. Background jobs are the common solution (it's what we do). Lambda functions are another tactic.

If your usage numbers are small, this may not be a concern. If you want to generate a lot of documents or support simultaneous PDF creation, scalability planning will be required.

In the end, our own difficulties with creating a high-quality PDF and scaling production led us to create DocRaptor. We thought we could help other developers from wasting time and energy as we did. That said, many DocRaptor users have tried an open-source tool before switching to DocRaptor. You should review all your potential solutions.

Open-Source Options

There are two kinds of open source tools: browser-based and...others. They each have different advantages and disadvantages.

Browser-Based HTML-to-PDF Libraries

As you normally write HTML, CSS, and JavaScript for web browsers, browser-based libraries are the easiest for web developers to use. You submit an HTML document or URL and you get a PDF back.

We’d recommend HTML to PDF libraries based on Headless Chrome. It provides modern CSS and JavaScript support and a strong developer community. There’s a bunch now, but the most popular Headless Chrome libraries include:

Historically, wkhtmltopdf and PhantomJS were the primary open-source HTML to PDF libraries. Do not use these for PDF generation. They are buggy, lack support for modern CSS, have poor font support, and are a pain to install. Additionally, PhantomJS has been officially abandoned by its maintainer. Stick with the Headless Chrome-based libraries.

Non-Browser-Based Libraries

Not all libraries are based on browser engines. Weasyprint is a very popular HTML-to-PDF library that is not based on an actual browser. We haven’t used it, but it’s generally well-reviewed. It has one major limitation though: it does not support JavaScript. Only HTML and CSS. That limitation aside, it offers more advanced PDF options than Headless Chrome and is probably the closest open-source alternative to DocRaptor (except we support JavaScript!).

The other non-browser libraries require you to programmatically create your PDF element by element, line by line. This flexibility and power comes at the cost of development time and extensive documentation consumption. If you need a level of pixel precision that HTML and CSS cannot provide, then these libraries may be a good option. Popular choices include:

Simple Pricing

Volume-Based Packages

Choose the pricing package that’s right for your needs. Have an increased demand? You can increase your package and scale up instantly.

Easy Overages

A few extra documents won’t force you into the next package. We offer transparent per-document overage costs.

Play the Long Game

There’s no limit for the lengths of the documents you produce. Build the PDFs you want—no matter the length—and enjoy the same per-document pricing.

Which is Right for You?

It primarily depends on your document requirements and your budget. Beyond easy access to the Prince engine, DocRaptor’s online HTML to PDF API provides a lot of benefits such as:

  • Fast and easy setup
  • No maintenance
  • 99.999% uptime guarantee
  • Professional support and debugging
  • Document hosting

On the other hand, open-source HTML to PDF libraries guarantee full control over your document generation pipeline and might save you money in the long-run—though when we compared self-hosting to DocRaptor, using DocRaptor was almost always more cost-effective (assuming US-based developer salaries). The choice is yours! If you have any questions about DocRaptor, feel free to contact us at support@docraptor.com.