How to convert HTML-to-PDF with Python Documentation

DocRaptor's HTML to PDF API is the most powerful way to create PDF documents with Python. As the only API that uses the Prince PDF engine, DocRaptor uniquely supports advanced PDF generation features such as mixed layouts and header placements, accessible PDF tagging, crop marks, and more.

We've been maintaining the DocRaptor Python package since 2016. Our library only supports Python 3 (we're proud members of the movement to drop Python 2 support).

No Signup Required

We've done our best to make the DocRaptor package the easiest way to convert HTML to PDF with Python. That includes a no-signup testing and trial mode. Our public API key (YOUR_API_KEY_HERE) works with any of the below Python code examples.

You can optionally sign up for a free account which removes the DocRaptor watermark and enables our support team to help you debug any document issues. DocRaptor paid plans don't have any document size limits and start at $15/month.

Package Installation

pip install --upgrade docraptor

Hello World! Example

The most simple example of the Python agent is:

import docraptor

doc_api = docraptor.DocApi()
# this key works in test mode!
doc_api.api_client.configuration.username = 'YOUR_API_KEY_HERE'

try:
    response = doc_api.create_doc({
        'test': True,  # test documents are free but watermarked
        'document_type': 'pdf',
        'document_content': '<html><body>Hello World!</body></html>',
        # 'document_url': 'https://docraptor.com/examples/invoice.html',
        # 'javascript': True,
        # 'prince_options': # {
        #    'media': 'print', # @media 'screen' or 'print' CSS
        #    'baseurl': 'https://yoursite.com', # the base URL for any relative URLs
        # },
    })

    # create_doc() returns a binary string
    with open('docraptor-hello.pdf', 'w+b') as f:
        binary_formatted_response = bytearray(response)
        f.write(binary_formatted_response)
        f.close()
    print('Successfully created docraptor-hello.pdf!')
except docraptor.rest.ApiException as error:
    print(error.status)
    print(error.reason)
    print(error.body)

Then to create a PDF, download or copy the above code and run this in your console:

python docraptor-hello.py

Advanced HTML Example

That was the basics. Let's get into the fun stuff now. These Python code examples demonstrate features and functionality unique to DocRaptor, including footnotes, repeating headers, table of contents with leaders, and a title page without a header. We'll start with the HTML and CSS:

<html>
  <head>
    <style>
      /* Create a running element */
      #header-and-footer {
        position: running(header-and-footer);
        text-align: right;
      }

      /* Add that running element to the top and bottom of every page */
      @page {
        @top {
          content: element(header-and-footer);
        }
        @bottom {
          content: element(header-and-footer);
        }
      }

      /* Add a page number */
      #page-number {
        content: "Page " counter(page);
      }

      /* Create a title page with a full-bleed background and no header */
      #title-page {
        page: title-page;
      }

      @page title-page {
        background: url('https://docraptor-production-cdn.s3.amazonaws.com/tutorials/raptor.svg') 50% 50px / 80% no-repeat #DDE4F3;
        @top {
          content: "";
        }
      }

      #title-page h1 {
        padding: 500px 0 40px 0;
        font-size: 75px;
      }

      /* Dynamically create a table of contents with leaders */
      #table-of-contents a {
        content: target-content(attr(href)) leader('.') target-counter(attr(href), page);
        color: #135697;
        text-decoration: none;
        display: block;
        padding-top: 5px;
      }

      /* Float the footnote to a footnotes area on the page */
      .footnote {
        float: footnote;
        font-size: small;
      }

      .page {
        page-break-after: always;
      }

      body {
        counter-reset: chapter;
        font-family: 'Open Sans';
        color: #135697;
      }
    </style>
    <link href="https://fonts.googleapis.com/css2?family=Open+Sans:wght@400;600;800&display=swap" rel="stylesheet">
  </head>
  <body>
    <div id="title-page" class="page">
      <h1>The Official DocRaptor eBook</h1>
      <div id="table-of-contents">
        <a href="#chapter-1"></a>
        <a href="#chapter-2"></a>
        <a href="#chapter-3"></a>
      </div>
      <div id="header-and-footer">
        <span id="page-number"></span> | The Official DocRaptor eBook
      </div>
    </div>
    <div class="page">
      <h2 id="chapter-1">What is DocRaptor?</h2>
      <p>DocRaptor is an HTML to PDF API!</p>
    </div>
    <div class="page">
      <h2 id="chapter-2">Why should I use DocRaptor?</h2>
      <p>It's super easy and also has the most powerful conversion capabilities!</p>
    </div>
    <div class="page">
      <h2 id="chapter-3">How much does it cost?</h2>
      <p>
        We have a free plan!<span class="footnote">Includes five documents a month</span>
      </p>
    </div>
  </body>
</html>

The Python code for generating a PDF from that HTML is very similar to the basic example. It just reads the external HTML file instead of using inline HTML:

import docraptor

doc_api = docraptor.DocApi()
# this key works in test mode!
doc_api.api_client.configuration.username = 'YOUR_API_KEY_HERE'

with open('docraptor-advanced-content.html', 'r') as f:
    document_content = f.read()

try:
    response = doc_api.create_doc({
        'test': True,  # test documents are free but watermarked
        'document_type': 'pdf',
        'document_content': document_content,
        # 'document_url': 'https://docraptor.com/examples/invoice.html',
        # 'javascript': True,
        # 'prince_options': # {
        #    'media': 'print', # @media 'screen' or 'print' CSS
        #    'baseurl': 'https://yoursite.com', # the base URL for any relative URLs
        # },
    })

    # create_doc() returns a binary string
    with open('docraptor-advanced.pdf', 'w+b') as f:
        binary_formatted_response = bytearray(response)
        f.write(binary_formatted_response)
        f.close()
    print('Successfully created docraptor-advanced.pdf!')
except docraptor.rest.ApiException as error:
    print(error.status)
    print(error.reason)
    print(error.body)

To make the PDF, download or copy the above code samples and then run:

python docraptor-advanced.py

Important API Parameters

Our Create Document API is likely the only endpoint you'll interact with. It has numerous well-documented parameters, but let's review the most important ones here:

`test`

DocRaptor does not count test documents towards your plan limits, ensuring that you can take the time to get your documents looking perfect. However, test documents are watermarked. When you're ready to remove the watermark, simply set test to False.

`javascript`

JavaScript processing is disabled by default. This speeds up the PDF generation. To enable JavaScript processing, set javascript to True.

It can sometimes be difficult to know when your JavaScript has finished accessing data or rendering graphs, etc.. In such cases, you can delay rendering for a set period of time or until an element exists.

Optionally, instead of our primary JavaScript engine, you can use Prince's engine. Prince's JavaScript processor's modern JavaScript support is somewhat limited, but it has powerful custom features like multiple PDF rendering passes. The Prince website has detailed documentation on their JavaScript support.

`async`

While DocRaptor does not have any input or output file size limits, we do have a generation time limit. Our default and synchronous API is limited to 60 seconds. For large documents, you can use our asynchronous API which has a 10 minute limit. The asynchronous API will ping your supplied callback URL when the document is completed, or you can query our API for the status.

Creating asynchronous documents with the Python agent requires using the create_async_doc method. Here's a complete code example:

import docraptor
import time

doc_api = docraptor.DocApi()
# this key works in test mode!
doc_api.api_client.configuration.username = 'YOUR_API_KEY_HERE'

try:
    # different method than the synchronous documents
    response = doc_api.create_async_doc({
        'test': True,  # test documents are free but watermarked
        'document_type': 'pdf',
        'document_content': '<html><body>Hello World!</body></html>',
        # 'document_url': 'https://docraptor.com/examples/invoice.html',
        # 'javascript': True,
        # 'prince_options': # {
        #    'media': 'print', # @media 'screen' or 'print' CSS
        #    'baseurl': 'https://yoursite.com', # the base URL for any relative URLs
        # },
    })

    while True:
        status_response = doc_api.get_async_doc_status(response.status_id)
        if status_response.status == "completed":
            pdf = doc_api.get_async_doc(status_response.download_id)
            # get_async_doc() returns a binary string
            with open("docraptor-async.pdf", "wb") as f:
                f.write(pdf)
            print('Wrote PDF to docraptor-async.pdf')
            break
        elif status_response.status == "failed":
            print("FAILED")
            print(status_response)
            break
        else:
            time.sleep(1)
except docraptor.rest.ApiException as error:
    print(error.status)
    print(error.reason)
    print(error.body)

`hosted`

DocRaptor can also host your documents for you, either temporarily or permanently. This is handy when your goal is to include a link to the PDF in an email or use the PDF in no-code automations. Document hosting is a paid add on, but it can also save you money when you’re repeatedly generating the same document.

To create hosted documents with the Python agent, use the create_hosted_doc or create_hosted_async_doc. Here’s a working code example for synchronous hosted document generation:

import docraptor

doc_api = docraptor.DocApi()
# this key works in test mode!
doc_api.api_client.configuration.username = 'YOUR_API_KEY_HERE'

try:
    # different method than the non-hosted documents
    response = doc_api.create_hosted_doc({
        'test': True,  # test documents are free but watermarked
        'document_type': 'pdf',
        'document_content': '<html><body>Hello World!</body></html>',
        # 'document_url': 'https://docraptor.com/examples/invoice.html',
        # 'javascript': True,
        # 'prince_options': # {
        #    'media': 'print', # @media 'screen' or 'print' CSS
        #    'baseurl': 'https://yoursite.com', # the base URL for any relative URLs
        # },
    })

    print(f"The PDF is hosted at {response.download_url}")
except docraptor.rest.ApiException as error:
    print(error.status)
    print(error.reason)
    print(error.body)

Advanced HTML & CSS for PDF Creation

Most of DocRaptor’s unique PDF generation capabilities are managed through CSS, not the Python agent. Controlling page breaks, footnotes, headers and footers, page floats and more are all done in your HTML and CSS. Here’s a few of our more popular guides:

Other Resources & Documentation

In addition to our documentation troubleshooting and tutorials (including Python code examples), we have sample documents that showcase advanced HTML-to-PDF techniques.

And if you’re still evaluating if DocRaptor’s Python library and API is the best fit for your project, we’ve written comparisons of DocRaptor versus other top Python HTML to PDF libraries and open-source HTML to PDF tools. We've also outlined differences between online APIs and HTML to PDF libraries. PDF conversion can be surprisingly tricky to get right and we believe DocRaptor's API is the easiest and highest-quality option.

Lastly, one of the major benefits of DocRaptor is access to our professional support team. Creating complex (or simple!) PDFs can be a challenging paradigm shift from creating standard web pages. If you need help getting the perfect PDF for your Python project, don’t hesitate to email support@docraptor.com or open a help request on your document.

How to Convert HTML-to-PDF with Python