Documentation
Document API
Converting HTML into PDF and XLS documents is fast and painless with DocRaptor. Browse the documentation below to get started, or check out a code example in your language.
Authentication
You'll authenticate with your API key, which can be found on your account dashboard. You can use your API key in one of two ways:
- The username for HTTP Basic Authentication (preferred)
- The value for the query parameter
user_credentials
Making a Document
To make documents, POST
to https://docraptor.com/docs
with JSON. Only a few options are required and most have reasonable defaults. Here are the minimum options for making a PDF:
{
"type": "pdf",
"document_content": "<html><body>Hello World!</body></html>"
}
Some options are nested and are referred to like prince[media]
in the documentation. As JSON they should look like this:
{
"type": "pdf",
"document_content": "<html><body>Hello World!</body></html>",
"prince_options": {
"media": "screen"
}
}
JSON is preferred, but you can also send form encoded variables by wrapping the option with doc[]
and adding another []
for sub options.
type becomes doc[type]
prince_options[screen] becomes doc[prince_options][screen]
API Response
If successful, the POST
call returns a binary string containing your PDF. Save this string as a file on your server or send it directly to the browser for the user to download. Alternatively, instead of a binary string, hosted documents return a public URL while asynchronously generated documents return a status_id
used to retrieve the document.
If an error occurs while generating your document, an XML error message will be returned instead of the binary string:
<?xml version="1.0" encoding="UTF-8"?>
<errors>
<error>Error downloading document content from supplied url.</error>
</errors>
The HTTP status code returned can be used to determine generation success or failure.
For PDF documents, the X-DocRaptor-Num-Pages
response header will contain the number of pages contained in the document.
API Parameters
The kind of document DocRaptor should create from the provided content.
This field was previously called document_type and is still available for applications that depend on it.
The HTML or XML that DocRaptor should use to create the document. Excel files should be XML while PDFs can be converted from HTML.
The URL that DocRaptor should request the content from to create the document.
Any string that you find meaningful to describe the document. It is used for identification in the documents log.
Creates the document in test mode. All plans have unlimited test documents that do not count against monthly limits. This way you can play with styles until you get a good looking document without wasting any of your allotted documents.
When test
is set to true
there are a few things to note:
- Generated PDFs will be watermarked.
- Excel documents will be cut off after 20 rows.
- Hosted documents will be limited to 5 downloads.
- Hosted documents will expire after 1 day.
Explicitly defining the referrer is useful if you have to make JavaScript calls that depend on the referrer. For example, Adobe's Typekit functionality requires this to be set explicitly when using document_content.
The DocRaptor pipeline allows you to choose the combination of Prince and JavaScript engines you want to use.
If not set, your documents will use Pipeline version 8 by default. You can change your default Pipeline version via the Settings link on your user dashboard. For users not on the newest version, we recommend upgrading after using this API parameter to test your documents on that version.
The following describes the mapping of Pipeline version to Prince and JavaScript engine versions:
Pipeline | Prince | JavaScript | Release Notes |
---|---|---|---|
8 | 13 | 2 | Release Notes |
7 | 12 | 2 | Release Notes |
6 | 11 | 2 | Release Notes |
Basic PDF Options
We convert HTML into PDFs using the industry-leading Prince PDF engine. Many of our API options are specific to Prince and only apply to PDF documents. Our usage of the Prince engine means we have multiple JavaScript parsing engines to select from.
We apply "print" media rules by default as most DocRaptor documents are destined for the printer. Using "print", when you really want "screen" (which is what browsers use), is the most common issue users experience. If your document looks really incorrect, try changing this first.
This API parameter is deprecated and pipeline should be used instead.
The base URL is used for all relative URLs in a document. Without a base URL, relative urls such as "../images/photo.jpg" will fail to load. This can also be accomplished by using the HTML Base tag.
Note that relative URLs starting with the /
will use the naked domain instead of the full baseurl for their base.
Here's how that works:
baseurl | rel. url | full url |
---|---|---|
a.com |
/b/c/d.jpg |
a.com/b/c/d.jpg |
a.com/b |
c/d.jpg |
a.com/b/c/d.jpg |
a.com/b |
/c/d.jpg |
a.com/c/d.jpg |
JavaScript PDF Options
DocRaptor offers two options for JavaScript parsing and both are disabled by default (it makes your document processing a lot faster). Most users will only want to use one JavaScript engine as enabling both engines simultaneously will cause all JavaScript code to be evaluated twice.
If enabled, we will use DocRaptor's custom JavaScript engine to run any JavaScript in your HTML before sending it to Prince for conversion.
DocRaptor's JavaScript engine is separate from Prince's JavaScript engine. Our engine has been specifically designed to provide support for popular JavaScript tools and libraries, such as Typekit and Highcharts. We generaly recommend using our engine, not the Prince engine
By default, we stop running JavaScript when your page is finished rendering. If you have any delayed or asynchronous JavaScript on your page, simply define a function called docraptorJavaScriptFinished()
. It should return true
if all of your JavaScript has finished executing, and false
otherwise. Any other return value is considered an error.
If there are any JavaScript errors, the document creation process will fail, and the errors will be returned so you can see what went wrong.
By default, JavaScript console messages, such as console.log('hello')
, halt PDF generation and cause an error message to be returned. If this parameter is enabled, we will instead ignore and log JavaScript console messages, as most web browsers do.
In Pipelines 1-6, the default value for this option is false, meaning console messages WILL halt document generation.
In Pipelines 7 and greater, the default value for this option is true, meaning console messages WILL NOT halt document generation by default.
If enabled, we will use Prince's built-in JavaScript engine. Note that this is a separate engine from DocRaptor's JavaScript engine. We generally recommend using DocRaptor's engine, but Prince's engine is necessary in certain cases, such as drawing in a canvas element with JavaScript or when accessing Prince's custom PDF JavaScript object.
Feel free to contact us if you have any questions about running JavaScript in your document.
Advanced PDF Options
99% of users don't need these options. Almost all PDF styling (headers, footers, page size, etc.) is controlled via simple CSS styles. But if you are looking for an uncommon file options such as password protection and DPI settings, take a look at these:
By default, errors downloading resources (CSS and images) are ignored when generating the PDF. When this is disabled, the following resource issues will cause document creation to fail: 400s, 500s, DNS resolution errors, unknown mime types, connection timeouts, SSL issues, rejected connections, and use of protocol-independent URIs without proper HTTP baseurl.
Disables parallel fetching of assets during PDF creation. Useful if your asset host has strict rate limiting.
Specify how long to wait, in seconds, when requesting an external resource. Accepts a value ranging from 1 to 60. By default, DocRaptor will attempt to fetch any external resource for up to 10 seconds. You can set a longer timeout to force DocRaptor to wait for a large file, for example, or shorten it to skip resources that are unavailable.
Specify the input type of the document to be used by Prince during processing.
By default, Prince sets the page DPI for generated PDFs to 96. However, when using Prince 9.0 or higher, you can override this setting to use the DPI you prefer.
If you need PDF profile support (Pipeline 3+ only), you can set this option. See Prince's documentation for details.
Option | Pipelines |
---|---|
|
7+ |
|
3+ |
|
7+ |
|
6+ |
|
7+ |
|
6+ |
|
6+ |
|
6+ |
|
4+ |
|
4+ |
The title of your PDF, part of the document's metadata. Many PDF viewers use the title as the name of your document. This setting is primarily used for XML-based PDFs as HTML documents automatically use the text of the HTML <title>
element.
DocRaptor attempts to create documents using synchronous creation by default. We set a time limit of 60 seconds for synchronous creation. When a synchronous request completes, DocRaptor will return your generated document.
If you have very large or complex documents, you may wish to switch to asynchronous job creation. Setting this to true
will extend the time spent on your job to 600 seconds, queue your document for background creation and DocRaptor will return JSON with a status_id
key set. e.g.
{
"status_id": "123454321"
}
Making an authenticated request against https://docraptor.com/status/{status_id}
will give you the status of your document job. The returned JSON from that call should look something like:
{
"download_url": "https://docraptor.com/download/12345asdf",
"message": "Completed at Mon Jun 06 18:33:17 +0000 2011",
"number_of_pages": 2,
"status": "completed"
}
When the job is complete, DocRaptor will call the specified
callback_url
if one was provided, via a POST request.
Querying the status URL after the doc has been successfully created will provide a download_url
in the returned JSON. The value associated with that key is a URL from which you can download your doc. This download URL can be used to download your document up to 5 times, and will expire after your account's data retention period. For accounts with the "as short as possible" data retention setting, documents can only be downloaded once.
If DocRaptor encounters an error generating your document, the status value will be failed
. A key validation_errors
will be set with a value corresponding to the reason for the failure. An example of this is:
{
"status": "failed",
"validation_errors": "Name can't be blank\nName is too long (maximum is 200 characters)"
}
If your document has been queued but processing has not yet begun, it will have a status of queued
. If your document is currently being processed, it will have a status of working
.
Our client libraries include links to asynchronous examples on GitHub.
Note: If there is an error creating your document, the callback_url
will never be called. The status page will explain the error.
When set and the async
option is set to true
, DocRaptor will send a POST request to this URL after successfully completing an asynchronous job. The POST will contain the parameter download_url
with the value being a URL where your document can be downloaded, and a download_id
, which will correspond to the status_id
returned by the document creation call.
If the callback URL does not return a 2XX HTTP status code within 10 seconds, we will retry the callback up to three times.
Note: If there is an error creating your document, the callback_url
will never be called. The status page will explain the error.
Creates a hosted document when set to
true
. Hosted documents are a paid add-on.
Asynchronous documents work as usual, except that the download URL in the final status response will be an unbranded domain. Synchronous hosted document requests respond with a JSON object rather than a binary blob. The JSON will look like this:
{
"download_id": "123-456-abc",
"download_url": "http://<<unbranded domain>>/download/123-456-abc",
"number_of_pages": 1
}
The download URL is publicly-accessible and doesn't require authentication.
By default, hosted documents do not having limits on downloads or hosting time, though you may pass additional parameters to the document generation call to set your own limits (see below).
When set and the hosted
option is set to true
, the
hosted_download_limit
option allows you to restrict the number of times the
hosted document can be downloaded. This attribute can be set to any number. Once the
number of downloads has been reached, the document will be made unavailable for download
and permanently removed from DocRaptor.
If no limit is specified, hosted documents will be available for an unlimited number of downloads. You may manually expire a hosted document at any time.
Test documents are limited to 5 downloads regardless of the value you provide for this option.
Keep in mind, hosted documents are a paid add-on and each download of a hosted document is
billed at a rate set by your plan. Limiting the number of downloads by setting
hosted_download_limit
can help control costs.
When set and the hosted
option is set to true
, the
hosted_expires_at
option will allow hosted documents to be available for
download until a specific date and time. The value for this attribute must be a properly
formatted ISO 8601 date time, specifically down to the second, with the optional addition of
a timezone offset. This option will be stored in the UTC timezone.
Once the hosted_expires_at
time has been reached, the document will be made
unavailable for download and permanently removed from DocRaptor.
If no expiration is specified, the document will be available for download indefinitely. You may manually expire a hosted document at any time.
Test documents documents will expire after 1 day regardless of the value you provide for this option.
Keep in mind, hosted documents are a paid add-on and each download of a hosted document is
billed at a rate set by your plan. Limiting the time a document is available by setting
hosted_expires_at
can help control costs.
When set to html
, if input does not pass our HTML validation, the document will fail and we'll report any HTML errors.
For Excel files, we always validate input as XML. Unlike PDFs, XLS files are not free-form, and elements must map to XLS cells clearly and exactly.
Enabling help mode will trigger an email to you and to support, letting us know you'd like assistance in troubleshooting the document styling.
When a document is in help mode, we'll store your document contents for review until it's resolved. You can have up to five active help requests at any given time.
HTTP Status Codes
Your request was made successfully, and DocRaptor has returned a document. We will also return a 200 code when an asynchronous document has successfully generated.
This error code means your request can not be completed as expected. DocRaptor will return this code if the download key is not valid, if the requested document has not completed, if there is an error in your generated document, or if there are errors in your HTTP POST request.
This status code means authorization is required, but has either not been provided or is incorrect. DocRaptor will return this status code if the API key provided is incorrect.
A 403 status code means the request was made correctly, but the server is refusing to respond. We will return this status code if you do not have permission to view the status of that document, or if you are making too many simultaneous document generation requests.
This error means your input document has syntax errors and DocRaptor can not process it as expected.
Limits
We do not impose hard limits on numbers of pages, document complexity, input size, or output size (except for hosted documents). We limit generation time, simultaneous requests, and documents created per billing period (this is defined by your DocRaptor plan).
Simple documents may only take a few seconds to generate, and a more complicated document with many external resources to fetch or scripts to run may take several minutes to create.
DocRaptor has four limits enforced:
Synchronous Document Generation Time (default): | 1 minute |
Asynchronous Document Generation Time: | 10 minutes |
Simultaneous Request Limit: | 30 |
Hosted Document Output Size: | 100mb |
If your document goes over these limits, it will be killed, and an error will be returned.
Test Documents
All DocRaptor plans have unlimited test documents so you can make sure the document looks exactly the way you want. When you set the 'test' parameter to 'true', your documents will not count against your plan limit.
Some things to keep in mind:
- Test PDFs will be watermarked.
- Test Excel documents will be cut off after 20 rows.
- Test hosted documents will be limited to 5 downloads.
- Test hosted documents will expire after 1 day.
Support & Debugging
We are here to guide you while integrating, running, and debugging any issue you might have. You can always reach us via email at support@docraptor.com. You can also chat with us live from our website, or send us a message by selecting "Help!" when logged into your dashboard.
If you need a hand with a specific document, you can open a Help Request right from your dashboard. This will share the document input, output, and log so we can get right to fixing your issue. Just log into your account, select "Doc Log" in the top left corner, then "Request Help" for the document you're having trouble with. Our support team will be with you as soon as we've received your request.
Hosted Documents
As a paid add-on , DocRaptor can provide long-term, publicly-accessible hosting for your documents. We'll host the document on your behalf at a completely unbranded URL for as long as you want, or within the limits you specify.
To create a hosted document, simply set the hosted
parameter to true
when using the HTTP API, or use the hosted document method in of our language-specific libraries.
File Size Limit
Unlike regular documents, the output file size is limited to keep hosting affordable. Hosted documents output must be less than 100mb.
Removing a hosted document
By default, we'll permanently host the document, but you can define a download limit or expiration date when you create the document. You can also expire the document at any time through the API or your DocRaptor dashboard.
To remove a hosted document through the interface, visit your documents log, view the desired document, and click "expire this document".
To expire a document through the API, make an authenticated
PATCH request to http://docraptor.com/expire/{download_id}.json
(or .xml) where {download_id}
is the value provided in the original document
request. The response will return a 200 status code to let you know that the
document was successfully expired.
If DocRaptor encounters an error expiring your document, the response will return a
4XX HTTP status code and a key of
errors
will be set with an array of values corresponding to the reasons for the
failure. An example of this is:
{
"errors": [
"[Invalid Action Error] Requested document cannot be expired."
]
}
Referrer-Based Documents
DocRaptor makes it easy to convert any webpage you have control over into a document using a simple anchor tag. On your account management page, you can add domains you would like to link to DocRaptor, and requests to DocRaptor to create docs that have that domain as part of their HTTP_REFERER HTTP header will be generated using your account without the need for an API key. Click "domains” after logging in to manage your domains!
URL
Once you've set up your domains, you can make a GET request against either:
- https://docraptor.com/docs/from_site
JavaScript & PDF Options
You can enable JavaScript or any of our PDF Options by using the same parameters as our Document API, but in a query string format.
Example Code
As an example, you can download a PDF of this page using the below code:
<a href='https://docraptor.com/docs/from_site/?name=example&type=pdf'>
Download a PDF
</a>
With JavaScript enabled and media set to screen
:
<a href='https://docraptor.com/docs/from_site/?javascript=true&prince_option[media]=screen&name=example&type=pdf'>
Download a PDF
</a>
Document Listing API
You can also get a list of previously created documents through the API. This is just information like the name, the date, and if it was a test document. Since we don't actually store the created document, we can't return that. Info about the documents is returned as XML or JSON in a paginated list, ordered by date of creation (most recent first).
URL
You can make a GET request against either:
- https://docraptor.com/docs.json (or .xml)
Parameters
The following are the parameters that DocRaptor expects when you request the document listing.
Page
Specifies the page (in terms of pagination) of documents to return
Name: page
Default: 1
Per Page
Specifies the number of documents per page (in terms of pagination) to return
Name: per_page
Default: 100
Document Log Listing API
You can also get a list of all previously attempted document creations. The returned data here includes information about document creation success and failure, any errors that were encountered, and information about generation time. The information is returned as XML or JSON in a paginated list, ordered by date of creation (most recent first).
URL
You can make a GET request against either:
- https://docraptor.com/doc_logs.json (or .xml)
Parameters
The following are the parameters that DocRaptor expects when you request the doc log listing.
Page
Specifies the page (in terms of pagination) of documents to return
Name: page
Default: 1
Per Page
Specifies the number of documents per page (in terms of pagination) to return
Name: per_page
Default: 100
Search Start
Specifies the start date of the search range for returning doc log entries.
Name: search[start]
Default: 1 month ago
Search End
Specifies the end date of the search range for returning doc log entries.
Name: search[end]
Default: The current date
IP Listing API
Using IPs for securing assets is not recommend, as these addresses will change over time without notice. Instead, the recommended approach is to use HTTP basic auth over HTTPS or prince_options[http_proxy]. This API is provided if no other alternative is available.
Use this endpoint to get a list of IP addresses that DocRaptor currently uses to download assets.
URL
You can make a GET request against either: