Generate good looking PDFs with WeasyPrint and Jinja2

Generate good looking PDFs with WeasyPrint and Jinja2
Photo by Austin Distel / Unsplash

Most Python based PDF generation tools either suck, or cost a ton of money. One example is ReportLab, which provides a free version that's completely useless for generating good looking PDFs, and a premium version that costs way too much money if you're a small business.

Are you doomed to relying on external APIs for generating PDFs? Of course not, open source will never let you down.

Enter WeasyPrint

After a lot of testing, I settled for WeasyPrint because it allows you to generate PDFs using HTML and CSS, which is great for two reasons:

  1. Every web developer is familiar with those technologies.
  2. CSS (especially flexbox) allows you to build great looking PDFs.

Dynamic HTML rendering with Jinja2

But you don't just want to static HTML pages into PDFs, you want your application to dynamically inject data into the HTML before rendering them as PDF.

That's where Jinja2 shines.

  • You can break down your HTML templates into separate pieces so that they're more manageable.
  • You can use filters, control flow techniques, tests, and much more to control how your templates receive data and displays them.
  • You can sanitize user input before inserting them into the template, which will shield you from security issues.

Setting up WeasyPrint

There are different ways of using the WeasyPrint developer API. In this post, I'll show you my preferred way of using a local CSS file, and passing a Python string containing HTML text directly to the constructor.

Dump the following in your Python file (you can name it app.py):

import tempfile
import webbrowser
from pathlib import Path

def get_html():
  with (Path("templates") / "index.html").open("r", encoding="utf-8") as fp:
    return fp.read()

def generate_pdf():
  css = CSS('styles.css')
  html = HTML(string=get_html())
  out_path = tempfile.mktemp(suffix=".pdf")
  html.write_pdf(target=out_path, stylesheets=[css])
  return out_path


if __name__ == "__main__":
    pdf_path = generate_pdf(pdf_data)
    webbrowser.open(pdf_path)

app.py

This simple program does the following:

  1. Load CSS rules from a file called styles.css in the project directory.
  2. Load index.html from a directory called templates.
  3. Generate the PDF given the styles and HTML skeleton provided.
  4. Write the newly generated PDF to a temporary location.
  5. Open the PDF file using the default web browser on your computer.

To try it out, simple create the following index.html file in your templates directory:

<!DOCTYPE html>
<html>

<head>
  <meta charset="UTF-8" />
  <title>PDF Invoice</title>
</head>

<body>
    <h1>Hello world!</h1>
</body>

</html>

templates/index.html

And then, create the styles.css file for testing:

h1 {
  color: red;
}

styles.css

Run the program and a PDF file will open in your web browser containing an HTML H1 element colored red.

Now that you've create a simple static PDF, it's time to generate PDF containing dynamic data from your database or other sources. For that, we'll rely on Jinja2.

Loading Dynamic Data

Earlier, you loaded index.html containing some static HTML. This time, you'll generate HTML containing your own data.

First, update your generate_pdf function to accept an extra argument called pdf_data.

...
def generate_pdf(pdf_data):
  ...
...

app.py

Create a file called pdf-data.json in the project root:

{
  "customer": {
    "name": "Josh Karamuth",
    "email": "[email protected]",
  },
  "line_items": [
    {
      "name": "Chicken",
      "qty": 10,
      "total": 99.99
    },
    {
      "name": "Sauce",
      "qty": 1,
      "total": 147.00
    }
  ],
  "created_at": "2024-01-10T15:32:35.539227"
}

pdf-data.json

This is fictional data for generating a customer's PDF invoice after they made a purchase.

In your own production use, this data will likely come from a database, but JSON works fine for us right now.

Load the file when the script starts and pass it to the generate_pdf function:

import json
...

if __name__ == "__main__":
  with open('test-data.json', 'r', encoding='utf-8') as fp:
    pdf_path = generate_pdf(json.load(fp))
    ...

app.py

We load the JSON, turn it into a dict using the json library, and pass it along to the generate_pdf function.

Now it's time to make use of this data.

HTML Templates with Jinja2

You can do crazy things using Jinja2 but here, we'll keep things simple because we're only interested in generating PDFs instead of building a web framework.

Update your get_html function to start making use of Jinja2:

from jinja2 import Environment, FileSystemLoader


def get_html(pdf_data):
  env = Environment(loader=FileSystemLoader("templates")
  tmpl = env.get_template("index.html.j2")
  return tmpl.render({"pdf_data": pdf_data})

app.py

Notice 3 things:

  1. First, you're creating an Environment using a FileSystemLoader.
  2. Then, you retrieve the index template from the environment.
  3. Finally, you render it as a string, passing the PDF data along.

Go ahead and rename your index.html file to index.html.j2 right now.

Run the program and you'll see the same PDF that you generated earlier show up in your web browser. Looks like Jinja2 has been setup correctly, let's continue.

Rendering Dynamic Data with Jinja2

Earlier, you created an Environment containing templates from your templates directory.

Environments allow you to use templates partials and inheritance. To understand, create a file called _header.html.j2 inside the templates directory.

<header>
  <h1>PDF Invoice</h1>
</header>

_header.html.j2

Update your index.html.j2 to use this partial:

...
<body>
  {% include "_header.html.j2" %}
</body>
...

Run the program again and you'll see a PDF containing this new H1.

Notice how I named the partial starting with an underscore. This isn't required but just my personal preference. It allows me to distinguish whole templates files from partials.

Now let's use the data we loaded earlier.

<header>
  <h1>
    PDF Invoice
  </h1>
  <p><b>Date:</b> {{ pdf_data.created_at }}</p>
</header>

_header.html.j2

<section>
  <h2>Items</h2>
  {% with line_items = pdf_data.line_items %}
    {% if line_items|length > 0 %}
    <table>
      <thead>
        <tr>
          <th>Name</th>
          <th>Quantity</th>
          <th>Total</th>
        </tr>
      </thead>
      <tbody>
        {% for line_item in line_items %}
          <tr>
            <td>{{ line_item.name }}</td>
            <td>{{ line_item.qty }}</td>
            <td>{{ line_item.total }}</td>
          </tr>
        {% endfor %}
      </tbody>
    </table>
    {% else %}
    <p>There aren't any products in this order.</p>
    {% endif %}
  
  {% endwith %}
</section>

_line-items.html.j2

Don't forgot to include the newly created _line-items.html.j2 partial in your index.html.j2 file.

<body>
  ...
  {% include "_line-items.html.j2" %}
</body>

_index.html.j2

Run the program again and you'll see a PDF containing all your data in a nice table.

Notice how the created_at date doesn't look user friendly though. Let's fix that next.

Transforming Data with Jinja2 filters

You can easily control how your data is rendered by passing them through a filter.

Jinja2 comes with a bunch of built in ones. If you look closely, we already made use of one called length earlier when we checked if the length of the line_items list is greater than zero. The length filter returns an integer representing the length of the list, as the name suggests.

You'll now create your own filter to format the date to make it easier to read:

def format_date(iso_string: str, format="%m-%d-%Y"):
    return datetime.fromisoformat(iso_string).strftime(format)

def get_html(pdf_data):
  ...
  env.filters['format_data'] = format_data
  ...

app.py

You create a new function format_date and then you add it to the filters in your environment. Everything else stays the same.

Now, use it in your template.

...
<p><b>Date:</b> {{ pdf_data.created_at|format_date }}</p>
...

_header.html.j2

After rendering the PDF again, you'll see the date printed as the default format string of Month-Day-Year.

If you need a different format, simply pass it along as an argument like so:

...
<p><b>Date:</b> {{ pdf_data.created_at|format_date("%d-%m-%Y") }}</p>
...

_header.html.j2

The date will now show up as Day-Month-Year.

Good Looking Layouts with CSS

The best feature of WeasyPrint is it's support for CSS Flexbox, which allows you to easily build any kind of layouts you need.

Try it out.

header {
  display: flex;
  flex-direction: column;
  align-items: center;
  justify-content: center;
  gap: 20px;
}

You'll the H1 and Date nicely centered on the page, with breathing space between them.

You can even use most of the fonts available on your server. If you need to use a specific font, from Google Fonts or anywhere else, simply download the font file, and move to the /usr/local/share/fonts directory. Type fc-list at the CLI and you should see it show up.

Now simply use it in your CSS:

.my-paragraph {
  font-family: "Noto Serif";
}

Piece of cake.

Done

As you saw, setting up WeasyPrint and Jinja2 to generate PDFs is easy and also allows you to provide professional looking PDF files to your users. All of that for free!

If you found this post helpful, please subscribe to my newsletter for free so that I can continue writing more posts like this.