PDF is the cockroach of document formats — it survives everything. In 2026, Python’s PDF tooling has matured to the point where you can generate, modify, and extract data from PDFs reliably and at scale.
pikepdf: Low-Level PDF Manipulation
pikepdf is a Python wrapper around the QPDF C++ library. It handles PDF manipulation at the object level — adding and removing pages, modifying metadata, encrypting and decrypting, and optimizing file size. It’s fast, memory-efficient, and handles edge cases that trip up pure-Python libraries.
Common use cases: merging PDFs from different sources, splitting a multi-page PDF into individual pages, adding watermarks, and compressing PDFs for email attachment. pikepdf handles these with a clean Python API and near-native performance.
ReportLab: Programmatic PDF Creation
ReportLab has been around since the early 2000s, and it’s still the best tool for creating PDFs programmatically. It gives you pixel-level control over layout, which is both a strength and a weakness. Creating a multi-page invoice requires explicitly positioning every element, calculating page breaks, and managing fonts.
The platypus layout engine in ReportLab adds a higher-level abstraction. You define a document as a sequence of flowables — paragraphs, tables, images — and platypus handles pagination. For reports, invoices, and certificates, platypus strikes the right balance between control and convenience.
WeasyPrint: HTML to PDF
WeasyPrint renders HTML and CSS to PDF. It’s the right tool when you need pixel-perfect rendering of complex layouts. Design your document in HTML and CSS, test it in a browser, and generate the PDF with WeasyPrint. The CSS support includes paged media features for headers, footers, and page numbers.
The downside: complex HTML with JavaScript-rendered content doesn’t work. WeasyPrint renders static HTML and CSS. If your content depends on React or Vue for rendering, you need a headless browser like Playwright for PDF generation.
Document Automation at Scale
For generating thousands of documents: use ReportLab for structured documents (invoices, certificates), WeasyPrint for designed documents (reports, proposals), and pikepdf for post-processing (merging, compression). Run generation in parallel with a task queue like Celery. Store generated PDFs in cloud storage with a CDN for delivery.
PDF automation isn’t glamorous, but it’s the backbone of document-heavy industries. The tools are mature, the patterns are established, and the reliability is high.
Discussion
Leave a comment
No comments yet
Be the first to start the conversation.