The document-format PDF ("Portable Document Format") has been around for 26 years now. Developed by Adobe, an open format since 2008, based on the printer language PostScript. This is a page-oriented "move-to / line-to" language.
Apache PDFBox is an open-source Java project that makes it possible to read and write PDF-documents. In this Blog I want to sneak into the way how to write a PDF document programmatically from Java.
PDPageContentStream
writer is a statful object that requires calls in a certain order, else it will throw exceptions.All of these things you need to implement when you want to use PDFBox. There are some github-projects that build on PDFBox and provide more, but they cover just pieces of the missing things.
Mind that you can also use the "Open HTML to PDF" library to generate PDF from HTML (its two required JAR files have about 1.5 MB). If you can facilitate this for your project, it may save you from the complexity of PDFBox.
Following source will create a PDF file like this:
Here is the Java code:
1 | import java.io.IOException; |
Mind that the document must be saved before it gets closed!
Output of this program is:
Font size: 10.0, ascent: 7.18, descent: 2.07, width of 'M': 8.33
PDF page w=595/h=842
Wrote file: HelloWorld.pdf
Some say this is the space between text-lines, i.e. what is below the lower end of a 'g' but still above the upper end of an 'G' that is directly below it. This is the older meaning of the word.
Some say this is the line height, i.e. what is below the lower end of an 'a' and the lower end of an 'a' that is directly below it, thus it includes line spacing. This is the modern meaning that software goes with.
Don't use this term, it is mis-leading :-)
This is how you can get a font's geometry. Mind that with font.getStringWidth(string)
you can measure all kinds of strings except those that contain newlines.
final PDFont font = PDType1Font.HELVETICA;
final float fontSize = 10;
final float ascent = fontSize * font.getFontDescriptor().getAscent() / 1000f;
final float descent = fontSize * -font.getFontDescriptor().getDescent() / 1000f;
// standard line height is 150% of font height
final float lineHeight = (ascent + descent) * 1.5f; // also called "leading" sometimes
The descent
is the part that hangs down from the text baseline for 'g', 'y', 'p', 'j', 'q'. Don't forget to subtract it after a text block, else these letters may be partially covered when being on last line.
Following is the way to get a PDPageContentStream
to write text or draw lines:
final PDDocument document = new PDDocument();
final PDRectangle pageSize = PDRectangle.A4;
final PDPage page = new PDPage(pageSize);
document.addPage(page);
final PDPageContentStream contentStream = new PDPageContentStream(document, page);
Here is how to find out the page geometry:
final PDRectangle pageBox = page.getMediaBox();
final float width = pageBox.getWidth();
final float height = pageBox.getHeight();
final float startX = pageBox.getLowerLeftX();
final float startY = pageBox.getUpperRightY();
When you advance from startY
at top of page towards bottom, you need to subtract the line height, not add it: nextY = currentY - lineHeigth
!
Lots of work to do with PDFBox. Hard parts are pagination (text across multiple pages) and tables, and the combination of the two.
ɔ⃝ Fritz Ritzberger, 2019-05-05