Wednesday, March 28, 2007

PDF editing on Linux (and other open source OS)

I recently received several PDF documents to fill out for background search (I'm applying for summer internship). However, the PDFs did not use proper form fields, so I cannot use Acrobat Reader to fill them. I surveyed two PDF editors that serve this particular purpose—add text to an existing PDF without form.

Before I began, I did some search on Google. I found this article (mentioning flpsed) that looks somewhat outdated. I also remember reading about a PDF editor based on QT (PDFedit) a while ago. I decided to try both.

I tried PDFedit first because it looks more recent. According to the user documentation, it is quite powerful. It has a Javascript-like scripting language (QSA) that let you automate many things. It can alter existing text attributes, delete objects, add lines and rectangles, and add text. It also lets you view PDF object tree and edit it directly. PDFedit supports multi-page documents, which Adobe Illustrator doesn't.

As for the function I was looking for, adding text to a document to fill out forms, PDFedit performs rather poorly. The button to add text does not have an obvious icon. Once you find the button and click on it, your mouse pointer turns into "add text" mode, so you can now click anywhere in the document to add text.

When you click to add text, PDFedit creates a small overlaid text box that lets you type. However, if you type too much, parts of your typing is scrolled away and becomes temporarily hidden. When you press the Enter key, everything you just typed (including the occluded part) is rendered onto the document, but with a different font and size. The desired font can be chosen using a drop-down box on the toolbar, but this setting is not previewed when you type.

Once you entered the text, you may discover that the placement is slightly off. You must switch the mouse back to selection mode, select the text you just entered, and drag it. Mouse selection does not work well, and you can easily select the wrong object or some mysterious object not visible in the document.

After adding text, dragging, and adding more text for a few times, PDFedit becomes very slow. On a fairly recent machine (Pentium 4, 2.4Ghz), it can take 20 seconds from pressing the Enter key until text appears in the document.

When I finally struggled through, I saved the resulting PDF file and tried to open it in Acrobat Reader. It opened the document only partially and complained about a "q" operator that is illegal in text. I tried opening in GhostView, and it just rejected the document completely. I tried loading it in PDFedit again, and it was fine. It seems that after you tinker a PDF with PDFedit, the only program you can ever open it again is PDFedit.

I decided to give flpsed a try. The reason I didn't try it first is because the article I read claimed that it only supports PostScript files. It wasn't that big of a problem because PostScript can be converted back and forth from PDF using GhostScript, but with some loss in font details.

Compiling flpsed from source was a pleasure. It requires fltk 2.0, which took only a minute to compile; flpsed itself took only a few seconds. On the contrary, PDFedit requires some functionality from boost, and both took me tens of minutes to compile.

With flpsed, I first tried converting PDF to PostScript, and it worked fine. The program is simple and serves only one purpose: add text to an existing document. It also supports multi-page documents. I can choose font sizes, but not font family. Adding text is simple—you click on anywhere in the document and start typing away. You can only edit what you typed, but you cannot modify original document text.

Flpsed is also much more responsive than PDFedit. Typing, selecting, and moving text is instantaneous. You can also move text with your keyboard cursor keys for more precise alignment.

After finishing the forms, I discovered that flpsed also imports directly from PDF and exports back to PDF. This actually produces better result than doing PDF to PostScript conversion. I started over with PDF, and it was a breeze to do.

However, the simplicity of flpsed comes with functionality trade-off. For example, there is no cut and paste function. It also doesn't have input method support (which is required for many non-English languages). I also cannot add image—this is to simulate signing a document, although it may be a bad idea to allow your signature to be infinitely reproducible by putting it inside a printable PDF document.

Here is my recommendation: if you need to fill PDF "forms" that do not have form fields, flpsed does the job beautifully and efficiently. But don't expect it to do much else.

No comments: