Kin Lane

Generating HTML from PDF Files

I came across a new tool this week called PDFMasher. PDFMasher converts PDF files containing text into HTML files.

Most e-Book readers support PDF file, but doesn't create a very good user experience because there is limit control over formatting, such as font sizes. PDF on e-Book readers also doesn't allow for annotations.

PDFMasher processes PDFs and asks the user about the role of each section of text in an efficient manner. It handles headers, footers and provides management for content ordering and footnotes.

PDFMasher is still in early development, but they provide a Mac, Windows and Linux versions for download.

I hope they plan for a web-based version as well as API in the future, this is definitely an extremely valuable utility for publishing multiple versions of a document whether it is for delivered to e-Books or for print on demand and self publishing.

I published in the Mimeo Connect Application Directory, because distribution to e-book is becoming a common part of publishing, along side print.