This project is read-only.

Project Description
hOcr2Pdf.NET is a .NET library to create or convert .hocr html produced by Tesseract or Cuneiform into highly compressed searchable pdfs using HtmlAgilityPack, Jbig2 and iTextSharp. It is written in C#.

  • Simple design. Create or edit pdf files with PDFDoc.Open() or PDFDoc.Create()
  • Easily add new scanned image pages
  • Ocr new or existing PDFs
  • Use different images for OCR and display
  • Optionally Define fonts to use for OCR output for perfectly underlayed text.
  • Compress PDFs with Jbig2
  • Provides common utility methods for searching, rotating, bookmarking, setting attributes such as title, author, etc...

Special thanks to the developers of: --to encode images to jbig2 --to parse the hocr files --used for OCR and hocr output --used for OCR and hocr output --used to create/edit pdfs -- used for pdf page extraction

Last edited Feb 17, 2015 at 2:36 PM by pwizzle, version 60