No color in pdf's

Apr 12, 2014 at 3:25 AM
I'm converting pdf's to jpg(colored) images using ghostscript ,tesseract-ocr to convert them to hocr format and then using this library to convert hocr to pdf format, but pdf's are loosing color, only black and white pdf's are generating.

Any help would be appreciated.

Thanks
Coordinator
Apr 12, 2014 at 7:10 PM
Edited Apr 12, 2014 at 7:11 PM
hi qayyum,
Whenever a hocr file is directly added to the PDFCreator, it uses the image found in the ocr file (which is the image used by tesseract).

I would recommend this:
PdfCreator p = new PdfCreator(outputPDF);
            p.PDFSettings.ImageType = ImageType.Jpeg;
            p.OnProcessImageForOcr +=new ProcessImageForOcr(p_OnProcessImageForOcr); //convert to black and white for ocring
            p.AddPage(colorIMage);
            p.SaveAndClose();
I hope this helps.

The other way you can do it is by:
 PdfCreator p = new PdfCreator();
                hDocument doc = new hDocument();
                doc.AddFile(hocr_file);
                foreach(hPage pg in doc.Pages)
                {
                    p.AddPage(pg, pageImage);
                }
                p.SaveAndClose();
Apr 12, 2014 at 10:09 PM
Thanks very much, it did the trick, works perfectly.