Saturday, June 11, 2011

Convert pdf to text file using PDFBox

import java.io.File;
import java.io.PrintWriter;
import org.pdfbox.pdmodel.PDDocument;
import org.pdfbox.util.PDFStreamEngine;
import org.pdfbox.util.PDFTextStripper;


public class PDFText extends PDFStreamEngine
{

    public static void main(String[] args)
    {
    PDDocument pd;
    String parsedText = null;
    try
    {
        File input = new File("D:\\Head First Servlets and JSP, Second Edition.pdf");
        File output = new File("C:\\SampleTex.txt");
        pd = PDDocument.load(input);
        System.out.println(pd.getNumberOfPages());
        System.out.println(pd.isEncrypted());
        pd.save("C:\\Copy_of_main.pdf");
        PDFTextStripper stripper = new PDFTextStripper();
        stripper.setStartPage(1);
        stripper.setEndPage(100);
        parsedText = stripper.getText(pd);
        System.out.println(parsedText);
        String st = stripper.getText(pd);
        PrintWriter pw = new PrintWriter(output);
        pw.print(st);
        pw.close();

        if (pd != null)
        {
        pd.close();
        }
    }
    catch (Exception e)
    {
        e.printStackTrace();
    }
    }
}


Required jar files : fontbox-0.1.0.jar, pdfbox-0.7.3.jar, pdfbox-0.8.0-incubating.jar (add this jar file as a option)

3 comments:

  1. Hi,

    This is a great blog. PDFBox is a library which can handle different types of PDF documents, including encrypted PDF formats and extract's text and has a command line utility as well to convert PDF to text documents. It would be great if you can provide more details about it. Thank you.

    Disable Paste

    ReplyDelete
  2. You can convert pdf to text file by using Aspose.Pdf for Java Library. I have also found code sample on their technical article page for pdf to text file conversion. Hope it will help the users to better understand the conversion technique.

    http://www.aspose.com/docs/display/pdfjava/Converting+text+file+to+PDF

    ReplyDelete

Look up fields in Salesforce Flow Datatable

How to fetch lookup fields on flow datatable in Salesforce Flows. Think before you get the solution, this is what one of my colleague(Hans) ...