পৃষ্ঠাসমূহ

Search Your Article

CS

 

Welcome to GoogleDG – your one-stop destination for free learning resources, guides, and digital tools.

At GoogleDG, we believe that knowledge should be accessible to everyone. Our mission is to provide readers with valuable ebooks, tutorials, and tech-related content that makes learning easier, faster, and more enjoyable.

What We Offer:

  • 📘 Free & Helpful Ebooks – covering education, technology, self-development, and more.

  • 💻 Step-by-Step Tutorials – practical guides on digital tools, apps, and software.

  • 🌐 Tech Updates & Tips – simplified information to keep you informed in the fast-changing digital world.

  • 🎯 Learning Support – resources designed to support students, professionals, and lifelong learners.

    Latest world News 

     

Our Vision

To create a digital knowledge hub where anyone, from beginners to advanced learners, can find trustworthy resources and grow their skills.

Why Choose Us?

✔ Simple explanations of complex topics
✔ 100% free access to resources
✔ Regularly updated content
✔ A community that values knowledge sharing

We are continuously working to expand our content library and provide readers with the most useful and relevant digital learning materials.

📩 If you’d like to connect, share feedback, or suggest topics, feel free to reach us through the Contact page.

Pageviews

Sunday, January 22, 2017

PDFBox - Reading Text

In the previous chapter, we have seen how to add text to an existing PDF document. In this chapter, we will discuss how to read text from an existing PDF document.

Extracting Text from an Existing PDF Document

Extracting text is one of the main features of the PDF box library. You can extract text using the getText() method of the PDFTextStripper class. This class extracts all the text from the given PDF document.
Following are the steps to extract text from an existing PDF document.

Step 1: Loading an Existing PDF Document

Load an existing PDF document using the static method load() of the PDDocument class. This method accepts a file object as a parameter, since this is a static method you can invoke it using class name as shown below.
File file = new File("path of the document") 
PDDocument document = PDDocument.load(file);

Step 2: Instantiate the PDFTextStripper Class

The PDFTextStripper class provides methods to retrieve text from a PDF document therefore, instantiate this class as shown below.
PDFTextStripper pdfStripper = new PDFTextStripper();

Step 3: Retrieving the Text

You can read/retrieve the contents of a page from the PDF document using the getText() method of the PDFTextStripper class. To this method you need to pass the document object as a parameter. This method retrieves the text in a given document and returns it in the form of a String object.
String text = pdfStripper.getText(document);

Step 4: Closing the Document

Finally, close the document using the close() method of the PDDocument class as shown below.
document.close();

Example

Suppose, we have a PDF document with some text in it as shown below.
Example PDF This example demonstrates how to read text from the above mentioned PDF document. Here, we will create a Java program and load a PDF document named new.pdf, which is saved in the path C:/PdfBox_Examples/. Save this code in a file with name ReadingText.java.
import java.io.File;
import java.io.IOException;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
public class ReadingText {

   public static void main(String args[]) throws IOException {

      //Loading an existing document
      File file = new File("C:/PdfBox_Examples/new.pdf");
      PDDocument document = PDDocument.load(file);

      //Instantiate PDFTextStripper class
      PDFTextStripper pdfStripper = new PDFTextStripper();

      //Retrieving text from PDF document
      String text = pdfStripper.getText(document);
      System.out.println(text);

      //Closing the document
      document.close();

   }
}
Compile and execute the saved Java file from the command prompt using the following commands.
javac ReadingText.java 
java ReadingText
Upon execution, the above program retrieves the text from the given PDF document and displays it as shown below.
This is an example of adding text to a page in the pdf document. we can add as many lines
as we want like this using the ShowText() method of the ContentStream class.

No comments:

Post a Comment