পৃষ্ঠাসমূহ

Search Your Article

CS

 

Welcome to GoogleDG – your one-stop destination for free learning resources, guides, and digital tools.

At GoogleDG, we believe that knowledge should be accessible to everyone. Our mission is to provide readers with valuable ebooks, tutorials, and tech-related content that makes learning easier, faster, and more enjoyable.

What We Offer:

  • 📘 Free & Helpful Ebooks – covering education, technology, self-development, and more.

  • 💻 Step-by-Step Tutorials – practical guides on digital tools, apps, and software.

  • 🌐 Tech Updates & Tips – simplified information to keep you informed in the fast-changing digital world.

  • 🎯 Learning Support – resources designed to support students, professionals, and lifelong learners.

    Latest world News 

     

Our Vision

To create a digital knowledge hub where anyone, from beginners to advanced learners, can find trustworthy resources and grow their skills.

Why Choose Us?

✔ Simple explanations of complex topics
✔ 100% free access to resources
✔ Regularly updated content
✔ A community that values knowledge sharing

We are continuously working to expand our content library and provide readers with the most useful and relevant digital learning materials.

📩 If you’d like to connect, share feedback, or suggest topics, feel free to reach us through the Contact page.

Pageviews

Wednesday, January 25, 2017

TIKA - Extracting XML Document

Given below is the program to extract content and metadata from an XML document:
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;

import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.xml.XMLParser;
import org.apache.tika.sax.BodyContentHandler;

import org.xml.sax.SAXException;

public class XmlParse {

   public static void main(final String[] args) throws IOException,SAXException, TikaException {

      //detecting the file type
      BodyContentHandler handler = new BodyContentHandler();
      Metadata metadata = new Metadata();
      FileInputStream inputstream = new FileInputStream(new File("pom.xml"));
      ParseContext pcontext = new ParseContext();
      
      //Xml parser
      XMLParser xmlparser = new XMLParser(); 
      xmlparser.parse(inputstream, handler, metadata, pcontext);
      System.out.println("Contents of the document:" + handler.toString());
      System.out.println("Metadata of the document:");
      String[] metadataNames = metadata.names();
      
      for(String name : metadataNames) {
         System.out.println(name + ": " + metadata.get(name));
      }
   }
}
Save the above code as XmlParse.java, and compile it from the command prompt by using the following commands:
javac XmlParse.java
java XmlParse 
Below given is the snapshot of pom.xml
XML Document This document has the following properties:
XML Document1 After executing the above program you will get the following output.
Output:
Contents of the document:   
4.0.0
org.apache.tika
tika
1.6
org.apache.tika
tika-core
1.6
org.apache.tika
tika-parsers
1.6
src
maven-compiler-plugin
3.1
1.7
1.7

Metadata of the document:
Content-Type:   application/xml 

No comments:

Post a Comment