পৃষ্ঠাসমূহ

Search Your Article

CS

 

Welcome to GoogleDG – your one-stop destination for free learning resources, guides, and digital tools.

At GoogleDG, we believe that knowledge should be accessible to everyone. Our mission is to provide readers with valuable ebooks, tutorials, and tech-related content that makes learning easier, faster, and more enjoyable.

What We Offer:

  • 📘 Free & Helpful Ebooks – covering education, technology, self-development, and more.

  • 💻 Step-by-Step Tutorials – practical guides on digital tools, apps, and software.

  • 🌐 Tech Updates & Tips – simplified information to keep you informed in the fast-changing digital world.

  • 🎯 Learning Support – resources designed to support students, professionals, and lifelong learners.

    Latest world News 

     

Our Vision

To create a digital knowledge hub where anyone, from beginners to advanced learners, can find trustworthy resources and grow their skills.

Why Choose Us?

✔ Simple explanations of complex topics
✔ 100% free access to resources
✔ Regularly updated content
✔ A community that values knowledge sharing

We are continuously working to expand our content library and provide readers with the most useful and relevant digital learning materials.

📩 If you’d like to connect, share feedback, or suggest topics, feel free to reach us through the Contact page.

Pageviews

Wednesday, January 25, 2017

TIKA - File Formats

File Formats Supported by Tika

The following table shows the file formats Tika supports.
File format Package Library Class in Tika
XML org.apache.tika.parser.xml XMLParser
HTML org.apache.tika.parser.html and it uses Tagsoup Library HtmlParser
MS-Office compound document Ole2 till 2007 ooxml 2007 onwards org.apache.tika.parser.microsoft
org.apache.tika.parser.microsoft.ooxml and it uses Apache Poi library
OfficeParser(ole2)
OOXMLParser(ooxml)
OpenDocument Format openoffice org.apache.tika.parser.odf OpenOfficeParser
portable Document Format(PDF) org.apache.tika.parser.pdf and this package uses Apache PdfBox library PDFParser
Electronic Publication Format (digital books) org.apache.tika.parser.epub EpubParser
Rich Text format org.apache.tika.parser.rtf RTFParser
Compression and packaging formats org.apache.tika.parser.pkg and this package uses Common compress library PackageParser and CompressorParser and its sub-classes
Text format org.apache.tika.parser.txt TXTParser
Feed and syndication formats org.apache.tika.parser.feed FeedParser
Audio formats org.apache.tika.parser.audio and org.apache.tika.parser.mp3 AudioParser MidiParser Mp3- for mp3parser
Imageparsers org.apache.tika.parser.jpeg JpegParser-for jpeg images
Videoformats org.apache.tika.parser.mp4 and org.apache.tika.parser.video this parser internally uses Simple Algorithm to parse flash video formats Mp4parser FlvParser
java class files and jar files org.apache.tika.parser.asm ClassParser CompressorParser
Mobxformat (email messages) org.apache.tika.parser.mbox MobXParser
Cad formats org.apache.tika.parser.dwg DWGParser
FontFormats org.apache.tika.parser.font TrueTypeParser
executable programs and libraries org.apache.tika.parser.executable ExecutableParser

No comments:

Post a Comment