File Formats Supported by Tika
The following table shows the file formats Tika supports.
| File format |
Package Library |
Class in Tika |
| XML |
org.apache.tika.parser.xml |
XMLParser |
| HTML |
org.apache.tika.parser.html and it uses Tagsoup Library |
HtmlParser |
| MS-Office compound document Ole2 till 2007 ooxml 2007 onwards |
org.apache.tika.parser.microsoft
org.apache.tika.parser.microsoft.ooxml and it uses Apache Poi library |
OfficeParser(ole2)
OOXMLParser(ooxml) |
| OpenDocument Format openoffice |
org.apache.tika.parser.odf |
OpenOfficeParser |
| portable Document Format(PDF) |
org.apache.tika.parser.pdf and this package uses Apache PdfBox library |
PDFParser |
| Electronic Publication Format (digital books) |
org.apache.tika.parser.epub |
EpubParser |
| Rich Text format |
org.apache.tika.parser.rtf |
RTFParser |
| Compression and packaging formats |
org.apache.tika.parser.pkg and this package uses Common compress library |
PackageParser and CompressorParser and its sub-classes |
| Text format |
org.apache.tika.parser.txt |
TXTParser |
| Feed and syndication formats |
org.apache.tika.parser.feed |
FeedParser |
| Audio formats |
org.apache.tika.parser.audio and org.apache.tika.parser.mp3 |
AudioParser MidiParser Mp3- for mp3parser |
| Imageparsers |
org.apache.tika.parser.jpeg |
JpegParser-for jpeg images |
| Videoformats |
org.apache.tika.parser.mp4 and org.apache.tika.parser.video this
parser internally uses Simple Algorithm to parse flash video formats |
Mp4parser FlvParser |
| java class files and jar files |
org.apache.tika.parser.asm |
ClassParser CompressorParser |
| Mobxformat (email messages) |
org.apache.tika.parser.mbox |
MobXParser |
| Cad formats |
org.apache.tika.parser.dwg |
DWGParser |
| FontFormats |
org.apache.tika.parser.font |
TrueTypeParser |
| executable programs and libraries |
org.apache.tika.parser.executable |
ExecutableParser |
No comments:
Post a Comment