Indexing process is one of the core functionality provided by Lucene.
Following diagram illustrates the indexing process and use of classes.
IndexWriter is the most important and core component of the indexing
process.

We add
Document(s) containing
Field(s) to
IndexWriter which analyzes the
Document(s) using the
Analyzer and then creates/open/edit indexes as required and store/update them in a
Directory. IndexWriter is used to update or create indexes. It is not used to read indexes.
Indexing Classes:
Following is the list of commonly used classes during indexing process.
Sr. No. | Class & Description |
1 | IndexWriter
This class acts as a core component which creates/updates indexes during indexing process. |
2 | Directory
This class represents the storage location of the indexes. |
3 | Analyzer
Analyzer class is responsible to analyze a document and get the
tokens/words from the text which is to be indexed. Without analysis
done, IndexWriter can not create index. |
4 | Document
Document represents a virtual document with Fields where Field is
object which can contain the physical document's contents, its meta data
and so on. Analyzer can understand a Document only. |
5 | Field
Field is the lowest unit or the starting point of the indexing
process. It represents the key value pair relationship where a key is
used to identify the value to be indexed. Say a field used to represent
contents of a document will have key as "contents" and the value may
contain the part or all of the text or numeric content of the document.
Lucene can index only text or numeric contents only. |
No comments:
Post a Comment