Sunday, January 22, 2017

Lucene - Quick Guide

Lucene is simple yet powerful java based search library. It can be used in any application to add search capability to it. Lucene is open-source project. It is scalable and high-performance library used to index and search virtually any kind of text. Lucene library provides the core operations which are required by any search application. Indexing and Searching.

How Search Application works?

Any search application does the few or all of the following operations.
StepTitleDescription
1Acquire Raw ContentFirst step of any search application is to collect the target contents on which search are to be conducted.
2Build the documentNext step is to build the document(s) from the raw contents which search application can understands and interpret easily.
3Analyze the documentBefore indexing process to start, the document is to be analyzed as which part of the text is a candidate to be indexed. This process is called analyzing the document.
4Indexing the documentOnce documents are built and analyzed, next step is to index them so that this document can be retrived based on certain keys instead of whole contents of the document. Indexing process is similar to indexes in the end of a book where common words are shown with their page numbers so that these words can be tracked quickly instead of searching the complete book.
5User Interface for SearchOnce a database of indexes is ready then application can make any search. To facilitate user to make a search, application must provide a user a mean or u0ser interface where a user can enter text and start the search process.
6Build QueryOnce user made a request to search a text, application should prepare a Query object using that text which can be used to inquire index database to get the relevant details.
7Search QueryUsing query object, index database is then checked to get the relevant details and the content documents.
8Render ResultsOnce result is received the application should decide how to show the results to the user using User Interface. How much information is to be shown at first look and so on.
Apart from these basic operations, search application can also provide administration user interface providing administrators of the application to control the level of search based on the user profiles. Analytics of search result is another important and advanced aspect of any search application.

Lucene's role in search application

Lucene plays role in steps 2 to step 7 mentioned above and provides classes to do the required operations. In nutshell, lucene works as a heart of any search application and provides the vital operations pertaining to indexing and searching. Acquiring contents and displaying the results is left for the application part to handle. Let's start with first simple search application using lucene search library in next chapter.

Lucene - Environment Setup

Environment Setup

This tutorial will guide you on how to prepare a development environment to start your work with Spring Framework. This tutorial will also teach you how to setup JDK, Tomcat and Eclipse on your machine before you setup Spring Framework:

Step 1 - Setup Java Development Kit (JDK):

You can download the latest version of SDK from Oracle's Java site: Java SE Downloads. You will find instructions for installing JDK in downloaded files, follow the given instructions to install and configure the setup. Finally set PATH and JAVA_HOME environment variables to refer to the directory that contains java and javac, typically java_install_dir/bin and java_install_dir respectively.
If you are running Windows and installed the JDK in C:\jdk1.6.0_15, you would have to put the following line in your C:\autoexec.bat file.
set PATH=C:\jdk1.6.0_15\bin;%PATH%
set JAVA_HOME=C:\jdk1.6.0_15
Alternatively, on Windows NT/2000/XP, you could also right-click on My Computer, select Properties, then Advanced, then Environment Variables. Then, you would update the PATH value and press the OK button.
On Unix (Solaris, Linux, etc.), if the SDK is installed in /usr/local/jdk1.6.0_15 and you use the C shell, you would put the following into your .cshrc file.
setenv PATH /usr/local/jdk1.6.0_15/bin:$PATH
setenv JAVA_HOME /usr/local/jdk1.6.0_15
Alternatively, if you use an Integrated Development Environment (IDE) like Borland JBuilder, Eclipse, IntelliJ IDEA, or Sun ONE Studio, compile and run a simple program to confirm that the IDE knows where you installed Java, otherwise do proper setup as given document of the IDE.

Step 2 - Setup Eclipse IDE

All the examples in this tutorial have been written using Eclipse IDE. So I would suggest you should have latest version of Eclipse installed on your machine.
To install Eclipse IDE, download the latest Eclipse binaries from http://www.eclipse.org/downloads/. Once you downloaded the installation, unpack the binary distribution into a convenient location. For example in C:\eclipse on windows, or /usr/local/eclipse on Linux/Unix and finally set PATH variable appropriately.
Eclipse can be started by executing the following commands on windows machine, or you can simply double click on eclipse.exe
 %C:\eclipse\eclipse.exe
Eclipse can be started by executing the following commands on Unix (Solaris, Linux, etc.) machine:
$/usr/local/eclipse/eclipse
After a successful startup, if everything is fine then it should display following result:
Eclipse Home page

No comments:

Post a Comment