পৃষ্ঠাসমূহ

Search Your Article

CS

 

Welcome to GoogleDG – your one-stop destination for free learning resources, guides, and digital tools.

At GoogleDG, we believe that knowledge should be accessible to everyone. Our mission is to provide readers with valuable ebooks, tutorials, and tech-related content that makes learning easier, faster, and more enjoyable.

What We Offer:

  • 📘 Free & Helpful Ebooks – covering education, technology, self-development, and more.

  • 💻 Step-by-Step Tutorials – practical guides on digital tools, apps, and software.

  • 🌐 Tech Updates & Tips – simplified information to keep you informed in the fast-changing digital world.

  • 🎯 Learning Support – resources designed to support students, professionals, and lifelong learners.

    Latest world News 

     

Our Vision

To create a digital knowledge hub where anyone, from beginners to advanced learners, can find trustworthy resources and grow their skills.

Why Choose Us?

✔ Simple explanations of complex topics
✔ 100% free access to resources
✔ Regularly updated content
✔ A community that values knowledge sharing

We are continuously working to expand our content library and provide readers with the most useful and relevant digital learning materials.

📩 If you’d like to connect, share feedback, or suggest topics, feel free to reach us through the Contact page.

Pageviews

Sunday, January 22, 2017

Lucene - Analysis

As we've seen in one of the previous chapter Lucene - Indexing Process, Lucene uses IndexWriter which analyzes the Document(s) using the Analyzer and then creates/open/edit indexes as required. In this chapter, we are going to discuss various types of Analyzer objects and other relevant objects which are used during analysis process. Understanding Analysis process and how analyzers work will give you great insight over how lucene indexes the documents.

Following is the list of objects that we'll discuss in due course.
Sr. No.Class & Description
1Token Token represents text or word in a document with relevant details like its metadata(position, start offset, end offset, token type and its position increment).
2TokenStream TokenStream is an output of analysis process and it comprises of series of tokens. It is an abstract class.
3Analyzer This is abstract base class of for each and every type of Analyzer.
4WhitespaceAnalyzer This analyzer spilts the text in a document based on whitespace.
5SimpleAnalyzer This analyzer spilts the text in a document based on non-letter characters and then lowercase them.
6StopAnalyzer This analyzer works similar to SimpleAnalyzer and remove the common words like 'a','an','the' etc.
7StandardAnalyzer This is the most sofisticated analyzer and is capable of handling names, email address etc. It lowercases each token and removes common words and punctuation if any.

No comments:

Post a Comment