About Duffbert...

Duffbert's Random Musings is a blog where I talk about whatever happens to be running through my head at any given moment... I'm Thomas Duff, and you can find out more about me here...

Email Me!

Search This Site!

Custom Search

I'm published!

Co-author of the book IBM Lotus Sametime 8 Essentials: A User's Guide
SametimeBookCoverImage.jpg

Purchase on Amazon

Co-author of the book IBM Sametime 8.5.2 Administration Guide
SametimeAdminBookCoverImage.jpg

Purchase on Amazon

MiscLinks

Visitor Count...



View My Stats

« Joe and I make our debut in Lotus Advisor! | Main| It's now official... Joe and I are speaking at Lotusphere 2004!!! Whoo-hoo! »

How The Fuzzy Search Option Works...

Category Software Development

This KnowledgeBase item gives a very good explanation as to how the Fuzzy Search option works in Notes, and how you can manipulate it to get the level of "fuzziness" you want...  

How Does the Fuzzy Search Option Work?

Document Number:  1088269

Problem
In the Notes client, the Fuzzy Search option is available on the Search Bar of a full-text-indexed database.  What type of results does the Fuzzy Search option generate?

Content
The power of Fuzzy Search is to find results that are not an exact match to the query term.  Fuzzy Search logic can be thought of as an "Expanded Or" search that allows users to find as many of query terms as possible but not necessarily all of them.  Fuzzy Search logic performs searches based on the similarity of character string but not based on meaning.  The Fuzzy Search logic used by the Notes Client allows specifically for text searching in which the logic has the ability to recognize incomplete hits in a document's text.  

This can be very important in the context in which documents contain errors or variables in the words or terms.  For example, documents that are converted through optical character recognition (OCR) may have many unrecognized characters within the words.  These types of errors can not be completely compensated for by wildcards and word stemming because there is no way to predict where the errors may occur.  If the error occurs in the stem word, the wildcard character and word stemming methods are ineffective.  Fuzzy Search logic determines that, if the hit term has at least some of the characters of the query term, it may be a valid hit.

Fuzzy Search finds matches using the base word described in the Notes Client Help under the topic "Word Variants" (as shown in the Supporting Information section below).  The size of the base word is determined by the parameter Matchinglevel but must be a minimum of three letters long and starts from the left side of the query term.  The Matchinglevel parameter determines what percent of the word needs to be matched.  The default value for this parameter is 75%.  

Using the example "Rossberg", 75% of the word to match would be Rossbe.  If a user types in Rossburg with an "u", documents containing Rossberg with an "e" are not returned because Rossberg does not match 75% of the base word.

To change the Matchinglevel parameter, in a Search bar, type in the following:

matchinglevel XX searchword

where XX is a number between 5 and 95, and "searchword" is the word to match.  Typing zero for this number yields zero matches for the base word, and 100 yields only exact matches.  Matchinglevel and the number must be entered to the left of the word to match.

In the case of double letters, a word will be returned if one of the letters is missing.  Using the Rossberg example, searching on Rosberg with one "s" will return documents containing Rossberg with two "s".

NOTE:  Fuzzy Search is not designed to work on small words.  Using the example of searching for the names James and Janet, if Janes is the search word and matchinglevel is set to 40%, the resulting base word is "Ja".  Fuzzy Search will not work with a base word of this size.


Supporting Information:

Examples of different types of search done by Fuzzy Search:

Various expression of phrases

Search for: new  technology
Returns: new  CMOS  technology

Search for:  user requirement          
Returns:  user group requirement OR user has a requirement

Incorrect Writing
Search for:  Califorrnia  (misspelled)
Returns:  California (spelled correctly)

Search for: Palalto (misspelled)
Returns: Palo Alto (spelled correctly)

Example of Inflection Searching
Search for:  communication    
Returns: communicate OR  communicating OR communi-cation

Search for:  Study
Returns: studies OR studied OR studio

Note:  for Fuzzy Search results on query term Study, studio is a valid result but may not be a result expected by the user.

Alternative expression
Search for: database
Returns: data-base OR data base

Search for: run-time
Returns: runtime OR run time


From the Notes Client Help (R5):
Use Word Variants
This option finds words with the base word + certain suffixes. For example, a search for "swim" will also find "swims," "swimming," "swimmer," and even "swimmed." It will not find the variation "swam," however, because the base word has changed, or "swimmet," or "swimsed" because the suffixes are not  acceptable with that word.

Comments

Gravatar Image1 - The Fuzzy Search option helps to find words that are similar but not identical to the search term. This can be useful when variations occur in the texts and it helps to bypass spelling errors.

Post A Comment

:-D:-o:-p:-x:-(:-):-\:angry::cool::cry::emb::grin::huh::laugh::lips::rolleyes:;-)

Want to support this blog or just say thanks?

When you shop Amazon, start your shopping experience here.

When you do that, all your purchases during that session earn me an affiliate commission via the Amazon Affiliate program. You don't have to buy the book I linked you to (although I wouldn't complain!). Simply use that as your starting point.

Thanks!

Thomas "Duffbert" Duff

Ads of Relevance...