Search inside Lucene in Action

Query parsed to: index fileindex

101 - 120 of 230 results (Page 6 of 12)

8.2.3 : LIMO: Lucene Index Monitor

starts on page 279 under section 8.2 (Interacting with an index) in chapter 8 (Tools and extensions)

...Julien Nioche is the creator of Lucene Index Monitor (LIMO).2 It's available online at http://limo.sourceforge.net/. LIMO provides a web browser interface to Lucene indexes, giving you a quick look at index status information such as whether an index is locked, the last modification date ... precon- figured indexes. To install LIMO, follow these steps: 1 Download the LIMO distribution, which ... a couple of references to Lucene index directories. LIMO uses context parameters in the web.xml file...

4.0 : Analysis

starts on page 102

... text into its most fundamen- tal indexed representation, terms. These terms are used to determine what docu- ments match a query during searches. For example, if this sentence were indexed into a field ... that text. In order for Lucene to know what "words" are, it analyzes the text during indexing, extracting ... too common. However, Google doesn't throw away these stop words during indexing, as you can ... . This is an interesting phenomenon: An astounding number of stop words are being indexed! How does Google...

4.9 : Nutch analysis

starts on page 145 in chapter 4 (Analysis)

... something very interesting with stop words, which it calls common terms. If all words are indexed ... we detail in this section. Nutch combines an index-time analysis bigram (grouping two consecutive ... >] 3: [brown:] 4: [fox:] Because additional tokens are created during analysis, the index ... faster. And there's a bonus: No terms were discarded during indexing. During querying, phrases are also ... that contains an additional type token. This was a quick view of what Nutch does with indexing...

7.10 : Summary

starts on page 265 in chapter 7 (Parsing common document formats)

... type of data that can be con- verted to text can be indexed and made searchable with Lucene. If you can extract textual data from sound or graphics files, you can index those, too. As a matter of fact, section 10.6 describes one interesting approach to indexing JPEG images. We used a number ... frame- work capable of recursively parsing and indexing a file system. What you've learned in this chapter isn't limited to indexing files stored in your local file system. You can use the same...

8.2.1 : lucli: a command-line interface

starts on page 269 under section 8.2 (Interacting with an index) in chapter 8 (Tools and extensions)

...Rather than write code to interact with an index, it can be easier to do a little command-line tap ... of com- mands and reexecute a previously entered command to enhance its usability. Using the WordNet index ... 8.1 lucli in action % java lucli.Lucli Lucene CLI. Using directory:index Open existing lucli> index ../WordNet/index index by path Lucene CLI. Using directory:../WordNet/index Index has 39718 documents All Fields:[syn, word] Indexed Fields:[word] Perform lucli> search jump search Searching for: syn:jump...

1.2.3 : History of Lucene

starts on page 9 under section 1.2 (Understanding Lucene) in chapter 1 (Meet Lucene)

... 2002 First Apache Jakarta release 1.3 December 2003 Compound index format, QueryParser enhancements ... .org); it's designed to handle crawling, indexing, and searching of several billion frequently updated... [Full sample chapter]

3.0 : Adding search to your application

starts on page 68

...This chapter covers Querying a Lucene index Working with search results Understanding Lucene scoring Parsing human-entered query expressions 68 If we can't find it, it effectively doesn't exist. Even if we have indexed documents, our effort is wasted unless it pays off by providing a reliable and fast way to find those documents. For example, consider this scenario: Give me a list of all books published in the last 12 months on the subject of "Java" where "open source" or "Jakarta" is mentioned... [Full sample chapter]

5.1.3 : Sorting by index order

starts on page 153 under section 5.1 (Sorting search results) in chapter 5 (Advanced search techniques)

...If the order documents were indexed is relevant, you can use Sort.INDEXORDER. Note the increasing document ID column: example.displayHits(allBooks, Sort.INDEXORDER); Results for: pubmonth:[190001 TO 201012] sorted by Title pubmonth id score A Modern Art of Education 198106 0 0.086743 /education/pedagogy Imperial Secrets of Health... 199401 1 0.086743 /health/alternative/chinese Tao Te Ching 198810 2 0.086743 /philosophy/eastern Gödel, Escher, Bach...

10.2 : Using Lucene at jGuru

starts on page 329 in chapter 10 (Case studies)

... of interesting goodies such as its StringTemplate engine (http://www.antlr.org/stringtemplate/index ... . By build- ing search indexes with Lucene directly from our database instead of spidering, the time dropped...

10.7.4 : Searching the index

starts on page 377 under section 10.7 (I love Lucene: TheServerSide) in chapter 10 (Case studies)

...Now we have an index. It is built from the various sources of information that we have and is just waiting for someone to search it. Lucene made this very simple for us to whip up. The innards of searching are hidden behind the IndexSearch class, as mentioned in the high-level overview. The work is so simple that I can even paste it here: public static SearchResults search(String inputQuery, int resultsStart, int resultsCount) throws SearchException { try { Searcher searcher = new IndexSearcher...

10.7.6 : Web tier: TheSeeeeeeeeeeeerverSide?

starts on page 383 under section 10.7 (I love Lucene: TheServerSide) in chapter 10 (Case studies)

...At this point we have a nice clean interface into building an index and searching on one. Since we need users to search the content via a web interface, the last item on the development list was to create the web layer hook into the search interface. TheServerSide portal infrastructure uses a home-grown MVC web tier. It is home grown purely because it was developed before the likes of Struts ... such as 15 Authors' note: Digester is also used for indexing XML documents in section 7.2. if (dateRangeType.equals...

7.3.2 : Built-in Lucene support

starts on page 239 under section 7.3 (Indexing a PDF document) in chapter 7 (Parsing common document formats)

... fine control over Lucene Document cre- ation. If you just need a quick way to index a directory ... and indexed. PDFBox's org.pdfbox.searchengine.lucene package contains two classes: IndexFiles ... a single method for indexing a single file system directory. Here's how you can use it: public class ... = new IndexFiles(); indexFiles.index(new File(args[0]), true, args[1]); } } This code calls the index method in IndexFiles class passing it arguments from the command line. The output of this program...

3.5.4 : Field selection

starts on page 95 under section 3.5 (Parsing query expressions: QueryParser) in chapter 3 (Adding search to your application)

...QueryParser needs to know the field name to use when constructing queries, but it would generally be unfriendly to require users to identify the field to search (the end user may not need or want to know the field names). As you've seen, the default field name is provided to the parse method. Parsed queries aren't restricted, how- ever, to searching only the default field. Using field selector notation, you can specify terms in nondefault fields. For example, when HTML documents are indexed... [Full sample chapter]

5.1.9 : Performance effect of sorting

starts on page 157 under section 5.1 (Sorting search results) in chapter 5 (Advanced search techniques)

...Sorting comes at the expense of resources. More memory is needed to keep the fields used for sorting available. For numeric types, each field being sorted for each document in the index requires that four bytes be cached. For String types, each unique term is also cached for each document. Only the actual fields used for sorting are cached in this manner. Plan your system resources accordingly if you want to use the sorting capa- bilities, knowing that sorting by a String is the most expensive...

7.1 : Handling rich-text documents

starts on page 224 in chapter 7 (Parsing common document formats)

...In addition to showing you how to parse and index individual document formats, our goal in this chapter is to create a small framework that you can use to index documents commonly found in the office environment as well as on the Internet. Such a framework is useful when your goal is to index and enable users to search for files that reside in multiple directories and are of different formats, or if you need to fetch and index web pages of different content types. In both cases, using...

3.1.1 : Searching for a specific term

starts on page 70 under section 3.1 (Implementing a simple search feature) in chapter 3 (Adding search to your application)

...IndexSearcher is the central class used to search for documents in an index. It has several overloaded search methods. You can search for a specific term using the most commonly used search method. A term is a value that is paired with its containing field name--in this case, subject. 2 The word ... . Using our example book data index, we'll query for the words ant and junit, which are words we know were indexed. Listing 3.1 performs a term query and asserts that the single document expected... [Full sample chapter]

3.3.1 : Lucene, you got a lot of `splainin' to do!

starts on page 80 under section 3.3 (Understanding Lucene scoring) in chapter 3 (Adding search to your application)

... != 2) { System.err.println("Usage: Explainer <index dir> "); System.exit(1); } String ... our sample index produced the following output; notice that the most relevant title scored best: Query ... the term junit twice in its contents field. The contents field in our index is an aggregation... [Full sample chapter]

3.4.7 : Searching for similar terms: FuzzyQuery

starts on page 92 under section 3.4 (Creating queries programmatically) in chapter 3 (Adding search to your application)

...The final built-in query is one of the more interesting. Lucene's FuzzyQuery matches terms similar to a specified term. The Levenshtein distance algorithm determines how similar terms in the index are to a specified target term.5 Edit dis- tance is another term for Levenshtein distance ... 't indexed but was close enough to match. FuzzyQuery uses a threshold rather than a pure edit distance ... in an index to find terms within the allowable threshold. Use this type of query sparingly, or at least... [Full sample chapter]

5.7 : Leveraging term vectors

starts on page 185 in chapter 5 (Advanced search techniques)

... 5.7.2. To enable term-vector storage, during indexing you enable the store term vectors attribute ... support for the field, as we did for the subject field when indexing our book data (see figure ... vectors during indexing Retrieving term vectors for a field in a given document by ID requires a call...

5.1.2 : Sorting by relevance

starts on page 152 under section 5.1 (Sorting search results) in chapter 5 (Advanced search techniques)

... ,. Score and index order are special types of sorting: The results are returned first ... ID order. Document ID order is the order in which the documents were indexed. In our case, index order isn't relevant, and order is unspecified (see section 8.4 on the Ant <index> task, which is how we indexed our sample data). As an aside, you may wonder why the score of the last two books...