Search inside Lucene in Action

Query parsed to: index fileindex

161 - 180 of 230 results (Page 9 of 12)

6.1 : Using a custom sort method

starts on page 195 in chapter 6 (Extending search)

...'t be determined during indexing. An interesting idea for a custom sorting mechanism is to order search ... and their fictitious grid coordinates on a sample 10x10 grid.2 The test data is indexed as shown in listing ... of businesses and could allow us to filter search results to specific types of places. Listing 6.1 Indexing ... .addDocument(doc); } } The coordinates are indexed into a single location field as a string ... implementa- tion makes space to store a float for every document in the index and computes the distance from...

4.2.2 : TokenStreams uncensored

starts on page 109 under section 4.2 (Analyzing the analyzer) in chapter 4 (Analysis)

...There are two different styles of TokenStreams: Tokenizer and TokenFilter. A good generalization to explain the distinction is that Tokenizers deal with indi- vidual characters, and TokenFilters deal with words. Figure 4.2 shows this archi- tecture graphically. A Tokenizer is a TokenStream that tokenizes the input from a Reader. When you're indexing a String through Field.Text(String, String) or Field. UnStored(String, String) (that is, the indexed field constructors which accept a String...

4.7.3 : Hole lot of trouble

starts on page 138 under section 4.7 (Stemming analysis) in chapter 4 (Analysis)

... for this case, repre- sents how many stop words could be present in the original text between indexed ... .length()); } Both laziness and the phrase "fox jumped" matched our indexed document, allowing users...

5.1.1 : Using a sort

starts on page 150 under section 5.1 (Sorting search results) in chapter 5 (Advanced search techniques)

... on the index path provided as a system property: String indexDir = System.getProperty("index.dir...

5.3 : Querying on multiple fields at once

starts on page 159 in chapter 5 (Advanced search techniques)

...In our book data, several fields were indexed. Users may want to query for terms regardless of which field they are in. One way to handle this is with MultiField- QueryParser, which builds on QueryParser. Under the covers, it parses a query expression using QueryParser's static parse method for each field as the default field and combines them into a BooleanQuery. The default operator OR is used ... practice for user-entered queries. More commonly, all words you want searched are indexed into a contents...

7.8.1 : FileHandler interface

starts on page 255 under section 7.8 (Creating a document-handling framework) in chapter 7 (Parsing common document formats)

... framework, combined with a file-indexing application that uses it Listing 7.13 FileHandler interface ... to a Document * @return a ready-to-index instance of Document */ Document getDocument(File file...

8.6 : Synonyms from WordNet

starts on page 292 in chapter 8 (Tools and extensions)

... into a Lucene index. This allows for rapid synonym lookup--for example, for synonym injection during indexing or querying (see section 8.6.2 for such an implementation)....

3.5.6 : Phrase queries

starts on page 98 under section 3.5 (Parsing query expressions: QueryParser) in chapter 3 (Adding search to your application)

... was indexed, the slop factor can be set to something other than zero automatically if it isn't specified using... [Full sample chapter]

4.2.3 : Visualizing analyzers

starts on page 112 under section 4.2 (Analyzing the analyzer) in chapter 4 (Analysis)

... the terms that would be indexed. Listing 4.2 AnalyzerDemo: seeing analysis in action /** * Adapted from ... to the text and the tokens are extracted. AnalyzerUtils passes text to an analyzer without indexing it and pulls the results in a manner similar to what happens during the indexing process under ... in section 8.7 uses a Token- Stream and the resulting Tokens outside of indexing to determine where...

4.8.1 : Unicode and encodings

starts on page 140 under section 4.8 (Language analysis issues) in chapter 4 (Analysis)

...Internally, Lucene stores all characters in the standard UTF-8 encoding. Java frees us from many struggles by automatically handling Unicode within Strings and providing facilities for reading in external data in the many encodings. You, however, are responsible for getting external text into Java and Lucene. If you're indexing files on a file system, you need to know what encoding the files were saved as in order to read them properly. If you're reading HTML or XML from an HTTP server...

5.5.5 : Caching filter results

starts on page 177 under section 5.5 (Filtering a search) in chapter 5 (Advanced search techniques)

... of IndexSearcher to benefit from the caching. When index changes need to be reflected in searches...

7.4.1 : Getting the HTML source data

starts on page 242 under section 7.4 (Indexing an HTML document) in chapter 7 (Parsing common document formats)

...Listing 7.6 contains the HTML document that we'll be parsing using the HTML parsers featured in this section. A large percentage of HTML documents avail- able on the Web aren't well formed, and not all parsers deal with that situation equally well. In this section, we use the JTidy and NekoHTML parsers, both of which are solid HTML parsers capable of dealing with broken HTML. Listing 7.6 The HTML document that we'll parse, index, and ultimately search Laptop power... </div> </div> <div class="entry"> <h2>8.6.2 : Tying WordNet synonyms into an analyzer</h2> <h3>starts on page 296 under section 8.6 (Synonyms from WordNet) in chapter 8 (Tools and extensions)</h3> <div> ... SynonymEngine { RAMDirectory directory; IndexSearcher searcher; public WordNetSynonymEngine(File <span class="highlight">index</span>) throws IOException { directory = new RAMDirectory( FSDirectory.getDirectory(<span class="highlight">index</span>, false)); searcher = new IndexSearcher(directory); Load synonym } <span class="highlight">index</span> into RAM for rapid access public String ... up in the WordNet <span class="highlight">index</span>. These are issues that need to be addressed based on your envi- ronment... </div> </div> <div class="entry"> <h2>9.2.4 : Performance</h2> <h3>starts on page 317 under section 9.2 (CLucene) in chapter 9 (Lucene ports)</h3> <div> ...According to a couple of reports captured in the archives of the Lucene Developers mailing list, CLucene <span class="highlight">indexes</span> documents faster than Lucene. We haven't done any benchmarks ourselves because doing so would require going back to version 1.2 of Lucene (not something a new Lucene user would do).... </div> </div> <div class="entry"> <h2>What's up with the hyphens in some of the search results?</h2> <div> This is an artifact of how the book content was <span class="highlight">indexed</span> (a text version of the PDF was processed, including the words split across lines). These split words are, however, searchable! There is a fair bit of analysis trickery going on to piece this stuff back together during <span class="highlight">indexing</span>, but the stored content still contains the hyphens. [<a href="/blog/FAQ/Whats-up-with-the-hyphens-in-some-of-the-search-results.html">Permalink</a>] </div> </div> <div class="entry"> <h2>Of course</h2> <div> The paragraph that begins with "During <span class="highlight">indexing</span>..." has a typo - it should read "...even this per-Document analysis is too <i>coarse</i> grained." instead of "course". [<a href="/blog/errata/course.html">Permalink</a>] </div> </div> <div class="entry"> <h2>SearchMorph - javadoc searching</h2> <div> David Spencer at <a href="http://www.searchmorph.com">SearchMorph</a> has just updated his . From the <a href="http://www.searchmorph.com/wp/2005/06/17/<span class="highlight">lucene-javadoc-index-updated-now-162000-pages-are-in</span>dexed/">announcement</a>: <blockquote> I have been collecting URLs to javadoc-generated pages and have updated the <span class="highlight">index</span> of javadoc trees. Now the Lucene <span class="highlight">index</span> includes over 162,000 documents (individual pages or URLs) from 630 javadoc trees. </blockquote> [<a href="/blog/announcements/searchmorph_javadoc.html">Permalink</a>] </div> </div> <div class="entry"> <h2>3.4.5 : Searching by phrase: PhraseQuery</h2> <h3>starts on page 87 under section 3.4 (Creating queries programmatically) in chapter 3 (Adding search to your application)</h3> <div> ...An <span class="highlight">index</span> contains positional information of terms. PhraseQuery uses this infor- mation to locate documents where terms are within a certain distance of one another. For example, suppose a field contained the phrase "the quick brown fox jumped over the lazy dog". Without knowing the exact phrase, you can still find this document by searching for documents with fields having quick and fox near each ... to <span class="highlight">index</span> a single document and a custom matched (String[], int) method to construct, execute, and assert... [<a href="http://www.manning.com/hatcher2">Full sample chapter</a>] </div> </div> <div class="entry"> <h2>4.8.3 : Analyzing Asian languages</h2> <h3>starts on page 142 under section 4.8 (Language analysis issues) in chapter 4 (Analysis)</h3> <div> ... for all tokenized fields in our <span class="highlight">index</span>, which tokenizes each English word as expected (tao ... are likely to be kept together (as well as disconnected characters, increasing the <span class="highlight">index</span> size... </div> </div> <div class="entry"> <h2>8.6.3 : Calling on Lucene</h2> <h3>starts on page 297 under section 8.6 (Synonyms from WordNet) in chapter 8 (Tools and extensions)</h3> <div> .... Constructing the T9 <span class="highlight">index</span> We wrote a utility class to preprocess the original WordNet <span class="highlight">index</span> into a special- ized T9 <span class="highlight">index</span>. Each word is converted into a t9 keyword field. Each word, its T9 equivalent, and the text length of the word are <span class="highlight">indexed</span>, as shown here: Document newDoc = new Document ... the book's source code distribution (see the "About this book" section). The word length is <span class="highlight">indexed</span> ... prototype is a compellingly fast and accurate T9 lookup implementation. However, the Lucene <span class="highlight">index</span> used... </div> </div> <div class="paging"> <table align="center"> <tr> <td><a href="/search?query=indexer+OR+fileindexer&page=8"><img src="L_w_arrow.png"><br> Previous</a></td> <td><a href="/search?query=indexer+OR+fileindexer&page=1"><img src="u.png"><br> 1</a></td> <td><a href="/search?query=indexer+OR+fileindexer&page=2"><img src="u.png"><br> 2</a></td> <td><a href="/search?query=indexer+OR+fileindexer&page=3"><img src="u.png"><br> 3</a></td> <td><a href="/search?query=indexer+OR+fileindexer&page=4"><img src="u.png"><br> 4</a></td> <td><a href="/search?query=indexer+OR+fileindexer&page=5"><img src="u.png"><br> 5</a></td> <td><a href="/search?query=indexer+OR+fileindexer&page=6"><img src="u.png"><br> 6</a></td> <td><a href="/search?query=indexer+OR+fileindexer&page=7"><img src="u.png"><br> 7</a></td> <td><a href="/search?query=indexer+OR+fileindexer&page=8"><img src="u.png"><br> 8</a></td> <td> <img src="u_selected.png"><br> 9</td> <td><a href="/search?query=indexer+OR+fileindexer&page=10"><img src="u.png"><br> 10</a></td> <td><a href="/search?query=indexer+OR+fileindexer&page=11"><img src="u.png"><br> 11</a></td> <td><a href="/search?query=indexer+OR+fileindexer&page=12"><img src="u.png"><br> 12</a></td> <td><a href="/search?query=indexer+OR+fileindexer&page=10"><img src="cene_w_arrow.png"><br> Next</a></td> </tr> </table> </div> </div> <div id="footer"> © 2005 Erik Hatcher & Otis Gospodnetić </div> </body> </html>