Search inside Lucene in Action

Query parsed to: index fileindex

121 - 140 of 230 results (Page 7 of 12)

6.3.3 : Handling numeric field-range queries

starts on page 205 under section 6.3 (Extending QueryParser) in chapter 6 (Extending search)

... up to you. In this section, our example scenario indexes an integer id field so that range queries can be performed. If we indexed toString representations of the inte- gers 1 through 10, the order in the index would be 1, 10, 2, 3, 4, 5, 6, 7, 8, 9-- not the intended order at all. However, if we pad ... formatter.format(n); } } The numbers need to be padded during indexing. This is done in our test ... (); } } With this index-time padding, we're only halfway there. A query expression for IDs 37 through...

7.0 : Parsing common document formats

starts on page 223

... a document indexing framework and application 223 So far in this book, we have covered various aspects ... rich-text documents like these? Yes, you can! Although Lucene doesn't include tools to automatically index ... to extract the textual data from rich media.1 Once extracted, you can index the data with Lucene ... as an abstraction to nest within a rich framework for parsing and indexing docu- ments of any type. Next, we'll walk through examples to show you how to parse and index various document types...

8.7.2 : Highlighting Hits

starts on page 303 under section 8.7 (Highlighting query terms) in chapter 8 (Tools and extensions)

...Whether to store the original field text in the index is up to you (see section 2.2 for field indexing options). If the original text isn't stored in the index (generally for size considerations ... .getBestFragment(stream, title); System.out.println(fragment); } With our sample book index, the output ... the original text was tokenized during indexing. However, during indexing, the positional information ... offsets are stored in the index). Because of the computational needs of highlighting, it should only...

9.5.1 : API compatibility

starts on page 320 under section 9.5 (Lupy) in chapter 9 (Lucene ports)

...Python syntax aside, Lupy's API resembles that of Lucene. In listing 9.3, which shows how to index ... IndexWriter without specifying the analyzer-- that is something we can't do in Lucene. Listing 9.3 Indexing a file with Lupy, and demonstrating Lupy's indexing API from lupy.index.indexwriter import IndexWriter from lupy import document # open index for writing indexer = IndexWriter('/tmp/index', True ... that # the actual text of s is not stored in the index f = document.Text('text', s, False) d.add(f) # add...

8.9.1 : Coding to DbDirectory

starts on page 308 under section 8.9 (Storing an index in Berkeley DB) in chapter 8 (Tools and extensions)

... DbDirectory being used for indexing. Listing 8.8 Indexing with DbDirectory public class BerkeleyDbIndexer ... != 1) { System.err.println("Usage: BerkeleyDbIndexer <index dir>"); System.exit(-1); } String indexDir = args[0]; DbEnv env = new DbEnv(0); Db index = new Db(env, 0); Db blocks = new Db(env, 0); File dbHome ... [i].delete(); dbHome.delete(); } dbHome.mkdir(); env.open(indexDir, Db.DB_INIT_MPOOL | flags, 0); index.open(null, "__index__", null, Db.DB_BTREE, flags, 0); blocks.open(null, "__blocks__", null, Db.DB_BTREE...

8.6.1 : Building the synonym index

starts on page 294 under section 8.6 (Synonyms from WordNet) in chapter 8 (Tools and extensions)

...To build the synonym index, follow these steps: 1 Download and expand the prolog16.tar.gz file from ... ; see section 8.10) of the Sandbox WordNet package. 3 Build the synonym index using the Syns2Index ... in the WordNet distribution from step 1. The second argument specifies the path where the Lucene index will be created: java org.apache.lucene.wordnet.Syns2Index prologwn/wn_s.pl wordnetindex The Syns2Index program converts the WordNet Prolog synonym database into a standard Lucene index with an indexed field...

3.4 : Creating queries programmatically

starts on page 81 in chapter 3 (Adding search to your application)

... expressions to a subset of the index, like documents only within a category. Depending on your search... [Full sample chapter]

5.0 : Advanced search techniques

starts on page 149

...This chapter covers Sorting search results Spanning queries Filtering Multiple and remote index searching Leveraging term vectors 149 Many applications that implement search with Lucene can do so using the API introduced in chapter 3. Some projects, though, need more than the basic searching mechanisms. In this chapter, we explore the more sophisticated searching capabilities built into Lucene. A couple of odds and ends, PhrasePrefixQuery and MultiFieldQueryParser, round out our coverage...

5.5.3 : Security filters

starts on page 174 under section 5.5 (Filtering a search) in chapter 5 (Advanced search techniques)

...Another example of document filtering constrains documents with security in mind. Our example assumes documents are associated with an owner, which is known at indexing time. We index two documents; both have the term info in their keywords field, but each document has a different owner: public ... during indexing, using a QueryFilter will work nicely. However, this scenario is oversimplified ... the index itself. In section 6.4, we develop a more sophisticated filter implemen- tation that leverages...

6.5.1 : Testing the speed of a search

starts on page 213 under section 6.5 (Performance testing) in chapter 6 (Extending search)

... we determine that a searching performance issue is caused by how we index, and find out how we can easily fix ... . We're indexing documents that have a last-modified timestamp. For example purposes, we index ... is returning the expected results by searching over a timestamp range that encompasses all documents indexed ... = new RangeQuery(beginTerm, endTerm, true); return newSearcher( index.byTimestampIndexDirName()).search(query); } At this point, all is well. We've indexed 1,000 documents and found them all using...

8.4 : Java Development with Ant and Lucene

starts on page 284 in chapter 8 (Tools and extensions)

...A natural integration point with Lucene incorporates document indexing into a build process. As part of Java Development with Ant (Hatcher and Loughran, Man- ning Publications, 2002), Erik created an Ant task to index a directory of file-based documents. This code has since been enhanced and is maintained in the Sandbox. Why index documents during a build process? Imagine a project ... for a particular version of the system, and having a read-only index created at build-time fits perfectly...

1.1 : Evolution of information organization and access

starts on page 4 in chapter 1 (Meet Lucene)

... of Mac OS X (nicknamed Tiger); it integrates indexing and search- ing across all file types including ... , Microsoft acquired Lookout, a product leveraging the Lucene.Net port of Lucene to index and search... [Full sample chapter]

3.4.6 : Searching by wildcard: WildcardQuery

starts on page 90 under section 3.4 (Creating queries programmatically) in chapter 3 (Adding search to your application)

... as an exact term under the covers. Internally, it's used as a pattern to match terms in the index ... with a wild- card query forces the term enumeration to search all terms in the index for matches. Oddly... [Full sample chapter]

5.1.4 : Sorting by a field

starts on page 154 under section 5.1 (Sorting search results) in chapter 5 (Advanced search techniques)

...Sorting by a field first requires that you follow the rules for indexing a sortable field, as detailed in section 2.6. Our category field was indexed as a single Field.Keyword per document, allowing it to be used for sorting. To sort by a field, you must create a new Sort object, providing the field name: example.displayHits(allBooks, new Sort("category")); Results for: pubmonth:[190001 TO 201012] sorted by "category", Title pubmonth id score A Modern Art of Education...

7.4 : Indexing an HTML document

starts on page 241 in chapter 7 (Parsing common document formats)

...HTML is everywhere. Most web documents are in HTML format. The Web is cur- rently the largest repository of information on the planet. Add two and two together, and it's clear that we need to be able to index and search volumes of existing HTML documents. That is the bread and butter of web search engines, and many companies have built businesses based on this need. Parsing HTML is nontrivial, though, because many sites still don't conform to the latest W3C stan- dards for XHTML (HTML as an XML...

8.4.3 : Installation

starts on page 290 under section 8.4 (Java Development with Ant and Lucene) in chapter 8 (Tools and extensions)

...The <index> task requires three libraries and at least Ant 1.5.4 (although Ant 1.6 or higher is recommended to take advantage of the Antlib feature). The Lucene JAR, JTidy's JAR, and the JAR of the <index> task itself are required. Obtain these JARs, place them in a single directory together, and use the -lib Ant 1.6 command-line switch to point to this directory (or use with the proper classpath). See section 8.10 for elaboration on how to obtain JARs from the Sandbox...

3.1.2 : Parsing a user-entered query expression: QueryParser

starts on page 72 under section 3.1 (Implementing a simple search feature) in chapter 3 (Adding search to your application)

... of the contents field, however, were lowercased when indexed. QueryParser, in this example, uses SimpleAnalyzer ... in the next chapter, but it's intimately inter- twined with indexing text and searching ... to query on the actual terms indexed. QueryParser is the only searching piece that uses an 3 Query ... but does rely on matching terms to what was indexed. In section 4.1.2, we talk more about ... , you're more than ready to begin searching your indexes. There are, of course, many more details to know about... [Full sample chapter]

Indexer command-line example whitespace issue

In section 1.4.1, sub-section Running Indexer, the command-line example appears to only be passing a single argument to Indexer. However, there should be a space between build/index and /lucene. The full command-line is:

% java lia.meetlucene.Indexer build/index /lucene

[Permalink]

7.1.1 : Creating a common DocumentHandler interface

starts on page 225 under section 7.1 (Handling rich-text documents) in chapter 7 (Parsing common document formats)

... handy when you're indexing files in a file system, because you can turn each File class instance ... >. * * @param is the InputStream to convert to a Document * @return a ready-to-index instance ... to be indexed by the caller. Finally, each implementation of DocumentHandler that we present ... Document instances, not on the actual indexing. After all, once these parsers convert their input to ready-to-index Lucene Documents, the index- ing step is identical for all document types; we don...

7.9.1 : Document-management systems and services

starts on page 264 under section 7.9 (Other text-extraction tools) in chapter 7 (Parsing common document formats)

...In addition to individual libraries that you can use to implement document pars- ing and indexing ... do that--and, interestingly enough, rely on Lucene to handle document indexing: DocSearcher (http://www.brownsite.net ... ://tockit.sourceforge.net/docco/index.html) is a small, personal document management system built on top of Lucene. It provides index- ing and searching with Lucene; the latter is enhanced by using ... as a web application. It's controlled and customized via a web browser interface, and it can index...