Search inside Lucene in Action

Query parsed to: index fileindex

41 - 60 of 230 results (Page 3 of 12)

10.7.5 : Configuration: one place to rule them all

starts on page 379 under section 10.7 (I love Lucene: TheServerSide) in chapter 10 (Case studies)

...There have been settings in both the indexing process and search process that were crying out for abstraction. Where should we put the index location, the cat- egory lists, and the boost values, and register the index sources? We didn't want to have this in code, and since the configuration ... 10.9. Listing 10.9 Abstracting indexing and search configuration /** * Wrap around a Singleton instance ... up with the object model (ourConfig). XML configuration file The config file drives the index process...

10.1.2 : Other Nutch features

starts on page 328 under section 10.1 (Nutch: "The NPR of search engines") in chapter 10 (Case studies)

...The Query Handler asks each Index Searcher for only a small number of documents (usually 10). Since results are integrated from many Index Searchers, there's no need for a lot of documents from any ... is actually expanded to quite a complicated Lucene query before it is processed.2 Each indexed document ... the search engine user's text in each of the three fields. Nutch also specially indexes combinations ... as a single unit Nutch must detect at index-time. Also, before contacting the Index Searcher, the Query...

1.2.2 : What Lucene can do for you

starts on page 7 under section 1.2 (Understanding Lucene) in chapter 1 (Meet Lucene)

...Lucene allows you to add indexing and searching capabilities to your applications (these functions are described in section 1.3). Lucene can index and make search- able any data that can be converted ... as you can convert it to text. This means you can use Lucene to index and search data stored ... information. Similarly, with Lucene's help you can index data stored in your databases, giv- ing your ... , animal:monkey AND food:banana, and so on. With Lucene, you can index and search email messages... [Full sample chapter]

2.10 : Debugging indexing

starts on page 66 in chapter 2 (Indexing)

...Let's discuss one final, fairly unknown Lucene feature (if we may so call it). If you ever need to debug Lucene's index-writing process, remember that you can get Lucene to output information about its indexing operations by setting Index- Writer's public instance variable infoStream to one ... here, and may help you tune indexing parameters described earlier in the chapter: merging segments ... ) _u (1 docs) _v (1 docs) into _w (10 docs) In addition, if you need to peek inside your index once...

2.2.1 : Adding documents to an index

starts on page 31 under section 2.2 (Basic index operations) in chapter 2 (Indexing)

... for unit tests in this chapter. The code in listing 2.1 creates a com- pound index imaginatively named index-dir, stored in the system's temporary directory: /tmp on UNIX, or C:\TEMP on computers using Windows. (Compound indexes are covered in appendix B.) We use SimpleAnalyzer to analyze the input text, and we then index two simple Documents, each containing all four types of Fields: Keyword, UnIndexed, UnStored, and Text. Listing 2.1 Preparing a new index before each test in a base test case...

2.2.2 : Removing Documents from an index

starts on page 33 under section 2.2 (Basic index operations) in chapter 2 (Indexing)

...Although most applications are more concerned with getting Documents into a Lucene index, some also ... 's worth of news in its searchable indexes. Other applications may want to remove all Documents ... called IndexReader. This class doesn't delete Documents from the index immedi- ately. Instead, it marks ... means that before each test method is run, the base class re-creates the two-Document index, as described in section 2.2.1. Listing 2.2 Removing Documents from a Lucene index by internal Document number...

2.4 : Indexing dates

starts on page 39 in chapter 2 (Indexing)

... modification. Chances are, like many other Lucene users, you'll need to index dates. Lucene comes ... indexing easy. For example, to index today's date, you can do this: Document doc = new Document(); doc ... the given date to a String suitable for indexing. Handling dates this way is simple, but you must be careful when using this method: Dates converted to indexable Strings by DateField include all ... Field values are eventually turned into text, you may very well index dates as Strings...

2.5 : Indexing numbers

starts on page 40 in chapter 2 (Indexing)

...There are two common scenarios in which number indexing is important. In one scenario, numbers are embedded in the text to be indexed, and you want to make sure those numbers are indexed ... , you have Fields that contain only numeric values, and you want to be able to index them and use them ... , if you're indexing email messages, one of the possible index Fields could hold the mes- sage size ... can index numeric values by treating them as strings internally. If you need to index numbers...

2.2.3 : Undeleting Documents

starts on page 36 under section 2.2 (Basic index operations) in chapter 2 (Indexing)

...Because Document deletion is deferred until the closing of the IndexReader instance, Lucene allows an application to change its mind and undelete Documents that have been marked as deleted. A call to IndexReader's undeleteAll() method undeletes all deleted Documents by removing all .del files from the index directory. Subsequently closing the IndexReader instance therefore leaves all Documents in the index. Documents can be undeleted only if the call to undeleteAll() was done using the same...

2.9.3 : Index locking

starts on page 62 under section 2.9 (Concurrency, thread-safety, and locking issues) in chapter 2 (Indexing)

...Related to the concurrency issues in Lucene is the topic of locking. To prevent index corruption ... to be executed by a single process at a time. Each index has its own set of lock files; by default, all lock ... property. If you look at that directory while indexing documents, you'll see Lucene's write.lock file ... computers that need to access the same index stored on a shared disk, you should set the lock directory ... from concurrently attempting to modify an index. More precisely, the write.lock is obtained...

10.3.5 : Reporting Engine

starts on page 344 under section 10.3 (Using Lucene in SearchBlox) in chapter 10 (Case studies)

...A key element of SearchBlox is the Reporting Engine. It is crucial to know what end users are searching for. Most commercial search tools provide a reporting tool, which is either a log analyzer or a database-based tool. In SearchBlox, the Reporting Engine is based on Lucene. Details of every search query are indexed as a Lucene document. Precanned searches are executed on this Lucene index to retrieve the various reporting statistics. This Lucene-based reporting engine offers all...

10.6.1 : Indexing content

starts on page 362 under section 10.6 (Artful searching at Michaels.com) in chapter 10 (Case studies)

... information, and projects. All searchable types are indexed in Lucene with a document containing at least ... . As such, an art print is indexed in Lucene with a document containing orientation, subject ... is first added to the site and is not performed every time that a print is indexed in Lucene. Running the indexers The search index is rebuilt from scratch once per hour. A background thread awakens, creates a new empty index, and then proceeds to add content data to the index. This is simply...

1.6.5 : Hits

starts on page 24 under section 1.6 (Understanding the core searching classes) in chapter 1 (Meet Lucene)

...The Hits class is a simple container of pointers to ranked search results--docu- ments that match a given query. For performance reasons, Hits instances don't load from the index all documents that match a query, but only a small portion of them at a time. Chapter 3 describes this in more detail.... [Full sample chapter]

2.1.3 : Index writing

starts on page 31 under section 2.1 (Understanding the indexing process) in chapter 2 (Indexing)

...After the input has been analyzed, it's ready to be added to the index. Lucene stores the input in a data structure known as an inverted index. This data structure makes efficient use of disk space while allowing quick keyword lookups. What makes this structure inverted is that it uses tokens extracted from input docu- ments as lookup keys instead of treating documents as the central entities ... search engines are inverted indexes. What makes each search engine different is a set of closely...

1.5.3 : Analyzer

starts on page 19 under section 1.5 (Understanding the core indexing classes) in chapter 1 (Meet Lucene)

...Before text is indexed, it's passed through an Analyzer. The Analyzer, specified in the IndexWriter constructor, is in charge of extracting tokens out of text to be indexed and eliminating the rest. If the content to be indexed isn't plain text, it should first be converted to it, as depicted in figure 2.1. Chapter 7 shows how to extract text from the most common rich-media document formats. Analyzer is an abstract class, but Lucene comes with several implementations of it. Some of them deal... [Full sample chapter]

1.8 : Summary

starts on page 27 in chapter 1 (Meet Lucene)

..., we quickly got to the point by show- ing you two standalone applications, Indexer and Searcher, which are capable of indexing and searching text files stored in a file system. We then briefly described ... We've organized the next couple of chapters as we did this chapter. The first thing we need to do is index... [Full sample chapter]

5.6.1 : Using MultiSearcher

starts on page 178 under section 5.6 (Searching across multiple Lucene indexes) in chapter 5 (Advanced search techniques)

...With MultiSearcher, all indexes can be searched with the results merged in a spec- ified ... two indexes that are split alphabetically by keyword. The index is made up of animal names beginning with each letter of the alphabet. Half the names are in one index, and half are in the other. A search is performed with a range that spans both indexes, demonstrating that results are merged ... WhitespaceAnalyzer(); b Two Directory aTOmDirectory = new RAMDirectory(); indexes Directory nTOzDirectory...

10.2.3 : Index fields

starts on page 332 under section 10.2 (Using Lucene at jGuru) in chapter 10 (Case studies)

...All jGuru Lucene databases have the same form for consistency, although some fields are unused depending on the indexed entity type. For example, the for- eign search database stores a site ... are used for searching. The complete list of fields is shown in table 10.1. Table 10.1 jGuru Lucene index ... field to store indexed text. For example, a FAQ entry provides the question, answer, and any related comments as contents (that is, the indexed text). The title is set to the FAQ question, the link...

5.1.7 : Selecting a sorting field type

starts on page 156 under section 5.1 (Sorting search results) in chapter 5 (Advanced search techniques)

.... Indexing time is when the decision about sorting capabilities should be made; however, custom sorting ... index-time sorting design. By index- ing an Integer.toString or Float.toString, sorting can be based on numeric val- ues. In our example data, pubmonth was indexed as a String but is a valid, parsable ... performance issues further. It's important to understand that you index numeric values this way ... indexing and searching in order to use numeric fields for searching. All terms in an index are Strings...

10.1 : Nutch: "The NPR of search engines"

starts on page 326 in chapter 10 (Case studies)

... and effort put into Nutch exist for just two reasons: to help build a Lucene index, and to help query that index. In fact, Nutch uses lots of Lucene indexes. The system is designed to scale to process ... indexing and querying must take place across lots of machines simultaneously. Further, the system ... and forwards the search terms to a large set of Index Searcher machines. The Nutch query system ... one. This is discussed further later. Each Index Searcher works in parallel and returns a ranked...