Search inside Lucene in Action

Query parsed to: index fileindex

21 - 40 of 230 results (Page 2 of 12)

3.2.3 : Reading indexes into memory

starts on page 77 under section 3.2 (Using IndexSearcher) in chapter 3 (Adding search to your application)

...Using RAMDirectory is suitable for situations requiring only transient indexes, but most applications need to persist their indexes. They will eventually need to use FSDirectory, as we've shown in the previous two chapters. However, in some scenarios, indexes are used in a read-only fashion. Sup- pose, for instance, that you have a computer whose main memory exceeds the size of a Lucene index stored in the file system. Although it's fine to always search the index stored in the index directory... [Full sample chapter]

1.5.5 : Field

starts on page 20 under section 1.5 (Understanding the core indexing classes) in chapter 1 (Meet Lucene)

...Each Document in an index contains one or more named fields, embodied in a class called Field. Each field corresponds to a piece of data that is either queried against or retrieved from the index ... --Isn't analyzed, but is indexed and stored in the index verbatim. This type is suitable for fields whose ... path in Indexer (listing 1.1) as a Keyword field. UnIndexed--Is neither analyzed nor indexed, but its value is stored in the index as is. This type is suitable for fields that you need to display... [Full sample chapter]

1.4.2 : Searching an index

starts on page 15 under section 1.4 (Lucene in action: a sample application) in chapter 1 (Meet Lucene)

...Searching in Lucene is as fast and simple as indexing; the power of this function- ality ... that we'll use to search the index created by Indexer. (Keep in mind that our Searcher serves ... of a web or desktop application with a GUI, an EJB, and so on.) In the previous section, we indexed a directory of text files. The index, in this example, resides in a directory of its own on the file system. We instructed Indexer to create a Lucene index in a build/index directory, relative... [Full sample chapter]

10.7.2 : High-level infrastructure

starts on page 373 under section 10.7 (I love Lucene: TheServerSide) in chapter 10 (Case studies)

... main tasks: building an index, and searching that index. This is defi- nitely the case with Lucene ... an index goes through the IndexBuilder (figure 10.7). This is a simple inter- face that provides two entry points to the indexing pro- cess. To do an incremental build and control how often to optimize the Lucene index as you add records, pass individual configuration settings to the class. To control ... that are used to create the index itself. As we will see in the next section, TheServerSide has various...

2.9.4 : Disabling index locking

starts on page 66 under section 2.9 (Concurrency, thread-safety, and locking issues) in chapter 2 (Indexing)

...We strongly discourage meddling with Lucene's locking mechanism and disre- garding the lock-related exception. However, in some situations you may want to disable locking in Lucene, and doing so won't corrupt your index. For instance, your application may need to access a Lucene index stored ... mode, too. In other words, your application will be using Lucene only to search the index and won't modify the index in any way. Although Lucene already stores its lock files in the system's temporary...

2.7.2 : In-memory indexing: RAMDirectory

starts on page 48 under section 2.7 (Controlling the indexing process) in chapter 2 (Indexing)

... control over indexing, its memory use, and the frequency of flushing the in-memory buffer to disk ... in listing 2.5 creates two indexes: one backed by an FSDirectory and the other by RAMDirectory. Except ... .io.tmpdir", "tmp") + System.getProperty("file.separator") + "fs-index"; Create Directory whose ramDir = new ... ); /** // change to adjust performance of indexing with FSDirectory writer.mergeFactor = writer ... you can use JUnitPerf to measure performance of index search- ing), this benchmark is sufficient...

10.7.3 : Building the index

starts on page 374 under section 10.7 (I love Lucene: TheServerSide) in chapter 10 (Case studies)

...We have seen that the external interface to building our search index is the class IndexBuilder. Now we will discuss the index building process and the design choices that we made. What fields should make up our index? We wanted to create a fairly generic set of fields that our index would contain. We ended up with the fields shown in table 10.7. Table 10.7 TheServerSide index field structure ... A summary paragraph introducing the content. fullcontents Field.UnStored The entire contents to index...

2.1 : Understanding the indexing process

starts on page 29 in chapter 2 (Indexing)

...As you saw in the chapter 1, only a few methods of Lucene's public API need to be called in order to index a document. As a result, from the outside, indexing with Lucene looks like a deceptively simple and monolithic operation. However, behind the simple API lies an interesting and relatively complex set of operations that we can break down into three major and functionally distinct groups, as described in the following sections and depicted in figure 2.1....

2.9 : Concurrency, thread-safety, and locking issues

starts on page 59 in chapter 2 (Indexing)

...In this section, we cover three closely related topics: concurrent index access, thread-safety of IndexReader and IndexWriter, and the locking mechanism that Lucene uses to prevent index corruption. These issues are often misunderstood by users new to Lucene. Understanding these topics is important, because it will eliminate surprises that can result when your indexing application starts serving multiple users simultaneously or when it has to deal with a sudden need to scale by parallelizing...

1.5.4 : Document

starts on page 20 under section 1.5 (Understanding the core indexing classes) in chapter 1 (Meet Lucene)

... modified, and so on, are indexed and stored separately as fields of a document. NOTE When we refer ... . Although various types of documents can be indexed and made searchable, processing them ... Java type. You'll learn more about handling nontext documents in chapter 7. In our Indexer, we're concerned with indexing text files. So, for each text file we find, we create a new instance of the Document class, populate it with Fields (described next), and add that Document to the index... [Full sample chapter]

2.7.1 : Tuning indexing performance

starts on page 42 under section 2.7 (Controlling the indexing process) in chapter 2 (Indexing)

...In a typical indexing application, the bottleneck is the process of writing index files onto a disk. If you were to profile an indexing application, you'd see that most of the time is spent in code sections that manipulate index files. Therefore, you need to instruct Lucene to be smart about indexing new Documents and mod- ifying existing index files. As shown in figure 2.2, when new Documents are added to a Lucene index, they're initially buffered in memory instead of being immediately written...

2.7.3 : Limiting Field sizes: maxFieldLength

starts on page 54 under section 2.7 (Controlling the indexing process) in chapter 2 (Indexing)

...Some applications index documents whose sizes aren't known in advance. To con- trol the amount of RAM and hard-disk memory used, they need to limit the amount of input they index. Other applications deal with documents of known size but want to index only a portion of each document. For example, you may want to index only the first 200 words of each document. Lucene's IndexWriter exposes ... . With a default value of 10,000, Lucene indexes only the first 10,000 terms in each Document Field...

2.9.2 : Thread-safety

starts on page 60 under section 2.9 (Concurrency, thread-safety, and locking issues) in chapter 2 (Indexing)

...It's important to know that although making simultaneous index modifications with multiple ... calls to its index-modifying methods will be properly synchronized so that index modifications ... must ensure that index- modifying operations of these two classes don't overlap. That is to say, before adding new documents to an index, you must close all IndexReader instances that have deleted Documents from the same index. Similarly, before deleting or updat- ing documents in an index, you must...

1.0 : Meet Lucene

starts on page 3

...This chapter covers Understanding Lucene Using the basic indexing API Working with the search API Considering alternative products 3 One of the key factors behind Lucene's popularity and success is its simplicity. The careful exposure of its indexing and searching API is a sign of the well- designed software. Consequently, you don't need in-depth knowledge about how Lucene's information indexing ... indexing and searching with Lucene with ready-to-use code examples. We then briefly introduce all... [Full sample chapter]

1.7 : Review of alternate search products

starts on page 24 in chapter 1 (Meet Lucene)

...: Information Retrieval libraries Indexing and searching applications The first group is smaller; it consists of full-text indexing and searching libraries similar to Lucene. Products in this group let ... up of ready-to-use indexing and searching software. This software is typically designed to index... [Full sample chapter]

2.1.2 : Analysis

starts on page 30 under section 2.1 (Understanding the indexing process) in chapter 2 (Indexing)

...Once you've prepared the data for indexing and created Lucene Documents pop- ulated with Fields, you can call IndexWriter's addDocument(Document) method and hand your data off to Lucene to index. When you do that, Lucene first ana- lyzes the data to make it more suitable for indexing. To do so, it splits the textual data into chunks, or tokens, and performs a number of optional operations on them. For instance, the tokens could be lowercased before indexing, to make searches case-insensitive...

2.6 : Indexing Fields used for sorting

starts on page 41 in chapter 2 (Indexing)

... to be able to sort results by a Field value, you must add it as a Field that is indexed ... ", "Arthur C. Clark"); Although we've indexed numeric values as Strings, you can specify the correct Field ... for sorting have to be indexed and must not be tokenized....

5.6 : Searching across multiple Lucene indexes

starts on page 178 in chapter 5 (Advanced search techniques)

...If your architecture consists of multiple Lucene indexes, but you need to search across them using a single query with search results interleaving documents from different indexes, MultiSearcher is for you. In high-volume usage of Lucene, your architecture may partition sets of documents into different indexes....

10.3.4 : Language support

starts on page 343 under section 10.3 (Using Lucene in SearchBlox) in chapter 10 (Case studies)

... across multiple collections in several languages Indexing documents with different encodings ... is converted to UTF-8 before indexing. SearchBlox uses several mechanisms to detect the encoding of the document that is to be indexed. Detecting the language of the content--The language of the content is required for two purposes when indexing: to choose the correct analyzer and to use the correct...

10.1.1 : More in depth

starts on page 327 under section 10.1 (Nutch: "The NPR of search engines") in chapter 10 (Case studies)

.... It contacts many Index Searchers simulta- neously because the document set is too large to be searched ... contacts one of the Index Searchers that can search it. If an Index Searcher cannot be contacted ... is whether to divide the overall text index by document or by search term. Should a single Index ... segmentation, the Query Handler could simply forward to a sin- gle Index Searcher and skip ... percentage of the indexed documents will be ignored during search. That's not great...