Search inside Lucene in Action

Query parsed to: index fileindex

1 - 20 of 230 results (Page 1 of 12)

7.8.3 : FileIndexer application

starts on page 260 under section 7.8 (Creating a document-handling framework) in chapter 7 (Parsing common document formats)

... a parser capable of handling their file format. FileIndexer may remind you of the Indexer application ... is limited to indexing plain-text files, FileIndexer can parse and index all the document formats covered in this chapter. Listing 7.15 FileIndexer: a recursive file-system indexer /** * A File Indexer capable of recursively indexing a directory tree. */ public class FileIndexer { protected ... IndexWriter(dir, analyzer, true); h FileIndexer indexer = new FileIndexer(props); i Create FileIndexer...

7.8.4 : Using FileIndexer

starts on page 262 under section 7.8 (Creating a document-handling framework) in chapter 7 (Parsing common document formats)

...The FileIndexer class includes a main method that can be used to invoke the class from the command line and recursively index files in a given directory tree. To run FileIndexer from the command line ... that you want to index: $ java lia.handlingtypes.framework.FileIndexer ~/handler.properties ~/data ~/index Indexing /home/otis/data/FileWithoutExtension Cannot handle /home/otis/data/FileWithoutExtension ... indexed: 6 Total time: 3046 ms As it works through a directory tree, FileIndexer prints out information...

7.8.5 : FileIndexer drawbacks, and how to extend the framework

starts on page 263 under section 7.8 (Creating a document-handling framework) in chapter 7 (Parsing common document formats)

... to index and make searchable files of a type that our framework doesn't handle? You extend ... file, mapping it to the appropriate file extension. 3 Keep using FileIndexer as shown. This leads...

7.8 : Creating a document-handling framework

starts on page 254 in chapter 7 (Parsing common document formats)

... to create a minimal framework for handling and indexing documents of various types without worry- ing ... a file-indexing framework Java class Purpose DocumentHandler Defines the getDocument(InputStream) method ... the getDocument(File) method Finally, we create a FileIndexer command-line application that uses all ... . This ready-to-use application can recursively traverse file-system directories, along the way indexing files...

1.5.1 : IndexWriter

starts on page 19 under section 1.5 (Understanding the core indexing classes) in chapter 1 (Meet Lucene)

...IndexWriter is the central component of the indexing process. This class creates a new index and adds documents to an existing index. You can think of Index- Writer as an object that gives you write access to the index but doesn't let you read or search it. Despite its name, IndexWriter isn't the only class that's used to modify an index; section 2.2 describes how to use the Lucene API to modify an index.... [Full sample chapter]

1.4.1 : Creating an index

starts on page 12 under section 1.4 (Lucene in action: a sample application) in chapter 1 (Meet Lucene)

...In this section you'll see a single class called Indexer and its four static methods; together, they recursively traverse file system directories and index all files with a .txt extension. When Indexer completes execution it leaves behind a Lucene index for its sibling, Searcher (presented ... in this example--we'll explain them shortly. After the annotated code listing, we show you how to use Indexer; if it helps you to learn how Indexer is used before you see how it's coded, go directly... [Full sample chapter]

1.3 : Indexing and searching

starts on page 10 in chapter 1 (Meet Lucene)

...At the heart of all search engines is the concept of indexing: processing the original data into a highly efficient cross-reference lookup in order to facilitate rapid searching. Let's take a quick high-level look at both the indexing and searching processes.... [Full sample chapter]

2.0 : Indexing

starts on page 28

...This chapter covers Performing basic index operations Boosting Documents and Fields during indexing Indexing dates, numbers, and Fields for use in sorting search results Using parameters that affect Lucene's indexing performance and resource consumption Optimizing indexes Understanding concurrency, multithreading, and locking issues in the context of indexing 28 So you want to search files stored ... . Lucene can help you do that. How- ever, before you can search something, you have to index...

2.8 : Optimizing an index

starts on page 56 in chapter 2 (Indexing)

...Index optimization is the process that merges multiple index files together in order to reduce their number and thus minimize the time it takes to read in the index at search time. Recall from section 2.7 that while it's adding new Documents to an index, Lucene buffers several Documents in memory ... with mergeFactor, maxMergeDocs, and minMergeDocs, when indexing is done you could still be left with several segments in the index. Searching an index made up of multiple segments works properly, but Lucene...

1.4 : Lucene in action: a sample application

starts on page 11 in chapter 1 (Meet Lucene)

...Let's see Lucene in action. To do that, recall the problem of indexing and search- ing files, which we described in section 1.3.1. Furthermore, suppose you need to index and search files stored in a directory tree, not just in a single directory. To show you Lucene's indexing and searching capabilities, we'll use a pair of command- line applications: Indexer and Searcher. First we'll index a directory tree contain- ing text files; then we'll search the created index. These example applications... [Full sample chapter]

2.9.1 : Concurrency rules

starts on page 59 under section 2.9 (Concurrency, thread-safety, and locking issues) in chapter 2 (Indexing)

...Lucene provides several operations that can modify an index, such as document indexing, updating, and deletion; when using them, you need to follow certain rules to avoid index corruption ... search the same index in parallel. Any number of read-only operations may be executed while an index is being modified. For example, users can search an index while it's being optimized or while new documents are being added to the index, updated, or deleted from the index. Only a single...

1.3.1 : What is indexing, and why is it important?

starts on page 10 under section 1.3 (Indexing and searching) in chapter 1 (Meet Lucene)

... files are very large. This is where indexing comes in: To search large amounts of text quickly, you must first index that text and convert it into a format that will let you search it rapidly, eliminating the slow sequential scanning process. This conver- sion process is called indexing, and its output is called an index. You can think of an index as a data structure that allows fast random access to words stored inside it. The concept behind it is analogous to an index at the end of a book... [Full sample chapter]

2.2 : Basic index operations

starts on page 31 in chapter 2 (Indexing)

...In chapter 1, you saw how to add documents to an index. But we'll summarize the process here, along with descriptions of delete and update operations, to provide you with a convenient single reference point....

7.8.2 : ExtensionFileHandler

starts on page 257 under section 7.8 (Creating a document-handling framework) in chapter 7 (Parsing common document formats)

...--and that is exactly what we do from the FileIndexer application, described in the next section....

2.11 : Summary

starts on page 67 in chapter 2 (Indexing)

...This chapter has given you a solid understanding of how a Lucene index oper- ates. In addition to adding Documents to an index, you should now be able to remove and update indexed Documents as well as manipulate a couple of index- ing factors to fine-tune several aspects of indexing to meet your needs. The knowledge about concurrency, thread-safety, and locking is essential if you're using Lucene in a multithreaded application or a multiprocess system. By now you should be dying to learn how...

2.7 : Controlling the indexing process

starts on page 42 in chapter 2 (Indexing)

...Indexing small and midsized document collections works well with the default Lucene setup. However, if your application deals with very large indexes, you'll probably want some control over Lucene's indexing process to ensure optimal indexing performance. For instance, you may be indexing several million docu- ments and want to speed up the process so it takes minutes instead of hours. Your ... several parameters that allow you to control its performance and resource use during indexing....

1.5 : Understanding the core indexing classes

starts on page 18 in chapter 1 (Meet Lucene)

...As you saw in our Indexer class, you need the following classes to perform the simplest indexing procedure: IndexWriter Directory Analyzer Document Field What follows is a brief overview of these classes, to give you a rough idea about their role in Lucene. We'll use these classes throughout this book. 3 Neal Stephenson details this process nicely in "In the Beginning Was the Command Line": http:// www.cryptonomicon.com/beginning.html.... [Full sample chapter]

1.6.1 : IndexSearcher

starts on page 23 under section 1.6 (Understanding the core searching classes) in chapter 1 (Meet Lucene)

...IndexSearcher is to searching what IndexWriter is to indexing: the central link to the index that exposes several search methods. You can think of IndexSearcher as a class that opens an index in a read-only mode. It offers a number of search methods, some of which are implemented in its abstract parent class Searcher; the simplest takes a single Query object as a parameter and returns a Hits ... .getDirectory("/tmp/index", false)); Query q = new TermQuery(new Term("contents", "lucene")); Hits... [Full sample chapter]

2.1.1 : Conversion to text

starts on page 29 under section 2.1 (Understanding the indexing process) in chapter 2 (Indexing)

...To index data with Lucene, you must first convert it to a stream of plain-text tokens, the format that Lucene can digest. In chapter 1, we limited our examples to indexing and searching .txt files ... 't always that simple. Suppose you need to index a set of manuals in PDF format. To prepare these manuals for indexing, you must first find a way to extract the textual information from the PDF ... the same situation if you want to index Microsoft Word documents or any docu- ment format other than...

1.5.2 : Directory

starts on page 19 under section 1.5 (Understanding the core indexing classes) in chapter 1 (Meet Lucene)

...The Directory class represents the location of a Lucene index. It's an abstract class that allows its subclasses (two of which are included in Lucene) to store the index as they see fit. In our Indexer example, we used a path to an actual file system directory to obtain an instance of Directory ... implementa- tions, FSDirectory, and created our index in a directory in the file system. In your applications, you will most likely be storing a Lucene index on a disk. To do so, use FSDirectory... [Full sample chapter]