Search inside Lucene in Action

Query parsed to: index fileindex

61 - 80 of 230 results (Page 4 of 12)

3.2 : Using IndexSearcher

starts on page 75 in chapter 3 (Adding search to your application)

... We recommend using the Directory constructor--it's better to decouple search- ing from where the index resides, allowing your searching code to be agnostic to whether the index being searched is on the file ... implementa- tion. Its actual implementation is an FSDirectory loaded from a file system index. Our setUp() method opens an index using the static FSDirectory.get- Directory() method, with the index path ... String indexDir = System.getProperty("index.dir"); protected Directory directory; protected void setUp... [Full sample chapter]

4.1.1 : Indexing analysis

starts on page 105 under section 4.1 (Using analyzers) in chapter 4 (Analysis)

...During indexing, an Analyzer instance is handed to the IndexWriter in this manner: Analyzer ... library. Each tokenized field of each document indexed with the IndexWriter instance uses ... is stored. However, the output of the designated Analyzer dictates what is indexed. The following code demonstrates indexing of a document with these two field types: Document doc = new Document(); doc ... ); During indexing, the granularity of analyzer choice is at the IndexWriter or per- Document level...

3.4.1 : Searching by term: TermQuery

starts on page 82 under section 3.4 (Creating queries programmatically) in chapter 3 (Adding search to your application)

...The most elementary way to search an index is for a specific term. A term is the smallest indexed ... is case-sensitive, so be sure to match the case of terms indexed; this may not be the exact case in the original doc- ument text, because an analyzer (see chapter 5) may have indexed things differently. TermQuerys are especially useful for retrieving documents by a key. If docu- ments were indexed ... that it's unique, though. It's up to you to ensure uniqueness during indexing. In our data, isbn is unique... [Full sample chapter]

10.7.7 : Summary

starts on page 385 under section 10.7 (I love Lucene: TheServerSide) in chapter 10 (Case studies)

... it as Editor, and now I manage to find exactly what I want. Indexing our data is so fast that we don ... frequently, we brought down the index time to a matter of seconds. It used to take a lot longer, even ... with this approach. We can tweak the way we index and search our content with little effort. Thanks...

1.6 : Understanding the core searching classes

starts on page 22 in chapter 1 (Meet Lucene)

...The basic search interface that Lucene provides is as straightforward as the one for indexing. Only a few classes are needed to perform the basic search operation: IndexSearcher Term Query TermQuery Hits The following sections provide a brief introduction to these classes. We'll expand on these explanations in the chapters that follow, before we dive into more advanced topics.... [Full sample chapter]

10.4.1 : The system architecture

starts on page 347 under section 10.4 (Competitive intelligence with Lucene in XtraMind's XM-InformationMinderTM) in chapter 10 (Case studies)

... parts is based upon the functionalities pro- vided by Lucene, with each employing its own index ... of the information that can be found in the Lucene index for two specific reasons: Failure recovery--If the index somehow becomes corrupted (for example, through disk failure), it can easily and quickly ... have to search its whole index for the document with the identifier stored in one of the document ... processing. 3 The agent process continues by feeding the Lucene indexer with the stored content data...

1.6.2 : Term

starts on page 23 under section 1.6 (Understanding the core searching classes) in chapter 1 (Meet Lucene)

...A Term is the basic unit for searching. Similar to the Field object, it consists of a pair of string elements: the name of the field and the value of that field. Note that Term objects are also involved in the indexing process. However, they're cre- ated by Lucene's internals, so you typically don't need to think about them while indexing. During searching, you may construct Term objects and use them together with TermQuery: Query q = new TermQuery(new Term("contents", "lucene")); Hits hits... [Full sample chapter]

1.7.1 : IR libraries

starts on page 24 under section 1.7 (Review of alternate search products) in chapter 1 (Meet Lucene)

.... Egothor A full-text indexing and searching Java library, Egothor uses core algorithms that are very ... ready-to-use applications, such as a web crawler called Capek, a file indexer with a Swing GUI, and more ... indexer and document parsers are similar to the small document parsing and indexing framework presented ... project is comparable to Lucene in most aspects. If you have yet to choose a full-text indexing ... , PHP, and (soon) Java; remote index searching; and so on. In addition to providing an IR library... [Full sample chapter]

4.1.2 : QueryParser analysis

starts on page 106 under section 4.1 (Using analyzers) in chapter 4 (Analysis)

...The Analyzer is the key to the terms indexed. As you saw in chapter 3, you need to be sure to query on the exact terms indexed in order to find documents (we covered QueryParser expression parsing ... , it's the devel- oper's responsibility to ensure that the terms used will match what was indexed ... to do its best job to match the terms that were indexed. An analyzer is specified on the static parse ... text equally, without knowledge of how it was indexed. This is a particularly thorny issue when...

10.2.4 : Indexing and content preparation

starts on page 333 under section 10.2 (Using Lucene at jGuru) in chapter 10 (Case studies)

... over what part of the content is indexed. jGuru indexes new content as it is added so you can post ... databases and build them during startup. jGuru highly processes content before letting Lucene index it. The same pro- cessing occurs for index and query operations; otherwise, queries probably ... indexing/querying. As it turns out, users want to be able to find non- Java keywords such as broken ... up the frequency information Lucene computes during indexing. I gradually built up the following...

1.2.1 : What Lucene is

starts on page 7 under section 1.2 (Understanding Lucene) in chapter 1 (Meet Lucene)

...Lucene is a high performance, scalable Information Retrieval (IR) library. It lets you add indexing ... a simple yet powerful core API that requires minimal understanding of full-text indexing ... into an application. Because Lucene is a Java library, it doesn't make assumptions about what it indexes ... if you will, not a full-featured search application. It concerns itself with text indexing and searching ... to its problem domain while hiding the complexity of indexing and searching implementation behind... [Full sample chapter]

2.2.4 : Updating Documents in an index

starts on page 36 under section 2.2 (Basic index operations) in chapter 2 (Indexing)

..."How do I update a document in an index?" is a frequently asked question on the Lucene user mailing ... from an index and then re-added to it, as shown in listing 2.3. Listing 2.3 Updating indexed Documents ... Field. We have effectively updated one of the Documents in the index. Updating by batching deletions ... . 6 Close IndexWriter. This is important to remember: Batching Document deletion and indexing ... under your belt, let's discuss how to fine-tune the performance of indexing and make the best use...

3.3 : Understanding Lucene scoring

starts on page 78 in chapter 3 (Adding search to your application)

..., as set during indexing. lengthNorm(t.field in d) Normalization value of a field, given the number of terms within the field. This value is computed during indexing and stored in the index. coord ... come in explicitly in the equation as the boost(t.field in d) factor, set at indexing time. The default value of field boosts, logically, is 1.0. During indexing, a Document can be assigned a boost ... multiplied together. Section 2.3 discusses index-time boosting in more detail. In addition... [Full sample chapter]

10.3.1 : Why choose Lucene?

starts on page 341 under section 10.3 (Using Lucene in SearchBlox) in chapter 10 (Case studies)

...While selecting an indexing and searching engine for SearchBlox, we were faced with two choices: either use one of the several open-source toolkits that are avail- able or build our own search toolkit. After looking at several promising toolkits, we decided to use Lucene. The reasons behind ... the compound index format, making the file handle situation much less of an issue. 5 There are still open ... is being used for 4 million document index with <100 millisecond search times. Extensive adoption...

4.10 : Summary

starts on page 147 in chapter 4 (Analysis)

...Analysis, while only a single facet of using Lucene, is the aspect that deserves the most attention and effort. The words that can be searched are those emitted dur- ing indexing analysis. Sure ... an analyzer during indexing. Many sophisticated processes may occur under the covers, such as stop-word removal and stemming of words. Removing words decreases your index size but can have a negative impact ... , you should rebuild your index using the new analyzer so that all documents are analyzed in the same...

9.5.2 : Index compatibility

starts on page 322 under section 9.5 (Lupy) in chapter 9 (Lucene ports)

...As is the case with dotLucene and Plucene, an index created with Lupy is com- patible with that of Lucene. Again, that compatibility is limited to a particular version. In Lupy's case, indexes are compatible with Lucene 1.2's indexes....

10.3.2 : SearchBlox architecture

starts on page 342 under section 10.3 (Using Lucene in SearchBlox) in chapter 10 (Case studies)

...Figure 10.1 shows the overall architecture of SearchBlox. Compared to Lucene, which is a text indexing and search API, SearchBlox is a complete search tool. It features integrated crawlers, support for different document types, provision for Figure 10.1 SearchBlox system architecture several languages, and customizable search results; all controlled from a browser- based Admin Console. As a pure Java solution, SearchBlox can be deployed to any Servlet/JSP container, giving the customer complete...

1.7.2 : Indexing and searching applications

starts on page 26 under section 1.7 (Review of alternate search products) in chapter 1 (Meet Lucene)

....net/ng/ Microsoft Index Server--http://www.microsoft.com/NTServer/techresources/ webserv/IndxServ.asp... [Full sample chapter]

8.4.1 : Using the task

starts on page 285 under section 8.4 (Java Development with Ant and Lucene) in chapter 8 (Tools and extensions)

...Listing 8.2 shows a simplistic Ant 1.6.x-compatible build file that indexes a directory of text and HTML files. Listing 8.2 Using the Ant <index> task index"> Lucene Ant index example Parent of index index.base.dir" location="build"/> directory Root directory of documents to index index"> index...

8.2 : Interacting with an index

starts on page 269 in chapter 8 (Tools and extensions)

...You've created a great index. Now what? Wouldn't it be nice to browse the index and perform ad hoc ... application to interact with the index. Thankfully, though, some nice utilities have already been created to let you interact with Lucene file system indexes. We'll explore three such utilities, each unique and having a different type of interface into an index: lucli (Lucene Command-Line Interface)--A CLI that allows ad-hoc querying and index inspection Luke (Lucene Index Toolbox)--A desktop application...