Response to an Amazon review
After all the good reviews and very positive feedback about Lucene in Action that we have received over the last 10 months, we finally came across a not so positive review on Amazon. The review can be broken down into the following 4 main parts:
- Lack of import statements
- Authors didn't test the code
- OOP is not suitable for Lucene code examples and there are no direct Lucene calls
- Need for a command-line tool for HTML indexing
As Amazon's site doesn't let us provide feedback and respond to the review there, we thought we would address these issues here and hopefully help the reviewer get more out of our book. Let's address each of the four concerns:
- Lack of import statements
Code examples in the book purposely don't contain import statements. Often times the list of import statements would be rather long. If we included all the imports, the code examples would be much longer and would often span multiple pages, thus making them harder for readers to follow. The list of import statements would also often repeat, as most examples import the same or very similar set of Lucene classes. Including imports would result in a thicker, heavier, and thus more expensive book.
So how should one deal with the lack of import statements?
Firstly, all code examples from Lucene in Action are free and available for download, even for those who don't own a copy of the book. The code is packaged with an ant script that can compile all the code, create all needed indexes, and run the code examples from the book.
Secondly, one can import all the code in any modern Java IDE and easily see which classes come from which packages.
This is also described in the book itself, in the "About the Book" section on page xxvii, in the last sentence in the paragraph titled "Code examples".
- Authors didn't test the code
One of the novel and interesting aspects of Lucene in Action is that most of its code examples are written as unit tests. All code examples are, therefore, automatically tested. We used the excellent JUnit unit test framework to build the examples, and we provided the reasoning behind this in the "About the Book" section on page xxvii, in the paragraph titled "Why JUnit?".
- OOP is not suitable for Lucene code examples and there are no direct Lucene calls
All the calls to Lucene are direct calls, but presented as unit tests. It sounds like the reviewer is confusing OOP and unit tests.
- Need for a command-line tool for HTML indexing
We present just such a tool in Chapter 7, in section 7.4.2. The chapter also includes a whole mini-framework for indexing other file types (e.g. XML, Word, PDF, etc.).
7.4.3 : Using NekoHTML
starts on page 245 under section 7.4 (Indexing an HTML document) in chapter 7 (Parsing common document formats)
index
starts on page 416
7.2.2 : Parsing and indexing using Digester
starts on page 230 under section 7.2 (Indexing XML) in chapter 7 (Parsing common document formats)
Memory leak in custom sort code
Hello,[Permalink]First *huge* thanks for your book Lucene In Action between it and the lucene develepers and user mailing lists i have been able to give our site a much better search infrastructure.
In the last phase of rolling out our new search system we discovered a memory leak in listing 6.2 DistanceComparatorSource. I used that code as a base for a modified integer sort. That was in and of it self pretty straight forward. But the problem was there was no equals and hash code method. That means that equals and hashcode are inherited from object for DistanceScoreDocLookupComparator.
And there in lies the memory leak. Everytime a new DisctanceComparatorSource was retrieved it failed to find the cached value ScoreDocComparator. So it added it to the cache of ScoreDocCompatators kept by o.a.l.s.FieldCacheImpl. The fix was to add a hashcode and equals method to ou ScoreDocCompatator implementation.
The big clue came after using www.yourkit.com's profiler to see what was allocating so much memory and reading the last paragraph on page 199 a couple of times.
"The sorting infrastructure within Lucene caches (based on a key combining the hashcode of the indexReader, the field name, and the custom sort object) ..."
That sentence gave the clue as to what was happening but it is also a little misleading. Looking at o.a.l.s.FieldCacheImpl The index reader is used as the key for the internal WeakHashMap of the different Entry(fieldName, ScoreDocComparator) that are used in an application.
If implementations of ScoreDocComapartors do not implment hashcode and equals correctly every time they are used they will be added to the internal cache of field/comparators.
This was complete my fault as I usually add the to every class i write, not however in this case. I hope you can add this to the errata for the currrent addition (as well as fix the code) and expand on this in the Second addition so others won't be bitten by this bug.
Thanks again for the book you guys *rock*.
PS. We are using lucene-1.4.3.jar /jsdk 1.4.2 & jre 1.5 solaris and linux