Synopses & Reviews
With the introduction of Ferret, Ruby users now have one of the fastest and most flexible search libraries available. And it's surprisingly easy to use.
This book will show you how to quickly get up and running with Ferret. You'll learn how to index different document types such as PDF, Microsoft Word, and HTML, as well as how to deal with foreign languages and different character encodings. Ferret describes the Ferret Query Language in detail along with the object-oriented approach to building queries.
You will also be introduced to sorting, filtering, and highlighting your search results, with an explanation of exactly how you need to set up your index to perform these tasks. You will also learn how to optimize a Ferret index for lightning fast indexing and split-second query results.
About the Author
David Balmain is a freelance software developer and the primary developer of the open source search library Ferret. He gained an interest in high performance text processing at university where he earned a BSc specializing in natural language processing. Recently he has taken an interest in web application development and become enamored with the scripting language Ruby.
Currently David resides with his girlfriend in 12 square meter apartment in Tokyo where he practices Judo five hours a day and is trying to learn Japanese.
Table of Contents
Preface; Conventions Used in This Book; Using Code Examples; Safari® Enabled; How to Contact Us; Chapter 1: Getting Started; 1.1 Installing Ferret; 1.2 A Quick Example: Indexing the Filesystem; 1.3 Summary; Chapter 2: Indexing; 2.1 Index Storage; 2.2 Documents, Fields, and Boosts; 2.3 Setting Up the Index; 2.4 Basic Indexing Operations; 2.5 Indexing Non-String Datatypes; 2.6 Summary; Chapter 3: Advanced Indexing; 3.1 How the Indexing Process Works; 3.2 Tuning Indexing Performance; 3.3 Optimizing the Index; 3.4 Index Locking and Concurrency Issues; 3.5 Summary; Chapter 4: Search; 4.1 Overview of Searching Classes; 4.2 Building Queries; 4.3 QueryParser; 4.4 Filtering Search Results; 4.5 Sorting Search Results; 4.6 Highlighting Query Results; 4.7 Summary; Chapter 5: Analysis; 5.1 Token; 5.2 TokenStream; 5.3 Analyzer; 5.4 Custom Analysis; Chapter 6: Ferret in Practice; 6.1 Indexing Multiple Document Types; 6.2 Other Indexing Improvements; 6.3 Search Improvements; 6.4 Putting It All Together; 6.5 Summary; Colophon;