Compass must be configured to work with a specific applications domain model. There are a large number of configuration parameters available (with default settings), which controls how Compass works internal and with the underlying Search Engine. This section describes the configuration API and parameters.
An instance of CompassConfiguration represents a set of mappings (one or more OSEM or Resource mappings), Common Meta Data definitions, transaction and Search Engine parameters. CompassConfiguration is used to build an immutable Compass instance.
CompassConfiguration provides several API's for adding OSEM and Resource mapping (suffixed .cpm.xml), as well as Common Meta Data definition (suffixed .cmd.xml). The following table summarizes the most important API's:
Table 2.1.
| API | Description |
|---|---|
addFile(String) | Loads the mapping file (cpm or cmd) according to the specified file path string. |
addFile(File) | Loads the mapping file (cpm or cmd) according to the specified file object reference. |
addClass(Class) | Loads the mapping file (cpm) according to the specified class. test.Author.class will map to test/Author.cpm.xml within the class path. |
addURL(URL) | Loads the mapping file (cpm or cmd) according to the specified URL. |
addResource(String) | Loads the mapping file (cpm or cmd) according to the specified resource path from the class path. |
addInputStream(InputStream) | Loads the mapping file (cpm or cmd) according to the specified input stream. |
addDirectory(String) | Loads all the files named *.cpm.xml or *.cmd.xml from within the specified directory. |
addJar(File) | Loads all the files named *.cpm.xml or *.cmd.xml from within the specified Jar file. |
addMappingResolver(MappingResolver) | Uses a class that implements the MappingResolver to get an InputStream for xml mapping definitions. |
Other than mapping file configuration API (CompassConfiguration), Compass::Core can be configured through the CompassSettings interface. CompassSettings is similar to Java Properties class and is accessible via the CompassConfiguration.getSettings() or the CopmassConfiguration.setSetting(String setting, String value) methods. Compass's many different settings will be explained later in this section (the concepts will become more clear once the remaining documentation is read).
Compass setting can also be defined programmatically using the org.compassframework.core.config.CompassEnvironment and org.compassframework.core.lucene.LuceneEnvironment classes (hold programmatic manifestation of all the different settings).
In terms of required settings, Compass only requires the compass.engine.connection (which maps to CompassEnvironment.CONNECTION) parameter defined.
Again, many words and so little code... . The following code example shows the minimal CompassConfiguration with programmatic control:
CompassConfiguration conf = new CompassConfiguration()
.setSetting(CompassEnvironment.CONNECTION, "my/index/dir")
.addResource(DublinCore.cmd.xml)
.addClass(Author.class);
All of Compass's operational configuration (apart from mapping definitions) can be defined in a single xml configuration file, with the default name compass.cfg.xml. You can define the environmental settings and mapping file locations within this file. The following table shows the different CompassConfiguration API's for locating the main configuration file:
Table 2.2.
| API | Description |
|---|---|
configure() | Loads a configuration file called compass.cfg.xml from the root of the class path. |
configure(String) | Loads a configuration file from the specified path |
And here is an example of the xml configuration file:
<!DOCTYPE compass-core-configuration PUBLIC
"-//Compass/Compass Core Configuration DTD 1.0//EN"
"http://static.compassframework.org/dtd/compass-core-configuration-1.0.dtd">
<compass-core-configuration>
<compass>
<setting name="compass.engine.connection">my/index/dir</setting>
<meta-data resource="vocabulary/DublinCore.cmd.xml" />
<mapping resource="test/Author.cpm.xml" />
</compass>
</compass-core-configuration>
After CompassConfiguration has been set (either programmatic or using the XML configuration file), you can now build a Compass instance. Compass is intended to be shared among different application threads. The following simple code example shows how to obtain a Compass reference.
Compass compass = cfg.buildCompass();Note: It is possible to have multiple
Compass instances within the same application, each with a different configuration.
Compass's various settings have been logically grouped in the following section, with a short description of each setting. A more detail definition of each can be found in Compass::Core documentation. Note: the only mandatory setting is the index file location specified in compass.engine.connection.
Sets the Search engine index connecion string.
Table 2.3.
| Connection | Description |
|---|---|
file:// prefix or no prefix | The path to the file system based index path, using default file handling. This is a JVM level setting for all the file based prefixes. |
mmap:// prefix | Uses Java 1.4 nio MMAp class. Considered slower than the general file system one, but might have memory benefits (according to the Lucene documentation). This is a JVM level setting for all the file based prefixes. |
ram:// prefix | Creates a memory based index, follows the Compass life-cycle. Created when the Compass is created, and disposed when Compass is closed. |
Controls Compass registration through JNDI,
using Compass JNDI lookups.
Table 2.4.
| Setting | Description |
|---|---|
| compass.name | The name that Compass will be registered under. Note that you can specify it at the XML configuration file with a name attribute at the compass element. If undefined, Compass will not register Compass using JNDI. |
| compass.jndi.class | JNDI initial context class, Context.INITIAL_CONTEXT_FACTORY. |
| compass.jndi.url | JNDI provider URL, Context.PROVIDER_URL |
| compass.jndi.* | prefix for arbitrary JNDI InitialContext properties |
Controls Compass automatic properties, and property names.
Table 2.5.
| Setting | Description |
|---|---|
| compass.property.alias | The name of the "alias" property that Compass will use (a property that holds the alias property value of a resource). Defaults to alias (set it only if one of your mapped meta data is called alias). |
| compass.property.all | The name of the "all" property that Compass will use (a property that accumulates all the properties/meta-data). Defaults to all (set it only if one of your mapped meta data is called all). Note that it can be overriden in the mapping files. |
compass.property.all.termVector (defaults to no) | The default setting for the term vector of the all property. Can be one of no, yes, positions, offsets, or positions_offsets. |
Defines Compass::Core supported transaction and special transaction levels. The two most common transaction levels that Compass::Core supports are read_committed and serializable (Compass::Core uses a sophisticated mechanism for the read_committed level, which does operate as fast as the same transaction with a hint that it is read only). A special transaction level
batch_insert is also supported, which specializes in handling batch indexing. You can set the transaction level using the compass.transaction.isolation setting. The following is a list of available Compass transaction levels:
Table 2.6.
| Transaction Level | Description |
|---|---|
| none | Not supported, upgraded to read_committed. |
| read_uncommitted | Not supported, upgraded to read_committed. |
| read_committed | The same read committed from data base systems. As fast for read only transactions. |
| repeatable_read | Not supported, upgraded to serializable. |
| serializable | The same as serializable from data base systems. Performance killer, basically results in transactions executed sequentially. |
| batch_insert | Specialized transaction level, mainly used for batch indexing. Note that it does not support queries, delete and save actions. Only the create operation is supported (If a resources with the same id and alias is in the index, you will have two after the create operation). Extremely fast for batch indexing, especially when tweaked with the matching settings in the Search Engine section. |
Please read more about how Compass::Core implements it's transaction management in the Search Engine section.
When using the Compass::Core transaction API, you must specify a factory class for the CompassTransaction instances. This is done by setting the property compass.transaction.factory. The CompassTransaction API hides the underlying transaction mechanism, allowing Compass::Core code to run in a managed and non-managed environments. The two standard strategies are:
Table 2.7.
| Transaction Strategy | Description |
|---|---|
| org.compassframework.core. transaction.LocalTransactionFactory | Manages a local transaction which does not interact with other transaction mechanisms. |
| org.compassframework.core. transaction.JTASyncTransactionFactory | Uses the JTA synchronization support to synchronize with the JTA transaction (not the same as two phase commit, meaning that if the transaction fails, the other resources that participate in the transaction will not roll back). If there is no existing JTA transaction, a new one will be started. |
Although the J2EE specification does not provide a standard way to reference a JTA TransactionManager, to register with a transaction synchronization service, Compass provides several lookups which can be set with a compass.transaction.managerLookup setting (thanks hibernate!).
The following table lists them all:
Table 2.8.
| Transaction Manager Lookup | Application Server |
|---|---|
| org.compassframework.core.transaction.manager.JBoss | JBoss |
| org.compassframework.core.transaction.manager.Weblogic | Weblogic |
| org.compassframework.core.transaction.manager.WebSphere | WebSphere |
| org.compassframework.core.transaction.manager.Orion | Orion |
| org.compassframework.core.transaction.manager.JOTM | JOTM |
| org.compassframework.core.transaction.manager.JOnaAS | JOnAS |
| org.compassframework.core.transaction.manager.JRun4 | JRun4 |
| org.compassframework.core.transaction.manager.BEST | Borland ES |
The JTA transaction mechanism will use the JNDI configuration to lookup the JTA UserTransaction. The transaction manager lookup provides the JNDI name, but if you wish to set it yourself, you can set the compass.transaction.userTransactionName setting.
Controls the different settings for the search engine.
Table 2.9.
| Setting | Description |
|---|---|
| compass.engine.connection | The index engine file system location. |
| compass.engine.defaultsearch | When searching using a query string, the default property/meta-data that compass will use for non prefixed strings. Defaults to compass.property.all value. |
| compass.engine.all.analyzer | The name of the analyzer to use for the all property (see the next section about Search Engine Analyzers). |
| compass.transaction.lockDir | The directory where the search engine will maintain it's locking file mechanism for inter and outer process transaction synchronization. Defaults to the java.io.tmpdir Java system property. This is a JVM level property. |
| compass.transaction.lockTimeout | The amount of time a transaction will wait in order to obtain it's specific lock (in seconds). Defaults to 10 seconds. |
| compass.transaction.commitTimeout | The amount of time a transaction will wait in order to commit it's data. Defaults to 10 seconds. |
| compass.transaction.lockPollInterval | The interval that the transaction will check to see if it can obtain the lock (in milliseconds). Defaults to 100 milliseconds. This is a JVM level proeprty. |
| compass.engine.optimizer.type | The fully qualified class name of the search engine optimizer that will be used. Defaults to org.compassframework.core.lucene.engine. optimizer.AdaptiveOptimizer. Please see the following section for a list of optimizers. |
| compass.engine.optimizer.schedule | Determines if the optimizer will be scheduled or not (true or false), defaults to true. If it is scheduled, it will run each period of time and check if the index need optimization, and if it does, it will optimize it. |
| compass.engine.optimizer. schedule.period | The period that the optimizer will check if the index need optimization, and if it does, optimize it (in seconds, can be a float number). Defaults to 10 seconds. The setting applies if the optimizer is scheduled. |
| compass.engine.optimizer. schedule.daemon | Sets the optimizer thread to be a daemon (true) or not (false). Defaults to false. The setting applies if the optimizer is scheduled. |
| compass.engine.optimizer. schedule.fixedRate | Determines if the schedule will run in a fixed rate or not. If it is set to false each execution is scheduled relative to the actual execution of the previous execution. If it is set to true each execution is scheduled relative to the execution time of the initial execution. |
| compass.engine.optimizer. adaptive.mergeFactor | For the adaptive optimizer, determines how often the optimizer will optimize the index. With small values, the faster the searches will be, but the more often that the index will be optimized. Larger values will result in slower searches, and less optimizations. |
| compass.engine.optimizer. aggressive.mergeFactor | For the aggressive optimizer, determines how often the optimizer will optimize the index. With small values, the faster the searches will be, but the more often that the index will be optimized. Larger values will result in slower searches, and less optimizations. |
| compass.engine.mergeFactor | Applies only for batch_insert transaction. With smaller values, less RAM is used, but indexing is slower. With larger values, more RAM is used, and the indexing speed is faster. Defaults to 10. |
| compass.engine.maxBufferedDocs | Applies only for batch_insert transaction. Determines the minimal number of resources required before the buffered in-memory resources will be flushed to disk. Large values give faster indexing. At the same time, compass.engine.mergeFactor limits the number of files open. |
| compass.engine.maxFieldLength | The number of terms that will be indexed for a single property in a resource. This limits the amount of memory required for indexing, so that collections with very large resources will not crash the indexing process by running out of memory. Note, that this effectively truncates large resources, excluding from the index terms that occur further in the resource. Defaults to 10,000 terms. |
| copmass.engine.useCompoundFile | Turn on (true) or off (false) the use of compound files. If used lowers the number of files open, but have very small performance overhead. Defaults to true.
|
The following section lists the different optimizers that are available with Compass::Core. Note that all the optimizers can be scheduled or not.
Table 2.10.
| Optimizer | Description |
|---|---|
org.compassframework.core.lucene.engine. optimizer.AdaptiveOptimizer | When the number of segments exceeds that specified mergeFactor, the segments will be merged from the last segment, until a segment with a higher resource count will be encountered. |
org.compassframework.core.lucene.engine. optimizer.AggressiveOptimizer | When the number of segments exceeds that specified mergeFactor, all the segments are merged into a single segment. |
org.compassframework.core.lucene.engine. optimizer.NullOptimizer | Does no optimization, starts no threads. |
With Compass, multiple Analyzers can be defined (each under a different analyzer name) and than referenced in the configuration and mapping definitions. Compass defines two internal analyzers names called: default and search. The default analyzer is the one used when no other analyzer can be found, it defaults to the standard analyzer with English stop words. The search is the analyzer used to analyze search queries, and if not set, defaults to the default analyzer (Note that the search analyzer can also be set using the CompassQuery API). Changing the settings for the default analyzer can be done using the compass.engine.analyzer.default.* settings (as explained in the next table). Setting the search analyzer (so it will differ from the default analyzer) can be done using the compass.engine.analyzer.search.* settings.
Table 2.11.
| Setting | Description |
|---|---|
| compass.engine.analyzer.[analyzer name].type | The type of the search engine analyzer, please see the available analyzers types later in the section. |
| compass.engine.analyzer.[analyzer name].stopwords | A comma separated list of stop words to use with the chosen analyzer. If the string starts with +, the list of stop-words will be added to the default set of stop words defined for the analyzer. Note, that not all the analyzers type support this feature. |
| compass.engine.analyzer.[analyzer name].factory | If the compass.engine.analyzer.[analyzer name].type setting is not enough to configure your analyzer, use it to define the fully qualified class name of your analyzer factory which implements LuceneAnalyzerFactory class. |
Compass comes with core analyzers (Which are part of the lucene-core jar). They are: standard, simple, whitespace, and stop. See the Analyzers Section.
Compass also allows simple configuration of the snowball analyzer type (which comes with the lucene-snowball jar). An additional setting that must be set when using the snowball analyzer, is the compass.engine.analyzer.[analyzer name].name setting. The settings can have the following values: Danish, Dutch, English, Finnish, French, German, German2, Italian, Kp, Lovins, Norwegian, Porter, Portuguese, Russian, Spanish, and Swedish.
Another set of analyer types comes with the lucene-analyzers jar. They are: brazilian, cjk, chinese, czech, german, greek, french, dutch, and russian.
With Compass, multiple Highlighters can be defined (each under a different highlighter name) and than referenced when using CompassHighlighter. Within Compass, an internal default highlighter is defined, and can be configured when using default as the highlighter name.
Table 2.12.
| Setting | Description |
|---|---|
| compass.engine.highlighter.[highlighter name].factory | Low level. Optional (defaults to DefaultLuceneHighlighterFactory). The fully qualified name of the class that creates highlighters settings. Must implement the LuceneHighlighterFactory interface. |
| compass.engine.highlighter.[highlighter name].textTokenizer | Optional (default to auto). Defines how a text will be tokenized to be highlighted. Can be analyzer (use an analyzer to tokenize the text), term_vector (use the term vector info stored in the index), or auto (will first try term_vector, and if no info is stored, will try to use analyzer). |
| compass.engine.highlighter.[highlighter name].rewriteQuery | Low level. Optional (defaults to true). If the query used to highlight the text will be rewritten or not. |
| compass.engine.highlighter.[highlighter name].computeIdf | Low level. Optional (set according to the formatter used). |
| compass.engine.highlighter.[highlighter name].maxNumFragments | Optional (default to 3). Sets the maximum number of fragments that will be returned. |
| compass.engine.highlighter.[highlighter name].separator | Optional (defaults to ...). Sets the separator string between fragments if using the combined fragments highlight option. |
| compass.engine.highlighter.[highlighter name].maxBytesToAnalyze | Optional (defaults to 50*1024). Sets the maximum byes of text to analyze. |
| compass.engine.highlighter.[highlighter name].fragmenter.type | Optional (default to simple). The type of the fragmenter that will be used, can be simple or the fully qualified class name of the fragmenter (implements the org.apache.lucene.search.highlight.Fragmenter). |
| compass.engine.highlighter.[highlighter name].fragmenter.simple.size | Optional (defaults to 100). Sets the size (in bytes) of the fragments for the simple fragmenter. |
| compass.engine.highlighter.[highlighter name].encoder.type | Optional (default to default). The type of the encoder that will be used to encode fragmented text. Can be default (does nothing), html (escapes html tags), or the fully qualifed class name of the encoder (implements org.apache.lucene.search.highlight.Encoder). |
| compass.engine.highlighter.[highlighter name].formatter.type | Optional (default to simple). The type of the formatter that will be used to highlight the text. Can be simple (simply wraps the highlighted text with pre and post strings), htmlSpanGradient (wraps the highlighted text with an html span tag with an optional background and foreground gradient colors), or the fully qualified class name of the formatter (implements org.apache.lucene.search.highlight.Formatter). |
| compass.engine.highlighter.[highlighter name].formatter.simple.pre | Optional (default to <b>). In case the highlighter uses the simple formatter, controlls the text that is appened before the highlighted text. |
| compass.engine.highlighter.[highlighter name].formatter.simple.post | Optional (default to </b>). In case the highlighter uses the simple formatter, controlls the text that is appened after the highlighted text. |
| compass.engine.highlighter.[highlighter name].formatter.htmlSpanGradient.maxScore | In case the highlighter uses the htmlSpanGradient formatter, the score that above it is displayed as max color. |
| compass.engine.highlighter.[highlighter name].formatter.htmlSpanGradient.minForegroundColor | Optional (if not set, foreground will not be set on the span tag). In case the highlighter uses the htmlSpanGradient formatter, the hex color used for representing IDF scores of zero eg #FFFFFF (white). |
| compass.engine.highlighter.[highlighter name].formatter.htmlSpanGradient.maxForegroundColor | Optional (if not set, foreground will not be set on the span tag). In case the highlighter uses the htmlSpanGradient formatter, the largest hex color used for representing IDF scores eg #000000 (black). |
| compass.engine.highlighter.[highlighter name].formatter.htmlSpanGradient.minBackgroundColor | Optional (if not set, background will not be set on the span tag). In case the highlighter uses the htmlSpanGradient formatter, the hex color used for representing IDF scores of zero eg #FFFFFF (white). |
| compass.engine.highlighter.[highlighter name].formatter.htmlSpanGradient.maxBackgroundColor | Optional (if not set, background will not be set on the span tag). In case the highlighter uses the htmlSpanGradient formatter, The largest hex color used for representing IDF scores eg #000000 (black). |
Several other settings that control compass.