Chapter 2. Configuration

2.1. Introduction

Compass must be configured to work with a specific applications domain model. There are a large number of configuration parameters available (with default settings), which controls how Compass works internal and with the underlying Search Engine. This section describes the configuration API and parameters.

2.2. Programmatic Configuration

An instance of CompassConfiguration represents a set of mappings (one or more OSEM or Resource mappings), Common Meta Data definitions, transaction and Search Engine parameters. CompassConfiguration is used to build an immutable Compass instance.

CompassConfiguration provides several API's for adding OSEM and Resource mapping (suffixed .cpm.xml), as well as Common Meta Data definition (suffixed .cmd.xml). The following table summarizes the most important API's:

Table 2.1. 

APIDescription
addFile(String)Loads the mapping file (cpm or cmd) according to the specified file path string.
addFile(File)Loads the mapping file (cpm or cmd) according to the specified file object reference.
addClass(Class)Loads the mapping file (cpm) according to the specified class. test.Author.class will map to test/Author.cpm.xml within the class path.
addURL(URL)Loads the mapping file (cpm or cmd) according to the specified URL.
addResource(String)Loads the mapping file (cpm or cmd) according to the specified resource path from the class path.
addInputStream(InputStream)Loads the mapping file (cpm or cmd) according to the specified input stream.
addDirectory(String)Loads all the files named *.cpm.xml or *.cmd.xml from within the specified directory.
addJar(File)Loads all the files named *.cpm.xml or *.cmd.xml from within the specified Jar file.
addMappingResolver(MappingResolver)Uses a class that implements the MappingResolver to get an InputStream for xml mapping definitions.

Other than mapping file configuration API (CompassConfiguration), Compass::Core can be configured through the CompassSettings interface. CompassSettings is similar to Java Properties class and is accessible via the CompassConfiguration.getSettings() or the CopmassConfiguration.setSetting(String setting, String value) methods. Compass's many different settings will be explained later in this section (the concepts will become more clear once the remaining documentation is read).

Compass setting can also be defined programmatically using the org.compassframework.core.config.CompassEnvironment and org.compassframework.core.lucene.LuceneEnvironment classes (hold programmatic manifestation of all the different settings).

In terms of required settings, Compass only requires the compass.engine.connection (which maps to CompassEnvironment.CONNECTION) parameter defined.

Again, many words and so little code... . The following code example shows the minimal CompassConfiguration with programmatic control:

CompassConfiguration conf = new CompassConfiguration()
     .setSetting(CompassEnvironment.CONNECTION, "my/index/dir")
     .addResource(DublinCore.cmd.xml)
     .addClass(Author.class);

2.3. XML Configuration

All of Compass's operational configuration (apart from mapping definitions) can be defined in a single xml configuration file, with the default name compass.cfg.xml. You can define the environmental settings and mapping file locations within this file. The following table shows the different CompassConfiguration API's for locating the main configuration file:

Table 2.2. 

APIDescription
configure()Loads a configuration file called compass.cfg.xml from the root of the class path.
configure(String)Loads a configuration file from the specified path

And here is an example of the xml configuration file:

<!DOCTYPE compass-core-configuration PUBLIC
"-//Compass/Compass Core Configuration DTD 1.0//EN"
"http://static.compassframework.org/dtd/compass-core-configuration-1.0.dtd">

<compass-core-configuration>
  <compass>
    <setting name="compass.engine.connection">my/index/dir</setting>

    <meta-data resource="vocabulary/DublinCore.cmd.xml" />
    <mapping resource="test/Author.cpm.xml" />

  </compass>
</compass-core-configuration>

2.4. Obtaining a Compass reference

After CompassConfiguration has been set (either programmatic or using the XML configuration file), you can now build a Compass instance. Compass is intended to be shared among different application threads. The following simple code example shows how to obtain a Compass reference.

Compass compass = cfg.buildCompass();
Note: It is possible to have multiple Compass instances within the same application, each with a different configuration.

2.5. Compass Settings

Compass's various settings have been logically grouped in the following section, with a short description of each setting. A more detail definition of each can be found in Compass::Core documentation. Note: the only mandatory setting is the index file location specified in compass.engine.connection.

2.5.1. compass.engine.connection

Sets the Search engine index connecion string.

Table 2.3. 

ConnectionDescription
file:// prefix or no prefixThe path to the file system based index path, using default file handling. This is a JVM level setting for all the file based prefixes.
mmap:// prefixUses Java 1.4 nio MMAp class. Considered slower than the general file system one, but might have memory benefits (according to the Lucene documentation). This is a JVM level setting for all the file based prefixes.
ram:// prefixCreates a memory based index, follows the Compass life-cycle. Created when the Compass is created, and disposed when Compass is closed.

2.5.2. JNDI

Controls Compass registration through JNDI, using Compass JNDI lookups.

Table 2.4. 

SettingDescription
compass.nameThe name that Compass will be registered under. Note that you can specify it at the XML configuration file with a name attribute at the compass element. If undefined, Compass will not register Compass using JNDI.
compass.jndi.classJNDI initial context class, Context.INITIAL_CONTEXT_FACTORY.
compass.jndi.urlJNDI provider URL, Context.PROVIDER_URL
compass.jndi.*prefix for arbitrary JNDI InitialContext properties

2.5.3. Property

Controls Compass automatic properties, and property names.

Table 2.5. 

SettingDescription
compass.property.aliasThe name of the "alias" property that Compass will use (a property that holds the alias property value of a resource). Defaults to alias (set it only if one of your mapped meta data is called alias).
compass.property.allThe name of the "all" property that Compass will use (a property that accumulates all the properties/meta-data). Defaults to all (set it only if one of your mapped meta data is called all). Note that it can be overriden in the mapping files.
compass.property.all.termVector (defaults to no)The default setting for the term vector of the all property. Can be one of no, yes, positions, offsets, or positions_offsets.

2.5.4. Transaction Level

Defines Compass::Core supported transaction and special transaction levels. The two most common transaction levels that Compass::Core supports are read_committed and serializable (Compass::Core uses a sophisticated mechanism for the read_committed level, which does operate as fast as the same transaction with a hint that it is read only). A special transaction level batch_insert is also supported, which specializes in handling batch indexing. You can set the transaction level using the compass.transaction.isolation setting. The following is a list of available Compass transaction levels:

Table 2.6. 

Transaction LevelDescription
noneNot supported, upgraded to read_committed.
read_uncommittedNot supported, upgraded to read_committed.
read_committedThe same read committed from data base systems. As fast for read only transactions.
repeatable_readNot supported, upgraded to serializable.
serializableThe same as serializable from data base systems. Performance killer, basically results in transactions executed sequentially.
batch_insertSpecialized transaction level, mainly used for batch indexing. Note that it does not support queries, delete and save actions. Only the create operation is supported (If a resources with the same id and alias is in the index, you will have two after the create operation). Extremely fast for batch indexing, especially when tweaked with the matching settings in the Search Engine section.

Please read more about how Compass::Core implements it's transaction management in the Search Engine section.

2.5.5. Transaction Strategy

When using the Compass::Core transaction API, you must specify a factory class for the CompassTransaction instances. This is done by setting the property compass.transaction.factory. The CompassTransaction API hides the underlying transaction mechanism, allowing Compass::Core code to run in a managed and non-managed environments. The two standard strategies are:

Table 2.7. 

Transaction StrategyDescription
org.compassframework.core. transaction.LocalTransactionFactoryManages a local transaction which does not interact with other transaction mechanisms.
org.compassframework.core. transaction.JTASyncTransactionFactoryUses the JTA synchronization support to synchronize with the JTA transaction (not the same as two phase commit, meaning that if the transaction fails, the other resources that participate in the transaction will not roll back). If there is no existing JTA transaction, a new one will be started.

Although the J2EE specification does not provide a standard way to reference a JTA TransactionManager, to register with a transaction synchronization service, Compass provides several lookups which can be set with a compass.transaction.managerLookup setting (thanks hibernate!).

The following table lists them all:

Table 2.8. 

Transaction Manager LookupApplication Server
org.compassframework.core.transaction.manager.JBossJBoss
org.compassframework.core.transaction.manager.WeblogicWeblogic
org.compassframework.core.transaction.manager.WebSphereWebSphere
org.compassframework.core.transaction.manager.OrionOrion
org.compassframework.core.transaction.manager.JOTMJOTM
org.compassframework.core.transaction.manager.JOnaASJOnAS
org.compassframework.core.transaction.manager.JRun4JRun4
org.compassframework.core.transaction.manager.BESTBorland ES

The JTA transaction mechanism will use the JNDI configuration to lookup the JTA UserTransaction. The transaction manager lookup provides the JNDI name, but if you wish to set it yourself, you can set the compass.transaction.userTransactionName setting.

2.5.6. Search Engine

Controls the different settings for the search engine.

Table 2.9. 

SettingDescription
compass.engine.connectionThe index engine file system location.
compass.engine.defaultsearchWhen searching using a query string, the default property/meta-data that compass will use for non prefixed strings. Defaults to compass.property.all value.
compass.engine.all.analyzerThe name of the analyzer to use for the all property (see the next section about Search Engine Analyzers).
compass.transaction.lockDirThe directory where the search engine will maintain it's locking file mechanism for inter and outer process transaction synchronization. Defaults to the java.io.tmpdir Java system property. This is a JVM level property.
compass.transaction.lockTimeoutThe amount of time a transaction will wait in order to obtain it's specific lock (in seconds). Defaults to 10 seconds.
compass.transaction.commitTimeoutThe amount of time a transaction will wait in order to commit it's data. Defaults to 10 seconds.
compass.transaction.lockPollIntervalThe interval that the transaction will check to see if it can obtain the lock (in milliseconds). Defaults to 100 milliseconds. This is a JVM level proeprty.
compass.engine.optimizer.typeThe fully qualified class name of the search engine optimizer that will be used. Defaults to org.compassframework.core.lucene.engine. optimizer.AdaptiveOptimizer. Please see the following section for a list of optimizers.
compass.engine.optimizer.scheduleDetermines if the optimizer will be scheduled or not (true or false), defaults to true. If it is scheduled, it will run each period of time and check if the index need optimization, and if it does, it will optimize it.
compass.engine.optimizer. schedule.periodThe period that the optimizer will check if the index need optimization, and if it does, optimize it (in seconds, can be a float number). Defaults to 10 seconds. The setting applies if the optimizer is scheduled.
compass.engine.optimizer. schedule.daemonSets the optimizer thread to be a daemon (true) or not (false). Defaults to false. The setting applies if the optimizer is scheduled.
compass.engine.optimizer. schedule.fixedRateDetermines if the schedule will run in a fixed rate or not. If it is set to false each execution is scheduled relative to the actual execution of the previous execution. If it is set to true each execution is scheduled relative to the execution time of the initial execution.
compass.engine.optimizer. adaptive.mergeFactorFor the adaptive optimizer, determines how often the optimizer will optimize the index. With small values, the faster the searches will be, but the more often that the index will be optimized. Larger values will result in slower searches, and less optimizations.
compass.engine.optimizer. aggressive.mergeFactorFor the aggressive optimizer, determines how often the optimizer will optimize the index. With small values, the faster the searches will be, but the more often that the index will be optimized. Larger values will result in slower searches, and less optimizations.
compass.engine.mergeFactorApplies only for batch_insert transaction. With smaller values, less RAM is used, but indexing is slower. With larger values, more RAM is used, and the indexing speed is faster. Defaults to 10.
compass.engine.maxBufferedDocsApplies only for batch_insert transaction. Determines the minimal number of resources required before the buffered in-memory resources will be flushed to disk. Large values give faster indexing. At the same time, compass.engine.mergeFactor limits the number of files open.
compass.engine.maxFieldLengthThe number of terms that will be indexed for a single property in a resource. This limits the amount of memory required for indexing, so that collections with very large resources will not crash the indexing process by running out of memory. Note, that this effectively truncates large resources, excluding from the index terms that occur further in the resource. Defaults to 10,000 terms.
copmass.engine.useCompoundFileTurn on (true) or off (false) the use of compound files. If used lowers the number of files open, but have very small performance overhead. Defaults to true.

The following section lists the different optimizers that are available with Compass::Core. Note that all the optimizers can be scheduled or not.

Table 2.10. 

OptimizerDescription
org.compassframework.core.lucene.engine. optimizer.AdaptiveOptimizerWhen the number of segments exceeds that specified mergeFactor, the segments will be merged from the last segment, until a segment with a higher resource count will be encountered.
org.compassframework.core.lucene.engine. optimizer.AggressiveOptimizerWhen the number of segments exceeds that specified mergeFactor, all the segments are merged into a single segment.
org.compassframework.core.lucene.engine. optimizer.NullOptimizerDoes no optimization, starts no threads.

2.5.7. Search Engine Analyzers

With Compass, multiple Analyzers can be defined (each under a different analyzer name) and than referenced in the configuration and mapping definitions. Compass defines two internal analyzers names called: default and search. The default analyzer is the one used when no other analyzer can be found, it defaults to the standard analyzer with English stop words. The search is the analyzer used to analyze search queries, and if not set, defaults to the default analyzer (Note that the search analyzer can also be set using the CompassQuery API). Changing the settings for the default analyzer can be done using the compass.engine.analyzer.default.* settings (as explained in the next table). Setting the search analyzer (so it will differ from the default analyzer) can be done using the compass.engine.analyzer.search.* settings.

Table 2.11. 

SettingDescription
compass.engine.analyzer.[analyzer name].typeThe type of the search engine analyzer, please see the available analyzers types later in the section.
compass.engine.analyzer.[analyzer name].stopwordsA comma separated list of stop words to use with the chosen analyzer. If the string starts with +, the list of stop-words will be added to the default set of stop words defined for the analyzer. Note, that not all the analyzers type support this feature.
compass.engine.analyzer.[analyzer name].factoryIf the compass.engine.analyzer.[analyzer name].type setting is not enough to configure your analyzer, use it to define the fully qualified class name of your analyzer factory which implements LuceneAnalyzerFactory class.

Compass comes with core analyzers (Which are part of the lucene-core jar). They are: standard, simple, whitespace, and stop. See the Analyzers Section.

Compass also allows simple configuration of the snowball analyzer type (which comes with the lucene-snowball jar). An additional setting that must be set when using the snowball analyzer, is the compass.engine.analyzer.[analyzer name].name setting. The settings can have the following values: Danish, Dutch, English, Finnish, French, German, German2, Italian, Kp, Lovins, Norwegian, Porter, Portuguese, Russian, Spanish, and Swedish.

Another set of analyer types comes with the lucene-analyzers jar. They are: brazilian, cjk, chinese, czech, german, greek, french, dutch, and russian.

2.5.8. Search Engine Highlighters

With Compass, multiple Highlighters can be defined (each under a different highlighter name) and than referenced when using CompassHighlighter. Within Compass, an internal default highlighter is defined, and can be configured when using default as the highlighter name.

Table 2.12. 

SettingDescription
compass.engine.highlighter.[highlighter name].factoryLow level. Optional (defaults to DefaultLuceneHighlighterFactory). The fully qualified name of the class that creates highlighters settings. Must implement the LuceneHighlighterFactory interface.
compass.engine.highlighter.[highlighter name].textTokenizerOptional (default to auto). Defines how a text will be tokenized to be highlighted. Can be analyzer (use an analyzer to tokenize the text), term_vector (use the term vector info stored in the index), or auto (will first try term_vector, and if no info is stored, will try to use analyzer).
compass.engine.highlighter.[highlighter name].rewriteQueryLow level. Optional (defaults to true). If the query used to highlight the text will be rewritten or not.
compass.engine.highlighter.[highlighter name].computeIdfLow level. Optional (set according to the formatter used).
compass.engine.highlighter.[highlighter name].maxNumFragmentsOptional (default to 3). Sets the maximum number of fragments that will be returned.
compass.engine.highlighter.[highlighter name].separatorOptional (defaults to ...). Sets the separator string between fragments if using the combined fragments highlight option.
compass.engine.highlighter.[highlighter name].maxBytesToAnalyzeOptional (defaults to 50*1024). Sets the maximum byes of text to analyze.
compass.engine.highlighter.[highlighter name].fragmenter.typeOptional (default to simple). The type of the fragmenter that will be used, can be simple or the fully qualified class name of the fragmenter (implements the org.apache.lucene.search.highlight.Fragmenter).
compass.engine.highlighter.[highlighter name].fragmenter.simple.sizeOptional (defaults to 100). Sets the size (in bytes) of the fragments for the simple fragmenter.
compass.engine.highlighter.[highlighter name].encoder.typeOptional (default to default). The type of the encoder that will be used to encode fragmented text. Can be default (does nothing), html (escapes html tags), or the fully qualifed class name of the encoder (implements org.apache.lucene.search.highlight.Encoder).
compass.engine.highlighter.[highlighter name].formatter.typeOptional (default to simple). The type of the formatter that will be used to highlight the text. Can be simple (simply wraps the highlighted text with pre and post strings), htmlSpanGradient (wraps the highlighted text with an html span tag with an optional background and foreground gradient colors), or the fully qualified class name of the formatter (implements org.apache.lucene.search.highlight.Formatter).
compass.engine.highlighter.[highlighter name].formatter.simple.preOptional (default to <b>). In case the highlighter uses the simple formatter, controlls the text that is appened before the highlighted text.
compass.engine.highlighter.[highlighter name].formatter.simple.postOptional (default to </b>). In case the highlighter uses the simple formatter, controlls the text that is appened after the highlighted text.
compass.engine.highlighter.[highlighter name].formatter.htmlSpanGradient.maxScoreIn case the highlighter uses the htmlSpanGradient formatter, the score that above it is displayed as max color.
compass.engine.highlighter.[highlighter name].formatter.htmlSpanGradient.minForegroundColorOptional (if not set, foreground will not be set on the span tag). In case the highlighter uses the htmlSpanGradient formatter, the hex color used for representing IDF scores of zero eg #FFFFFF (white).
compass.engine.highlighter.[highlighter name].formatter.htmlSpanGradient.maxForegroundColorOptional (if not set, foreground will not be set on the span tag). In case the highlighter uses the htmlSpanGradient formatter, the largest hex color used for representing IDF scores eg #000000 (black).
compass.engine.highlighter.[highlighter name].formatter.htmlSpanGradient.minBackgroundColorOptional (if not set, background will not be set on the span tag). In case the highlighter uses the htmlSpanGradient formatter, the hex color used for representing IDF scores of zero eg #FFFFFF (white).
compass.engine.highlighter.[highlighter name].formatter.htmlSpanGradient.maxBackgroundColorOptional (if not set, background will not be set on the span tag). In case the highlighter uses the htmlSpanGradient formatter, The largest hex color used for representing IDF scores eg #000000 (black).

2.5.9. Other Settings

Several other settings that control compass.

Table 2.13. 

SettingDescription
compass.managedId.indexCan be either un_tokenized or no (defaults to no). It is the index setting that will be used when creating an internal managed id for a class property mapping (if it is not a property id, if it is, it will always be un_tokenized).