Chapter 4. OSEM

4.1. Introduction

Compass::Core provides the ability to map Java Objects to the underlying Search Engine through simple XML mapping files, we call this technology OSEM (Object Search Engine Mapping). OSEM provides a rich syntax for describing Object attributes and relationships. The OSEM files are used by Compass to extract the required property from the Object model at run-time and inserting the required meta-data into the Search Engine index.

4.2. Searchable Classes

Searchable classes are normally classes representing the state of the application, implementing the entities with the business model. Compass works best if the classes follow the simple Plain Old Java Object (POJO) programming model. The following class is an example of a searchable class:

import java.util.Date;
import java.util.Set;

public class Author {
   private Long id; // identifier
   private String name;
   private Date birthday;
   private Set books;

   private void setId(Long id) {
      this.id = id;
   }

   public Long getId() {
      return this.id;
   }

   public void setName(String name) {
      this.name = name;
   }

   public String getName() {
      return this.name;
   }

   public void setBirthday(Date birthday) {
      this.birthday = birthday;
   }

   public Date getBirthday() {
      return this.birtday;
   }

   public void setBooks(Set books) {
      this.books = books;
   }

   public Set getBooks() {
      return this.books;
   }

   // addBook not needed by Compass::Core
   public void addBook(Book book) {
      this.books.add(book);
   }
} 

Compass works non-intrusive with application Objects, these Objects must follow several rules:

4.2.1. Implement a Default Constructor

Author has an implicit default (no-argument) constructor. All persistent classes must have a default constructor (which may be non-public) so Compass::Core can instantiate using Constructor.newInstance().

4.2.2. Provide Property Identifier(s)

OSEM requires that any mapped Object will define one or more properties (JavaBean properties) that identifies the class. The id properties can be called anything, and it's type can be any primitive type, primitive "wrapper" type, java.lang.String or java.util.Date.

4.2.3. Declare Accessors and Mutators (Optional)

Even though Compass can directly persist instance variables, it is usually better to decouple this implementation detail from the Search Engine mechanism. Compass::Core recognizes JavaBean style property (getFoo, isFoo, and setFoo). This mechanism works with any level of visibility.

4.2.4. Implementing equals() and hashCode()

You have to override the equals() and hashCode() methods if you intend to mix objects of persistent classes (e.g. in a Set). You can implement it by using the identifier of both objects, but note that Compass::Core works best with surrogate identifier (and will provide a way to automatically generate them), thus it is best to implement the methods using business keys.

4.3. Mapping

Object/Search Engine mappings are defined in an XML document. The mapping language is Java centric, meaning that mappings are constructed around the classes themselves and not internal Resources. A possible OSEM file for the previous Author class example follows:

<?xml version="1.0"?>
<!DOCTYPE compass-core-mapping PUBLIC
    "-//Compass/Compass Core Mapping DTD 1.0//EN"
    "http://static.compassframework.org/dtd/compass-core-mapping-1.0.dtd">

<compass-core-mapping package="eg">

  <class name="Author" alias="author">

    <id name="id" />

    <constant>
      <meta-data>type</meta-data>
      <meta-data-value>person</meta-data-value>
      <meta-data-value>author</meta-data-value>
    </constant>

    <property name="name">
      <meta-data>name</meta-data>
      <meta-data>authorName</meta-data>
    </property>

    <property name="birthday">
      <meta-data>birthday</meta-data>
    </property>

    <component name="books" ref-alias="book" />

    <!-- can be a reference instead of component
    <reference name="books" ref-alias="book" />
    -->

  </class>

  <class name="Book" alias="book">

    ...

  </class>

</compass-core-mapping>

The above example defines the mapping for Author and Book classes. It introduces some key Compass mapping concepts and syntax. Before explaining the concepts, it is essential that the terminology used is clearly understood.

The first issue to address is the usage of the term Property. Because of its common usage as a concept in Java and Compass (to express Search Engine and Semantic terminology), special care has been taken to clearly prefix the meaning. A class Property refers to a Java class attribute. A Resource Property refers in Compass to Search Engine meta-data, which contains the values of the mapped class Property value. In previous OSEM example, the value of class Property "name" is mapped to two Resource Property instances called "name" and "authorname", each containing the value of the class Property "name".

The OSEM example above shows:

  • The unique class identifier, which maps to the "id" class property.

  • Constant meta data, a feature that allows Compass to insert extra meta data and values (not expressed in the Object). Compass::Core will save the Resource Property "type" with the specified values "person" and "author".

  • The mappings for the class Property "name" saved with two Resource Property called "name" and "authorName".

  • A dependency between Author and Book managed using a component mapping.

Each of these concepts are explained in detail in the following sections.

All XML mappings should declare the doctype shown. The actual DTD may be found at the URL above, or in the compass-core-x.x.x.jar. Compass will always look for the DTD in the classpath first.

4.3.1. compass-core-mapping

The main element which holds all the rest of the mappings definitions.

<compass-core-mapping package="packageName"/>
        

Table 4.1. 

AttributeDescription
package (optional)Specifies a package prefix for unqualified class names in the mapping document.

4.3.2. class

Declaring a searchable class using the class element.

<class
        name="className"
        alias="alias"
        sub-index="sub index name"
        analyzer="name of the analyzer"
        root="true|false"
        poly="false|true"
        extends="a comma seperated list of aliases to extend"
        boost="boost value for the class"
        all="true|false"
        all-term-vector="no|yes|positions|offsets|positios_offsets"
        all-metadata="all meta-data"
        all-analyzer="name of the analyzer used for the all proeprty"
        converter="fully qualified converter class name|converter lookup name"
        converter-param="parameter for the converter"
>
    (converter-param)*,
    (id)*,
    parent?,
    (analyzer?),
    (property|component|reference|constant)*
</class>

Table 4.2. 

AttributeDescription
nameThe fully qualified class name (or relative if the package is declared in compass-core-mapping).
aliasThe alias of the Resource that will be mapped to the class.
sub-index (optional, defaults to the alias value)The name of the sub-index that the alias will map to.
analyzer (optional, defaults to the default analyzer)The name of the analyzer that will be used to analyze TOKENIZED properties. Defaults to the default analyzer which is one of the internal analyzers that comes with Compass. Note, that when using the analyzer mapping (a child mapping of class mapping) (for a property value that controls the analyzer), the analyzer attribute will have no effects.
root (optional, defaults to true)Specifies if the class is a "root" class or not.
poly (optional, defaults to false)Specifies if the class will be enabled to support polymorphism.
extends (optional)A comma seperated list of aliases to extend. Can extend a class mapping or a contract mapping. Note that can extend more than one class
boost (optional, defaults to 1.0)Specifies the boost level for the class.
all (optional, defaults to true)Specifies if the class will support the "all" feature.
all-term-vector (optional, defaults to configuration setting compass.property.all.termVector)The term vector value of the all property.
all-metadata (optional, defaults to configuration setting compass.property.all)The name of the all property.
all-analyzer (optional, defaults to configuration setting compass.engine.all.analyzer, which in turn, defaults to the default analyzer)The name of the analyzer that will be used to analyze the all property.
converter (optional)The fully qualified class of a custom converter or the global converter lookup name registered with the configuration.
converter-param (optional)A single parameter for the converter (none required for the default one).

Root classes have their own index within the search engine index directory. Classes with a dependency to Root class, that don't require an index (i.e. component) should set root to false. You can control the sub-index that the root classes will map to using the sub-index attribute, otherwise it will create a sub-index based on the alias name.

If the class can be mapped to several classes (i.e. it is an interface or an abstract class), than set poly to true. This means Compass will persist the fully qualified class in the index.

You can set the boost level at the class level, which is applied to all class meta data (override by specifying at meta data level).

The class mapping can extend other class mappings (more than one), as well as contract mappings. All the mappings that are defined within the class mapping or the contract mapping will be inherited from the extended mappings. You can add any defined mappings by defining the same mappings in the class mappings, except for id mappings, which will be overridden. Note that any xml attributes (like root, sub-index, ...) that are defined within the extended mappings are not inherited.

The default behavior of the searchable class will support the "all" feature, which means that compass will create an "all" meta-data which represents all the other meta-data (with several exceptions, like Reader class property). The name of the "all" meta-data will default to the compass setting, but you can also set it using the all-metadata attribute.

Compass provides support for custom converters. Please refer to the converter section later on.

4.3.3. contract

Declaring a searchable contract using the contract element.

<contract
        alias="alias"
>
    (id)*,
    (analyzer?),
    (property|component|reference|constant)*
</contract>

Table 4.3. 

AttributeDescription
aliasThe alias of the contract. Will be used as the alias name in the class mapping extended attribute

A contract acts as an interface in the Java language. You can define the same mappings within it that you can define in the class mapping, without defining the class that it will map to.

If you have several classes that have similar properties, you can define a contract that joins the properties definition, and than extend the contract within the mapped classes (even if you don't have a concrete interface or class in your Java definition).

4.3.4. id

Declaring a searchable id class property (a.k.a JavaBean property) of a class using the id element.

<id
      name="property name"
      accessor="property|field"
      boost="boost value for the class property"
      class="explicit declaration of the property class"
      managed-id="auto|true|false"
      exclude-from-all="false|true"
      converter="fully qualified converter class name|converter lookup name"
      converter-param="parameter for the converter"
  >
 (converter-param)*,
 (meta-data)*
  </id>

Table 4.4. 

AttributeDescription
nameThe class property (a.k.a JavaBean property) name, with initial lowercase letter.
accessor (optional, defaults to property)The strategy to access the class property value. property access using the Java Bean accessor methods, while field directly access the class fields.
boost (optional, default to 1.0f)The boost level that will be propagated to all the meta-data defined within the id.
class (optional)An explicit definition of the class of the property, helps for certain converters.
managed-id (optional, defaults to auto)The strategy for creating or using a class property meta-data id (which maps to a Resource Property).
exclude-from-all (optional, defaults to false)Excludes the class property from participating in the "all" meta-data, unless specified in the meta-data level.
converter (optional)The fully qualified class of a custom converter or the global converter lookup name registered with the configuration.
converter-param (optional)A single parameter for the converter (none required for the default one).

The id mapping is used to map the class property that identifies the class. You can define several id properties, even though we recommend using one. You can use the id mapping for all the Java primitive types (i.e. int), Java primitive wrapper types (i.e. Integer) and the String type.

Compass::Core requires that id and property mappings will be identifiable on the root class (Resource) level. Compass does that by either using one of the meta-data names (which is unique within ALL of the meta-data in the class mapping), or creating an internal one. Compass will create an internal one if no meta-data is defined in the id or property mapping. You can control it by using the managed-id, the value auto leaves the id assignment / creation as Compass's responsibility. Compass will analyse all the different meta- data defined in the mappings and will decide if it needs to create an internal id for an id or a property mapping. The true option will always create an internal id for the id or property and the false option will always take the first meta-data and use it as the id or property id.

Compass provides support for custom converters. Please refer to the converter section later on.

4.3.5. property

Declaring a searchable class property (a.k.a JavaBean property) of a class using the property element.

<property
      name="property name"
      accessor="property|field"
      boost="boost value for the property"
      class="explicit declaration of the property class"
      analyzer="name of the analyzer"
      managed-id="auto|true|false"
      managed-id="[compass.managedId.index setting]|no|un_tokenized"
      exclude-from-all="false|true"
      col-class="fully qualified class of the collection class"
      converter="fully qualified converter class name|converter lookup name"
      converter-param="parameter for the converter"
>
   (converter-param)*,
   (meta-data)*
</property>

Table 4.5. 

AttributeDescription
nameThe class property (a.k.a JavaBean property) name, with initial lowercase letter.
accessor (optional, defaults to property)The strategy to access the class property value. property means accessing using the Java Bean accessor methods, while field directly accesses the class fields.
boost (optional, default to 1.0f)The boost level that will be propagated to all the meta-data defined within the class property.
class (optional)An explicit definition of the class of the property, helps for certain converters.
analyzer (optional, defaults to the class mapping analyzer decision scheme)The name of the analyzer that will be used to analyze TOKENIZED meta-data mappings defined for the given property. Defaults to the class mapping analyzer decision scheme based on the analyzer set, or the analyzer mapping property.
col-class (optional)The collection class that will be used when un-marshaling. Serves two purposes, improve performance since Compass::Core does not need to save the collection class, and in case of collection proxies, you might need to defined the actual collection (i.e. in case of Hibernate). Only applies if mapping to a java.util.Collection and it's implementations.
override (optional, defaults to true)If there is another definition with the same mapping name, if it will be overridden or added as additional mapping. Mainly used to override definitions made in extended mappings.
managed-id (optional, defaults to auto)The strategy for creating or using a class property meta-data id (which maps to a Resource Property.
managed-id-index (optional, defaults to compass.managedId.index setting, which defaults to no)Can be either un_tokenized or no. It is the index setting that will be used when creating an internal managed id for a class property mapping (if it is not a property id, if it is, it will always be un_tokenized).
exclude-from-all (optional, defaults to false)Excludes the class property from participating in the "all" meta-data, unless specified in the meta-data level.
converter (optional)The fully qualified class of a custom converter or the global converter lookup name registered with the configuration.
converter-param (optional)A single parameter for the converter (none required for the default one).

Compass::Core maps a class property to a set of meta-data (Resource Property).

You can map all internal Java primitive data types, primitive wrapper and most of the common Java classes (i.e. Date and Calendar). You can also map Arrays and Collections of these data types. When mapping a Collection, you must specify the object class (like java.lang.String) in the class mapping property.

The same rules for managed-id that apply for the id mapping, also applies for property mappings.

Note, that you can define a property with no meta-data mapping within it. It means that it will not be searchable, but the property value will be stored when persisting the object to the search engine, and it will be loaded from it as well (unless it is of type java.io.Reader).

Compass provides support for custom converters. Please refer to the converter section later on.

4.3.6. analyzer

Declaring an analyzer controller property (a.k.a JavaBean property) of a class using the analyzer element.

<analyzer
      name="property name"
      null-analyzer="analyzer name if value is null"
      accessor="property|field"
      converter="fully qualified converter class name|converter lookup name"
      converter-param="parameter for the converter"
>
</analyzer>

Table 4.6. 

AttributeDescription
nameThe class property (a.k.a JavaBean property) name, with initial lowercase letter.
accessor (optional, defaults to property)The strategy to access the class property value. property means accessing using the Java Bean accessor methods, while field directly accesses the class fields.
null-analyzer (optional, defaults to error in case of a null value)The name of the analyzer that will be used if the property has the null value.
converter (optional)The fully qualified class of a custom converter or the global converter lookup name registered with the configuration.
converter-param (optional)A single parameter for the converter (none required for the default one).

The analyzer class property mapping, controls the analyzer that will be used when indexing the class data (the underlying Resource). If the mapping is defined, it will override the class mapping analyzer attribute setting.

If, for example, Compass is configured to have two additional analyzers, called an1 (and have settings in the form of compass.engine.analyzer.an1.*), and another called an2. The values that the class property can hold are: default (which is an internal Compass analyzer, that can be configured as well), an1 and an2. If the analyzer will have a null value, and it is applicable with the application, than a null-analyzer can be configured that will be used in that case. If the class property has a value, but there is not matching analyzer, an exception will be thrown.

4.3.7. meta-data

Declaring and using the meta-data element.

<meta-data
      store="yes|no|compress"
      index="tokenized|un_tokenized|no"
      boost="boost value for the meta-data"
      analyzer="name of the analyzer"
      reverse="no|reader|string"
      exclude-from-all="[parent's exclude-from-all]|false|true"
      converter="fully qualified converter class name|converter lookup name"
      converter-param="parameter for the converter"
>
   (converter-param)*,
</meta-data>

Table 4.7. 

AttributeDescription
store (optional, defaults to yes)If the value of the class property that the meta-data maps to, is going to be stored in the index.
index (optional, defaults to tokenized)If the value of the class property that the meta-data maps to, is going to be indexed (searchable). If it does, than controls if the value is going to be broken down and analysed (tokenized), or is going to be used as is (un_tokenized).
boost (optional, defaults to 1.0f)Controls the boost level for the meta-data.
analyzer (optional, defaults to the parent analyzer)The name of the analyzer that will be used to analyze TOKENIZED meta-data. Defaults to the parent property mapping, which in turn defaults to the class mapping analyzer decision scheme based on the analyzer set, or the analyzer mapping property.
reverse (optional, defaults to no)The meta-data will have it's value reversed. Can have the values of no - no reverse will happen, string - the reverse will happen and the value stored will be a reversed string, and reader - a special reader will wrap the string and reverse it. The reader option is more perform ant, but the store and index settings will be discarded.
exclude-from-all (optional, defaults to the parent's exclude-from-all value)Excludes the meta-data from participating in the "all" meta-data.
converter (optional)The fully qualified class of a custom converter or the global converter lookup name registered with the configuration.
converter-param (optional)A single parameter for the converter (none required for the default one).

The element meta-data is a Property within a Resource.

You can control the format of the marshalled values when mapping a java.lang.Number (or the equivalent primitive value) using the format provided by the java.text.DecimalFormat. You can also format a java.util.Date using the format provided by java.text.SimpleDateFormat. You set the format string in the converter-param attribute.

4.3.8. component

Declaring and using the component element.

<component
      name="the class property name"
      ref-alias="name of the alias"
      accessor="property|field"
      converter="fully qualified converter class name|converter lookup name"
      converter-param="parameter for the converter"
>
   (converter-param)*,
</component>

Table 4.8. 

AttributeDescription
nameThe class property (a.k.a JavaBean property) name, with initial lowercase letter.
ref-aliasThe class mapping alias that defines the component.
col-class (optional)The collection class that will be used when un-marshaling. Serves two purposes, improve performance since Compass::Core does not need to save the collection class, and in case of collection proxies, you might need to defined the actual collection (i.e. in case of Hibernate). Only applies if mapping to a java.util.Collection and it's implementations.
override (optional, defaults to true)If there is another definition with the same mapping name, if it will be overridden or added as additional mapping. Mainly used to override definitions made in extended mappings.
accessor (optional, defaults to property)The strategy to access the class property value. property access using the Java Bean accessor methods, while field directly access the class fields.
converter (optional)The fully qualified class of a custom converter or the global converter lookup name registered with the configuration.
converter-param (optional)A single parameter for the converter (none required for the default one).

The component element defines a class dependency within the root class. The dependency name is identified by the ref-alias, which can be non-rootable or have no id mappings.

An embedded class means that all the mappings (meta-data values) defined in the referenced class are stored within the alias of the root class. It means that a search that will hit one of the component mapped meta-datas, will return it's owning class.

The type of the JavaBean property can be the class mapping class itself, an Array or Collection.

Support for cyclic mapping (from one component to it's parent class) is implemented using the parent mapping.

4.3.9. reference

Declaring and using the reference element.

<reference
        name="the class property name"
        ref-alias="name of the alias"
        ref-comp-alias="name of an optional alias mapped as component"
        accessor="property|field"
        converter="fully qualified converter class name"
        converter-param="parameter for the converter"
  >
 (converter-param)*,
</reference>

Table 4.9. 

AttributeDescription
nameThe class property (a.k.a JavaBean property) name, with initial lowercase letter.
ref-aliasThe class mapping alias that defines the reference.
col-class (optional)The collection class that will be used when un-marshaling. Serves two purposes, improve performance since Compass::Core does not need to save the collection class, and in case of collection proxies, you might need to defined the actual collection (i.e. in case of Hibernate). Only applies if mapping to a java.util.Collection and it's implementations.
ref-comp-aliasThe class mapping alias that defines a "shadow component". Will marshal a component like mapping based on the alias into the current class. Note, it's best to create a dedicated class mapping (with root="false") that only holds the required information. Based on the information, if you search for it, you will be able to get as part of your hits the encompassing class. Note as well, that when changing the referenced class, for it to be reflected as part of the ref-comp-alias you will have to save all the relevant encompassing classes.
accessor (optional, defaults to property)The strategy to access the class property value. property access using the Java Bean accessor methods, while field directly access the class fields.
converter (optional)The fully qualified class of a custom converter or the global converter lookup name registered with the configuration.
converter-param (optional)A single parameter for the converter (none required for the default one).

The reference element defines a "pointer" to a class dependency identified in ref-alias.

The type of the JavaBean property can be the class mapping class itself, an Array of it, or a Collection.

Currently there is no support for lazy behavior or cascading. It means that when saving an object, it will not persist the object defined references and when loading an object, it will load all it's references. Future versions will support lazy and cascading features.

Compass::Core supports cyclic references, which means that two classes can have a cyclic reference defined between them.

4.3.10. parent

Declaring and using the parent element.

<parent
        name="the class property name"
        accessor="property|field"
        converter="fully qualified converter class name"
        converter-param="parameter for the converter"
  >
 (converter-param)*,
</reference>

Table 4.10. 

AttributeDescription
nameThe class property (a.k.a JavaBean property) name, with initial lowercase letter.
accessor (optional, defaults to property)The strategy to access the class property value. property access using the Java Bean accessor methods, while field directly access the class fields.
converter (optional)The fully qualified class of a custom converter or the global converter lookup name registered with the configuration.
converter-param (optional)A single parameter for the converter (none required for the default one).

The parent mapping provides support for cyclic mappings for components. If the component class mapping wish to map the enclosing class, the parent mapping can be used to map to it. The parent mapping will not marshal (persist the data to the search engine) the parent object, it will only initialize it when loading the parent object from the search engine.

4.3.11. constant

Declaring a constant set of meta-data using the constant element.

<constant
          exclude-from-all="false|true"
          converter="fully qualified converter class name"
          converter-param="parameter for the converter"
    >
   (converter-param)*,
   meta-data,
   meta-data-value+
</reference>

Table 4.11. 

AttributeDescription
exclude-from-all (optional, defaults to false)Excludes the constant meta-data and all it's values from participating in the "all" feature.
override (optional, defaults to true)If there is another definition with the same mapping name, if it will be overridden or added as additional mapping. Mainly used to override definitions made in extended mappings.
converter (optional)The fully qualified class of a custom converter or the global converter lookup name registered with the configuration.
converter-param (optional)A single parameter for the converter (none required for the default one).

If you wish to define a set of constant meta data that will be embedded within the searchable class (Resource), you can use the constant element. You define the usual meta-data element followed by one or moremeta-data-value elements with the value that maps to the meta-data within it.

4.3.12. converter

Compass::Core uses converters to convert all the different types and mappings. Compass::Core has default converters for most of your needs, you can define your own using the converter attribute where appropriate. If your converter requires parameters, you can use the converter-param attribute or the converter-param elements.