Magnolia 4.5 reached end of life on June 30, 2016. This branch is no longer supported, see End-of-life policy.

Page tree
Skip to end of metadata
Go to start of metadata

Magnolia search functionality is provided by the Jackrabbit repository. An indexer based on Apache Lucene extracts text from content nodes and properties. Content of Web pages and documents is included in the index. To search the index, you can write queries in a query language supported by the JCR repository. You can test the queries in AdminCentral and execute them in code. The Standard Templating Kit includes a complete example of site search. This article explains how the STK default search implementation works and walks you through the example.

Making content searchable

There are two processes involved in making content searchable: indexing and querying. Indexing collects and parses Web pages and documents and stores the data in an index to make information retrieval fast and accurate. Querying searches the data in the index and returns results.

Indexing

Magnolia search is based on the default Jackrabbit search implementation. Jackrabbit uses an Apache Lucene-based indexer to process the data stored in the Java Content Repository. An index makes it faster to retrieve requested portions of the data. Node names and property values are indexed immediately as they stored in the repository. Text from documents is extracted in a background process which makes document content searchable after a short delay.

You can find the physical index folders and files in the webapps installation directory at repositories/magnolia/workspaces/*/index. See the Jackrabbit Search wiki to learn how to configure the search indexing and options available with the implementation. The workspace.xml file mentioned on the wiki is under the repositories/magnolia/workspaces/<name of workspace> directory.

Each Magnolia instance has its own repository and its own index. This means that the author instance index is different from public instance indexes. Any content that has not been activated to a public instance cannot be found when running a search on that public instance.

Querying

Use a query to search the index. You can write the query in SQL-2 (grammar, examples). A query returns a result set which you can display on a Web page.

Tools > JCR Queries is a good place to test queries. When you get the result set you want, you can implement the query in code. Select the workspace to search from the dropdown.

Example queries written in SQL-2:

Pages that have the word "article".

SELECT * from [mgnl:page] AS t WHERE 
    ISDESCENDANTNODE([/demo-project]) AND 
    contains(t.*, 'article')

Modules that provide commands. This query finds folders named commands in the config workspace and returns the full path to the child nodes.

select * from [mgnl:content] as t where 
    ISDESCENDANTNODE([/modules]) and 
    name(t) = 'commands'

Folders and documents under /demo-docs/sheet-music in the dms workspace.

select * from [nt:base] as t where 
    ([jcr:primaryType] = 'mgnl:contentNode' OR 
    [jcr:primaryType] = 'mgnl:content') AND 
    ISDESCENDANTNODE([/demo-docs/sheet-music]) 
order by [t].title asc

See JCR Query Cheat Sheet for more examples.

Joins are slow in JCR SQL-2. See Queries in Jackrabbit 2.4.x for issue description and hints.

STK search

The Standard Templating Kit (STK) is a set of common templates and functionality. It also includes a complete example of search.

What follows is a walkthrough of the STK search, starting from the search box and ending on the results page.

We recommend the STK search as a best practice over the non-STK search functionality. It will get you started faster.

Try the STK search example from the user interface. In the demo-project and demo-features example sites the search box in the top right corner.

The search area script in Templating Kits > Templates /templating-kit/pages/global/search) renders the box on the page:

<div id="search-box">
    <h6>${i18n['accessibility.header.search']}</h7. 
    <form action="${stkfn.searchPageLink(content)!}" >
    <div>
        <label for="searchbar">${i18n['accessibility.header.searchFor']}</label>
        <input required="required" 
               id="searchbar" 
               name="queryStr" 
               type="text" 
               value="${ctx.queryStr!?html}" />
        <input class="button" 
               type="submit" 
               value="${i18n['button.label.search']}" />
    </div>
    </form>
</div>

When a user types a search term into the box and submits the form, the term is assigned to the queryStr parameter.

Notice how the box is prefilled with the previously run search term. The template script reads it from the queryStr context attribute, available through the ctx templating support object.

Query execution in the model

The information sent in the form is processed in the SearchResultModel Java class (Git).
The class gets the search term from the request using the getQueryStr method.

public String getQueryStr() {
    return MgnlContext.getParameter("queryStr");
}

The search term is then embedded into a hard-coded SQL query pattern. The value of the jcr:path parameter is set to the site root node.

select * from nt:base where 
    jcr:path like ''{0}/%'' and 
    contains(*, ''{1}'') 
order by jcr:path

The model class executes the query and gets results back from the JCR repository. The results are stored in an array named results, which is available to the template script for rendering the results on the page.

Displaying the results

The stkSearchResult component displays the result to the user. The modelClass property in the component definition is set to the SearchResultModel. This makes the results of the search execution available to the template script. You can find the component definition in Templating Kit > Template Definitions > /components/features/stkSearchResult.

The component definition references a Freemarker script /templating-kit/components/features/searchResult.ftl. The script loops through the result set, rendering each result as a list item.

[#assign result = model.result!]

[#list result as item]
    [#-- Macro: Item Assigns --]
    [@assignItemValues item=item/]

    [#-- Rendering: Item rendering --]
    <li>
        <h2><a href="${itemLink}" >${itemTitle}</a></h3. 
        [#if hasDate || hasAuthor || hasCategory]
            <div class="text-meta" role="contentinfo">
                <ul class="text-data">
                    [#if hasDate]
                        <li class="date">${itemDate?date?string.medium}</li>
                    [/#if]
                    [#if hasAuthor]
                        <li class="author">${itemAuthor!}</li>
                    [/#if]
                    [#if hasCategory]
                        <li class="cat">${i18n['search.category']} ${itemCategory!}</li>
                    [/#if]
                </ul>
            </div><!-- end text-meta -->
        [/#if]
        <p>${itemText!}</p>
    </li>
[/#list]

The script renders the following details about each search hit:

  • Title of the page, rendered as a link
  • Date last modified
  • Author
  •  Category
  • Snippet of item text with the search term highlighted

Finally, the script renders another search box and prefills it with the search term.

Non-STK search

This search functionality does not use or require the Standard Templating Kit and is present in all Magnolia editions.

QueryManager

QueryManagerImpl is a utility class that allows you to execute queries in code. To execute a search, first assign the query statement to a variable. You then get an instance of QueryManager from the JCRSession.

Here is an example of using QueryManager in a  Groovy script. The query finds pages that have the word "Article" in their title. To execute the statement, get a JCR session for the website workspace. Pass the query statement and the language used (JCR-SQL2) as parameters. You will get a QueryResultImpl object in return. The Groovy code iterates through the result set.

queryString = "select * from [mgnl:page] as p where contains([p].title,'Article')"
q = ctx.getJCRSession("website").getWorkspace().getQueryManager().createQuery(queryString, "JCR-SQL2")
queryResult = q.execute()
queryResult.nodes.each {
    println it.path
}

The ctx templating support object is a shortcut for MgnlContext and stands for Magnolia context. It is an abstraction layer that represents the current process such as a request for a Web page. The context query is recommended as a best practice for executing queries programmatically when not using the STK.

simpleSearch templating function

The simpleSearch templating function allows you to run a search from a template script without the need for a custom model class. The results are available to the script in a collection variable of your choice. The function belongs to the CMS TemplatingFunctions class which is exposed to template scripts as cmsfn.

The function takes four parameters:

  • workspace such as website or dms.
  • statement - a set of labels the target has to contain. Insert the labels as one string, separated by commas.
  • returnItemType - node type to return such as mgnl:page.
  • startPath - path to search. For results without limits set it to forward slash.

Example: Usage in a Freemarker script

[#assign results = cmsfn.simpleSearch("website", "interesting,article", "mgnl:page", "/demo-project") /]
[#list results as result]
    <p>${result!}</p>
[/#list]

Multisite

All sites that run on the same Magnolia instance store their content in the same repository. When you execute a query in the website workspace you will get results from all sites.

In order to limit the search to a specific site, add the jcr:path parameter to the query and set its value to the root node of the site. The example below searches the demo-project website only.

select * from nt:base where jcr:path like '/demo-project/%' order by jcr:path

Multilanguage

If you use the single-tree approach to enter multilanguage content, the system stores all language variants under the same page node. This means there is no separate hierarchy for each language and site visitors will get search results from all language variants at the same time.

Single language results

To return results from one language only:

  • Maintain each language tree separately so that you can limit the search to a particular path.
  • Index the content on the public instance using an external search implementation such as Google Custom Search. (Use a language identifier in the URL such as example.com/de/article.html the external search can be configured to return results from that path only.)
  • If you are using the single-tree approach, customize the search query by adding a language parameter. In the JCR repository, language variants are stored in nodes whose names have a language suffix, such as subtitle_de.

The Jackrabbit wiki Search page includes a description of the process for indexing Chinese, Korean, and Japanese.

Special characters

Jackrabbit stores all character data (node names and values) in Unicode. This ensures that special characters such as accents and umlauts are indexed and can be used in search. Issues with special characters are often due to character set conversion problems in the application server. See URI encoding in Tomcat.

Security

Search within Magnolia is access controlled. Search results include only content the user has permission to access. Permissions are controlled through Security. When you execute a query in Magnolia context ( MgnlContext ), contextual factors such as the current user's permissions are taken into account. If you do not have permission to the items you are querying, they will not show up in the results. Contrast this will running the same query in SystemContext which provides full access.

External search

As Data on a page are aggregated from many sources, the Jackrabbit search functionality does not guarantee results for all searches. For example, data from, a search query that only looks for a particular term in the website workspace, will not a find a term that is stored in the Data module only. In a real-life scenario (such as a online shop) product descriptions and images may be stored in the nodes in the Data workspace. To ensure that a single search executes two queries and then aggregates the results, you can use the External Indexing module.

External Indexing module

Page content can be stored in disparate workspaces (such as website, dms, commenting, data) and can be of different types (such as Web pages, documents, forum threads, shop products). The External Indexing module makes content available for external third-party indexers by providing the content in a uniform way (plain text, object, URL etc.). In addition, the External Indexing module can integrate semantic search implementations for features such as suggestions for pages related to the page the visitor is viewing. For more information, see Opening the door to semantic search.

Apache Solr module

The Magnolia Solr module provides integration with the Apache Solr search platform. Solr uses the same Lucene library for full-text indexing as the default Jackrabbit search. In addition, Solr provides advanced features such as faceted search , distributed search , and index replication . Solr uses REST-like HTTP/XML and JSON APIs. Click here to see a beginners tutorial on Solr.

Download the External Indexing module and the Apache Solr indexing module here .

  • No labels