Magnolia 5.3 reached end of life on June 30, 2017. This branch is no longer supported, see End-of-life policy.

Page tree
Skip to end of metadata
Go to start of metadata

Magnolia search functionality is provided by the Jackrabbit repository. An indexer extracts text from nodes and properties. The content of pages and documents is included in the index. To search the index, write queries in a query language supported by the JCR repository. You can test your queries in the JCR Queries app and execute them in code. The Standard Templating Kit includes a complete example of site search. This document explains how the STK default search implementation works.

Making content searchable

There are two processes involved in making content searchable: indexing and querying. Indexing collects and parses Web pages and documents and stores the data in an index to make information retrieval fast and accurate. Querying searches the data in the index and returns results.

Indexing

Magnolia search is based on the default Jackrabbit search implementation. Jackrabbit uses an Apache Lucene-based indexer to process the data stored in the Java Content Repository. An index makes it faster to retrieve requested portions of the data. Node names and property values are indexed immediately as they stored in the repository. Text from documents is extracted in a background process which makes document content searchable after a short delay.

You can find the physical index folders and files in the webapps installation directory at repositories/magnolia/workspaces/*/index. See the Jackrabbit Search wiki to learn how to configure the search indexing and options available with the implementation. The workspace.xml file mentioned on the wiki is under the repositories/magnolia/workspaces/<name of workspace> directory.

Magnolia uses a custom Jackrabbit/Lucene indexing configuration. The custom configuration excludes all properties that start with jcr: or mgnl: from the index. This means you get fewer results but those results are more relevant. The configuration also boosts the title property of the mgnl:page node type since page titles are important. The indexing configuration file is in the Magnolia Core module under src/main/resources/info/magnolia/jackrabbit and org.apache.jackrabbit.core.query.lucene.SearchIndex will get it from the classpath. See Limiting search results to relevant matches.

Each Magnolia instance has its own repository and its own index. This means that the author instance index is different from public instance indexes. Any content that has not been activated to a public instance cannot be found when running a search on that public instance.

Querying

Use a query to search the index. You can write the query in SQL-2 (grammar, examples). A query returns a result set which you can display on a page. The JCR Queries app is a good place to test queries. When you get the result set you want, you can implement the query in code. Select the workspace to search from the dropdown.

Example queries

The following queries are written in SQL-2. See JCR Query Cheat Sheet for more examples.

1) Find pages that contain the word "article".

Workspace: website

SELECT * FROM [mgnl:page] AS t WHERE 
   ISDESCENDANTNODE([/demo-project]) AND 
   CONTAINS(t.*, 'article')

2) Find modules that have commands. This query looks for a folder named commands in the module configuration.

Workspace: config

SELECT * FROM [mgnl:content] AS t WHERE 
   ISDESCENDANTNODE([/modules]) AND 
   name(t) = 'commands'

3) Find assets that are not JPG images under the /demo-project path in the DAM.

Workspace: dam

SELECT * FROM [nt:base] AS t WHERE 
   ([jcr:primaryType] = 'mgnl:asset' AND 
   [type] <> 'jpg') AND 
   ISDESCENDANTNODE([/demo-project]) 
ORDER BY [t].title asc

Joins are slow in JCR SQL-2. See Queries in Jackrabbit 2.4.x for issue description and hints.

STK search

The Standard Templating Kit (STK) is a set of common templates and functionality. It also includes a complete example of search. What follows is a walkthrough of the STK search, starting from the search box and ending on the results page. We recommend the STK search as a best practice over the non-STK search functionality. It will get you started faster.

Try the STK search example from the user interface. In the demo-project and demo-features example sites the search box in the top right corner.

The search area script in STK > Templates /templating-kit/pages/global/search) renders the box on the page:

<div id="search-box">
    <h6>${i18n['accessibility.header.search']}</h6> 
    <form action="${stkfn.searchPageLink(content)!}" >
    <div>
        <label for="searchbar">${i18n['accessibility.header.searchFor']}</label>
        <input required="required" 
               id="searchbar" 
               name="queryStr" 
               type="text" 
               value="${ctx.queryStr!?html}" />
        <input class="button" 
               type="submit" 
               value="${i18n['button.label.search']}" />
    </div>
    </form>
</div>

When a user types a search term into the box and submits the form, the term is assigned to the queryStr parameter.

Notice how the box is prefilled with the previously run search term. The template script reads it from the queryStr context attribute, available through the ctx templating support object.

Query execution in the model

The information sent in the form is processed in the SearchResultModel Java class (Git).
The class gets the search term from the request using the getQueryStr method.

public String getQueryStr() {
    return MgnlContext.getParameter("queryStr");
}

The search term is then embedded into a hard-coded SQL query pattern. The value of the jcr:path parameter is set to the site root node.

SELECT * FROM nt:base WHERE 
   jcr:path LIKE ''{0}/%'' AND 
   CONTAINS(*, ''{1}'') 
ORDER BY jcr:path

The model class executes the query and gets results back from the JCR repository. The results are stored in an array named results, which is available to the template script for rendering the results on the page.

Displaying the results

The stkSearchResult component displays the result to the user. The modelClass property in the component definition is set to the SearchResultModel. This makes the results of the search execution available to the template script. You can find the component definition in STK > Template Definitions > /components/features/stkSearchResult.

Node name

Value

 components

 

 features

 

 stkSearchResult

 

 description

 paragraphs.features.stkSearchResult.description

 dialog

standard-templating-kit:components/features/stkSearchResult

 i18nBasename

info.magnolia.module.templatingkit.messages

 modelClass

info.magnolia.module.templatingkit.search.SearchResultModel

 renderType

stk

 templateScript

/templating-kit/components/features/searchResult.ftl

 title

paragraphs.features.stkSearchResult.title

The component definition references a Freemarker script /templating-kit/components/features/searchResult.ftl. The script loops through the result set, rendering each result as a list item.

[#assign result = model.result!]

[#list result as item]
    [#-- Macro: Item Assigns --]
    [@assignItemValues item=item/]

    [#-- Rendering: Item rendering --]
    <li>
        <h2><a href="${itemLink}" >${itemTitle}</a></h3. 
        [#if hasDate || hasAuthor || hasCategory]
            <div class="text-meta" role="contentinfo">
                <ul class="text-data">
                    [#if hasDate]
                        <li class="date">${itemDate?date?string.medium}</li>
                    [/#if]
                    [#if hasAuthor]
                        <li class="author">${itemAuthor!}</li>
                    [/#if]
                    [#if hasCategory]
                        <li class="cat">${i18n['search.category']} ${itemCategory!}</li>
                    [/#if]
                </ul>
            </div><!-- end text-meta -->
        [/#if]
        <p>${itemText!}</p>
    </li>
[/#list]

The script renders the following details about each search hit:

  • Title of the page, rendered as a link
  • Date last modified
  • Author
  • Category
  • Snippet of item text with the search term highlighted

Finally, the script renders another search box and prefills it with the search term.

Non-STK search

This search functionality does not use or require the Standard Templating Kit and is present in all Magnolia editions.

Using QueryManager

Use javax.jcr.query.QueryManager to create a javax.jcr.query.Query on which you can operate.

Here is an example of using QueryManager in a Groovy script. The query finds pages that have the word "Article" in their title. To execute the statement, get a JCR session for the website workspace. Pass the query statement and the language used (JCR-SQL2) as parameters. You will get a Query object in return. The Groovy code iterates through the result set.

queryString = "select * from [mgnl:page] as p where contains([p].title,'Article')"
q = ctx.getJCRSession("website").getWorkspace().getQueryManager().createQuery(queryString, "JCR-SQL2")
queryResult = q.execute()
queryResult.nodes.each {
    println it.path
}

The ctx templating support object is a shortcut for  MgnlContext  and stands for Magnolia context. It is an abstraction layer that represents the current process such as a request for a Web page. The context query is recommended as a best practice for executing queries programmatically when not using the STK.

The same example in java looks like this:

String queryStatement = "select * from [mgnl:page] as p where contains([p].title,'Article')";
Session jcrSession = MgnlContext.getJCRSession(RepositoryConstants.WEBSITE);
final Query query = jcrSession.getWorkspace().getQueryManager().createQuery(queryStatement, Query.JCR_SQL2);
final NodeIterator nodeIterator = query.execute().getNodes();
// now use the nodeIterator to use the results ... 

Multisite search

All sites that run on the same Magnolia instance store their content in the same repository. When you execute a query in the website workspace you will get results from all sites.

In order to limit the search to a specific site, add the jcr:path parameter to the query and set its value to the root node of the site. The example below searches the demo-project website only.

SELECT * FROM nt:base WHERE 
   jcr:path LIKE '/demo-project/%' 
ORDER BY jcr:path

Multilanguage search

If you use the single-tree approach to enter multilanguage content, the system stores all language variants under the same page node. This means there is no separate hierarchy for each language and site visitors will get search results from all language variants at the same time.

Single language results

Options for returning results from one language only:

  • Maintain each language tree separately so that you can limit the search to a particular path.
  • Index the content on the public instance using an external search implementation such as Google Custom Search. (Use a language identifier in the URL such as example.com/de/article.html the external search can be configured to return results from that path only.)
  • If you are using the single-tree approach, customize the search query by adding a language parameter. In the JCR repository, language variants are stored in nodes whose names have a language suffix, such as subtitle_de.

The Jackrabbit wiki Search page includes a description of the process for indexing Chinese, Korean, and Japanese.

Special characters

Jackrabbit stores all character data (node names and values) in Unicode. This ensures that special characters such as accents and umlauts are indexed and can be used in search. Issues with special characters are often due to character set conversion problems in the application server. See URI encoding in Tomcat.

Security

Search within Magnolia is access controlled. Search results include only content the user has permission to access. Permissions are controlled through Security. When you execute a query in Magnolia context ( MgnlContext ), contextual factors such as the current user's permissions are taken into account. If you do not have permission to the items you are querying, they will not show up in the results. Contrast this will running the same query in SystemContext which provides full access.

External search

Page content may be aggregated from many sources. Not all of those sources are necessarily persisted in the repository. Therefore, Jackrabbit search does not guarantee results for all searches. For example, a search query that only looks for a particular term in the website workspace will not a find a term that is stored in another workspace. In a scenario such as an online shop, product descriptions and images may be stored in the data workspace. To ensure that a single search executes two queries and then aggregates the results, you can use the External Indexing module.

External Indexing module

Page content can be stored in disparate workspaces (such as website, dam, commenting, data) and can be of different types (pages, documents, forum threads, shop products). The External Indexing module (source, builds) makes content available for external third-party indexers by providing the content in a uniform way (plain text, object, URL etc.). The External Indexing module can implement semantic search options for features such as suggestions for pages related to the page the visitor is viewing. For more information, see Opening the door to semantic search.

Apache Solr module

The Magnolia Solr module provides integration with the Apache Solr search platform. Solr uses the same Lucene library for full-text indexing as the default Jackrabbit search. In addition, Solr provides advanced features such as faceted search, distributed search, and index replication. Solr uses REST-like HTTP/XML and JSON APIs. See Solr Quick Start.

  • No labels