Magnolia 4.5 reached end of life on June 30, 2016. This branch is no longer supported, see End-of-life policy.
Magnolia search functionality is provided by the Jackrabbit repository. An indexer based on Apache Lucene extracts text from content nodes and properties. Content of Web pages and documents is included in the index. To search the index, you can write queries in a query language supported by the JCR repository. You can test the queries in AdminCentral and execute them in code. The Standard Templating Kit includes a complete example of site search. This article explains how the STK default search implementation works and walks you through the example.
There are two processes involved in making content searchable: indexing and querying. Indexing collects and parses Web pages and documents and stores the data in an index to make information retrieval fast and accurate. Querying searches the data in the index and returns results.
Magnolia search is based on the default Jackrabbit search implementation. Jackrabbit uses an Apache Lucene-based indexer to process the data stored in the Java Content Repository. An index makes it faster to retrieve requested portions of the data. Node names and property values are indexed immediately as they stored in the repository. Text from documents is extracted in a background process which makes document content searchable after a short delay.
You can find the physical index folders and files in the
webapps installation directory at
repositories/magnolia/workspaces/*/index. See the Jackrabbit Search wiki to learn how to configure the search indexing and options available with the implementation. The
workspace.xml file mentioned on the wiki is under the
repositories/magnolia/workspaces/<name of workspace> directory.
Each Magnolia instance has its own repository and its own index. This means that the author instance index is different from public instance indexes. Any content that has not been activated to a public instance cannot be found when running a search on that public instance.
Tools > JCR Queries is a good place to test queries. When you get the result set you want, you can implement the query in code. Select the workspace to search from the dropdown.
Example queries written in SQL-2:
Pages that have the word "article".
Folders and documents under
/demo-docs/sheet-music in the
See JCR Query Cheat Sheet for more examples.
Joins are slow in JCR SQL-2. See Queries in Jackrabbit 2.4.x for issue description and hints.
The Standard Templating Kit (STK) is a set of common templates and functionality. It also includes a complete example of search.
What follows is a walkthrough of the STK search, starting from the search box and ending on the results page.
We recommend the STK search as a best practice over the non-STK search functionality. It will get you started faster.
Try the STK search example from the user interface. In the demo-project and demo-features example sites the search box in the top right corner.
search area script in Templating Kits > Templates
/templating-kit/pages/global/search) renders the box on the page:
When a user types a search term into the box and submits the form, the term is assigned to the
Notice how the box is prefilled with the previously run search term. The template script reads it from the
queryStr context attribute, available through the
ctx templating support object.
The search term is then embedded into a hard-coded SQL query pattern. The value of the
jcr:path parameter is set to the site root node.
The model class executes the query and gets results back from the JCR repository. The results are stored in an array named
results, which is available to the template script for rendering the results on the page.
stkSearchResult component displays the result to the user. The
modelClass property in the component definition is set to the
SearchResultModel. This makes the results of the search execution available to the template script. You can find the component definition in Templating Kit > Template Definitions >
The component definition references a Freemarker script
/templating-kit/components/features/searchResult.ftl. The script loops through the result set, rendering each result as a list item.
The script renders the following details about each search hit:
Finally, the script renders another search box and prefills it with the search term.
This search functionality does not use or require the Standard Templating Kit and is present in all Magnolia editions.
QueryManagerImpl is a utility class that allows you to execute queries in code. To execute a search, first assign the query statement to a variable. You then get an instance of QueryManager from the JCRSession.
Here is an example of using QueryManager in a . The query finds pages that have the word "Article" in their title. To execute the statement, get a JCR session for the
website workspace. Pass the query statement and the language used (JCR-SQL2) as parameters. You will get a
object in return. The Groovy code iterates through the result set.
ctx templating support object is a shortcut for
and stands for . It is an abstraction layer that represents the current process such as a request for a Web page. The context query is recommended as a best practice for executing queries programmatically when not using the STK.
simpleSearch templating function allows you to run a search from a template script without the need for a custom model class. The results are available to the script in a collection variable of your choice. The function belongs to the CMS
class which is exposed to template scripts as
The function takes four parameters:
statement- a set of labels the target has to contain. Insert the labels as one string, separated by commas.
returnItemType- to return such as
startPath- path to search. For results without limits set it to forward slash.
Example: Usage in a Freemarker script
All sites that run on the same Magnolia instance store their content in the same repository. When you execute a query in the
website workspace you will get results from all sites.
In order to limit the search to a specific site, add the
jcr:path parameter to the query and set its value to the root node of the site. The example below searches the
demo-project website only.
If you use theto enter multilanguage content, the system stores all language variants under the same page node. This means there is no separate hierarchy for each language and site visitors will get search results from all language variants at the same time.
To return results from one language only:
example.com/de/article.htmlthe external search can be configured to return results from that path only.)
The Jackrabbit wiki Search page includes a description of the process for indexing Chinese, Korean, and Japanese.
Jackrabbit stores all character data (node names and values) in Unicode. This ensures that special characters such as accents and umlauts are indexed and can be used in search. Issues with special characters are often due to character set conversion problems in the application server. See URI encoding in Tomcat.
Search within Magnolia is access controlled. Search results include only content the user has permission to access. Permissions are controlled through Security. When you execute a query in Magnolia context ( MgnlContext ), contextual factors such as the current user's permissions are taken into account. If you do not have permission to the items you are querying, they will not show up in the results. Contrast this will running the same query in SystemContext which provides full access.
As Data on a page are aggregated from many sources, the Jackrabbit search functionality does not guarantee results for all searches. For example, data from, a search query that only looks for a particular term in the
website workspace, will not a find a term that is stored in the Data module only. In a real-life scenario (such as a online shop) product descriptions and images may be stored in the nodes in the
Data workspace. To ensure that a single search executes two queries and then aggregates the results, you can use the External Indexing module.
Page content can be stored in disparate workspaces (such as
data) and can be of different types (such as Web pages, documents, forum threads, shop products). The External Indexing module makes content available for external third-party indexers by providing the content in a uniform way (plain text, object, URL etc.). In addition, the External Indexing module can integrate semantic search implementations for features such as suggestions for pages related to the page the visitor is viewing. For more information, see Opening the door to semantic search.
The Magnolia Solr module provides integration with the Apache Solr search platform. Solr uses the same Lucene library for full-text indexing as the default Jackrabbit search. In addition, Solr provides advanced features such as faceted search , distributed search , and index replication . Solr uses REST-like HTTP/XML and JSON APIs. Click here to see a beginners tutorial on Solr.
Download the External Indexing module and the Apache Solr indexing module here .