Magnolia search functionality is provided by the Jackrabbit repository. An indexer extracts text from nodes and properties. The content of pages and documents is included in the index. To search the index, write queries in a query language supported by the JCR repository. You can test your queries in the JCR Queries app and execute them in code. The Standard Templating Kit includes a complete example of site search. This document explains how the STK default search implementation works.
Making content searchable
There are two processes involved in making content searchable: indexing and querying. Indexing collects and parses Web pages and documents and stores the data in an index to make information retrieval fast and accurate. Querying searches the data in the index and returns results.
Magnolia search is based on the default Jackrabbit search implementation. Jackrabbit uses an Apache Lucene-based indexer to process the data stored in the Java Content Repository. An index makes it faster to retrieve requested portions of the data. Node names and property values are indexed immediately as they stored in the repository. Text from documents is extracted in a background process which makes document content searchable after a short delay.
You can find the physical index folders and files in the
webapps installation directory at
repositories/magnolia/workspaces/*/index. See the Jackrabbit Search wiki to learn how to configure the search indexing and options available with the implementation. The
workspace.xml file mentioned on the wiki is under the
repositories/magnolia/workspaces/<name of workspace> directory.
Magnolia uses a custom Jackrabbit/Lucene indexing configuration. The custom configuration excludes all properties that start with
mgnl: from the index. This means you get fewer results but those results are more relevant. The configuration also boosts the
title property of the
mgnl:page node type since page titles are important. The indexing configuration file is in the Magnolia Core module under
org.apache.jackrabbit.core.query.lucene.SearchIndex will get it from the classpath. See Limiting search results to relevant matches.
Each Magnolia instance has its own repository and its own index. This means that the author instance index is different from public instance indexes. Any content that has not been activated to a public instance cannot be found when running a search on that public instance.
Use a query to search the index. You can write the query in SQL-2 (grammar, examples). A query returns a result set which you can display on a page. The JCR Queries app is a good place to test queries. When you get the result set you want, you can implement the query in code. Select the workspace to search from the dropdown.
The following queries are written in SQL-2. See JCR Query Cheat Sheet for more examples.
1) Find pages that contain the word "article".
3) Find assets that are not JPG images under the
/demo-project path in the DAM.
Joins are slow in JCR SQL-2. See Queries in Jackrabbit 2.4.x for issue description and hints.
The Standard Templating Kit (STK) is a set of common templates and functionality. It also includes a complete example of search. What follows is a walkthrough of the STK search, starting from the search box and ending on the results page. We recommend the STK search as a best practice over the non-STK search functionality. It will get you started faster.
Try the STK search example from the user interface. In the demo-project and demo-features example sites the search box in the top right corner.
search area script in STK > Templates
/templating-kit/pages/global/search) renders the box on the page:
When a user types a search term into the box and submits the form, the term is assigned to the
Notice how the box is prefilled with the previously run search term. The template script reads it from the
queryStr context attribute, available through the
ctx templating support object.
Query execution in the model
The search term is then embedded into a hard-coded SQL query pattern. The value of the
jcr:path parameter is set to the site root node.
The model class executes the query and gets results back from the JCR repository. The results are stored in an array named
results, which is available to the template script for rendering the results on the page.
Displaying the results
stkSearchResult component displays the result to the user. The
modelClass property in the component definition is set to the
SearchResultModel. This makes the results of the search execution available to the template script. You can find the component definition in STK > Template Definitions >
The component definition references a Freemarker script
/templating-kit/components/features/searchResult.ftl. The script loops through the result set, rendering each result as a list item.
The script renders the following details about each search hit:
- Title of the page, rendered as a link
- Date last modified
- Snippet of item text with the search term highlighted
Finally, the script renders another search box and prefills it with the search term.
This search functionality does not use or require the Standard Templating Kit and is present in all Magnolia editions.
javax.jcr.query.QueryManager to create a
javax.jcr.query.Query on which you can operate.
Here is an example of using QueryManager in a . The query finds pages that have the word "Article" in their title. To execute the statement, get a JCR session for the
website workspace. Pass the query statement and the language used (JCR-SQL2) as parameters. You will get a Query object in return. The Groovy code iterates through the result set.
ctx templating support object is a shortcut for
and stands for . It is an abstraction layer that represents the current process such as a request for a Web page. The context query is recommended as a best practice for executing queries programmatically when not using the STK.
The same example in java looks like this:
All sites that run on the same Magnolia instance store their content in the same repository. When you execute a query in the
website workspace you will get results from all sites.
In order to limit the search to a specific site, add the
jcr:path parameter to the query and set its value to the root node of the site. The example below searches the
demo-project website only.
If you use theto enter multilanguage content, the system stores all language variants under the same page node. This means there is no separate hierarchy for each language and site visitors will get search results from all language variants at the same time.
Single language results
Options for returning results from one language only:
- Maintain each language tree separately so that you can limit the search to a particular path.
- Index the content on the public instance using an external search implementation such as Google Custom Search. (Use a language identifier in the URL such as
example.com/de/article.htmlthe external search can be configured to return results from that path only.)
- If you are using the single-tree approach, customize the search query by adding a language parameter. In the JCR repository,
subtitle_de. whose names have a language suffix, such as
The Jackrabbit wiki Search page includes a description of the process for indexing Chinese, Korean, and Japanese.
Jackrabbit stores all character data (node names and values) in Unicode. This ensures that special characters such as accents and umlauts are indexed and can be used in search. Issues with special characters are often due to character set conversion problems in the application server. See URI encoding in Tomcat.
Search within Magnolia is access controlled. Search results include only content the user has permission to access. Permissions are controlled through Security. When you execute a query in Magnolia context ( MgnlContext ), contextual factors such as the current user's permissions are taken into account. If you do not have permission to the items you are querying, they will not show up in the results. Contrast this will running the same query in SystemContext which provides full access.
Page content may be aggregated from many sources. Not all of those sources are necessarily persisted in the repository. Therefore, Jackrabbit search does not guarantee results for all searches. For example, a search query that only looks for a particular term in the
website workspace will not a find a term that is stored in another workspace. In a scenario such as an online shop, product descriptions and images may be stored in the
data workspace. To ensure that a single search executes two queries and then aggregates the results, you can use the External Indexing module.
External Indexing module
Page content can be stored in disparate workspaces (such as
data) and can be of different types (pages, documents, forum threads, shop products). The External Indexing module (source, builds) makes content available for external third-party indexers by providing the content in a uniform way (plain text, object, URL etc.). The External Indexing module can implement semantic search options for features such as suggestions for pages related to the page the visitor is viewing. For more information, see Opening the door to semantic search.
Apache Solr module
The Magnolia Solr module provides integration with the Apache Solr search platform. Solr uses the same Lucene library for full-text indexing as the default Jackrabbit search. In addition, Solr provides advanced features such as faceted search, distributed search, and index replication. Solr uses REST-like HTTP/XML and JSON APIs. See Solr Quick Start.