Cache

Magnolia CMS employs a web cache to store server responses so that future requests for the same content can be served faster. Using a cache reduces the amount of information that needs to be transmitted across the network, easing the bandwidth and processing requirements and improving responsiveness. Caching functionality is provided by the Cache module.

Installing

The Cache module is installed by default. To restore the default configuration, delete the /modules/cache node and restart your server.

Uninstalling

Remove the /modules/cache node and its subnodes. Shutdown Magnolia CMS, remove the Cache module JAR file from WEB-INF/lib and start Magnolia CMS.

How caching works

Caching is performed by the Cache filter which is part of standard Magnolia CMS filter chain. When a request arrives to the Cache filter, the filter passes it first to the browser cache policy. In case content has not been modified and the client already has the latest version, the browser cache policy instructs the filter to simply respond with "304 Not Modified".

If content has been modified or it does not exist in the cache, the filter passes the request to the server cache policy. Server cache policy analyses the request and replies with the expected behavior. Based on the expected behavior the filter invokes the appropriate executor. This mechanism allows you to add and remove executors and use them by changing the current cache policy to a different one.

If the content is not available, the filter passes the request on to the CMS part of the filter chain. On the return trip the filter reads the content from the response and stores it in the cache store for future use.

Flush policy on the other hand is completely independent from this chain and reacts on content changes rather than serving content.

Configuration

Magnolia CMS allows you to define several alternative cache configurations in the Cache module. The configurations are in /modules/cache/config/configurations/. Within each configuration you define what and when to cache, when to flush the cache, what header data to pass to browsers and specific implementations of tasks.

To select one of the cache configurations set the cacheConfigurationName parameter in the Cache filter. The chosen configuration is read into a JavaBean using the Content2Bean mechanism, which makes it dynamically available to your own module code.

Policy configuration

Caching behavior for each configuration is defined with policies:

  • Server cache policy defines whether the requested content should be cached or not. The standard Magnolia CMS way to make such decisions is with voters. Voters are used whenever configuration values are not assigned at startup but depend on rules. Voters evaluate a rule such as "should content residing at this URL be cached" and return a positive or negative response. By default, all content on public instances is cached except the AdminCentral UI at /.magnolia. Server cache policy is configured in /modules/cache/config/configurations/default/cachePolicy. The default implementation checks if the content exists in the Ehcache store and requests caching if not found.

  • Client (browser) cache policy defines how long the browser may cache a document. The time is passed to the browser in the response header. The default FixedDuration option instructs the browser to cache the document for 30 minutes. Another option is Never which tells the browser to do nothing. Client cache policy is configured in /modules/cache/config/configurations/browserCachePolicy.

  • Flush policy defines when to flush the cache. The default configuration observes changes (activation, import, edit) in a repository and flushes the cache if new or modified content is detected. You can flush the cache completely, partially or not at all. Multiple flush policies can be registered. Each module can register its own flush policy (or multiple policies) and be notified about new or modified content in each repository. Flush policies are informed about changes in observed workspaces. The list of observed workspaces can be defined per policy under the repositories sub node of each policy. The Cache module also provides a RegisterWorkspaceForCacheFlushingTask install task that custom modules can use to register their workspace default FlushAll policy. When registered, any cached content originating from this repository will be flushed from the cache when a change to any content anywhere in the repository is detected.

  • Executors are actions taken once a caching decision has been made. There are three possible actions. useCache retrieves the cached item from the cache and streams it to the client, store stores the response in the cache for future use, and bypass skips caching altogether - useful for content that cannot or should not be cached. Executors can be configured at /modules/cache/config/configurations/executors. Each of the executors is also responsible for configuring expiration headers.

Ehcache backend

Magnolia CMS uses Ehcache for its back-end cache functionality. Ehcache is a robust, proven and full-featured cache product which has made it the most widely-used Java cache. Ehcache has its own configuration options. You can set them in /modules/cache/config/cacheFactory/defaultCacheConfiguration.

Parameter Default value Description
diskExpiryThreadIntervalSeconds 120 The number of seconds between runs of the disk expiry thread.
diskPersistent false Whether the disk store persists between restarts of the Virtual Machine.
diskSpoolBufferSizeMB 30 This is the size to allocate the DiskStore for a spool buffer. Writes are made to this area and then asynchronously written to disk. The default size is 30MB. Each spool buffer is used only by its cache. If you get OutOfMemory errors consider lowering this value. To improve DiskStore performance consider increasing it. Trace level logging in the DiskStore will show if put back ups are occurring.
eternal true Sets whether elements are eternal. If eternal, timeouts are ignored and the element is never expired.
maxElementsInMemory 10000 Sets the maximum number of objects that will be created in memory. 0 = no limit.
maxElementsOnDisk 10000000 Sets the maximum number of objects that will be maintained in the DiskStore. The default value is zero, meaning unlimited.
memoryStoreEvictionPolicy LRU Policy would be enforced upon reaching the maxElementsInMemory limit. Available policies:
  • Least Recently Used (specified as LRU)
  • First In First Out (specified as FIFO)
  • Less Frequently Used (specified as LFU)
overflowToDisk true Sets whether elements can overflow to disk when the memory store has reached the maxInMemory limit.
timeToIdleSeconds 0 Sets the time to idle for an element before it expires i.e. the maximum amount of time between accesses before an element expires. Is only used if the element is not eternal. Optional attribute. A value of 0 means that an Element can idle for infinity.
timeToLiveSeconds 0 Sets the time to live for an element before it expires i.e. the maximum time between creation time and when an element expires. Is only used if the element is not eternal. Optional attribute. A value of 0 means that and Element can live for infinity.

Tip

You can use a different cache engine as long as you implement Java interfaces that allow you to configure caching behavior from AdminCentral. The engine can be changed by implementing info.magnolia.module.cache.Cache and info.magnolia.module.cache.CacheFactory interfaces.

Compression

Magnolia CMS compresses content in order to reduce its size. Compression is a simple and effective way to save bandwidth and speed up your site. It is a common practice used by Google and Yahoo! for example. (How to Optimize Your Site with GZIP Compression is a great general introduction to the topic.)

Compression in Magnolia CMS is performed in the gzip filter, configured in /server/filters/gzip. When a client requests a resource such as index.html, Magnolia CMS delivers it zipped. A typical HTML page is compressed to 20% of its original size. So if your page is 100 kB uncompressed, it is 20 kB compressed.

You can configure which content types to compress. By default the gzip filter bypasses compression for HTML, JavaScript and CSS because they are explicitly selected for compression in the Cache module configuration. These types can be compressed efficiently because they are text.

The decision to compress a particular content type is made with voters. Voters are used whenever configuration values are not assigned at startup but depend on rules instead. In the Cache module configuration there are three voting rules based on content type:

  • text/html: HTML
  • application/x-javascript: JavaScript
  • text/css: Cascading Style Sheets

To add more content types, such as XML, create a numbered data node under allowed. Use the Internet media type (MIME type) as value. Here are some common media types:

  • application/xhtml+xml: XHTML
  • text/csv: Comma-separated values
  • text/plain: Textual data
  • text/xml: Extensible Markup Language
  • application/pdf: Portable Document Format
As a rule, compressing the big three (HTML, JavaScript, CSS) is enough. It does not make sense to compress binary content such as images as they are already compressed.

The browser sends a header telling the server it accepts compressed content: Accept-Encoding: gzip. While all modern browsers support compression, a few older browsers don't, notably Internet Explorer 6 before Service Pack 2. We make an exception for IE6 using a userAgent voter. This voter rejects compression and delivers uncompressed content if the browser identifies itself as IE 6 in the User-Agent field in request headers.

To test your compression configuration, use a tool such as Web-Sniffer that allows you to change the Accept-Encoding and User-Agent sent headers easily. Here's what the headers look like when the Magnolia CMS demo site home page is submitted to the sniffer.

Request header:

GET /demo-project.html HTTP/1.1
Host: demopublic.magnolia-cms.com
Connection: close
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9) Gecko/2008052906 Firefox/3.0
Accept-Encoding: gzip
Accept-Charset: ISO-8859-1,UTF-8;q=0.7,*;q=0.7
Cache-Control: no
Accept-Language: de,en;q=0.7,en-us;q=0.3
Referer: http://web-sniffer.net/

Response header:

Status: HTTP/1.1 200 OK
Date: Fri, 23 Jul 2010 07:45:10 GMT	
Server: Apache/2.2.9	
X-Magnolia-Registration: Registered	
Cache-Control: max-age=900	
Last-Modified: Thu, 01 Jul 2010 14:03:12 GMT	
Content-Encoding: gzip	
Vary: Accept-Encoding	
Content-Length: 3852	
Connection: close	
Content-Type: text/html;charset=UTF-8

Advanced strategies

Advanced caching strategies are available in a separate Advanced Cache module.

Commands

Cache related commands are in the cache catalog:
  • flushAll completely flushes all available caches.
  • flushByUUID completely flushes all entries related to given UUID from all available caches. Expects repository and uuid as parameters.

What is cached and what is not

By default, the following URLs are cached:
  • on a public instance everything except /.magnolia/*
  • on an author instance all static resources /.resources/* (if the magnolia.develop property is set to false).

Caching while developing

Important! The system caches resources such as JavaScript files and CSS files on the author instance by default to make authoring more responsive. When you develop, you want to disable this behavior. Set the magnolia.develop property to true in the default magnolia.properties file.

For more complex configurations, you need to adjust the configuration under the /config/configuration/default/cachePolicy/voters node.

Flushing the cache

To flush the cache, choose one:
  • Shut down Magnolia CMS, delete the cache directory and restart.
  • Enable Java Management Extensions (JMX). It is enabled by default on some application servers. Connect to the server using jconsole or use your server's own JMX administration interface if provided. Find the bean called net.sf.ehcache.CacheManager and invoke the flush() operation of the default instance of the cache.
Note that the cache is not aware of the web application context, so when changing the context of a previously deployed application, you need to flush the cache to make sure served pages do not contain absolute links still pointing to old context path.

Excluding content from cache

There are multiple reasons why one would want to do this. The most common is paragraphs that query some external data source dynamically and therefore the rendered HTML can change even if the content of the Magnolia CMS page has not changed.

By default, all pages containing query arguments are excluded from the cache, so the simplest way to exclude content is to link to the page with some dummy query parameter in the URL. A more subtle solution is to add bypass to the cache filter that will ensure no cache filter is executed on particular URLs. A more manageable solution is to add such URLs to the deny list of the cachePolicy. Entries on the deny list are not cached by Magnolia CMS but are still taken through the whole filter chain allowing other policies such as BrowserCachePolicy to be still applied.

A more sophisticated exclusion strategy still would be to cache the page but flush it from the cache at regular intervals. To apply this cache flushing strategy to the whole site, reconfigure the underlying cache engine (Ehcache). To apply the strategy to a subset of URLs only, the best way is to deploy a custom implementation of CachePolicy. The cache policy will then maintain information (directly or by requesting cache entry creation info from the cache engine) about when each URL has been re-cached last and will be able to decide when the cache entry needs to be refreshed again.