Content storage and structure
Magnolia CMS stores all content (web pages, images, documents, configuration, data) in a content repository. The repository implementation we have chosen, Apache Jackrabbit, adheres to the Java Content Repository standard (JCR).
- Magnolia CMS workspaces
- Hierarchical content store
- JCR standard API for content repositories
- Persistent storage
A content repository is a high-level information management system that is a superset of traditional data repositories. It implements content services such as:
- Hierarchical, structured and unstructured content
- Granular content access and access control
- Node types, property types (text, number, date, binary)
- Queries (XPath, SQL)
- Import and export
- Referential integrity
- Multiple persistence models
Magnolia CMS has one repository,
magnolia, which in turn contains several workspaces. There is a workspace for storing website content, another for user accounts, third for configuration and so on. In JCR terms these are all workspaces. In Magnolia terms we sometimes refer to them as repositories for historical reasons.
Each workspace contains a single-rooted tree of items. An item is either a node or a property. Each node may have child nodes and child properties. Properties cannot have children; they are the leaves of the tree. All of the actual content in the repository is stored within the values of the properties.
Here's what the object hierarchy in a repository looks like:
- Repository contains workspaces
- Workspace contains nodes
- Each workspace has a single root node.
- Node contains properties
- Properties store values (data)
The changes you make to nodes and properties are transient in the workspace. Only when the system performs a workspace save operation are the changes persisted into permanent storage such as a database.
Magnolia CMS workspaces
In its default configuration Magnolia CMS Community Edition has the following workspaces:
config. Configuration settings for everything.
website. Websites, pages and paragraphs. This is where most of your site content is stored.
users. System, administrative and public user accounts.
usergroups. Groups of users.
userroles. Roles and access control lists (ACL) for granting permissions to users.
dms. Documents and images, typically binary data.
mgnlSystem. For Magnolia internal use. Created by Magnolia core.
mgnlVersion. Versioning information for Magnolia internal use. Created by Magnolia core.
Individual modules such as the Standard Templating Kit can add their own workspaces:
templates. Page and paragraph templates in the Standard Templating Kit (STK)
data. Custom data types and data items. Created by the Data module.
imaging. Images created by the Imaging module.
forum. Page comments and forum posts. Created by Forum module.
packager. Content stored by the Packager module.
Expressions. Workflow definitions. Created by Workflow module
Store. Running workflows. Created by Workflow module.
<RepositoryMapping> <Map name="website" repositoryName="magnolia" workspaceName="website" /> <Map name="config" repositoryName="magnolia" workspaceName="config" /> <Map name="users" repositoryName="magnolia" workspaceName="users" /> </RepositoryMapping>
The ability to map a workspace to a named repository provides interesting possibilities. While Magnolia CMS keeps all workspaces in one repository by default, you don't have to. You could map a workspace to a different repository in order to:
- Provide a workspace with its own persistence manager. You could store users in one database and website content in another.
- Use a different persistence mechanism according to how often the data is requested and how quickly it needs to be available. For example, it might make sense to store search indexes on a fast disk but archived web content on a slower disk.
- Store shared content such as user-generated comments in a clustered storage. This allows various Magnolia CMS instances to access the same comments. See Jackrabbit clustering for an illustrated example.
- Integrate your system with an external repository. As long as the external repository adheres to the JCR standard, Magnolia CMS can access content in it.
Hierarchical content store
A content repository is designed to store, search and retrieve hierarchical data. Data consists of a tree of nodes with associated properties. Data is stored in the properties. They may store simple values such as numbers and strings or binary data (images, documents) of arbitrary length. Nodes may optionally have one or more types associated with them, which in turn dictates the type of their properties, the number and type of their child nodes, and certain behavioral characteristics.
Example: A, B, C and D are nodes. The boxes represent properties with Boolean, numerical, string and binary values.
JCR standard API for content repositories
Java Content Repository (JCR) is a standard interface for accessing content repositories. JCR version 1.0 was specified in Java Specification Request 170 (JSR-170). Version 2.0 in JSR-283 is also final. JCR specifies a hierarchical content store with support for structured and unstructured content.
Magnolia CMS was the first open-source content management system built specifically to leverage JCR. The standard decouples the responsibilities of content storage from content management and provides a common API that enables standardized content reuse across the enterprise and between applications. Magnolia CMS uses the open-source Jackrabbit reference implementation.
Application developers benefit from standardization as they don't need to learn several vendor-specific APIs. Learning one standard API allows them to work with any compliant repository and write code against it.
Businesses enjoy the freedom of choice. Open standards like JCR are the best insurance against vendor lock-in; any CMS that supports the JCR standard becomes a viable alternative. Costs associated with switching vendors are smaller when your content is already the correct format.
A persistence manager (PM) is an internal Jackrabbit component that handles the persistent storage of content nodes and properties. Each workspace of a Jackrabbit content repository can use a separate persistence manager to store content for that workspace. The persistence manager sits at the bottom layer of the Jackrabbit system architecture. Reliability, integrity and performance of the PM are crucial to the overall stability and performance of the repository.
In order to avoid integrity issues and to benefit from services such as observation, clustering and indexing, you should always access the content through the JCR API. Changing the data directly (bypassing the API) causes serious issues. This may sound restrictive but the API is actually quite versatile. You can even access the content repository from external applications using the API.
- Database: Magnolia CMS uses a database as persistence manager by default. This is the most common option. We ship WAR files and operating system specific bundles with the Derby database. Derby is an embedded database that allows us to package a fully operational Magnolia CMS example into a single download, including configuration details and demonstration websites. It requires minimal installation effort from users. However, for production environments we recommend an enterprise-scale database such as MySQL, PostgreSQL, Oracle or MS SQL Server. All of them work with JCR. Database connections are based on JDBC, involve zero deployment, and run fast. Note! The MySQL InnoDB storage engine is supported by Magnolia, the MyISAM engine is not. InnoDB is the default engine in MySQL 5.5 and later.
- File system: This kind of data store is typically not meant to run in production environments, except in read-only cases, but it can be very fast.
- In-memory: This is a great persistence manager for testing and for small workspaces. All content is kept in memory and lost as soon as the repository is closed. Even faster than a file system. Again, not for production use.