The Cloud Foundry Integration module provides the adjustments necessary to easily deploy Magnolia on Cloud Foundry.
Cloud Foundry is an open-source, cloud computing platform as a service (PaaS) originally developed by VMware. The platform provides a technical stack that includes the run-time and services on top of a cloud infrastructure. Users can build and deploy any kind of application without the cost and complexity of buying and managing the underlying hardware and software, and provisioning hosting capabilities. All languages and platforms are supported.
In a production-grade PaaS environment several adjustments are typically necessary. The module addresses these and overcomes any platform constraints.
Cloud Foundry is used by different providers such as IBM Bluemix and Pivotal to deliver their PaaS. Each provider has their own specificties and features. A dedicated Magnolia bundle will be provided for each provider. At present, the IBM BlueMix platform and a generic Cloud Foundry platform are supported.
Features of the Cloud Foundry Integration module include:
- Out-of the box Magnolia bundle.
- Auto-injection of bounded services credentials into Magnolia.
- Java API exposing the Cloud Foundry context (Application settings such as host, Ip, quotas and bounded services ...).
- Native support for Cloud Foundry providers.
- Default setup provides an author and clustered public instances.
To install the Cloud Foundry module include it in your custom web app by adding following maven dependencies to your pom.
Generic Cloud Foundry platform
This Magnolia bundle can be used for your own private cloud or existing Cloud Foundry providers. It uses only standard Cloud Foundry features.Click here to see maven configuration...
This Magnolia bundle can be used only on IBM BlueMix. It leverages their Liberty buildpack and db2 integration.Click here to see maven configuration...
Once you have a valid Magnolia project using one of the Cloud Foundry bundles, the resulting war file can be deployed using the manifest available in the GIT project of the bundle. A deployment script is also provided as a reference.
The module has the following known limitations:
- Janitor is deactivated on the public instances (see Clustering below). Hence the Journal could reach a critical size. One way to solve this issue is to recreate a new fresh cluster regularly.
- On BlueMix, a single DEA has fairly limited CPU resources. This is not really an issue for the public instances, since they can scale horizontally (adding an instance to the cluster). However, it does limit the author instance where scaling is not feasible.
Security exceptions while loading driver
This warning is raised during startup of the both public and author instance. It might be due to incorrect settings on the JDBC driver of the db2 database.
Unable to reset the Stream
This exception is thrown during startup of the public instances (in a cluster). A large blob is added to the Journal (
Error code = -302 states that the issue could be due to the size of the blob, but the blob appears to be correctly inserted in the table, and the Journal also looks correct (the Journal Idx and the number of rows are equal, so the Journal is complete). In addition, adding new instances does not raise any issue.
One assumption could be a bug in the JDBC layer which has misinterpreted the answer of the db2 database.
Since the Journal is valid, this issue can be ignored.
Cloud Foundry constraints
Cloud Foundry imposes certain constraints on applications in order to maximize its scalability and reliability.
An application managed by Cloud Foundry must be stateless. Therefore, it is not possible to permanently store any information locally inside an application container. Each time an application is started, a new Virtual Machine is created and the application reinstalled. This constraint allows you to scale horizontally by adding new instances of the application with a single click.
By default, Magnolia stores the data and the state of the application in the container. The module modifies the default behavior in order to make it compliant with the Cloud Foundry constraints.
Lucene indexes are stored on the local filesystem of the application. As a consequence, indexes are rebuilt each time the application container is started. There is no way to store the indexes elsewhere in the infrastructure. Although hacks to store the indexes in a db, infinispan or in a shared memory do exist, they require coding that goes beyond the purpose of this evaluation.
Magnolia's cache implementation presents a similar issue to that of lucene indexes. The cache is emptied each time the application container is started. An alternative is to use a shared cache on top of the public instances (such as Varnish), or cache services offered by the Cloud Foundry provider. The Magnolia cache must be configured to not cache the same assets or pages to save memory.
Logs are stored in the application container and are lost on restart. Cloud Foundry provides a mechanism to send the log to another shared service dedicated to monitoring. The mechanism depends on the chosen provider.
Magnolia's Solr integration can only be deployed as a Cloud Foundry service since it needs to store its indexes on the local filesystem. A Cloud Foundry service does not have the same constraints as a Cloud Foundry application.
It may not be possible to deploy Magnolia's Solr integration in a Cloud Foundry platform managed by a third-party vendor, such as Pivotal or BlueMix. Typically these platforms do not allow users to create their own private service. The only options are to use a search engine supported by the Cloud Foundry platform (Pivotal supports Searchify), or an external search engine, such as ElasticSearch.
Cloud Foundry supports sticky sessions, but does not persist or replicate HTTP session data on the filesystem. If an instance goes down, the user's session is lost. In order to prevent this, sessions must be stored in a shared service such as Redis (provided out-of-the-box by Java buildpack) or Session Cache in BlueMix.
Clustering works only if Janitor is deactivated. As a consequence, a new cluster must be recreated regularly in order to restart with a fresh and empty Journal (the old cluster can be destroyed). In Cloud Foundry, this approach can be considered since deploying a new instance is straightforward.
In order to activate Janitor, here are a few approaches which can be investigated:
- Use Jackrabbit API to recreate the Lucene Indexes based on the content of the database (without using the Journal).
- Implement https://github.com/chregu/Jackrabbit-clone-scripts/ in a dedicated buildpack or directly in Magnolia
Each supported Cloud Foundry platform provides an out-of the box setup that consists of an author instance and clustered public instances. This section provides details about this setup, and explains how to use it and extend it to your needs.
In the diagrams below, dark-grey boxes represent a Cloud Foundry service. Black circle-ended lines represents a service binding. Magnolia instances are deployed through the Java buildpack in a standard DEA.
In Cloud Foundry, scalability is either vertical or horizontal:
- You can increase the allocated memory size for each DEA.
- You can increase the number of instances in a DEA
To leverage horizontal scalability, the JCR needs to be configured in cluster mode and shared across all the public instances. You can increase the number of public instances with a single click.
The default deployement script and manifest allows you to instanciate such a public cluster.
Suggested enhancements to the setup
As it is, this setup exposes several single points of failure (SPOF) and limitations:
- The database: If the database crashes, the website is definitely lost.
- The HTTP sessions: The loss of one of the clusters destroys part of the live sessions on the public instances.
- The instance cache: Adding a new instance in the cluster or a new cluster means starting with an empty cache.
Here are some approaches to bypass these issues:
- Using several sets of clusters reduces the risk at the database-level. The Cloud Foundry router must be configured to route dynamically to one of the clusters in the set. Adding a new cluster is straigtforward in Cloud Foundry. However, activation from the author instance to the public instances does take more time. Recreating a new cluster also takes some time since the database must be copied and the Lucene indexes fully recreated. The Synchronization module can also be used in this context.
- Clustering the database itself. Usually called master/slave, or replication. It is not a SQL standard and thus is specific to each database provider. This approach looks interesting at a first glance since it is totally abstracted from the application and performance is typically better. However, the configuration could be complex and rely heavily on the available tools.
- Using a shared cache: A shared cache can be used to cache pages and assets, and reduce the number of requests on public instances (for static content). Public instances are configured not to cache this content. Usually, Cloud Foundry platforms provide a global cache out-of the box that can be set up with one click. However, an efficient cache invalidation requires a deeper integration in Magnolia.
- Using a global HTTP session cache: Cloud Foundry platforms ususally provide this mechanism out-of the box. Sticky sessions are also supported, but do not help in this specific case, because they limit the scalability of the cluster.