PHPCR is an adaptation of the JCR standard which can be used to connect a PHP application with a JCR-compliant repository. Magnolia is a popular open-source, enterprise-grade Java CMS that uses a JCR repository to store web content. This article demonstrates how PHPCR can be used to easily integrate a PHP front-end with the Magnolia content repository without needing special Java knowledge or training.
Relational databases are great for storing and retrieving strongly-typed, structured data, and they’re extremely popular data storage containers for web applications. However, for application data that isn’t strongly typed or rigidly structured, the selection of an appropriate data storage container requires deeper thought. In this situation, most developers typically head to XML, which is easy to understand, highly flexible and widely supported in most programming languages.
XML and relational databases aren’t the only two options any longer, though. Consider the evolving PHP Content RepositÏory project, aka PHPCR, which aims to “combine the best of document-oriented databases (weakly-structured data) and of XML databases (hierarchical trees).” The project, which provides a 100% PHP implementation of the Java Content Repository (JCR) standard, is rapidly attracting followers as a viable alternative to traditional data storage containers.
In this article, I’ll introduce you to PHPCR and demonstrate how it can be used to connect a PHP application with a JCR-compliant repository. The repository in this case belongs to Magnolia, a popular open-source, enterprise-grade Java CMS. The examples in this article will demonstrate how PHP can be used to add and update content in the Magnolia repository and have those changes reflected in the Magnolia user interface.
Understanding JCR and PHPCR
The PHP Content Repository is “an adaptation of the Java Content Repository (JCR) standard, an open API specification defined in JSR-283...[it] defines how to handle hierarchical semi-structured data in a consistent way”. It was originally ported from Java to PHP by Karsten Dambekalns with the help of others for the typo3/flow3 project and is currently maintained by David Buchmann. It is freely available under the Apache License v2.0 at http://phpcr.github.com/.
How does it work? In their simplest forms, JCR and PHPCR are APIs to access and manipulate content. This content is stored hierarchically using a tree structure with each node of the tree representing a single content fragment. Each node has properties that are used to store information about the node; this might include the node value, node type, node status, node identifier, and so on.
To better understand this, take a look at the repository model diagram from the JCR 2.0 specification by Day Software AG (see Related URLs), which illustrates what the JCR tree structure looks like. Then consider Figure 1, which shows a fully-realized Jackrabbit implementation of this tree structure from the Magnolia JCR browser.
Nodes can be accessed either by path or by identifier. Each node may itself expose a collection of child nodes, and PHPCR and JCR define API methods for node traversal using these parent-child relationships. There also exist API methods to add or remove nodes from the tree. Here are some examples of these API methods:
SessionInterface->getNodeByIdentifier()methods return a node either by path or by identifier.
NodeInterface->addNode()method adds a new child node.
NodeInterface->getNodes()method returns all the children of a particular parent node.
NodeInterface->getPropertyValue()method returns the value of a specified node property.
See Related URLs for links to more information on the JCR 2.0 API and the PHPCR API.
Each node in a JCR tree is associated with a node type. JCR specifies a rich and configurable typing system for nodes. Available types include primitives, such as Booleans and strings, as well as types that are relevant in a hierarchical context, such as path. User-specified node types are also supported.
For example, the JCR "
nt:file" node type is used to represent a file and includes a mandatory property for the file creation date. Similarly, the Magnolia-specific "
mgnl:user" node type is used to represent a Magnolia user and so includes properties for user name, email address, and password, as well as child nodes for the user’s groups and roles.
Both PHPCR and JCR also support the concept of “workspaces”, each having its own node tree. Think of workspaces like branches in a version control system: they can be used independently, but they can also be merged, moved, copied and cloned. Workspaces make it possible to logically separate content, yet have it reside within the same physical repository.
It’s important to note at this point that both PHPCR and JCR merely define an implementation standard; they are not implementations themselves. There are several implementations for each:
- Apache Jackrabbit ( ) is a well-known open-source implementation of JCR. .apache.org
- Content Repository Extreme (CRX) (http://www.day.com/day/en/products/crx.html) is a commercial implementation of JCR by Day Software.
- Jackalope (https://github.com/jackalope/jackalope-jackrabbit) is an open-source implementation of PHPCR with both Doctrine and Jackrabbit bindings.
- Midgard2 PHPCR ( ) is a Symfony Content Management Framework (CMF) PHPCR provider. .org/phpcr
Installing and configuring required components
With the basics out of the way, let’s get started with some practical examples. In terms of software, this tutorial assumes that you have a properly-configured PHP and Java development environment, including the following components in your system path:
- Java (JDK 5 or JDK 6) - /
- PHP (v5.3.3 or better) - .net
- Git (v1.6 or better) - http://git-scm.com/downloads
On the PHP end of things, you’ll need to download and install:
The easiest way to install PHPCR and its dependencies is by using Composer, the popular dependency manager for PHP. Create a new working directory for the project, change to it, and then run the following command at the console to download Composer:
Within your working directory, create a file named composer.json and fill it with the following content:
Then, use Composer to download the necessary components using the console command below:
Note that the download process might take a while, so this is a good time to grab a cup of coffee and a slice of toast.
On the Java end of things, you’ll need to download and install:
- Magnolia Community Edition (v5.3 or better) - http://www.magnolia-cms.com/
- Magnolia Jackrabbit DavEx module (v0.2) - https://nexus.magnolia-cms.com/content/repositories/magnolia.forge.releases/info/magnolia/davex/magnolia-module-jackrabbit-davex/
You’ll find detailed instructions on how to perform the Magnolia installation in its documentation at Installing or in the beginner tutorial at . Note that the Magnolia download includes a bundled version of Apache Tomcat. .webreference.com/authoring/MagnoliaCMS/Setup/
Once you’ve downloaded and installed Magnolia, copy all the JAR files from the Jackrabbit DavEx module to the
$MAGNOLIA_INSTALL_PATH/apache-tomcat-x.y.z/webapps/magnoliaAuthor/WEB-INF/lib/ directory. DavEx is WebDAV with JCR extensions, and the DavEx module is necessary to enable access to the content in the JCR repository over HTTP.
Once the module files are copied, restart Magnolia by running the following command at the console:
Next, browse to the URL http://localhost:8080 and log in with the user name “superuser” and password “superuser”. You’ll be prompted to update the Magnolia installation with the new DavEx module. Do so, and once the process completes, you should end up at the Apps Launcher, which looks like Figure 2.
With the addition of the DavEx module, Magnolia should already be configured to allow DavEx access to the Jackrabbit repository. To check this, select the Configuration App in the Apps Launcher, and then navigate your way to the node at
/server/IPConfig/allow-all/methods. Select the node value and check that it contains with the following values:
GET,POST,PROPFIND,PUT,DELETE,REPORT,HEAD,SEARCH. If these values are not present, update the node by double-clicking it and entering them.
Figure 3 shows what the result should look like.
Back in the Apps Launcher, select the Tools category and the JCR App. Navigate your way to the node
/demo-project/about/history. Pay attention to the value of the node’s “
title” property (Figure 4), which holds the title for the corresponding webpage.
Now, try doing the same thing with PHP. As noted previously, a PHPCR implementation can connect to any JCR-compliant repository and both read and write data to it. Listing 1 demonstrates - by logging into the Magnolia JCR repository with PHPCR - opening a session to Magnolia’s “
website” workspace, navigating to the node above, and retrieving the content of its “
Listing 1 begins by setting up the Composer auto-loader, which takes care of loading PHP classes as needed. Next, Jackalope’s
getRepository() method is used to initialize and configure a repository factory object referencing the Magnolia repository. The Repository object’s
login() method is then used to log in to the repository with the “superuser” credentials and create a JCR session. This session object serves as the primary object for all subsequent repository operations.
Once a JCR session has been established, it’s quite easy to navigate the JCR tree using the PHPCR API. This listing illustrates the
getNode() method, which returns a node object representing the corresponding JCR node. It’s now possible to use the node object’s
getPropertyValue() method to return the value of any node property, as illustrated in Figure 5. Once you’re done interacting with the repository, it’s a good idea to clean up, by unsetting the session object and closing the connection to the repository.
The steps above make up a standard process for interacting with any JCR-compliant repository... although, as you’ll shortly see, PHPCR allows you to do much more than simply reading property values.
Retrieving nodes and properties
Every node in a JCR repository has a unique identifier, and so, just as you can retrieve a node by path with the
getNode() method, you can also retrieve a node by identifier with the
getNodeByIdentifier() method. Consider Listing 2, which produces output equivalent to the previous one using this method.
getPropertyValue() method described above is useful when you need to retrieve a specific value. However, you can also retrieve all the properties of a particular node as a PHP associative array, with the node object’s
getPropertiesValues() method. Listing 3 has an example.
Figure 6 is a side-by-side comparison that displays the node properties from the Magnolia JCR repository and the properties returned by Listing 3
Retrieving binary node content
The Magnolia repository also includes a “
dam” workspace to store binary content. A node in this workspace represents a binary file, and as per the JCR specification, includes a child node named “
jcr:data” which holds the actual binary data for the file. With PHPCR, it’s possible to access and retrieve this binary data from PHP in the same way as one would retrieve any other node content.
Consider Listing 4, which illustrates by retrieving a PDF file from Magnolia’s repository and prompting the user to download it to his or her desktop. This listing creates a JCR session to the “
dam” workspace, which is where Magnolia stores binary data, such as images, PDF documents and videos. It then retrieves the node at
/demo-project/downloads/Magnolia_Flyer_4-0, which contains a PDF file with an advertisement for Magnolia.
This node contains only basic information about the file, such as its name and last modified date. The real meat is found in its “
jcr:content” child node, which holds the binary content for the file in its "
jcr:data" property. With PHPCR’s node traversal methods, it’s possible to access the child node, retrieve the binary data for the file via the "
jcr:data" property, and send the output to the client (browser) as a binary stream.
By accessing other properties of the “
jcr:content” node, it’s also possible to derive the file name, extension and MIME type, and send the client the appropriate response headers to force it to perform a download rather than displaying the binary content directly to the user’s console.
Creating nodes and setting properties
Just as you can read data from Magnolia’s repository, so too can you write data. Consider Listing 5, which illustrates the process. In this example, the node object’s
setProperty() method is used to create a new property, as well as modify an existing property, of the node. The changes are then saved back to the repository with the session object’s
Figure 7 shows the impact of the change in Magnolia.
Listing 6 has another example, this one adding a folder to Magnolia’s “
dam” workspace. In this example, the
addNode() method is used to add a new child node to the “
dam” workspace. The new node is assigned the type "
mgnl:folder", which is Magnolia’s custom type for folders; using this type ensures it shows up in Magnolia’s Assets App. Next, the node title is set using the
setProperty() method discussed previously, and the changes are saved back to the repository.
If you now examine the repository using Magnolia’s Assets App, you’ll see the newly-added folder (Figure 8).
Traversing and searching the node tree
PHPCR also provides methods for iterating over a collection of nodes using standard loop constructs. Consider Listing 7, which iterates over all the children of a specified node and returns their abstracts. In this case, the node object’s
getNodes() method is used to retrieve a collection of child nodes from the Magnolia repository. PHPCR makes it possible to iterate over this node collection using a standard “
foreach” loop and perform operations on calculations on each child node. In this example, it checks if each node exposes an “
abstract” property, and if it does, it prints the node path and property value.
Figure 9 shows the output.
Like JCR, PHPCR also makes it possible to perform custom searches for nodes across the JCR repository via its support for JCR-SQL2. Part of the JCR 2.0 specification, JCR-SQL2 makes it possible to retrieve a result set of nodes matching specific criteria using SQL-like syntax. You’ll find more information at http://www.day.com/specs/jcr/2.0/6_Query.html.
To illustrate, consider Listing 8 which enhances the previous one to retrieve a listing of all nodes (pages) in the “
website” repository under the
/demo-project/about branch. This example uses the query manager object, which makes it possible to define a JCR-SQL2 query and execute it against the current workspace. As the query string illustrates, the syntax is very similar to SQL, with selectors, clauses, filters, and functions. The result of query execution is a collection of node objects, which can be processed using a regular “
foreach” loop, in the same way as the previous example.
Figure 10 shows the output.
Example: page abstract editor
From the previous examples, it’s clear that PHPCR offers PHP developers some key capabilities: searching for JCR nodes using pre-defined criteria, reading their properties and writing new values to them. These capabilities make it possible to create PHP-based web applications that can directly interact with content in a JCR-powered CMS like Magnolia, without needing any special Java knowledge or training.
To illustrate this, consider Listing 9: a PHP-based editor that allows users to directly edit the abstracts of webpages inside Magnolia. This script is essentially a giant "if" test keyed on the presence or absence of the
$_POST['submit'] variable, which is used to check whether the web form in the script has been submitted or not. Here’s how it works:
- The script begins by loading all required classes and opening a connection to the Magnolia repository. It also creates a session object for the “
website” workspace. It then checks to see if the web form has been submitted by checking for the
- If it hasn’t, it initializes and executes a JCR-SQL2 query to return a list of all the pages in the CMS under the
/demo-projectURL with abstracts. It then presents these abstracts as editable fields within a web form. Each input field is accompanied by a hidden field holding the corresponding node’s unique identifier. The user can now edit the abstracts and submit the form once done.
- Once the form has been submitted, the second half of the conditional test is invoked. Here, the submitted abstracts and accompanying node identifiers are used to retrieve the corresponding nodes from the Magnolia repository via the
getNodeByIdentifier()method, and the
setProperty()method is used to update their “
abstract” properties to the user-submitted values. The changes are saved back to the repository via the session object’s
save()method, and immediately become visible in the Magnolia instance.
Figure 11 illustrates the abstract editor in action, and the resulting change in Magnolia page content.
As the concluding example demonstrates, PHPCR provides a full-featured and robust implementation of the JCR specification. With PHPCR, PHP developers can easily build and integrate PHP-based web applications and frontends with JCR-powered Java applications, without needing special Java knowledge or training. It’s truly the best of both worlds...so what are you waiting for? Fork a copy of PHPCR today, and start coding!
Vaswani, Vikram: Integrating PHP Web Applications with JCR and Magnolia. This article was first published in the October 2012 issue of php|architect magazine. Vikram Vaswani is the founder and CEO of Melonfire, a consulting services firm with special expertise in open-source tools and technologies. He is also the author of the books Zend Framework: A Beginners Guide and PHP: A Beginners Guide.
- Kahwe Smith, Lukas: Code examples in this article. Liip AG.
- Day Software AG: Repository diagram. Day JCR License.