Data

Overview

The purpose of the Data Module is to manage structured data (as opposed to unstructured or semi-structured data commonly used for web pages). The Data Module stores content independently of a page structure. To place content of entry stored in Data Module on a page, one can retrieve the content from the paragraph or page template. In that regard, the Data Module is similar to the DMS, which allows you to store documents independently of the page structure.

The Data Module also provides functionality for automated import of external data which is specially useful for exposing the data from the other internal systems like warehousing or CRM solutions in Magnolia. Using the Data Module as an intermediary storage for the imported data allows to introduce the single point of entry for such data which can then be used from the web pages in an uniform way, same as any other Magnolia content. It also allows for post-import editing of the content to further improve automatically imported data in case they are not always directly suitable for publishing.

Usage

As mentioned above, the idea is to use the Data Module similarly to DMS, except that Data Module is designed to deal with the content rather than with files (images, movies, documents). Following are just a few ideas how Data Module can be used in projects:
  • Boiler Plates: You can define multiple text modules which can be referenced either within the template script or through a simple "Reference Paragraph".
  • Look-up Tables: Another widely used concept are so called "Look-up Tables". E.g. you can define a folder "US States" and for each state a node with properties like "abbreviation", "state name", "size of state" and more.
  • Product or manual catalogues for e-commerce solutions
  • Other Ideas: For example the list of Magnolia References on Magnolia's website is driven from the Data Module. By storing the client information in the Data Module Magnolia can generate different types of lists with different amount of details displayed from same set of data.
The Data Module introduces users to concept of Node Types. Node types themselves are nothing new and underpin whole working of the repositories. For further reading it would be beneficial to get familiar with the concept, for example by reading the JSR 170 specification for more details.

Just a brief explanation here: A Node Type is (very) simply speaking, a tag for a node. Node Type makes it easier to search "for nodes of type 'myNodeType'". Once you understand the reason why the Data Module starts with creating a custom Node Type, the rest should be straight forward. Another important note about Node Types. Node Types have to be registered before they can be used! While this is done automatically when you create a new Data Module entry, installing a custom module (using the Data Module) will need you to first bootstrap the custom Node Types before you can bootstrap the config and content nodes. The module provides a special install task to do this and it will be explained in more details further down the page.

How it works

Node Types

Everything in the Data Module starts with the definition of the type. In the image below you can see the Data/Type menu listing all the existing types. To create a new type, simply click on the New Type button in the toolbar at the bottom of the page.

Types Overview

The "New Type" dialog (below) pops in and the type can be defined. Types Definition

Name Each type needs to have a unique type name. This type name is not used for a visual representation of the type, but is real type definition so if you ever export content created with this type, the type name will be a primary type of the content same way as anything created in the Website repository has a primary type mgnl:content.
Title The visual Representation of the new Type. This name will be used to identify the type when interacting with users. For example it will be used as a label for sub menu entry in Data menu to show tree with all the instances of given type
Icon Optional icon to represent the type. The best result will be achieved with Icon image of size 16x16 pixels. The icon is used to represent the instances of the type itself in the tree as well as the icon for the type menu entry in Data menu.
Root Path Path under which the any nodes of this type should be stored. It is possible to use the same path for different types, but if you plan to import huge amounts of content of given type and to execute search queries over them, it will be easier to specify unique path to make sure that search queries can be easily limited to given sub path.
JCR Node Type Optional additional definition for the type. It is possible to constrain the type to certain conditions or to define auto created properties or sub nodes, etc. Read the above mentioned JCR specification for more details and to see how such syntax looks like. Defining such extra syntax is not necessary in most use cases for Data Module.
Allow Folder Check this checkbox if you want to allow folder creation for given data type. If not checked it will be possible to create content of given type only in flat list rather then in the tree.
Sort by Name Check this box if you want to have items of given type automatically sorted by their name.

Tip

When registering new node types programmatically (e.g. bootstrapping types and their data from custom module), you can use info.magnolia.module.data.setup.RegisterNodeTypeTask to do so. This task needs to be added at the beginning of the basicInstallTasks of your VersionHandler to ensure that when bootstrapping content that is using such node type, the type is already registered. It can be done for example by

    @Override
    protected List getBasicInstallTasks(InstallContext installContext) {
        final List<Task> installTasks = new ArrayList<Task>();
        // make sure we register the type before doing anything else
        installTasks.add(new RegisterNodeTypeTask("RSSAggegator"));
        installTasks.addAll(super.getBasicInstallTasks(installContext));
        return installTasks;
    }

Dialogs

Once the new type is created, Magnolia will also automatically define sub entry for given type in the Data menu as well as Tree configuration and basic Dialog for given type. To see the menu sub entry, please logout and login again into your Magnolia to refresh the menu. After selecting the type sub menu entry in the Data menu, the empty tree will be shown and the "New Item" or alternatively "New Folder" buttons will appear in the toolbar at the bottom of the page. Use those buttons to create new items and folders respectively. After clicking on "New Item" button the automatically created dialog will appear. By default this dialog defines only the name field. The dialog used to enter the data for newly created type is just a standard Magnolia dialog and other controls can be added to it as easily as to any other dialogs. The Dialog can be configured in config:/modules/data/dialogs/<type name>. The dialogs definition can be also easily accessed from Data menu by clicking on the Config (Dialogs) sub menu entry.

Controls

Data module provides also selection of its specific controls that can be used in dialogs. Those controls deal with the differences in data structures between Data repository and the others. Those controls are:
dataFile This control is used to upload binary data into the content in Data repository
dataButtonSet Same as buttonSet control, allows creation of sets of Radio or Checkbox groups
dataMultiSelect Same as multiselect control, allows selection of multiple values
dataUUIDButtonSet Same as dataButtonSet, but stores the values as UUIDs rather then paths
dataUUIDMultiSelect Same as dataUUIDMultiSelect, but stores the values as UUIDs rather then paths

Same as with other controls and dialogs also in Data Module the custom controls can be used. The important part to remember when writing custom controls for Data Module Node Types is that if such custom control is creating more complex structure underneath the Node Type, the Node Types of the sub nodes need to dataItemNode in order to get activated as part of the content of given type. In case this is not possible the activation command needs to be adopted to deal with the fact. Read on further to find out more about commands in Data module and activation related details.

Commands

Data Module provides range of predefined commands upon its installation. Those commands are either used internally or available for free use to anyone.

Installed commands are divided into two catalogues. The dataType catalogue contains commands specific to the Node Types definitions and the data catalogue contains commands common to the all defined types. dataType catalogue commands are

  • delete commands ensure that all the structures defined at the time of creation of new type (dialog, tree, sub menu entry) are deleted together with the type
  • importData command allows for manual invocation of ImportHandler of choice directly from the toolbar of the Types sub menu. The ImportHandler can be specified in product/importer entry of the command. The value specified here has to match one of the ImportHandlers defined at config:/modules/data/config/importers
data catalogue commands are
  • activate Activates a content from Data repository. The node of given custom type and all it's sub nodes of type dataItemNode are configured as one piece of content and will be activated together. Sub nodes of any other types will not be activated together with the content.
  • import The default ImportHandler invoker. This command will try to find the importer definition matching the name of the type at config:/modules/config/importers. If such matching importer is found, it will be invoked. If not the command will delegate to the importData command.
  • importData Same as this command in dataType catalogue, when called command will try to invoke importer specified in its sub node product/importer.
  • activateAll Activates all the content including all the sub content. This command may be optionally called by ImportHandler.
  • deactivateAll Deactivates all the content in given path. This command may be optionally called by ImportHandler.
  • deleteAll Deactivates and deletes all content in given path. This command may be optionally called by ImportHandler.
Tip

Each of the activation commands defined for the data module allow to customize what should be considered as part of the content during activation and what should be left out. By default commands are configured to include anything of type dataItemNode as part of the content and to activate it with the parent type. Other types could be added by changing the value of itemTypes property of the activation command. If updating this entry please make sure all the activation commands defined by Data module are updated to contain the same values.

Import Handlers

Import handlers are special tasks supported by Data module that can be invoked periodically to synchronize data in Data Module repository with external data storage. The import handlers, as the name suggests, are meant to import content into repository, rather then export it. Optionally import handlers can be configured to activate all the changes to public instances after finishing the import run.

Import handlers are configured in data module under config:/modules/data/config/importers. Each entry looks like shown below.

Import Handler

class The import Handler definition. The class extending abstract info.magnolia.module.data.importer.ImportHandler. This class will be called by the ImportCommand to execute the import
repository Target repository for imported data. By default the value should be data, but in case it is necessary for the import handler to import data into another repository, it can be changed here. However only one repository can be specified at any given time.
lockDuringImport Controls whether exclusive lock should be issued on the parent content to prevent any other updated during run of the importer. This property allows to define degree of control for long running and often executed imports. In case the first import is not finished, the lock will prevent any other imports from running. The lock will also prevent any manual editing of the content while the import is running. However, if the import handler for whatever reason fails to connect to external data source and doesn't timeout or gets otherwise locked when communicating with the external system, the lock will prevent any further updated to the affected Type. The only workaround in such case is to export the content, delete it and re-import. Use the content locking only when sure that your ImportHandler will not end up hanging during import process while communicating with external system.
activateImport Set to true when imported content should be automatically activated after successful import.
deleteIOldData Set to true when the data not found in external system should be automatically deleted (and deactivated).
automatedExecution/cron Cron job like pattern for scheduling automatic execution of the ImportHandler. See Quartz Cron Pattern for more details on supported patterns.
automatedExecution/enabled Set to true to enable automatic execution of the ImportHandler based on specified cron pattern.
targets Collection of targets into which imported data should be saved. In most cases single target main should be enough.
targets/main/class Name of the class transforming imported data into a format suitable for data storage. The info.magnolia.module.data.importer.SimpleImportTarget is suitable for most of the cases.
targets/main/targetPath The path for storing the content by this target. In most cases this path should match the root path specified for given Node Type in the Node Type definition.

Hierarchical Types

This feature has been introduced as of Data Module 1.4

It is an extension which exposes more of the underlying content repository functionality to the user. Previously, the data module was capable of handling any sort of custom content types. But so far there was only one level of such types and any sort of dependencies have to be either incorporated directly in the type, or maintained separately and linked together. With hierarchical types, it is possible to to build a types that are more realistic and more closely aligned to the business needs. One example to show this. Imagine using Data Module to store a information about various companies. For the sake of the example, let's say we are concerned with storing only some simple info about different departments in each company and employee information. Prior to Data Module 1.4 such scenario could be solved by creating "Employee", "Department" and "Company". Then for each instance of the Type company we would use the multiselect control to specify whatever departments or employees we would want to list under the company. While this is doable you can quickly see shortcomings of such solution - same name collisions, no way to see which department belongs to which company without actually opening the dialog, etc.

As of Data Module 1.4 the same scenario can be solved with the hierarchy of the types. The main type "Company" remains as before, but for this type we create 2 sub types - "Department" and "Employees". This is done simply by selecting the main type and clicking on "New Type" in the toolbar, or in the context menu. Since we want to also distinguish different kinds of employees, we create 3 distinct sub types for the "Employee" type: "Director", "Manager", "Developer".

After reading the above, one might ask why the "Department" type is defined directly under the "Company" while the employees are grouped in the sub type. This is a valid question and there are no hard rules. In general if you expect many different instances of the subtype, it is better to group them so they can be split if necessary, while if there are only few instances expected, it is easier to keep them directly under the main type so user doesn't have to browse through too deep a type hierarchy.

Hierarchical data type

The details of each type creation.

The only difference between main dialog type and sub type is the fact that one can't select Root Path for the sub types, nor can one change "Sort Items" or "Allow Folders" setting.

Main Type Configuration Dialog Sub Type Configuration Dialog

After defining the types as described above, the instance of such type might look for example like the one shown below:

Content of Hierarchical type

The creation of the instances of types, no matter whether flat or hierarchical, have not changed. Just select the folder in which instance of given type should be created and click "New Item". To create instance of the subtype in it, select the instance of the main type and click the "New Item" again. If there are more then one sub types that can be chosen at given level, the selection dialog similar to selection of paragraphs will be displayed letting user choose which sub type to create. The amount of instances of given subtype is not limited in any way.

Content of Hierarchical type

The screenshot should be self explaining. Ability to assign different icons to each of the subtypes further helps to visualize the entries. Also as I mentioned above, there are 2 different instances of the "Employee" type - Permanent and Temporary to further aid working with the tree (Imagine the company had 100+ employees). You can also see the other advantage of such structure I mentioned - looking at company, I'm always interested at different departments in it, and so i see them directly when expanding Company tree item, however I'm not slowed down by the need to fetch all the employees either. This way comparing departments in too Companies visually is still possible.

Known Limitations of hierarchical types implementation:

  • Each type or sub type is a separate content type in the underlying repository, since the repository requires the type names to be unique, it is not possible to have sub type of the same name in multiple main types.

Examples

Other modules using Data Module as a data storage: