The RSS Aggregator module displays external feed content on a Magnolia page (feed aggregation) and generates feeds from Magnolia content (feed syndication). Feed syndication increases your content exposure. This in turn generates more traffic and backlinks that improve your site rank. Displaying aggregated feeds provides continuous fresh content on your site, which encourages regular crawling by search engines. RSS feeds can also add a sense of timeliness and community involvement to your site.

(warning) Magnolia 4.5.10+ / RSS Aggregator 1.4.2+ Planet feeds enhance standard feeds by collecting feed statics. Planet Magnolia  is an example site that uses the module to collect blog posts about Magnolia.

Feed aggregation

A feed aggregator displays external feed content on a Magnolia page. This provides continuous fresh content and encourages regular crawling by search engines.

Creating an aggregate feed

In this example we collect two individual RSS feeds into one aggregate feed:

Both have a common topic – search engine optimization (SEO) – so they are good candidates for aggregation. A reader interested in one feed is likely to be interested in the other too.

To create a RSS aggregator:

  1. Go to Data > RSS Aggregator.
  2. Add a new item.
    • Name: SEOBlogs. This is the internal name of the aggregator.
    • Title: Feed title.
    • Description: Feed description.
    • Planet Feed: Check to mark this feed as a planet feed.
  3. Add the feed URLs from above.
    • Title: Leave blank to get the title from the external feed title
    • URL: Feed URL.
    • Image: Select an image from the DMS. 60x60 px works best. The image is displayed in the Planet Feeds componentHackergotchi have become popular and are supported.
  4. Add filters to refine content.
    1. Option: Select from a pre-defined list of options, Category, Title, Author or Description, commonly found in feeds.
    2. Value: Type a value to meet the condition and option in the first two boxes.
    3. Condition: Select an AND, OR or NOT condition to include or exclude content.

Importing feed data

To import the feed data:

  1. Go to Data > RSS Aggregators.
  2. Click Import data.

See also: Scheduling an automatic feed import.

Displaying a feed on the site

To display the aggregated feed on the site:

  1. Go to Website.
  2. Edit a section page such as /demo-features/aggregation-paragraphs/rss-aggregation.
  3. Add the RSS Combined Feed or RSS Feed List component in content area.
  4. In the component dialog, select the SEOBlogs feed you created above.

Scheduling an automatic feed import

Importing feed data manually is not a long-term solution. You should configure an automatic import schedule. Choose the update frequency depending on how often the feed has new content. For example, if a blog gets one post a day then importing once a day at 6 a.m. is enough. For more see Importing data.

The rssaggregator importer is configured in /modules/data/config/importers/rssaggregator.

To schedule automatic feed import in /automatedExecution:

  1. Set the enabled property to true.
  2. Define a Quartz Cron Pattern to create a suitable import schedule. For example, the pattern 0 0 6 * * * imports data daily at 6 a.m. See Cron Maker for help. 

Properties:

  • targets: Defines where the imported data is stored.
    • main: Name of the target.
      • class SimpleImportTarget  stores at the path configured in targetPath
      • targetPath: Path in the workspace. Typically the root /. The workspace name is configured in the repository property below.
  • automatedExecution: An import can happen either manually or automatically. This is the automatic import schedule.
    • cron: Cron job pattern for scheduled execution of the import handler. .
    • enabled: Enables and disables scheduled import.
  • feedFetcher:
    • class FastRSSFeedFetcher retrieves RSS feed content over HTTP. Fetches all feed channels defined in an aggregate feed.
  • activateImport: Allows imported data to be activated (published) automatically after a successful import. Default is false.
  • backup: Backs up feed content automatically if you are importing feeds automatically at set intervals. Default is true.
  • class: RSSFeedImportHandler imports RSS and Atom feeds over HTTP for aggregate feeds. You can optionally configure a feedFetcher that executes the actual import.
  • deleteOldData: Deletes and deactivates data that is no longer found in the external system. Default is false.
  • repository: Target workspace for the imported data. Default is data.

You can use SimpleRSSFeedFetcher as an alternative to FastRSSFeedFetcher. This simple, single-threaded fetcher reduces server load and is suitable when you don't have many feed channels and you don't import often.

Feed data storage

Feed data is stored in the data workspace. View the data in Data > JCR Browser.

Here's the SEOBlogs example.


Structure:

  • <RSS aggregator name>
    • importState: Internal feed properties.
    • .....
    • feeds: Feed information you entered into the dialog when you created the aggregator.
      • <feed number>
        • img: Image displayed in planet components.
        • link: Feed URL.
        • title: Title displayed in RSS components.
    • data: Data that was retrieved from the Internet. 
      • channel-<number>: A channel is created for each RSS feed.
        • description: Feed properties retrieved from the internet.
        • ....
        • entry-<number>: One entry in the feed such as one blog post.
          • author
          • channelTitle
          • content
          • .....

Feed components

Content will only display in the feed components after it has been imported. Unless the /modules/data/config/importers/activateImport property is set to true, it is necessary to activate the aggregator in Data > RSS Aggregator to display feed content on the public instance. This activation also activates the feed data.

There are two standard feed components:

  • STKCombinedFeed: Combines all feeds in a channel and renders them sequentially. You can set the number, sort order and character limit of the entries.

  • STKSingleFeed: Displays a defined number of entries for each feed in the channel. The external feed title is used if no internal title is set. You can set the number, sort order and character limit of the entries.

For non-STK users, the RSS Aggregator module provides equivalent components, CombinedFeedParagraoh and feedListParagraph . The two sets of components are essentially identical.

The components definitions are configured in STK > Template Definitions /components/teasers/stkCombinedFeed and /stkSingleFeed.

Here's the stkCombinedFeed definition:

The model class and template script are the important properties that determine content in the different components. All RSS model classes extend  AbstractFeedModel that provides the business logic to retrieve a defined feed and it's data from the data workspace and supports:

  • Entry sorting by title or publication data.

  • Ascending and descending entry sorting.

  • Maximum results property. Default is 20.

  • Search capabilities

Any custom model class should extend AbstractFeedModel.

Planet feeds

(warning) Magnolia 4.5.10+ / RSS Aggregator 1.4.2+ Planet feeds have all the features of standard RSS feeds but also store additional data that is used to create feed statistics. For example, a planet feed will tell you the number of posts by an author. You can use this information in a Planet Statistics component.

Creating a planet feed

To create a planet feed check the Planet Feed box in the aggregator dialog when creating or editing a feed. Feeds can be marked as a planet feed at any time. When you change a standard feed to a planet feed, re-import the feed data   before generating the additional planet data (described below).

Planet commands

The module includes two custom commands that generate planet data. These are configured in /modules/rssaggregator/commands/planet/generatePlanetData and /collectPlanetStatistics.

  • PlanetDataGenerator command is executed when planet data needs to be updated. Here's what the command does:

    • Generates data for a planet feed.
    • Processes all feeds in the channel and creates an archive of feed items if the feed is marked as a planet feed.
    • Stores individual feed items in the planet archive and assigns authorchannelTitletitlecontent and description properties. Posts without these properties are not stored in the archive
    • Builds checksums  to eliminate duplicates. New items are only added if they don't already exist in the archive.
    • Ensures that the maximum nodes allowed under a single node is not exceeded.
    • Respects any filters set.
  • CollectStatisticsCommand command is executed when planet statistics need to be updated. Here's what the command does:

    • Collects and generates statistics from feeds in a planet archive.
    • Stores the collected statistics in the JCR tree.

Scheduling a planet update

Use the Scheduler module to schedule planet updates. The update schedules are stored as a standard scheduler job in /modules/scheduler/config/jobs/generatePlanetData and /collectPlanetStatistics . The jobs execute the planet commands that do the actual work. 

To schedule a planet update:

  1. Set the active property to true.
  2. Define a Quartz Cron Pattern to create a suitable schedule. For example, the pattern 0 0 6 * * * imports data daily at 6 a.m. See Cron Maker for help.

(warning) There is no way to manually generate the planet data. You can use the pattern 0 0/5 * 1/1 * to generate the data five minutes after changing the settings. 


Displaying planet components on a site

To display planet components on a site:

  1. Make the components available in any area. You can make them available in a specific page template or globally for all pages in the template prototype. The planet components are not available by default.

  2. In Website edit a page where the components are available.

  3. Add the components to the page.

Here are the planet components using our SEOBlogs example data.

  • Planet Feed: Displays feed posts, including images. Editors can define the length of the text, and pagination is available.

  • Planet Statistics: Shows a list of authors with statistics (number of posts).

  • Planet Authors: Shows a list of sites with individual feed subscriptions. 

Planet data storage

Planet data is used to generate content for planet components. The data is stored in the data workspace. View the data in Data > JCR Browser.  

Here's the   SEOBlogs example after marking it as a planet feed, re-importing the data, and running the planet commands.


Properties:

  • <aggregator name>
    • planetData: The GeneratePlanetData command reads feed data and stores it here.
      • posts-<number>: All posts from all feeds. First all entries from the first feed, newest entry first. Then all posts from the second feed, and so on.
        • entry-<number>: One entry in the feed such as one blog post.
          • author
          • authorLink
          • channelTitle
          • checksum1/2: The module uses checksums to handle duplicate entries. Because feed data is deleted and recreated on every run, there is a high probability that a subsequent run will include entries that were contained in a previous run. Some entries may only have changed slightly, for example a different publication date. To avoid duplicates, the PlanetDataGenerator command uses checksums for each entry. Two checksum properties are generated. If an archive node with one of the checksums exists, no data is stored for the new item and an INFO level entry is written in the logs.
          • description
          • hidden: Set to true to hide the entry from the planet feed. Useful for hiding spam entries.
          • link
          • pubDate
          • rssLink
          • title

Planet statistics storage

Planet statistics are generated from planet data and stored in the data workspace. View the data in Data > JCR Browser.

Here's the data for the SEOBlogs example.

Properties:

  • <aggregator name>
    • statistics: The  CollectPlanetStatistics command extracts statistics from the /planetData node. This node is deleted and recreated on every run of the command.
      • authors: All authors from all feeds.
        • author-<number>: Each author is allocated a number.
          • author
          • blogLink
          • feedLink
          • postCount: Number of posts by this author in the aggregate feed.
          • counted-posts: Each child node is a reference to a post in the feed.
            • <post UUID>

Configuring how long to keep planet data

Planet data is generated and stored for the last 3 months by default. You can configure the time period for which the data is retained in /modules/rssaggregator/config/planetOptions/lastMonthsIncluded.

Properties:

  • planetOptions
    • lastMonthsIncluded: Number of months. Default is 3.

Planet components

Planet component definitions are configured in /modules/rssaggregator/templates/components/.

Planet Feeds component

The Planet Feeds component is an enhanced version of the STK Combined Feed component and can only be used with planet feeds. The component:

  • Displays the entries of all feeds in a channel with the latest appearing first.
  • Uses the HTML of the source entry. This means that images and links are preserved and displayed.
  • Allows pagination. 

The component definition is in /modules/rssaggregator/templates/components/planetFeeds.

Planet Statistics component

The Planet Statistics component uses the planet statistics data. The component:

  • Displays a list of authors.
  • Displays subscription links to the author's individual feeds (icon) and text links (author name) to their Website.
  • Orders the list by post frequency with the top contributors appearing first.
  • Allows editors to add a title and subtitle, define the number of authors to include, and select whether links and the post count should be included. 

The component definition is in /modules/rssaggregator/templates/components/feedStatistics.

Planet Authors component

The Planet Authors component allows users to subscribe directly to the external feeds on a Magnolia page. The links take the user to the external sites.

The component definition is in /modules/rssaggregator/templates/components/feedSubscriptions.

Feed syndication

Feed syndication increases your content exposure. This in turn generates more traffic and backlinks that improve your site rank.

Feed generators

Feed generators generate RSS feeds from Magnolia content and imported content stored in the data workspace.

Four feed generators are registered in /modules/rssaggregator/config/feedGenerators.

Custom generators should extend the convenience base class  AbstractSyndFeedGenerator . Subclasses need to implement the template methods loadFeedEntries() and setFeedInfo(SyndFeed).

Servlet

FeedSyndicationServlet writes an XML feed to the response. Based on the request parameters, the feedGenerators configuration (above) is resolved and used to generate the XML feed. The content of the feed is written to the response with the appropriate character encoding.

The servlet is registered in /server/filters/servlets/FeedSyndicationServlet. 


Virtual URI mapping

The syndication components use virtual URI mappings to redirect the generated feeds. The mappings use regular expressions and are called by the feed generator classes to render appropriate content in the XML feed.

  • Two mappings are configured in the RSS Aggregator module in /modules/rssaggregator/virtualURIMappings/rssFeeds and /planetFeeds.

  • Standard Templating Kit module includes the contentFeeds mapping in /modules/standard-templating-kit/virtualURIMapping/contentFeeds .

  • Categorization module includes the categoryFeeds mapping in /modules/categorization/virtualURIMapping/categoryFeeds .

Syndication components

Three modules provide syndication components: Standand Templating Kit, Categorization and RSS Aggregator. All components rely of the functionality of the RSS Aggregator module.

Content Type RSS Feed component

The STK module includes the stkExtrasContentTypeRSSFeed component. The component renders an RSS subscription icon on the page. Editors can define feeds that aggregate pages based on the Article, News or Events templates, and select a parent page.

  

Here's the component definition is in STK > Template Definitions /components/extras/stkContentTypeRSSFeed:

Properties:

  • modelClass ContentTypeSyndicateModel creates a STK renderable definition and returns the appropriate content.

  • templateScript: syndicate.ftl, (GIT) renders the RSS icon in the component and itemLinks provided by the model class.

The contentFeeds URI mapping calls the templateContent feed generator class ( PageSyndicator ) that generates the XML feed for content based on templates with a category property equal to content, and a subCategory property equal to the selection in the dialog. Here's the generated feed for all article pages in the demo-project/about/subsection-articles section.

Category RSS Feed component

The Categorization module includes the categoryRSSFeed component. The component renders an RSS subscription icon on the page. The generated feed includes all pages tagged with specified categories. Editors can define the categories and root page for the feed.

The component definition is in /modules/categorization/templates/components/categoryRSSFeed.

Properties:

  • modelClass: The CategorySyndicateModel model class provides the business logic to select the relevant entries for the feed.

  • templateScriptsyndicate.ftl (Git) renders the RSS icon in the component and itemLinks provided by the model class.

The categoryFeeds URI mapping calls the category feed generator class ( CategorySyndicator ) that generates the XML feed for content based on the categories and root page selected in the dialog. The URL for the generated feed of content in the demo-project/about section, tagged with the family category, is similar to http://localhost:8080/magnoliaAuthor/rss/?generatorName=category&categories=ab9437db-ab2c-4df5-bb41-87e55409e8e1&siteRoot=/demo-project/about/subsection-articles . Compare the feed URL to the categoryFeeds Virtual URI mapping. The long number sequence is the UUID of the family category set up in Data > Category.

Planet Syndication component

The RSS Aggregator module includes the planet feedSyndication component that renders subscription Atom and RSS icons and links. Editors can add a title and select a planet feed.

The component definition is in /modules/rssaggregator/templates/components/feedSyndication.

Properties:

  • templateScript: The feedSyndication.ftl (Git) script renders the icons and links, and the  PlanetFeedGenerator  generator class renders the feed.

Security

The anonymous user does not have permissions to the data workspace on the author or public instance by default. Public users cannot see the component content. You can check this by logging out of the public instance.

The RSS aggregator module creates the rss-aggregaotor-base role with the following permissions:.

WorkspacePermissionScopePath
DataRead onlySelected and sub nodes/rssaggregator

To give anonymous access to the RSS components assign the rss-aggregator-base role to the anonymous user on the public instance.

#trackbackRdf ($trackbackUtils.getContentIdentifier($page) $page.title $trackbackUtils.getPingUrl($page))