Page tree
Skip to end of metadata
Go to start of metadata

The RSS Aggregator module displays external feed content on a Magnolia page (feed aggregation) and generates feeds from Magnolia content (feed syndication).

Planet feeds enhance standard aggregation feeds by collecting additional data and generating feed statics. 

Creating an aggregate feed

A feed aggregator retrieves external feeds and displays the content on a Magnolia page. 

Planet feeds have all the features of standard feeds but can generate additional data and statistics that is displayed in planet components. See Feed components and Generating planet data for more.

In this example we collect three individual RSS feeds into one aggregate feed:

These have a common topic – celebrity gossip – so they are good candidates for aggregation. A reader interested in one feed is likely to be interested in the others too.

To create a RSS aggregator, in the Feeds app:

  1. Add a new feed.
    • Name: CelebrityGossip. This is the internal name of the aggregator.
    • Title: Feed title.
    • Short description: Feed description.
    • Check Planet Feed to mark the feed as a planet feed. 
  2. Add the feed URLs from above.
    • Title: Leave blank to get the title from the external feed title
    • URL: Feed URL.
      Image: Select an image from the DAM. 60x60 px works best. The image is displayed in the Planet Feeds component. Hackergotchi images are supported.
  3. Add filters to refine content.
    1. Condition: Select an AND, OR or NOT condition to include or exclude content.
    2. Property: Select from a pre-defined list of options, Author, Category, Description or Title, commonly found in feeds.
    3. Regex: Type a regex pattern to meet the condition and property in the first two boxes.

Feeds can be marked as a planet feed at any time. When you change a standard feed to a planet feed, re-import the feed data before generating planet data 

Importing feed data

To import data:

  • All feeds: Click Import all feeds now.
  • A single feed: Click Import feed now.

See also: Scheduling automated feed imports.

Feed components

The module includes components that display external feeds and generate feeds from internal content. 

Content will only display in the feed components after it has been imported. To display feed content on the public instance publish the feed in the Feeds app. This will also publish the feed data.

The feed components are configured in /modules/rssaggregator/templates/components.

Node nameValue

 rssaggregator

 

 templates

 

 components

 

 combinedFeedsParagraph

 

 feedListParagraph

 

 planetFeeds

 

 feedSyndication

 

 feedStatistics

 
combinedFeedsParagraphCombines all feeds in a channel and displays them sequentially. Editors can set the number, sort order and character limit of the entries. Use with standard or planet feeds.
feedListParagraphDisplays a defined number of entries for each feed in the channel. The external feed title is used if no internal title is set. Editors can set the number, sort order and character limit of the entries. Use with standard or planet feeds.
planetFeeds

Displays the entries of all feeds in a channel with the latest appearing first. Uses the HTML of the source entry. This means that images and links are preserved and displayed. Allows pagination. Use with planet feeds only.

feedStatistics

Uses planet statistics data to display a list of authors with feed subscription icons and links to the external website. Author list is ordered by post frequency with top contributors appearing first. Editors can set a title and subtitle, define the number of authors to include, and whether links and the post count should be included. Use with planet feeds only.

feedSyndicationDisplays Atom and RSS subscription icons and links that allow users to subscribe to the feed on your site. Editors can add a title and select a planet feed. Use with planet feeds only.
 

Creating a custom component

The model class and template script are the important properties that determine content in the different components. All RSS model classes extend  AbstractFeedModel  that provides the business logic to retrieve a defined feed and it's data from the rss workspace and supports:

  • Entry sorting by title or publication data.

  • Ascending and descending entry sorting.

  • Maximum results property. Default is 20.

  • Search capabilities

Any custom model class should extend AbstractFeedModel.

Adding components to templates

You can add the feed components to any template. 

Example: Template definition with all feed components in main area.

/my-module/templates/pages/myTemplate.yaml
templateScript: /my-module/templates/pages/my-script.ftl
renderType: freemarker
visible: true
title: My template
areas:
  main:
    availableComponents:
      combinedFeeds:
        id: rssaggregator:components/combinedFeedsParagraph
      feedList:
        id: rssaggregator:components/feedListParagraph
      planetFeeds:
        id: rssaggregator:components/planetFeeds
      feedStatistics:
        id: rssaggregator:components/feedStatistics
      feedSyndication:
        id: rssaggregator:components/feedSyndication
Node nameValue

 myTemplate

 

 areas

 

 main

 

 availableComponents

 

 combinedFeeds

 

 id

rssaggregator:components/combinedFeedsParagraph

 feedList

 

 id

rssaggregator:components/feedListParagraph

 planetFeeds

 

 id

rssaggregator:components/planetFeeds

 feedStatistics

 

 id

 rssaggregator:components/feedStatistics

 feedSyndication

 

 id

rssaggregator:components/feedSyndication

 templateScript

/my-module/templates/pages/my-script.ftl

 renderType

freemarker 

 visible

true

 title

My Template

Scheduling automated feed imports

Importing feed data manually is not a long-term solution. You should configure an automated import schedule. Choose the update frequency depending on how often feeds have new content. For example, if a blog gets one post a day then importing once a day at 6 a.m. is enough.

Setting automated imports for all feeds

You can schedule automated imports for all feeds in the Configuration app. The global settings can be overridden for single feeds.

  1. Periodic import
    1. Disabled: Automated imports are disabled by default. You can switch off them off at any time.
    2. Import every: Quick option to set imports for a specified number of minutes, hours or days.
    3. Use cron time/date defintion: Define a Quartz Cron Pattern to create a suitable import schedule. For example, the pattern 0 0 6 * * * imports data daily at 6 a.m. See Cron Maker for help.
  2. Fetcher: Substitute the default feed fetcher class if necessary. See Properties table below for options.

The input in the app is stored in the module configuration in /modules/rssaggregator/config/.

Node nameValue

 rssaggregator

 

 config

 

 planetOptions

 

 feedGenerators

 

 automatedImport

true

 cron

0 0 0/1 1/1 * ? *

 fetcherClass

info.magnolia.module.rssaggregator.importhandler.FastRSSFeedFetcher

 importTimingSetter

CronMaker 

Properties:

config

required

Module configuration node.

automatedImport

required, default is false

Enables and disables automated imports.

cron

required

The cron pattern for scheduled imports.

fetcherClass

required

FastRSSFeedFetcher  retrieves RSS feed content over HTTP. Fetches all feed channels defined in an aggregate feed. You can use SimpleRSSFeedFetcher as an alternative. This is a simple, single-threaded fetcher that reduces server load and is suitable when you don't have many feed channels and you don't import often.

importTimingSetter

required

Utility used to build the cron expression.

Setting automated imports for single feeds

You can configure automated import settings for each feed in the Import Settings tab of the edit dialog. These setting override the global settings

  1. Periodic import
    1. Select This feed has different import settings to override the global settings.
    2. Disabled: Stops any scheduled automated imports for the individual feed and the global settings take over.
    3. Import every: Quick option to set imports for a specified number of minutes, hours or days.
    4. Use cron time/date defintion: Define a Quartz Cron Pattern to create a suitable import schedule. For example, the pattern 0 0 6 * * * imports data daily at 6 a.m. See Cron Maker for help.
  2. Fetcher: Substitute the default feed fetcher class if necessary. See Properties table above for options.

Feed data storage

Feed data is stored in the rss workspace

Example: CelebrityGossip feed in the JCR Browser app.

Node nameValue

 CelebrityGossip

 

 data

 

 channel-02

 

 entry-0

 

 categories

 

 author

Perez Hilton

 channelTitle

Perez Hilton

 content

<p><strong></p><p>Yolanda Foster</strong> has been <a href="http://perezhilton.com/2015-12-09-yo...

 description

Yolanda Foster has been coming to terms with her divorce from David Foster ever since announcing it...

 link

http://perezhilton.com/2015-12-10-yolanda-foster-divorce-david-foster-focus-on-her-health-on-the-mend

 pubDate

1,449,759,236,000

 title

Yolanda Foster Thinks Divorce Will Help Her Fight With Chronic Lyme Disease — Here's How!

 entry-1

 

 entry-2

 

....

 

 description

Perez Hilton dishes up the juiciest celebrity gossip on all your favorite stars, from Justin Bieber to Kim Kardas..

 link

http://perezhilton.com

 rss

http://i.perezhilton.com/?feed=rss2

 title

PerezHilton

 type

rss_2.0

 channel-00

 

 channel-01

 

 automatedImport

true

 cron

0 0/15 * 1/1 * ? *

 importTimerSetter

bgvCronMaker

 name

CelebrityGossip

 overrideDefault

false

 title

Celebrity Gossip

Structure:

<RSS aggregator name> 

data

Data retrieved from the Internet. 

The content of this folder is that same for standard and planet feeds. This content is used for standard feed components (combinedFeedsParagraph and feedListParagraph) but not by the planetFeed component. See Generating planet data for more.

channel-<number>

A channel is created for each RSS feed.

entry-<number>

One entry in the feed such as one blog post.

author

 

channelTitle

 

content

 

....

 

description

Feed properties retrieved from the internet.

link

 

....

 

automatedImport

Internal feed properties.

cron

 

....

 

Generating planet data

Planet commands

The module includes two custom commands to generate planet data.

The commands are configured in /modules/rssaggregator/commands/planet/.

Node nameValue

 rssaggregator

 

 commands

 

 planet

 

 generatePlanetData

 

 class

info.magnolia.module.rssaggregator.generator.PlanetDataGenerator

 collectPlanetStatistics

 

 class

info.magnolia.module.rssaggregator.generator.CollectStatisticsCommand

planet

required

Planet commands node.

generatePlanetData

required

Generate planet data command node.

class

required

  PlanetDataGenerator :

  • Generates data for a planet feed.
  • Processes all feeds in the channel and creates an archive of feed items if the feed is marked as a planet feed.
  • Stores individual feed items in the planet archive and assigns authorchannelTitletitlecontent and description properties. Posts without these properties are not stored in the archive
  • Builds checksums to eliminate duplicates. New items are only added if they don't already exist in the archive.
  • Ensures that the maximum nodes allowed under a single node is not exceeded.
  • Respects any filters set.

collectPlanetStatistics

required

Collect planet statistics command node.

class

required

  CollectStatisticsCommand :

  •  Collects and generates statistics from feeds in a planet archive.
  • Stores the collected statistics in the JCR tree.

Scheduling planet updates

Use the Scheduler module to execute the planet commands to generate planet data and schedule regular updates.

 Two jobs are preconfigured in /modules/scheduler/config/jobs:

Node nameValue

 scheduler

 

 config

 

 jobs

 

 generatePlanetData

 

 params

 

 repository

rss

 active

true

 catalog

planet

 command

generatePlanetData

 cron

0 0/10 * 1/1 * ? *

 description

generate data for RSS planet

 collectPlanetStatistics

 

 params

 

 repository

rss

 active

true

 catalog

planet

 command

collectPlanetStatistics

 cron

0 0/18 * 1/1 * ? *

 description

generate statistics for RSS planet

Properties:

jobs

required

Node for scheduled jobs

<job name>

required

Name of job.

params

required/optional

Parameters passed to the command.

repository

required

Workspace where the content item resides.

active

required

Enables and disables the job. Set to true to run.

catalog

required

Name of the catalog where the command resides

command

required

Name of the command.

cron

required

CRON expression that sets the scheduled execution time. You can use the pattern 0 0/5 * 1/1 * ? * to generate the initial data five minutes after changing the settings. Cronmaker is a useful tool for building expressions

description

optional

Description of the job.

Planet data storage

Planet data is used in planet components. The data is stored in the planetData folder in the rss workspace. View the data in your custom JCR browser.  

Example: CelebrityGossip feed after marking it as a planet feed, re-importing the data, and running the generatePlanetData job

Node nameValue

 CelebrityGossip

 

 data

 

 planetData

 

 posts-00000

 

 entry-1

 

 author

Perez Hilton

 authorLink

http://perezhilton.com

 channelTitle

PerezHilton

 checksum1

1f271e498d7aa6352d38a1d8ae6707bb

 checksum2

6d2abcefcb5e196c76494f0f4b7a1d04

 description

<p><strong></p><p>Yolanda Foster</strong> has been <a href="http://perezhilton.com/2015-12-09-yolanda-...

 hidden

false

 link

http://perezhilton.com/2015-12-10-yolanda-foster-divorce-david-foster-focus-on-her-health-on-the-mend

 pubDate

1,449,759,236,000

 rssLink

http://i.perezhilton.com/?feed=rss2

 title

Yolanda Foster Thinks Divorce Will Help Her Fight With Chronic Lyme Disease — Here's How!

 entry-2

 

 entry-3

 

....

 

Structure:

<aggregator name> 

planetData

The GeneratePlanetData command reads feed data and stores it here.

posts-<number>

All posts from all feeds. First all entries from the first feed, newest entry first. Then all posts from the second feed, and so on.

entry-<number>

One entry in the feed such as one blog post.

author

 

authorLink

 

channelTitle

 

checksum1/2

The module uses checksums to handle duplicate entries. Because feed data is deleted and recreated on every run, there is a high probability that a subsequent run will include entries that were contained in a previous run. Some entries may only have changed slightly, for example a different publication date. To avoid duplicates, the PlanetDataGenerator command uses checksums for each entry. Two checksum properties are generated. If an archive node with one of the checksums exists, no data is stored for the new item and an INFO level entry is written in the logs.

description

 

hidden

 

link

 

pubDate

 

rssLink

 

title

 

Planet statistics storage

Planet statistics are generated from planet data and stored in the statistics folder in the rss workspace. View the data in your custom JCR browser.

Example: CelebrityGossip feed after running the generatePlanetStatistics job

Node nameValue

 CelebrityGossip

 

 data

 

 planetData

 

 statistics

 

 authors

 

 author-0

 

 author-1

 

 author-2

 

 countedPosts

 

 af0a8534-d0eb-4bd4-8c03-c1051438468a

 

 f4c3ca83-f766-4ae1-bb33-97761916fd7d

 

 ....

 

 author

Perez Hilton

 blogLink

http://perezhilton.com

 feedLink

http://i.perezhilton.com/?feed=rss2

 postCount

20

 author-3

 
                       .... 

Structure:

<aggregator name> 

statistics

The  CollectPlanetStatisticscommand extracts statistics from the /planetData node. This node is deleted and recreated on every run of the command.

authors

All authors from all feeds.

author-<number>

Each author is allocated a number. 

counted-posts

Each child node is a reference to a post in the feed.

<post UUID>

 

....

 

author

 

blogLink

 

feedLink

 

postCount

Number of posts by this author in the aggregate feed.

Configuring how long to keep planet data

Planet data is generated and stored for the last 3 months by default. You can configure the time period for which the data is retained in /modules/rssaggregator/config/planetOptions.

Node nameValue

 rssaggregator

 

 config

 

 planetOptions

 

 lastMonthsIncluded

3

Properties:

planetOptions

required

Planet options

lastMonthsIncluded

required, default is 3

Number of months.

Feed generators

Feed generators generate RSS feeds from Magnolia content and imported content stored in the rss workspace.

Four feed generators are registered in /modules/rssaggregator/config/feedGenerators.

Node nameValue

 modules

 

 rssaggregator

 

 config

 

 feedGenerators

 

 rss

 

 class

info.magnolia.module.rssaggregator.generator.RSSModuleFeedGenerator

 planet

 

 class

info.magnolia.module.rssaggregator.generator.PlanetFeedGenerator

 category

 

 class

info.magnolia.module.categorization.syndication.CategorySyndicator

 templateContent

 

 class

info.magnolia.module.rssaggregator.generator.PageSyndicator

Properties:

feedGenerators

required

Feed generators node.

<generator name>

required

Generator name.

class

required

Generator class:

Custom generators should extend the convenience base class  AbstractSyndFeedGenerator . Subclasses need to implement the template methods loadFeedEntries() and setFeedInfo(SyndFeed).

Feed syndication servlet

The feed syndication servlet writes an XML feed to the response.

The servlet is registered in /server/filters/servlets/FeedSyndicationServlet. 

Node nameValue

 server

 

 filters

 

 context

 
               .... 

 servlets

 

 ClasspathSpoolServlet

 
                       .... 

 FeedSyndicationServlet

 

 mappings

 

 -rss--

 

 pattern

/rss/*

 parameters

 

 class

info.magnolia.cms.filters.ServletDispatchingFilter

 comment

Responsible for RSS Feed syndication

 enabled

true

 servletClass

info.magnolia.module.rssaggregator.servlet.FeedSyndicationServlet

 servletName

FeedSyndicationServlet

Properties:

FeedSyndicationServlet

required

Feed syndication servlet node.

servletClass

required

FeedSyndicationServlet writes an XML feed to the response. Based on the request parameters, the feedGenerators configuration is resolved and used to generate the XML feed. The content of the feed is written to the response with the appropriate character encoding.

Virtual URI mapping

Syndication components use virtual URI mappings to redirect generated feeds. The mappings use regular expressions and are called by the feed generator classes to render appropriate content in the XML feed. 

Mappings are configured in /modules/rssaggregator/virtualURIMappings.

Node nameValue

 modules

 

 rssaggregator

 

 virtualURIMapping

 

 rssFeeds

 

 class

info.magnolia.cms.beans.config.RegexpVirtualURIMapping

 fromURI

/rssFeeds/(.*)

 toURI

redirect:/rss/?feedPath=/$1

 planetFeeds

 

 class

info.magnolia.cms.beans.config.RegexpVirtualURIMapping

 fromURI

/planetFeeds/(.*)

 toURI

redirect:/rss/?feedPath=/$1&generatorName=planet

 categoryFeeds

 

 class

info.magnolia.cms.beans.config.RegexpVirtualURIMapping

 fromURI

/categoryFeeds/([a-zA-Z0-9,-]*)/(.*)

 toURI

redirect:/rss/?generatorName=category&categories=$1&siteRoot=/$2

Properties:

virtualURIMapping

required

Virtual URI mapping node

<mapping name>

required

Name of mapping.

class

required

RegexpVirtualURIMapping allows you to specify a regular expression pattern that matches a sequence of characters. 

fromURI

required

Pattern to match in the requested URI.

toURI

required

Concrete URI where the request is mapped to.

Security

Public users

The anonymous role does not have permissions to the rss workspace on the author or public instance. Public users cannot see the feed content by default. 

The RSS Aggregator module installs the rss-aggregator-base role that provides read permissions to the rss workspace.

WorkspacePermissionScopePath
RssRead onlySelected and sub nodes/

To provide public access to feed content, assign the rss-aggregator-base role to the anonymous systems user on the public instance.

App access

By default only superuser can access the Feeds app and work with feeds because the superuser role includes read/write permissions to the rss and config workspaces.

Here's how to grant permissions for various feed tasks:.

  • App launcher access: The Feeds app is in the Setup group and editors typically cannot access this group. Move the app to the Edit group to grant access in the app launcher. See App launcher layout for more. This allows the user to open the app but not view the content.
  • Read only access: Assign the rss-aggregator-base role to give read only access to feed content. This allows the user to view the list of feeds in the app, select feeds in components and view feed content on pages. 
  • Read/write access: Create a new role granting read/write access to the rss workspace and assign it to the user. This allows the user to create new feeds in the Feeds app, but they cannot access the Configuration subapp.
  • Configuration app access: Create a new role granting read/write access in the config workspace to /modules/rssaggregator to allow the user to schedule automated feed imports in the Configuration subapp.

2 Comments

  1. I implemented custom feed generators and added it  to /modules/rssaggregator/config/feedGenerators and made it available through virtualUriMapping. The feed works and can be accessed.

    However every time I restart my Magnolia instance the generator has to be re-published again otherwise it fails to be resolved, with the following error:

    info.magnolia.module.rssaggregator.generator.FeedGeneratorConstructionException: Failed to resolve feed generator. Set it up in config:/modules/rssaggregator/config/feedGenerators or in module descriptor as a value for info.magnolia.module.rssaggregator.generator.FeedGenerator property.

    Any idea how to make it persistent for every reboot?

    1. I have resolved my issue by replacing \@inject variables with Providers because during the bootstrap Magnolia does not know its reference and that's why it failed on the bootstrap only