The Google Sitemap module creates an XML sitemap file that lists URLs for each page. Sitemaps are used to tell search engines which pages they should index. This improves search engine optimization (SEO) by ensuring that all site pages are found and indexed. This is particularly important for sites that use dynamic access to content such as Adobe Flash and for sites that have JavaScript menus that do not include HTML links. Where navigation is built with Flash, a search engine will probably find the site homepage automatically, but may not find subsequent pages unless they are provided in a Google Sitemap format.

Note that using Google Sitemaps does not guarantee that all links will be crawled, and even crawling does not guarantee indexing. Nevertheless, a Google Sitemap is still the best insurance for visibility in search engines. Webmasters can include additional information about each URL, such as when it was last updated, how often it changes, and how important it is in relation to other URLs in the site. Google Sitemaps adhere to the Sitemaps protocol and are ready to be submitted to search engines.

Download

The Google Sitemap module is not bundled with the Community or Enterprise Editions. Download the module from Magnolia Store or Nexus repository.

Installing

To install the module, follow the general  module installation instructions.

Uninstalling

See the general module uninstalling instructions and advice.

Configuration

The Google Sitemap module is configured in /modules/google-sitemap.

Creating a sitemap

To create a sitemap:

  1. Create a new page and assign the GoogleSiteMap template. The page can be anywhere in the website tree. You can move the pages under a site root node if you plan to have Sitemaps for more than one site. Multiple Sitemaps are supported.

  2. In the page properties dialog (warning) 4.5.13+:
    1. Hide in Navigation: The page will be excluded from site navigation.
    2. Sitemap Type: Two sitemap types are available, standard and mobile. Google recommends that you use separate sitemaps for different content types. Mobile sitemaps use compliant mobile-specific tag and namespace requirements.
      (warning) Add the dialog property to the Sitemap template if the dialog does not open. For earlier versions see below.

  3. Add the SiteComponent component to the page and select the site in the dialog. The component allows you to select a site root or branch. All pages under that branch will be listed in the sitemap. You can also select subpages as the root node to, for example, create different sitemaps for site sections.



    The root node of the selection will not be included in the site map. Assume you have the following trees: /a/b/c and /a/b/d. If you select /a/b as the root of the sitemap, only pages under c and d will be included in the map. The root node b will not be included.

  4. The sitemap renders on the page. You can edit each entry individually.



Sitemap links are generated using the protocol that is defined in your site definition. The default protocol is HTTP. If you want HTTPS define the protocol in domain mapping.

For versions up to 4.5.13, the Google sitemap page does not have a properties dialog and mobile sitemaps are not supported. To exclude the sitemap page from site navigation, the options are: 

  1. Register a properties dialog. Add a dialog property in the siteMapsConfiguration template definition. Set the value to the generic standard-templating-kit:generic/master/basePageProperties dialog. Then open the dialog and check the Hide in navigation box.
  2. In the JCR Browser, add a hideInNav property under the sitemap page node and set it to true.

Editing sitemap entries

To define properties for the entries click Edit properties:

  • Priority: Priority of the page relative to other site pages. Values range from 0.0 (low) to 1.0 (high). Default is 0.5. Set the priority of your most important page to 1.0. Setting all pages to 1.0 does not increase the rank of your site in search results since the importance is a relative measure among pages of the same site. A search engine may choose to rank the page higher than other pages of the site based on the value, however. See priority in XML Sitemap protocol.
  • Change frequency: Suggested frequency for search engines to crawl the page. Valid values are: always, hourly, daily, weekly, monthly, yearly and never. Use the value always for pages that change each time they are accessed. Use never for archived pages that will never change. See changefreq in XML Sitemap protocol.
  • Hide: Excludes a page from the sitemap. Child pages are not excluded automatically. (warning) The hideIngoogleSiteMap property is stored in the page itself. This means you need to activate the page. Activating the sitemap only is not enough.
  • Hide children: Excludes child pages from the sitemap. To exclude both a parent and its children check both boxes.

Including virtual URIs

To include virtual URIs in your sitemap, add the VirtualUriComponent to the page. No dialog is associated with this component. The component directly renders virtual URIs defined in this instance.  Virtual URI mappings are a Magnolia method of redirecting requests and shortening URLs. The Google Sitemap module reads all virtual URI mappings from the system and lists them here. Set properties as required. The entries display as list (as opposed to a  tree) and you can set the same properties that are available for pages, except Hide children that is inapplicable.

Hide default mappings defined in the adminInterface module, such as those for accessing AdminCentral. Public users will not access AdminCentral, so these URLs do not need to appear in the sitemap. Also, hide mappings that use regular expressions in the toURI property. These are not understood by search engines as regular expressions.

Activating

Activate the sitemap page to the public instance to ensure that it is accessible to the search engines. (warning) You also need to activate any pages you excluded from the sitemap.

Viewing the sitemap

You can view the XML sitemap on the author or public instance at /<CATALINA_HOME>/<contextPath>/<sitemap name>.xml, for example, http://localhost:8080/magnoliaPublic/sitemap.xml. Note that a filter mechanism removes duplicate URLs.

Here's the rendered XML for a standard and mobile sitemap for the demo-project site. Note the use of the mobile tags.

 

Sitemap template

The siteMapsConfiguration page template renders the sitemap. The configuration is at /modules/google-sitemaps/templates/pages/siteMapsConfiguration:

  • modelClass: SiteMapModel is the main model class for site map templates.
  • templateScript: main.flt (GIT) includes two alternative scripts, mainXml.ftl (GIT) and mainConfiguration.ftl (GIT) that renders text or XML content dependent on the URL extension.
  • dialog: (warning) Add this property and set the value to google-sitemap:pages/googleSitemapProperties.

Adding to robots.txt file

Add the following line in your robots.txt file. Include the full URL to the sitemap:

Sitemap: http://www.example.com/sitemap.xml

Submitting to search engines

Submit the sitemap to major search engines via the webmaster tools of each engine or wait for the engines to find the sitemap on their own.

#trackbackRdf ($trackbackUtils.getContentIdentifier($page) $page.title $trackbackUtils.getPingUrl($page))

1 Comment

  1. Hi,

    When I generate a sitemap it contains vwqwertyp, de and default in the URL e.g. The below 2 urls are present in one sitemap:

    1. http://localhost:8080/vwqwertyp/default/de/contact-us
    2. http://localhost:8080/vwqwertyp/default/contact-us

    I don't want vwqwertyp, default and de in the urls. I only want true urls in the sitemap. So in the above example one URL should be included in sitemap; it should be:
    http://localhost:8080/contact-us because this is how contact-us page is accessed by end use via the public. We do not have a german site so I don't know where de comes from.

    I also generated a sitemap using the demo site, even that contains de. Is there some kind of configuration to include and exclude these from the sitemap.xml

    ex:

    <url>
    <loc>http://travel-demo.magnolia-cms.com/meta/privacy.html</loc>
    <lastmod>2015-06-18</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.5</priority>
    </ur>

    <url>
    <loc>http://travel-demo.magnolia-cms.com/de/meta/privacy.html</loc>
    <lastmod>2015-06-18</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.5</priority>

    It would be appreciate taht if you can give me any advice.