Magnolia 4.5 reached end of life on June 30, 2016. This branch is no longer supported, see End-of-life policy.

Page tree
Skip to end of metadata
Go to start of metadata

This is a geolocation tutorial. It aims to lower the barrier of entry to Magnolia and get new developers to write modules. The topic is a popular integration question:how to detect the visitor's geographical location using a public geo-API

Title

Display Localized Content with Magnolia and a Custom Geoservice Module

Author

Peter Wayner

Abstract

By taking a user's location into account when displaying content, web sites can add value to the user experience and increase their overall relevance and utility. This article explains how to add geo-location support to Magnolia by building a custom module that combines information from the public IPInfoDB geo-location API with Magnolia's JCR search features and bundled jQuery support to produce responsive web pages displaying localized content.

Introduction

Did a hurricane sweep through your town thirty years ago? Is there going to be a concert in your town hall in a few weeks? Is there a new business opening up on the town square? Wherever you go, you always want the news to follow you.

There's no reason why a web site needs to present the same information to everyone, and one way to choose the articles to present is by looking at the visitor's location. A site might pick only local articles for the front page, or it might fill a box in the sidebar with specific local content.

Magnolia's customization mechanism is flexible enough to accommodate all of these choices and this article discusses one solution that injects a list of local stories containing the name of a user's city into the output Web page. The server sends the user's IP address off to a location service and then uses the result to do a keyword search on the content. Magnolia then bundles up the results so that it appears as a block in the output page.

This article examines how to build a Magnolia geo-location plugin by describing key steps involved in the process:

  • How to add a Java object that can send a query to a third-party web service
  • How to search the Magnolia content repository for matching content.
  • How to add a paragraph and a template to display the information.
  • How to format the list of stories found through keyword search.
  • How to make a web page seem more responsive by adding the local information through an AJAX query.

Accomplishing this with a Magnolia plugin is relatively straightforward. The following sections approach each of these topics in turn.

Step 1: Identify the User's Location

The first task is choosing a good way to identify the location of the user. There are two main ways that a user can supply the information, one voluntary and the other involuntary.

  • From the browser: The user can volunteer a fairly precise estimate of his or her location by allowing the JavaScript layer in the browser to gather this data. Browsers running on newer smartphones often have a good estimate of their location from the GPS module. When the JavaScript code asks for this location, the browser will typically ask the user if it's okay to release the data.

There are several problems with this solution. First, asking the user for permission is polite but intrusive. It breaks the flow and slows up interaction with a web site. Second, the data is often not available. Desktop browsers might ask permission and then report that they don't know the location, so the intrusion is all for naught.

  • From the IP address: Another solution is to use the IP address from the request. This doesn't require asking permission, but the accuracy is often worse. Converting the IP address into a location requires asking a database and these databases may include old or inaccurate information.

After some debate, I chose using the IP address because it was more likely to be useful for desktop users. If I was building a site that was mainly for mobile users, I might want to rely on the more accurate GPS data, but this approach is simpler and more general for the end-user.

There are several different options for converting the IP address into a location:

  • One option is to buy a copy of the database and host it with the web site. IP2Location.com is one company that offers the data necessary for conversion. Adding and maintaining a local server with this capability may make sense for a web site with heavy traffic, but it's not ideal for an example designed to teach how to expand Magnolia.
  • Working with a third-party service is simpler because they handle much of the work of maintaining a database for a price. MaxMind.com , IPAddressAPI, IPLocationTools, and IPInfoDB are just a few of the companies that offer service. Generally, you pay more for better accuracy and faster response.

I chose to use the free service from IPInfoDB because it's easier for users who want to experiment. They also offer an upgrade to a commercial service from IP2Location that provides better accuracy and reliability.

Converting an IP Address is as simple as requesting a URL in this format:

http://api.ipinfodb.com/v3/ip-city/?key=<your_api_key>&ip=74.125.45.100

The API key is assigned to each user and you must register for one before the service will work.

Here's what it returns:

OK;;74.125.45.100;US;UNITED STATES;GEORGIA;ATLANTA;30301;33.809;-84.3548;-05:00

This API does not include any XML or JSON formatting, but others use many of the typical formats.

To fetch this information, I chose the Java library JSoup, a tool for scraping web sites that can request a URL and then parse any of the response HTML. It has more functionality than this application requires, but its ability to understand XML and HTML might be useful in other applications. To add this library, I just included this dependency in the Maven build file pom.xml.

<dependency>
  <!-- jsoup HTML parser library @ http://jsoup.org/ -->
  <groupId>org.jsoup</groupId>
  <artifactId>jsoup</artifactId>
  <version>1.6.3</version>
</dependency>

Step 2: Create a Java Object

The business logic for Magnolia is placed in a separate Java object that is attached to a template and queried for information. If the business logic is complex, it makes sense to use good programming practices to split up the Java code into multiple classes. This version is kept in one class file for simplicity.

The Java code does four things:

  1. Finds the IP address of the request.
  2. Maintains a very simple cache.
  3. Goes to the location API if necessary.
  4. Searches the web site for the location names.

Let's look at each of these aspects below.

Retrieve the User's IP Address

The IP address of the user is not hard to find, but it's not obvious. Magnolia keeps most of the information about the request out of sight of the Java model object unless the object requests it.

Here's the method used to find the IP address. The HttpServletRequest object is found in the WebContext object. This can be found by calling the static method MgnlContext.getWebContext(). It is a subclass of the ServletRequest class that offers the method for extracting the IP address of the user we want, getRemoteAddr().

getIPAddress
	String getIPAddress(){
		WebContext c = MgnlContext.getWebContext();
		
		String path = c.getContextPath();
		HttpServletRequest request = c.getRequest();
		String addr=request.getRemoteAddr();
		if (CheckForLocalHostDuringDebugging){
			addr=cleanseIPAddress(addr);
		}
		return addr;
	}

The code includes an extra feature helpful for debugging. If you're running the code on the same desktop you're using to test the code, the IP address is likely to be 127.0.0.1 and this number can't be located using the API.

This test routine should be shut off during operation. It adds a debugging IP address that can be found in the main database.

cleanseIPAddress
	String cleanseIPAddress(String s){
		if (s.equalsIgnoreCase("127.0.0.1")){
			return ReplacementIPAddress;
		}
		return s;
	}

Create a Simple Cache

Caching the results is very useful because calling a distant API is often relatively slow and can sometimes also be expensive. Many of the commercial APIs count the number of requests and bill accordingly. So, caching the results from the API makes plenty of sense.

The cache used here is very simple: it just uses a HashMap object to store the responses from the API. There's no logic for removing items, even when they get too old. This is probably not a practical problem because IP addresses seem to be relatively stable, but a full-featured implementation would do a better job.

getPartsWithCache
	public String[] getPartsWithCache(String addr){
		String[] ans = ipLocationCache.get(addr);
		if (ans==null){
			// not found. Go to web.
			ans=getParts(addr);
			ipLocationCache.put(addr, ans);
		}
		return ans;
	}

Call the Location API

The location API from ipinfodb.com takes an IP address as a parameter and returns a String separated by semi-colons. Here's a routine that calls the JSoup library:

getParts
	public String[] getParts(String addr){
		String fetchMe=urlBase+addr;
		Document doc;
		try {
			doc = Jsoup.connect(fetchMe).get();
		        String text=doc.text();
		        String[] parts=text.split(splittingCharacter);  
		        return parts;
		} catch (IOException e) { 
			e.printStackTrace();
		}
		return null;
	}

You'll notice that the JSoup routine does little more than fetch a URL. The result is split with the standard String split method. Most of the power of JSoup to parse the results is left untapped here, but it may be useful if you use an API with a more complex response.

Search for Matching Pages

When the location API returns the name of the city, state and country associated with an IP address, the plugin must search for these keywords. Magnolia stores the content with the Lucene search engine and implements search according to the Java Content Repository standard.

There are two different ways to write your query for Magnolia. One syntax mimics SQL, and the other mimics the XPath format used to search XML. The general consensus is that the SQL structure is simpler to use when searching for keywords, and the XPath-like syntax is better when searching only portions of content under a limited path. The XPath syntax, though, is deprecated and is no longer supported. In this example, the SQL-like syntax is probably the best choice.

This search method calls the static object QueryUtil with a query:

getIPAddress
	public Collection<Content> search(String s){
		String q="select * from mgnl:content where contains(*,'"+s+"')";
		Collection<Content> ans = QueryUtil.query("website", q);
		return ans;
	}

The results come in the form of a Java Collection object containing the Content objects.

Put it All Together

This simple routine collects the IP address, calls the API, searches the local content repository, and then returns the result as a Collection:

getTextsBasedUponIP
	public Collection<Content> getTextsBasedUponIP(){
		String addr=getIPAddress();
		String[] parts = getParts(addr);
		Collection<Content> ans = fallBackSearch(parts);
		return ans;
	}

This will be visible to the template as the variable model.textsBasedUponIP. Magnolia connects the Java object used as the model with the getter and setters.

Step 3: Format the Search Results with a Template

The job of the template is to convert the information from the Collection of Magnolia Content objects into something readable by a user. The code uses one looping construct and a number of techniques for extracting data from each individual Content object.

Magnolia offers two different languages for writing a template. The first, the classic JSP, is a good start but the Freemarker templating system is typically the simplest to use. This example uses the Freemarker approach, although it should be reasonably straightforward to use the model object to work with a JSP-based template too.

Here's the code:

getIPAddress
 [#assign cms=JspTaglibs["cms-taglib"]]

[@cms.editBar /] 


<ul>
[#list model.textsBasedUponIP as n]
<li> <a href="${model.initialPath!
}${n.@handle}">${n?node_type} -- ${n.title}-- ${n.metaData.creationDate}-- ${n.@name}-- ${n.@handle}-- ${n.@uuid}</a>
<li> <a href="${mgnl.createLink(n)}">${n?node_type} -- ${n.title}-- ${n.metaData.creationDate}-- ${n.@name}-- ${n.@handle}-- ${n.@uuid}</a>
The current page: ${page.@handle} <br>
The current node handle: ${n.@handle} <br>
The current node name: ${n.@name}<br>
The current node uuid: ${n.@uuid}<br>
The current paragraph definition: ${def.name}<br>
Paragraph model: ${model}<br>
Action result: ${actionResult!'... no action result here'}<br>
Current locale: ${ctx.locale}<br>
Aggregation state: ${aggregationState}<br>

</li>
[/#list]
</ul>

The template builds a bulleted list in HTML with the <ul> element and embeds each item in <li> element. The loop is built with this syntax:

[#list model.textsBasedUponIP as n]

This construction will loop through all elements in a Collection and insert each item in the variable 'n' in turn. The individual Content objects are constructed by Magnolia and use their internal format. The parts of the object can be extracted using the dollar-sign notation like this ${n.@name}.

The list should also include a clickable link in case a reader wants to go to the full page containing that information. The path to this URL needs to be constructed with two parts: ${model.initialPath} and ${n.@handle}. Concatenating these creates a URL that can be clicked to take the user to the correct page.

Step 4: Add AJAX for Speed

The classic model of a dynamic web site requires that the server collect all of the information before assembling it into the page that is sent to the user. This is often manageable when all of the information is available locally, but it can lead to slow performance when the data comes from different locations around the Web. In this example, the call to determine the location of the IP address is both predictable and often slower because the API is located on a different server.

One solution is to create a web page that loads the localized links after the fact. The initial page contains a blank DIV that is filled by a subsequent call back to the main site. The initial page will arrive quickly because it can be the same for each user. It can be cached and served immediately. The second subpage can take longer to execute because the user is happily looking at the main, static information.

Magnolia bundled the jQuery library which includes several good routines for loading blocks of data after the main page arrives.

Step 5: Download Code

The source code for this project can be found in the Magnolia Git repository under the project name Geo. The code includes a number of enhancements to make it easier to use the tool in a working web site. The bulk of the code is found in the class GeoserviceLocationParagraphModel and the other classes help integrate the core of the code with configuration options available through Magnolia. Future articles will discuss how to add these configuration options to make your plugin more reliable and easier to configure for people who want to use it in a production web server. The code in this article is stripped down to make it easier to understand. For the complete code, see http://git.magnolia-cms.com/gitweb/?p=forge/geo.git;a=summary

Conclusions and Other Solutions

This approach creates a simple connection between a user's location and the content shown on the web page by searching the site for all content with the name of the user's address. It places this information in a separate list that can be loaded independently to help speed up the creation of the main page.

This approach is just one of the different ways that IP address information can be used with Magnolia. Some developers create a separate set of Filters in the chain of filters that process requests. One can turn the IP address into a location and add the location name to a parameter that a subsequent filter can use. This can be more useful when you implement the IP address to location name conversion locally, so that it doesn't slow down the entire chain.

A common feature is to add the location to a cookie so subsequent requests do not need to look up the IP address, a solution that's even more efficient than using the cache. The cookie can also include some information about the articles presented already to allow the site to rotate the articles presented. The cookie should probably expire relatively frequently because the cookie is stored on the user's computer and the user may travel to different locations.

As this article illustrates, Magnolia provides a powerful framework to add a geo-location module to your web site. The module illustrated in this article was reasonably simple, but you can leverage Magnolia's framework to make it as complex as you wish. Try it out, and let me know what you think!