Using Yahoo! Pipes to filter an RSS feed
I’m currently working on a redesign of the University’s research homepage which involves a requirement to pull in three RSS feeds:
- Recent research outputs (from the PURE research portal, which hasn’t gone public yet)
- Recent research activities (from the PURE research portal)
- Research news (from the main University news page. But this third feed needs to pull in only those stories that are related to research.)
Using PHP SimpleXML
I’m using the PHP SimpleXML extension to pull in the news feeds, extract the data that I want and then output it onto the Web page. For the first two feeds that was simple, I created a PHP function to which I would pass two parameters, the URL and the number of posts I wanted to display on the page:
As the name suggests SimpleXML was very easy to use. SimpleXML converts the XML document into an object which you can then iterate through like a collection of arrays and objects. It’s great if you already know the structure of the XML document (which we do because a) it follows the RSS 2.0 standard and b) we generated it).
So to load a new XML document you use the simplexml_load_file() function:
$feed = simplexml_load_file('http://www.st-andrews.ac.uk/rss/news/index.xml');
And you then gain access to the different elements as you would using ordinary object properties. So for an RSS 2.0 file which contains this data:
<rss version="2.0"> <channel> <title>News feed from The University of St Andrews</title> <link>http://www.st-andrews.ac.uk/rss/news/index.xml</link> ...
I could use this code to grab the feed title and URL:
$feedTitle = $feed->channel->title; // get name of RSS feed
$feedURL = $feed->channel->link; // get URL of RSS feed
In this case $feedTitle would equal “News feed from The University of St Andrews” and $feedURL would equal “http://www.st-andrews.ac.uk/rss/news/index.xml”.
But I only want Research news
The third feed required a bit of thinking because I don’t want to simply pull in all the latest news, I only want the latest news articles that relate to research.
To help us the RSS specification allows content creators to add any number (zero or more) of <category> tags to each news item, e.g.
When a content creator adds a new news item to our content management system they have an option of selecting a category from a drop-down list; this populates a category tag in the RSS feed.
I realised that I then had two options:
- I could write into my PHP code a script that would filter out any posts that were not tagged with the Research category.
- Use an external, third-party application to do this for me.
In the end I opted for Yahoo! Pipes simply because it was much quicker to have Pipes filter the news RSS feed for me than it was for me to write a PHP parser to do the same. (“I’m a PHP coder of very little brain”, as Winnie the Pooh might say!)
But I then realised that an attractive side-effect of this was that I then had a new RSS feed that not only I could use but that other users could subscribe to it as well.
Yahoo! Pipes was really simple to use and the feed filter took less than a minute to set up. It just involved three stages:
- Fetch the feed.
- Filter the feed, permitting only items that matched the rules I set.
- Output the results.