Light blogging here this week as I'm out at "ETel" in San Francisco... VoIP blogger dinner... more...
PC World: The 50 Most Important People on the Web

Yahoo!Pipes and its dating problem... (and a failure of RSS standardization)

In its short existence, I've become a great fan of Yahoo!Pipes, but until very recently it did have a fundamental problem... dates.  If you took a bunch of RSS feeds and combined them together and then tried to sort by date... well, you had a problem.    When I was out at ETel last week in San Francisco, I actually met someone from the Yahoo!Pipes team there at ETel, and we had a chat about the challenge of sorting out the dates, when all the data to work with is very different.

Now, it seems that Yahoo!Pipes has fixed the problem!  As I went to write this post today, it now looks like they have figured out how to sort the dates out.  (My contact out at ETel indicated that they were working hard to try to fix this issue.)

So for those interested in the problem and why it existed, take a look at my pipe combining my various RSS feeds. If you dig down into the actual RSS feed, you'll see the fundamental problem faced by Yahoo (or anyone else trying to mash up different RSS feeds).  Here is the date associated with an entry from Disruptive Telephony, a TypePad blog:

pubDate 2007-03-05T14:37:34-05:00

Here's the date from an entry from Voice of VOIPSA, a WordPress blog:

pubDate Mon, 05 Mar 2007 16:14:52 +0000

Here's the date from an entry from my LiveJournal account:

pubDate 2007-03-01T00:00:00-06:00

Here's the date from a RSS feed item from Twitter:

pubDate Mon, 05 Mar 2007 19:48:48 +0000

Here's the date from a RSS feed entry from Blue Box: The VoIP Security Podcast, also a TypePad blog:

pubDate Thu, 22 Feb 2007 22:39:48 -0600

Are we seeing the problem yet?  Note that different feeds are using different formats for the date.  Indeed even two of my blogs from the same host, TypePad, appear to be using different date formats!  Also note that some are using GMT/UTC (the ones with +0000) and some are using the timezone (although why some are -5 and some are -6 is a bit confusing).   I had another feed somewhere that used yet another time format as well.  Since RSS is entirely text, Yahoo!Pipes has to parse the text and try to make sense out of it... and then presumably convert it to some neutral format that it can use for the actual sorting.  Not exactly a fun task.

When I first noticed this shortly after the launch of Yahoo!Pipes, there also was a problem that each feed seemed to have a different date field.  In some RSS feeds, it was "pubDate".  In others, it was "dc:date".  I think one was "publication date".  This created a royal headache when you were trying to create a filter or sort in Yahoo!Pipes. 

Again, though, this seems to have gone away or at least been normalized by the Yahoo!Pipes team.  All my feeds now seem to have "pubDate", albeit in differing formats.  So kudos to the Yahoo! team for figuring out how to make it all make sense.

Interestingly, though, this really appears to be a failure in RSS standardization.  Perhaps not in the specification, but in the adherance to the specification.  Near the top of the RSS 2.0 Specification, in talking about channel elements, it states:

All date-times in RSS conform to the Date and Time Specification of RFC 822, with the exception that the year may be expressed with two characters or four characters (four preferred).

This would argue for the "Mon, 05 Mar 2007 19:48:48 +0000" format which is also shown in the example for individual item entries in RSS.  So it would appear that some vendors have not exactly implemented RSS feeds per the spec (is anyone surprised?).

Technorati tags: , , ,