Yahoo!Pipes and its dating problem… (and a failure of RSS standardization)

In its short existence, I’ve become a great fan of Yahoo!Pipes, but until very recently it did have a fundamental problem… dates.  If you took a bunch of RSS feeds and combined them together and then tried to sort by date… well, you had a problem.    When I was out at ETel last week in San Francisco, I actually met someone from the Yahoo!Pipes team there at ETel, and we had a chat about the challenge of sorting out the dates, when all the data to work with is very different.

Now, it seems that Yahoo!Pipes has fixed the problem!  As I went to write this post today, it now looks like they have figured out how to sort the dates out.  (My contact out at ETel indicated that they were working hard to try to fix this issue.)

So for those interested in the problem and why it existed, take a look at my pipe combining my various RSS feeds. If you dig down into the actual RSS feed, you’ll see the fundamental problem faced by Yahoo (or anyone else trying to mash up different RSS feeds).  Here is the date associated with an entry from Disruptive Telephony, a TypePad blog:

pubDate 2007-03-05T14:37:34-05:00

Here’s the date from an entry from Voice of VOIPSA, a WordPress blog:

pubDate Mon, 05 Mar 2007 16:14:52 +0000

Here’s the date from an entry from my LiveJournal account:

pubDate 2007-03-01T00:00:00-06:00

Here’s the date from a RSS feed item from Twitter:

pubDate Mon, 05 Mar 2007 19:48:48 +0000

Here’s the date from a RSS feed entry from Blue Box: The VoIP Security Podcast, also a TypePad blog:

pubDate Thu, 22 Feb 2007 22:39:48 -0600

Are we seeing the problem yet?  Note that different feeds are using different formats for the date.  Indeed even two of my blogs from the same host, TypePad, appear to be using different date formats!  Also note that some are using GMT/UTC (the ones with +0000) and some are using the timezone (although why some are -5 and some are -6 is a bit confusing).   I had another feed somewhere that used yet another time format as well.  Since RSS is entirely text, Yahoo!Pipes has to parse the text and try to make sense out of it… and then presumably convert it to some neutral format that it can use for the actual sorting.  Not exactly a fun task.

When I first noticed this shortly after the launch of Yahoo!Pipes, there also was a problem that each feed seemed to have a different date field.  In some RSS feeds, it was “pubDate”.  In others, it was “dc:date”.  I think one was “publication date”.  This created a royal headache when you were trying to create a filter or sort in Yahoo!Pipes. 

Again, though, this seems to have gone away or at least been normalized by the Yahoo!Pipes team.  All my feeds now seem to have “pubDate”, albeit in differing formats.  So kudos to the Yahoo! team for figuring out how to make it all make sense.

Interestingly, though, this really appears to be a failure in RSS standardization.  Perhaps not in the specification, but in the adherance to the specification.  Near the top of the RSS 2.0 Specification, in talking about channel elements, it states:

All date-times in RSS conform to the Date and Time Specification of RFC 822, with the exception that the year may be expressed with two characters or four characters (four preferred).

This would argue for the “Mon, 05 Mar 2007 19:48:48 +0000” format which is also shown in the example for individual item entries in RSS.  So it would appear that some vendors have not exactly implemented RSS feeds per the spec (is anyone surprised?).

Technorati tags: , , ,

10 thoughts on “Yahoo!Pipes and its dating problem… (and a failure of RSS standardization)

  1. Rogers Cadenhead

    I haven’t messed around with Pipes, but that problem looks like it’s occurring because Pipes supports RSS 1.0, RSS 2.0, and Atom and also supports the Dublin Core date element dc:date.
    Any tool that consumes RSS, as Pipes does, has to support all of these because they’re in such common use. Pipes will probably solve the problem by normalizing the date values to one format, regardless of how they were formatted in a feed.

  2. Daniel Raffel

    Hi Dan: It was a pleasure meeting you @ eTel last week.
    The Pipes team has been busy making many behind the scences updates including one that should significantly improve date sorting.
    Glad to hear it’s working for you! And, please don’t hesitate to let us know if you run into any further date sorting hiccups.
    – Daniel (from the Pipes Team @ Yahoo!)

  3. Dan York

    @Rogers: Ah, so Dublin Core is where “dc:date” is from. As I was looking through the RSS feeds initially, at least one of my feeds had that as the date element. It seems that what the Pipes team has done is precisely that – normalized all the variations to at least have a “pubDate” field, with the noted value variations I mentioned. But it would seem they have figured out how to then sort the dates correctly. Step 1 – get them all with the same field id. Step 2 – sort the data.
    @Daniel: It was likewise a pleasure meeting you, and yes, I will drop you a line if I run into other hiccups.
    Thanks,
    Dan

  4. David Tebbutt

    I’ve crashed into the same issue from the other end, so to speak. I want to display dates consistently in the output of five feeds.
    I notice that the day, month and year are inside the y:published section of the XML. I’m guessing that this is how Yahoo! sorted the, erm, sorting out.
    All I want to do is pick up those three values or, indeed, the version of pubDate that Yahoo! used internally and stick them on the output. I figured if {pubDate} worked as a variable then {day}, {month} and {year} might. But not so far…
    I’ve tried using regex on pubDate but my regex skills fall well short of what’s needed. If, indeed, it’s possible.

  5. David Tebbutt

    Right. I’ve now realised what to do.
    First of all I don’t think that Yahoo did the sorting the way I suggested. There’s an easier way – use utime: it’s the number of elapsed seconds since goodness knows when. I think it stands for Unix time.
    Anyway. I used the Copy part of the rename operator to create three new variables. I called them dd mm and yyyy. Then I appended them to my title using In title replace $ with [${dd}-${mm}-${yyyy}]
    Bingo. It doesn’t output leading zeroes on the day or month, but heck, this is a result for me.

  6. Dan York

    Unfortunately I’ve had to close off comments on this post because blog comment spammers seem to love to post comments here – and TypePad’s anti-comment-spam system doesn’t appear to be working. I’m quite tired of spending a part of each day removing all the spammy comments here.
    If you have a legitimate comment you would like to leave to this post, please email me and I would be glad to add it.
    PING:
    TITLE: OPML in Yahoo Pipes with Canonical Date Sorting
    BLOG NAME: TechBrew
    Tony Hirst tipped me off in an email to the new Fetch Data module in Yahoo Pipes. This feature allows you to parse XML or JSON, opening up a whole new world of possibility. With a bit of trickery and a bit of slop, Ive created a …
    PING:
    TITLE: Yahoo! Pipes – a result
    BLOG NAME: Teblog
    Finally got Yahoo! Pipes to do my bidding. After sleeping on my utter failure to extract the results I wanted, I realised I was using the wrong approach. I was mixing search results with RSS feeds and getting grumpy that

  7. Dan York Post author

    David,
    Thanks for all the info… very interesting to read what you are trying to do. Have you “published” your Yahoo pipe that has this in it? I’d be curious to see the end result.
    Dan

Comments are closed.