« Light blogging here this week as I'm out at "ETel" in San Francisco... VoIP blogger dinner... more... | Main | PC World: The 50 Most Important People on the Web »

March 05, 2007

Yahoo!Pipes and its dating problem... (and a failure of RSS standardization)

In its short existence, I've become a great fan of Yahoo!Pipes, but until very recently it did have a fundamental problem... dates.  If you took a bunch of RSS feeds and combined them together and then tried to sort by date... well, you had a problem.    When I was out at ETel last week in San Francisco, I actually met someone from the Yahoo!Pipes team there at ETel, and we had a chat about the challenge of sorting out the dates, when all the data to work with is very different.

Now, it seems that Yahoo!Pipes has fixed the problem!  As I went to write this post today, it now looks like they have figured out how to sort the dates out.  (My contact out at ETel indicated that they were working hard to try to fix this issue.)

So for those interested in the problem and why it existed, take a look at my pipe combining my various RSS feeds. If you dig down into the actual RSS feed, you'll see the fundamental problem faced by Yahoo (or anyone else trying to mash up different RSS feeds).  Here is the date associated with an entry from Disruptive Telephony, a TypePad blog:

pubDate 2007-03-05T14:37:34-05:00

Here's the date from an entry from Voice of VOIPSA, a WordPress blog:

pubDate Mon, 05 Mar 2007 16:14:52 +0000

Here's the date from an entry from my LiveJournal account:

pubDate 2007-03-01T00:00:00-06:00

Here's the date from a RSS feed item from Twitter:

pubDate Mon, 05 Mar 2007 19:48:48 +0000

Here's the date from a RSS feed entry from Blue Box: The VoIP Security Podcast, also a TypePad blog:

pubDate Thu, 22 Feb 2007 22:39:48 -0600

Are we seeing the problem yet?  Note that different feeds are using different formats for the date.  Indeed even two of my blogs from the same host, TypePad, appear to be using different date formats!  Also note that some are using GMT/UTC (the ones with +0000) and some are using the timezone (although why some are -5 and some are -6 is a bit confusing).   I had another feed somewhere that used yet another time format as well.  Since RSS is entirely text, Yahoo!Pipes has to parse the text and try to make sense out of it... and then presumably convert it to some neutral format that it can use for the actual sorting.  Not exactly a fun task.

When I first noticed this shortly after the launch of Yahoo!Pipes, there also was a problem that each feed seemed to have a different date field.  In some RSS feeds, it was "pubDate".  In others, it was "dc:date".  I think one was "publication date".  This created a royal headache when you were trying to create a filter or sort in Yahoo!Pipes. 

Again, though, this seems to have gone away or at least been normalized by the Yahoo!Pipes team.  All my feeds now seem to have "pubDate", albeit in differing formats.  So kudos to the Yahoo! team for figuring out how to make it all make sense.

Interestingly, though, this really appears to be a failure in RSS standardization.  Perhaps not in the specification, but in the adherance to the specification.  Near the top of the RSS 2.0 Specification, in talking about channel elements, it states:

All date-times in RSS conform to the Date and Time Specification of RFC 822, with the exception that the year may be expressed with two characters or four characters (four preferred).

This would argue for the "Mon, 05 Mar 2007 19:48:48 +0000" format which is also shown in the example for individual item entries in RSS.  So it would appear that some vendors have not exactly implemented RSS feeds per the spec (is anyone surprised?).

Technorati tags: , , ,

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341bfc6e53ef00d83430496f53ef

Listed below are links to weblogs that reference Yahoo!Pipes and its dating problem... (and a failure of RSS standardization):

» Yahoo! Pipes - a result from Teblog
Finally got Yahoo! Pipes to do my bidding. After sleeping on my utter failure to extract the results I wanted, I realised I was using the wrong approach. I was mixing search results with RSS feeds and getting grumpy that [Read More]

» OPML in Yahoo Pipes with Canonical Date Sorting from TechBrew
Tony Hirst tipped me off in an email to the new Fetch Data module in Yahoo Pipes. This feature allows you to parse XML or JSON, opening up a whole new world of possibility. With a bit of trickery and a bit of slop, Ive created a ... [Read More]

Comments

I haven't messed around with Pipes, but that problem looks like it's occurring because Pipes supports RSS 1.0, RSS 2.0, and Atom and also supports the Dublin Core date element dc:date.

Any tool that consumes RSS, as Pipes does, has to support all of these because they're in such common use. Pipes will probably solve the problem by normalizing the date values to one format, regardless of how they were formatted in a feed.

Hi Dan: It was a pleasure meeting you @ eTel last week.

The Pipes team has been busy making many behind the scences updates including one that should significantly improve date sorting.

Glad to hear it's working for you! And, please don't hesitate to let us know if you run into any further date sorting hiccups.

- Daniel (from the Pipes Team @ Yahoo!)

@Rogers: Ah, so Dublin Core is where "dc:date" is from. As I was looking through the RSS feeds initially, at least one of my feeds had that as the date element. It seems that what the Pipes team has done is precisely that - normalized all the variations to at least have a "pubDate" field, with the noted value variations I mentioned. But it would seem they have figured out how to then sort the dates correctly. Step 1 - get them all with the same field id. Step 2 - sort the data.

@Daniel: It was likewise a pleasure meeting you, and yes, I will drop you a line if I run into other hiccups.

Thanks,
Dan

I've crashed into the same issue from the other end, so to speak. I want to display dates consistently in the output of five feeds.

I notice that the day, month and year are inside the y:published section of the XML. I'm guessing that this is how Yahoo! sorted the, erm, sorting out.

All I want to do is pick up those three values or, indeed, the version of pubDate that Yahoo! used internally and stick them on the output. I figured if {pubDate} worked as a variable then {day}, {month} and {year} might. But not so far...

I've tried using regex on pubDate but my regex skills fall well short of what's needed. If, indeed, it's possible.

Right. I've now realised what to do.

First of all I don't think that Yahoo did the sorting the way I suggested. There's an easier way - use utime: it's the number of elapsed seconds since goodness knows when. I think it stands for Unix time.

Anyway. I used the Copy part of the rename operator to create three new variables. I called them dd mm and yyyy. Then I appended them to my title using In title replace $ with [${dd}-${mm}-${yyyy}]

Bingo. It doesn't output leading zeroes on the day or month, but heck, this is a result for me.

I should have mentioned that I took the day, month and year variables out of the y:published section of the RSS feed.

David,

Thanks for all the info... very interesting to read what you are trying to do. Have you "published" your Yahoo pipe that has this in it? I'd be curious to see the end result.

Dan

Sure. It's here:

http://pipes.yahoo.com/pipes/pipe.info?_id=aF0p7E_A2xGlVLm2JhOy0Q

Until y:published is available, I have a workaround using Fetch Data, Rename, and Regex. Might be useful to you:

http://techbrew.net/articles/200703/opml-in-yahoo-pipes-with-canonical-date-sorting/

Post a comment

Comments are moderated, and will not appear on this weblog until the author has approved them.

If you have a TypeKey or TypePad account, please Sign In

Subscribe

  • Add to Google
    Subscribe in Bloglines

    Or enter your email address:

Full Disclosure

  • Dan York, CISSP, is Director of Emerging Communication Technology at Voxeo Corporation. He is also the Best Practices Chair of the VOIP Security Alliance (VOIPSA).

    Note that neither Voxeo nor VOIPSA have any connection to this weblog and any opinions stated here are entirely Dan's.

Contact Info

  • Search:

Other Places I Write

Blog.DanYork.com

Disruptive Telephony

Blue Box: The VoIP Security Podcast

Voice of VOIPSA