Do Formats Really Matter That Much?
In the below entry, I expresse concern regarding Dave Winer’s belief that it should be illegal for Google to demonstrate a preference for a given data format. As I’ve thought about this, it has become clearer to me just how wrong-headed such a belief is. Antitrust considerations aside, I’m not sure I believe that it is even fundamentally wrong for Google to demonstrate a preference for a particular format per se.
Historically, search engines only searched text documents. Binary documents were ignored. Today, Google and possibly other search engines index Word Documents and PDFs as well as some other binary formats. During the early days of the search engine, however, binary document formats were effectively discriminated against. In fact, I would venture to say that there is now (as there was then) just as much valuable content stored in binary, proprietary formats as there are in public formats. I don’t remember Dave, or anyone else for that matter, wailing that non-HTML or non-text documents where not being indexed by the search engines? Even if the feature was openly requested, I definitely don’t remember anyone suggesting that it should be illegal for search engines to ignore such content.
Google is a content-centric tool. It finds data. The format is simply a barrier/shell for that data. Even if Google limits its crawl to non-RSS 2.0 open formats, there is no material barrier to content publishers getting their content crawled, just a political one. The data provided in ATOM and RSS 1.0 is, for all intents and purposes, equivalent to that contained in RSS 2.0. My aggregator of choice (NetNewsWire) reads all three formats quite well. To tell you the truth, I don’t really know which feeds are of which type because other than some ego stroking (on all sides, I might add) the differences between the formats is largely irrelevant to their content and, therefore, the end-user. Sure, there are things that are easier with RDF based data and there is something to be said for the simplicity of RSS 2.0, but a title is a title and a date is a date no matter what you call them.
If, on the other hand, Google were to limit the formats they were willing to crawl to only those formats that were created with expensive, encumbered, closed apps, there might be a worthwhile argument against such a decision. That is not the case here. RSS 1.0, RSS 2.0 and ATOM are all competing open formats. Anyone can publish in any of or all three formats with freely available tools.
In the final analysis, however, we’d be remiss if we failed to point out the largest, overriding check on Google’s power: the user. If users don’t find adequate results, they’ll go elsewhere. Unlike the Microsoft Antitrust situation, there is no financial transaction between the searcher and Google. There is, therefore, no material cost to to the user associated with switching away from Google. If another engine offers better results, users will leave. This has happened three or four times already over the history of the web. Yahoo was king for a while. AltaVista was king. AOL still does a TON of traffic for their search. Google may be at the top of the heap today, but history shows us that a fresher, newer option can gain ground incredibly quickly.
Any comparison to Microsoft (or other anti-trust references) is weak. Google doesn’t control the desktop and they don’t control the network. Call me when Google starts blocking entire networks because those networks refuse to block traffic to Yahoo, AltaVista and other Google competitors. Then we can talk about Google behaving like Microsoft. Until then, the question of legality should even be on the periphery of our thought process.






The Eleventh Hour
See the Morning
Roots
Furthermore: From the Studio, From the Stage
Vintage Jesus: Timeless Answers to Timely Questions (Relit Theology)
Jesus: 90 Days With the One and Only (Personal Reflections)
This discussion is now closed.