You are here

Study shows websites deny Google competitors web-crawler access to their sites

Have Penn State researchers stumbled upon a Google-DoubleClick anti-competitive smoking gun?  

PCWorld flagged some very troubling new research findings pertinent to the FTC/EU reviews of the Google-DoubleClick merger by Penn State researchers in its article "Google Favored By Web Admins."  

  • Penn State researchers: "Web-site policy makers are playing favorites, and Google is the big beneficiary, say Penn State researchers."
  • ""We expected that robots.txt files would treat all search engines equally, or maybe disfavor certain obnoxious bots, so we were surprised to discover a strong correlation between the robots favored and the search engines' market share," said C. Lee Giles, the David Reese Professor of Information Sciences and Technology at Penn State who led the research team that developed BotSeer, in a statement."

 Why is this significant?

First, what this means is that the search advertising market is even less competitive than feared. 

  • Search advertising is a leading intermediary function of the Internet -- matching buyers (audience) and sellers (advertisers) using content (websites) as the "currency" of this intermediation marketplace.
    • If websites are broadly proscribing access to their content by competitive search engines, this practice effectively kneecaps search advertising competitors ability to compete with a Google-DoubleClick.   

The monster question for FTC investigators from these new Penn State findings, is why is this happening?

  • Why is there such an eerie correlation between access to websites' content and search market share -- and could it be the result of anti-competitive behavior?  
    • The researchers stated: "The results also show a strong correlation between the search engine market share and the bias towards particular search engine robots."
  • Is this eerie correlation a result of natural factors or the result of non-disclosed side-deals between the website and Google?
    • There is certainly a strong financial incentive for a site that exclusively or predominantly monetizes its content through Google, to favor Google in finding its content
      • Simply, both sides make more money if people find their content through Google than through another search engine.
    • (In a truly competitive market, such deals may not be anti-competitive, but in an industry dominated by Google, with ~75% revenue share of search, it could very well be anti-competitive behavior -- as it could practically foreclose competition.)
  • If I were the FTC, this is what I would want to learn from Googleand DoubleClick:
    • Is this phenomenon a result of normal market forces? Contractual arrangements? By understood mutual self-interest? Or by a nod and a wink?  
    • Does Google have sufficient internal controls to ensure the Google client manager for a particular website is not financially incented or encouraged to try and limit competitors' web crawler access?  
    • Where are web crawler policies publicly disclosed and explained by Websites? and if not, why not?
    • Does Google have any internal policy guidelines on web crawler access?
    • What is Google's official policy on competitive availablity of web crawlers? 
    • Are web crawlers/spiderbots mentioned in any way in any Google-website agreements?
    • Does DoubleClick in any of its arrangments with websites have any interest in, or influence over, which search engines get access to the content that DoubleClick advertises?  
    • What incentives would the combined Google-DoubleClick have concerning competitive availablity of search access to websites, given that:
      • The Google-DoubleClick publisher network would combine Google's dominant long tail and mid market share, with DoubleClick's top market dominance (17 of the top 20 publishing websites and most of the top global 1000 sites)?

Second, what these findings may mean is that consumers are not in fact choosing a search engine based on its algorithm, but choosing implicitly through learned behavior that Google has better access to the content the user wants than other serch engines do. 

  • This matters because if Google excels predominantly because of a superior algorithm, that is innovation and good for consumers, but if they excel in significant part because they are contributing to the denial of competitors access to the lifeblood of search, content, then it ill-serves consumers and is anti-competitive.

Third, this study seriously undermines Google's already dubious claim that the next great garage search startup could knock Google off its #1 search perch.

  • This study shows that website administrators are "de facto" gatekeepers that discriminate in favor of which select web crawlers get access to search their website.
    • That "gate" serves as a substantial barrier to entry for new entrants, Yahoo and Microsoft to gain share from Google.
    • A search engine cannot search sites to which its web crawlers or spiderbots do not have access.
      • As Google CEO Eric Schmidt candidly shared at the most recent annual Google investor conference: "search is only as good as the content searched."  

Fourth, this is another powerful network effect reinforcing Google's search and online advertising dominance.  This is one of the main points made by the Penn State researchers in the abstract of their paper "Determining Bias to Search Engines from Robots.txt:

  • "Websites can explicitly specify an access preference for each robot by name."
  • "Such biases may lead to a "rich get richer" situation, in which a few popular search engines ultimately dominate the Web because they have preferred access to resources that are inaccessible to others."
    • This new and powerful network effect and barrier to competition is on top of the many barriers to entry I enumerated in my analysis on the merger (p.13-22), and in addition to the four new network effects that the Google-Doubleclick merger would create (p 25-27).  

Bottom line: This new study should open a whole new line of inquiry in the Google-DoubleClick merger, if the FTC is serious about protecting competition on the Internet.

  • The fact that websites can and do deny search engines access to search their site, lessens competition.
    • What the FTC has to determine is whether Google or DoubleClick have been using their dominance to anti-competitively deny competitors access to websites -- the functional equivalent of "sucking the oxygen out of a competitor's space."