The Open Yellow Pages Database in the Sky

Author Image
by andrewsho

Great discussion on TechCrunch about the need for an open database of local business listings.  My two favorite comments thus far – the first from a former Facebook engineer and the second from Dave Hyman, CEO of MOG, my fave web music service:

From Yishan Wong:

Disclosure: I worked at Facebook, and was at one point involved with location-related products. This comment doesn’t contain any info about any such products, but relates some information learned about location data in general.

Here is the practical difficulty with creating such a database:

The gathering of the data to create this database necessarily runs into murky legality issues. For comparison, Wikipedia is unable to straight-up import copyrighted data into its own corpus. “Fair use” doctrine doesn’t apply as easily to location data, because the amount of data needed to express an entity’s location (i.e. name + lat/long) is small so it is difficult for someone who is trying to collect this data to say it is “fair use” when they are basically harvesting 100% of the data they need and unlikely to show attribution (i.e. it’s not exactly an “excerpt”). Remember that even if you can make what you would consider a reasonable argument, any proprietary data provider is still going to sue you, because you are directly threatening their bread and butter. It’s a situation worse than Google scanning books, because this would make 100% of the data freely available.

These existing location databases consider their information proprietary. There are existing open databases, but they are small and quality is highly questionable. This means that anyone trying to build a comprehensive database of locations, whether they intend to keep it proprietary or to establish it as an open database, still faces the problem that all the data must be voluntarily entered by users (or employees of the organization) who have agreed to an agreement that (1) the information they are entering is theirs to give – for users, this is information they are directly observing (e.g. describing the location of a place they know or see) – rather than importing data from another proprietary source and (2) that they agree to hand over the rights to that information to the collecting organization, so that collecting organization can do with it what they please (like open-source it) – note that this is an agreement that users who upload media to Wikipedia must also agree to, i.e. that they are putting an image into the public domain. Without such an agreement, the data exists in a sort of legal limbo where it’s not clear if the organization can open-source it, and more importantly, it means that no large-scale import of existing high-quality databases can be done (except for the not-so-great public ones) to jump-start such a database. And, without a reasonably-sized database to begin with, it is very hard for most organizations to get users involved enough to contribute more location data.

Many existing local-business directories (e.g. Citysearch, Yelp) have populated their initial databases with data from providers like InfoUSA, but the agreements signed with these providers include provisions that the data cannot be given away or open-sourced. Thus, in many cases, Google cannot simply scrape Citysearch, Yelp cannot scrape Google, Foursquare cannot scrape Yahoo, etc. Each entity, even if it has noble intentions to create an open database of locations, needs to collect the information on its own, which is – one way or another – something that has to be done via a lot of human effort (either via employees or users), much of it unfortunately duplicative – e.g. how often has the location of a particular Starbucks in NYC been entered into both Citysearch, Google, Yahoo Local, Foursquare, and Gowalla?

Lastly, the issue of operating a jointly-opened database of location data (let’s say multiple organizations agree to pool their crowdsourced data) is that crowdsourced data is often inaccurate and inconsistent. Merging and collation policies need to be agreed upon and then implemented and overseen, and done so efficiently that users are able to see the results of their contributions nearly instantaneously. This is not impossible, but from a user-experience and product quality perspective, it is hard to see any large or small players agreeing to such a cumbersome joint arrangement when their alternative is to do it in-house.

And from the MOGfather himself, Dave Hyman, former CEO of Gracnote:
wont happen imho. think gracenote vs freedb. i lived it. exact same parallel.

Share:

  • Twitter
  • Facebook
  • Mail
  • LinkedIn
Recommend

this content