What follows is based on a short talk I gave at a workshop for open data startups in Uruguay that was organized by the World Bank’s Open Finances team. The reflections are all personal and don’t necessarily reflect the viewpoints of my employer.
Let’s start with some definitions. We throw around terms like big data and open data without much specification of what they refer to and what they don’t. The best definition I’ve heard of “big data” comes from Google’s Eric Schmidt: any dataset that’s too big for an Excel spreadsheet.
Open data has a more precise definition, though it’s rare that a dataset labeled as “open data” actually complies with the eight principles of open government data, which were established in 2007. More generally, we often use the term “open data” as an all-encompassing label for information that comes from the government. And we tend to use “big data” to describe information about the habits of customers of companies.
For the open data startup, both open data and big data are valuable, and often in complementary ways. Nike+, a platform that permits over 8 million runners worldwide to share information about their runs, depends on open data to provide its users with information about the weather, how fast they run, and their elevation gain/loss. But the Nike+ platform also offers its users value by aggregating the information that they share. As a result, we’re able to see on a map the most popular running routes in every major city around the world. (And Nike gets to see which of their shoes are the fastest.) Here are the most popular running routes in Mexico City:
You can also see the complementarity between open data and big data on Yelp, the largest restaurant directory in the United States. If we take a look at the profile page for Domo, a sushi restaurant in San Francisco, we find a wealth of information to guide our decision about where to eat tonight. Some of the information comes from government records, such as whether or not it has wheelchair access (it does) and whether it has any health inspection violations (not since September 2011 when an inspector found moderate risk for vermin infestation.) But the vast majority of information on Yelp comes from the users themselves. They contribute and verify information about the restaurants’ operating hours, payment methods, noise levels, price range, and more. At Domo the Yelp community has left over 500 reviews. No one can read 500 reviews, but by selecting those terms that are mentioned most often, we learn right away which are the most popular dishes.
Your users don’t want more information
The most frequent mistake by open data startups is to assume that their readers want access to as much information as possible. I’ve seen many landing pages of open data startups that proudly boast that they have millions of records of information about some particular issue — as if anyone has the time or inclination to read through millions of pages of data.
Data is not knowledge. Knowledge is information with context that helps us make informed decisions. In our contemporary world, an over-abundance of information plus an over-abundance of choices has caused an unprecedented amount of anxiety. This is the basic message of Barry Schwartz’s The Paradox of Choice: Why More Is Less. Your users don’t want more information, but they are willing to pay for the right knowledge at the right time to help inform their decisions in order to reduce their anxiety and gain a competitive advantage.
Let’s see how this framework applies to two very different open data startups, one a billion-dollar public company and the other a modest smartphone application that led to $50,000 of revenue in the first month it was developed.
Zillow is a real estate database that was founded in 2005 by two former Microsoft executives. Zillow was built on public data, though it was far from open data as defined by the eight principles. As a Zillow employee writes in a blog post after Obama announced the Open Data Initiative in May:
At Zillow, we built our business taking public real estate information that was previously only accessible by spending hours in dusty registry of deeds offices or courthouses poring over paper documents, and making it easily accessible to consumers, for free. And since our start in 2006, we’ve been heartened to see just how much the national conversation about real estate has changed since we helped free this data from the shadows. We think people are making smarter decisions based on this abundance of information.
In 2010 Zillow’s revenue was $30 million. By 2011 they had over $66 million in revenue and launched a successful IPO. Their current valuation (market cap) is $2.3 billion. Zillow is one of open data’s biggest and most controversial success stories.
But you don’t have to be a former Microsoft executive to launch a successful, profitable open data startup. In 2010 two young mobile programmers from Mexico City developed a very simple application called “anti-mordidas.” (“Mordida” in Mexican slang refers to a bribe.)
In Mexico, it is difficult to determine the cost of a traffic violation. The cost of each violation is hidden in the depths of an obscure government website, and the actual cost of each infraction is never listed. Instead, the cost of the ticket is listed as “30 times the cost of an hour’s work according to the current minimum wage.” And the minimum wage tends to change at least once a year. As a consequence, police officers knowingly exaggerate the cost of the ticket, but then offer the driver a “special deal” if he or she pays the officer directly.
The developers of Anti-mordidas simply gathered all the information about traffic infractions in a single app, and created a “fee calculator” so that users could easily calculate the true cost of their infraction. The app, which took less than a week to make, costs $2 and within the first month of its release it was downloaded by more than 37,000 users. Even when you take into account Apple’s 30% cut, they still made over $50,000 in a single month from an application that took just a week to develop and is based on information that is all available on public websites. No, they didn’t make a multi-billion dollar company, but $50,000 for a week of work isn’t bad.
How do these two examples apply to our framework above? Anyone who has purchased a home knows the amount of anxiety that is involved in making such a significant decision. Zillow reduces that anxiety by providing us with the necessary information to compare the asking price with the most recent data about home costs. Similarly, a police siren always causes our hearts to beat anxiously. That anxiety is rooted in the fact that we don’t know what will happen or how much it will cost. The appeal of anti-mordidas is to arm us with the necessary information at just the right moment to reduce our anxiety.
These are just two of a growing number of open data applications that use a wide range of government datasets. In their own way, they all reduce their users’ anxieties and help give them a competitive edge. Here goes an incomplete list:
- Cartography: – Cloudmade, SimpleGeo, Mom Maps
- Financial Information: BrightScope, Duedil, GuideStar
- Legal information: CENDOJ, Anti-mordidas
- Weather: Weather Channel – sold for $3.5 billion in 2008
- Transport: Google Maps, Waze, Transit, Snips, and all the many apps listed on City-Go-Round
- Legislative: Popvox, ElectNext, BillTrack50
- Travel: FlyOnTime, TripIt
- Health: Castlight, BabyCenter, Teladoc, State of the Air
- Safety: Crimemapping.com, Earthquake alert apps
- Energy: Opower
- Zoning: Zonability
Again, all of these apps aim to reduce the anxieties of their users, and provide them with relevant information at the right time to make the best decisions.
They are just the beginning. Think about how many decisions you have to make everyday, and how poorly we make most of those decisions. Those companies that can provide the most relevant information with the least friction will succeed.