Open and Machine ReadablePosted: July 4, 2013
So-called “big data” is quite the rage now, one might even say a bit of a fad in that many people are writing about it as a concept but less about the practical applications. But governments, led somewhat ironically by American cities, states and even Federal government (ironic given the recent disclosures of NSA contractor Snowden), are leading the way in making lots of its data freely available for analysis by individuals and companies.
After a Soviet missile shot down a South Korean airliner that strayed into Russian airspace in 1983, President Ronald Reagan made America’s military satellite-navigation system, GPS, available to the world. Entrepreneurs pounced. Car-navigation, precision farming and 3m American jobs now depend on GPS. Official weather data are also public and avidly used by everyone from insurers to ice-cream sellers.
But this is not enough. On May 9th Barack Obama ordered that all data created or collected by America’s federal government must be made available free to the public, unless this would violate privacy, confidentiality or security. “Open and machine-readable”, the president said, is “the new default for government information.”
This is a big bang for big data, and will spur a frenzy of activity. Pollution numbers will affect property prices. Restaurant reviews will mention official sanitation ratings. Data from tollbooths could be used to determine prices for nearby billboards. Combining data from multiple sources will yield fresh insights. For example, correlating school data with transport information and tax returns may show that academic performance depends less on income than the amount of time parents spend with their brats.
Over the next few months federal agencies must make an inventory of their data and prioritize their release. They must also take steps not to release information that, though innocuous on its own, could be joined with other data to undermine privacy—a difficult hurdle.
Many countries have moved in the same direction. In Europe the information held by governments could be used to generate an estimated €140 billion ($180 billion) a year. Only Britain has gone as far as America in making data available, however. For example, it requires the cost of all government transactions with citizens to be made public. Not all public bodies are keen on transparency. The Royal Mail refuses to publish its database of postal addresses because it makes money licensing it to businesses. On May 15th an independent review decried such practices, arguing that public-sector data belong to the public.
Rufus Pollock of the Open Knowledge Foundation, a think-tank, says most firms will eventually use at least some public-sector information in their business. But no one has a clue what breakthroughs open data will allow, just as Reagan never guessed that future drivers would obey robot voices telling them to turn left. (The Economist April 27 2013.)
Cities like Chicago are both crunching numbers to provide tax payers with better feedback on the running of their cities as well as just making the data available for others to figure out uses.
As cities also start to look back at historical data, fascinating discoveries are being made. Mike Flowers, the chief analytics officer in New York, says that if a property has a tax lien on it there is a nine fold increase in the chance of a catastrophic fire there. And businesses that have broken licensing rules are far more likely to be selling cigarettes smuggled into the city in order to avoid paying local taxes. Over in Chicago, the city knows with mathematical precision that when it gets calls complaining about rubbish bins in certain areas, a rat problem will follow a week later.
The next step is to use these predictions to inform policymaking. New York is already doing this, for example by deciding where to send its cigarette-tax inspectors. Chicago is not quite at this point yet, but is ambitiously trying to build an “open-source predictive analytics platform”. This means that it will publish as many data as it can, as close to real time as possible, in a way that will allow anyone to mine them for useful insights into the city.
Moreover, the software Chicago plans to create will be made public, allowing other cities to use it to set up similar systems of their own. (New York keeps its analysis behind closed doors and uses proprietary technology.) It is a big job and means cleaning up 10 billion lines of unstructured data. The hope is that entirely new services will emerge, as well as a great deal of new intelligence about how the city works.
The City of London used data to help commuters better navigate public transit by crunching numbers from its smart Oyster card and updating passengers in real-time of the best (and worst) routes. In Singapore, the aim is to go beyond its already sophisticated road-pricing system to create a city control system that will, for example, optimize the number and location of taxis in response to rain patterns.