“I’ll Have a Skinny Grande Data to Go”

Gilad Elbaz

Gilad Elbaz is a 34 year-old multi-millionaire and founder of Factual (http://www.factual.com/) a company whose mission is to make “data more accessible (i.e. cheaper, higher quality, less encumbered) for machines and developers, to drive and accelerate innovation in an unprecedented way. We take on the dirty work of data management and data curation, letting developers focus on higher value and more productive tasks. We provide clean, structured data with complete source transparency to developers.”

Gilad made his first fortune (I’m fairly certain several more are in the cards) when another company he started, Applied Semantics, was purchased by Google and now is the basis of Google’s AdSense business, which generates close to $10 billion in revenue annually.

In a recent article in The New York Times, Quentin Hardy wrote:

“The world is one big data problem,” Mr. Elbaz says from his headquarters, a quiet office 14 floors above the Los Angeles Country Club. He is a slim, soft-spoken man who weaves in his chair when an idea excites him. “What if you could spot any error, as soon as you wrote it? Factual is definitely a new thing that will change business, and a valuable new tool for computing.”

In the booming world of Big Data, where once-unimaginably huge amounts of information are scoured for world-changing discoveries, Mr. Elbaz may be the most influential inventor and investor. Besides Factual, he has interests in 30 start-ups, including an incubator in San Francisco dedicated to Big Data. Factual’s headquarters, in a high-rise on the Avenue of the Stars, hosts seminars for a data community he hopes to foster in the Los Angeles area.

Mr. Elbaz also serves on the boards of the California Institute of Technology, his alma mater, and the X Prize Foundation, which offers cash prizes to teams that meet challenges in space flight, medicine and genomics.

The scale and scope of Factual’s data is mind-boggling, at least to mortals. Hardy describes the size of Factual’s database:

Geared to both big companies and smaller software developers, it includes available government data, terabytes of corporate data and information on 60 million places in 50 countries, each described by 17 to 40 attributes. Factual knows more than 800,000 restaurants in 30 different ways, including location, ownership and ratings by diners and health boards. It also contains information on half a billion Web pages, a list of America’s high schools and data on the offices, specialties and insurance preferences of 1.8 million United States health care professionals. There are also listings of 14,000 wine grape varietals, of military aircraft accidents from 1950 to 1974, and of body masses of major celebrities. Odd facts matter too, Mr. Elbaz notes.

He keeps 500 terabytes of storage near Factual’s headquarters. That’s about twice the amount needed to hold the entire Library of Congress. He has more data stored inside Amazon’s giant cloud of computers. His statisticians have cleaned and corrected data to account for things like how different health departments score sanitation, whether the term “middle school” means two years or three in a particular town, and whether there were revisions between an original piece of data and its duplicate.

Factual’s plan, outlined in a big orange room with a few tables and walled with whiteboards, is to build the world’s chief reference point for thousands of interconnected supercomputing clouds. The digital world is expected to hold a collective 2.7 zettabytes of data by year-end, an amount roughly equivalent to 700 billion DVDs. Factual, which now has 50 employees, could prove immensely valuable as this world grows and these databases begin to interact.

How does Factual make money doing this? They sell data to companies and software developers based on how much the information is further processed. Data for things like prototypes are free; contracts with its biggest customers run into the millions. Factual will also trade data with other companies, building its resources. Some applications are to add information like restaurant locations to cellphone maps, or for planning sales campaigns. More broadly, Factual is aiming to use all the cloud-based data and algorithms to find patterns in nature and society, for scientists to observe and businesses to exploit. Some of their products are:

  • Location data for 58 million places (businesses and points of interest) in 50 countries, multiple languages;
  • Location data plus 43 additional attributes for 800,000+ restaurants in the U.S. including cuisines, price, ratings, dress codes, parking availability, and meal types;
  • Location data plus 8 healthcare specific attributes for 1.8 million healthcare providers in the U.S. including Doctors, Dentists, Chiropractors, Physical Therapists and many other types of healthcare providers and information such as medical affiliations, insurance accepted, gender, languages spoken, and education;
  • Data from the World Bank on Global Education Statistics by Country;
  • Data from the World Bank on Health Nutrition and Population Statistics.

In The Times article Gilad explains:

“Data has always been seen as just a side effect in computing, something you look up while you are doing work. We see it as a whole separate layer that everyone is going to have to tap into, data you want to solve a problem, but that you might not have yourself, and completely reliable.”

A restaurant chain, for example, might use Factual to figure out whether a new location is near the competition, and how the locals have talked about the place on Yelp, the social ratings site. Checking for gas stations near the restaurant can indicate how many cars come off the highway. The chain can also employ Factual to see where it is mentioned on the Web, or to correct what other people are saying about it.

Financed with $27 million by a constellation of Silicon Valley luminaries, Factual remains closely held. But it already has thousands of customers. Facebook, CitySearch, AT&T and others use it for information about places. Newsweek used the database to help rank America’s greenest companies.

Others use Factual data for tasks like product planning and customer care. There are no profits yet, as Mr. Elbaz puts money into more data sets and talent, which already includes advanced mathematicians, data scientists from LinkedIn and Google, and at least one specialist in late Roman archaeology.

Competitors in the new industry include Microsoft, which says its Windows Azure Marketplace has “trillions of data points,” as well as a language translator. People can sell data sets to Azure, too. Infochimps offers geographic and social data, among other kinds, while companies like Gnip and Datasift offer insights from Twitter and other social sites. Wolfram Alpha, founded by another mathematician, has both data and computations that are used by Apple’s Siri, among others.

And a young company called ClearStory, also financed by Andreessen Horowitz, is trying to tie together all of these companies, often called data marts, in a way ordinary people can use. There are also several open-source data repositories, with public and private information that developers plug into their algorithms.

Several other data specialists, mostly from Google, have left their jobs to wrangle lots of information in new ways. David Friedberg, a former product manager at Google, has started the Climate Corporation, which uses government data on weather, soil porosity and the root structures of wheat and soybeans to write crop insurance.

Mr. Elbaz is also an investor in Kaggle, which awards cash for finding data patterns. It was used by NASA, for example, to find a better way to measure the shape of galaxies; in the first week of competition, a Ph.D. student in glacier mapping had outperformed NASA’s algorithms. He has also put money into ZestCash, which makes payday loans that are cheaper than the industry’s average, judging risk via criteria like cellphone bills and how its applicants read the ZestCash Web site. The ZestCash C.E.O., Douglas Merrill, once ran Google’s internal information systems.

“We feel like all data is credit data, we just don’t know how to use it yet,” he says. “This is the math we all learned at Google. A page was important for what was on it, but also for how good the grammar was, what the type font was, when it was created or edited. Everything. What Gil is doing at Factual is the same. Data matters. More data is always better.”

Part of the difficulty, even among employees, is deciding how much data is enough. “For sure, we want the correct name and location of every gas station on the globe,” Mr. Bell says. “Not the price changes at every station.”

“Wait a minute, I’d like to know every gallon of gasoline that flows around the world,” Mr. Chklovski cuts in. “That might take us 20 years, but it would be interesting.”

At most start-ups, talk about doing the same kind of thing, only bigger and better, 20 years from now might seem like a marriage of the delusional and the dull. Mr. Elbaz and his team, however, say they feel that it makes sense. Telling everyone the true facts of the world is at least the work of a lifetime.

”Lately, I’ve been thinking that we need to get more personal data,” Mr. Elbaz says. He doesn’t mean names and addresses, but their genetic information, what they ate, when and where they exercised — ideally, for everyone on the planet, now and forever. “I want to figure out a way,” he says, “to get people to leave their data to science.”