Now, We're All Data Miners

Imagine finding out that your headquarters is sitting on a diamond mine. But you’re an architectural firm, oil company, or a commercial real estate company — what do you know about diamonds?

Data is like that. Simply put, no matter what kind of company you are, you’re in the data business and you’re sitting on a kind of mine, whether or not you’ve tapped it. “You’ve got us mixed up with Twitter,” you might say. But hear me out. Every enterprise collects data. Facebook, Google and Twitter are clearly in the business of data. These organizations have mastered wrangling the complexity of data and turning it into products or services.

Smaller organizations are now looking at their data in new ways. Organizations that recruit employees and temp agencies may find they can sell data about employment trends to others who are looking for more sources of information about this key economic indicator.

For startups, monetizing data is a common strategy. Kaggle runs contests for analyzing big data. In the process, it created one of the largest collections of data about active data scientists in the world, an asset that can be used in dozens of ways.

A Japanese firm found that by looking at elevator activity from a building-automation system they could predict the likelihood of lease renewals. Building-automation data is being used for dozens of other money-saving purposes.

As more and more “things” like elevators become smart, more data will arrive, and it seems obvious to see what value can be obtained from it. Figuring out how you will profit from data means looking at your data from new angles and determining how the data you have is valuable to you or to your partners.
Mashing up data sources for business insight

You have more data than you think you do. Twitter, for all its servers, is no New York Stock Exchange; it’s also not a chemical plant, which generates more data in a day than Twitter does in a month. The enterprise grabs its data from an astounding number of sources: From Salesforce.com or its CRM system, its back-office systems such as SAP or Oracle, not to mention its downloads, click-throughs, Facebook “Likes” and Twitter followers. If you’re a manufacturer, you can pile on data from the factory-automation system and remote sensors. If you’re a retailer, you can add data from your point-of-sale and warehouse management systems. The key point here is not just having so many rich-data sources, but creating new alloys by mashing up those data sources in different ways to increase their business value.

Data-driven decisions = data monetization

Any enterprise that uses data to drive decisions is monetizing its data. In 2011, MIT researchers analyzed 179 publicly traded companies to find that data-driven decision-making (versus relying on a leader’s gut instincts) translated into about five percent higher productivity and profit. I’m surprised those numbers aren’t higher. Those same companies tended to score better in performance measures like asset utilization, return on equity, and market value, which are all forms of monetization.
Hard goods, big data

Manufacturers understand the value data can bring and, according to IDC, 43 percent of manufacturers are actively designing automated, connected factories of the future. Six in 10 manufacturers expect their production processes to be mostly or completely digitized in the next five years. The key in all this is not just the automation of production processes, but the ability to “listen” to all the data the machines in the factories are generating. Manufacturers can then aggregate that machine data, along with other data sources, for everything from optimizing maintenance schedules to saving energy costs to creating a more resilient supply chain. In other words, automation saves money, but it also generates voluminous data that can be used to create fine-grained operational models. Viewing that information in new ways can lead not only to new insights but also to new lines of business.

General Electric is in the data business. The company once known for “Bringing Good Things to Life” with consumer products is now leading with what it calls “The Industrial Internet,” aimed at connecting intelligent machines (like its own gas and steam turbines) with advanced analytics. The payoff is in preventing downtime and eliminating unnecessary labor. GE estimates that it costs the industry 52 million labor hours and more than $7 billion a year to service the gas and steam turbines at work across 56,000 power plants. Servicing machines that are in perfect working order wastes much of that labor. Sensor data tells them which machines are running outside normal ranges (for example, for temperature and vibration) and need servicing. Apply data like that to commercial aircraft, fleet vehicles, conveyors, medical equipment like MRI scanners, and the savings are enormous — especially if you eliminate a costly breakdown.

The new competitive landscape is data

Data is the new competitive landscape, and how we can use data in new ways is becoming clearer all the time. If we think of data as a product, then certainly Amazon, Google, Facebook and Twitter come to mind — they aggregate and sell user data for geospatial advertising (among other uses).

But data can be a product for internal use, as well. A bank client of my company, Concurrent, creates numerous products using its data, both customer-facing products and internal products geared toward risk management, compliance and trading policies. That bank assigns product managers to both types of products — which sounds novel for data products, but I predict that it will become the rule, not the exception. These data product managers bridge business requirements, imperatives and processes with their data. As more organizations recognize that they are in the business of data, data products will need to be managed like any other product line.

Trading data for data

Another way that companies are in the business of data is by trading data that they have — partial data sets — for more coherent data sets. One example is Jigsaw, the crowdsourced database of companies and contacts. “No more renting or buying costly company directory lists,” Jigsaw promises (presumably from costly services like Bloomberg). Instead, you share the contact information you have, and receive credits toward a more complete data set. A second example is Factual, with location-based data for mobile personalization and ad targeting. User companies contribute their own consumer location data for access to bigger data sets and to product data on more than 650,000 consumer packaged goods. (That’s data that you historically paid for with a United Product Code (UPC) membership.) Factual promises that companies can “share and mash open data on any subject,” and share and mash they do. Both Jigsaw and Factual offer premium services like data cleansing, but the price of admission is your partial data set.

Learning to mine big data

Enterprises of all kinds will only grow more data-rich, meaning there are more riches to be found. If data is a kind of rich repository (like diamonds), then it requires both crude and refined tools. Hadoop is like dynamite, just the first tool in mining diamonds. Yes, it can aggregate data into a whole, but none of those data leaders (like Amazon or GE) installs Hadoop and declares its job done. They are developing testable, reusable data processing applications that turn data into products they can use.

It will take creativity and business acumen to look at the data we have and begin to imagine its uses, either for us or for others. What I can tell you is that you are sitting on that mine. Exactly how that data is valuable to you is the question that I invite you to consider. Since we’re all in the business of data, I guarantee that the answers to that question will be worth the time you take to ponder it.

* * *

Check out our upcoming Webinar “How to Get Hadoop App Intelligence with Driven“

Gary Nakamura is the CEO of Concurrent. Follow him @heynaka.

This article was originally published by Re/code.

Performance Management for Big Data Apps

Now, We’re All Data Miners

Data-driven decisions = data monetization

The new competitive landscape is data

Trading data for data

Learning to mine big data