There’s more data in the world today than at any point in human history because data is growing exponentially due to social media, IoT, etc. 90% of the world’s data was created in the past two years alone.
It’s fair to say that plenty of economic possibilities (opportunities or threats) are tied to big data and the way it’s used. Big data represents a global shift in business intelligence that companies will need to master if they want to stay competitive.
Big Data: Definitions and Benefits
What exactly do we mean by “big data”? First, consider traditional notions of data: mostly structured information neatly organized in rows and columns in tables, managed by relational database management systems (RDBMS), and accessible by query tools such as Structured Query Language (SQL). These systems have their roots dating back to the 1970s and are mostly ill-equipped to handle the huge influx of today’s data.
Big data goes a step further: In addition to traditional tabular data, big data can include unstructured text, audio and video, signal data from physical sensors, and more. These attributes, combined with its inherently large volume, mean that big data cannot be collected, stored, organized, or analyzed using traditional infrastructure, software, data tools and techniques.
So why bother with big data? Because of the immense value locked in those vast data sets—value that can spur innovation, provide competitive advantage, and drive productivity gains.
- Innovation: Big data offers opportunities for innovation in many industries. To take just one example, the vast quantities of data associated with the human genome enable researchers not only to identify causes of various diseases, but to develop treatments to prevent or cure them. Technologies such as polymerase chain reaction (PCR) analyzers can locate DNA sequences of interest in specific patients, providing a wealth of information regarding that patient’s predisposition to certain diseases or susceptibility environmental factors.
- Competition: It’s no secret that more customer data generally equates to better decision-making. More advanced market segmentation is an obvious example of this, but the benefits aren’t limited to marketing personalization or more focused brand messaging. When companies slice and dice their usage data, they can tap into valuable insights related to website performance, product development, channel preferences, and even risk mitigation strategies.
- Productivity: Business processes generate data—potentially, lots of it. With appropriate tools to extract, analyze, and present business process data, a company can make tactical and strategic decisions and enhance process efficiencies.
Big Data Challenges
The benefits of big data are clear—the question is, how do we realize these benefits?
As mentioned previously, big data, by definition, has too much volume and variety to manage with traditional tools and techniques. This means that new tools and methodologies must be developed, and indeed, lots of smart people in both academia and industry have been working hard over the last few years doing just that.
The result is a wide variety of solutions, both proprietary and open-source, that can be brought to bear on different parts of the big data problem, such as:
- Data collection: Specialized software for “ingesting” data into a big-data system that can perform some level of pre-processing to improve the quality of the data coming in. Popular solutions include Apache Sqoop for data coming from an RDBMS like Oracle or PosgreSQL to Hadoop Distributed File System (HDFS) or Apache Kafka for streaming data like web server logs or tweets from Twitter.
- Data storage: Unlike traditional databases, big data can’t be stored on a single hard drive. It takes an array of commodity, distributed storage devices, which must be securely managed to prevent data corruption and loss. Solutions such as HDFS are designed to tackle this problem where data sets are distributed across many nodes in a cluster and replicated to avoid a single point of failure.
- Data processing: Just as storage requires many devices working in parallel, the task of processing and transforming big data requires distributed resources as well. MapReduce is the leading solution for dividing an analysis task into smaller subtasks and coordinating them among multiple computing resources.
- Data visualization: For the processed data to be useful, it must be presented in a meaningful way. Data visualization is about how to present your data, to the right people, at the right time, in order to enable them to gain insights most effectively. Many solutions, such as Tableau, Qlikview, Apache Zeppelin, and Prometheus; each is designed for a certain type of visualization task.
The Future of Big Data
Organizations large and small eventually will be faced with the task of taming the big-data beast. The tools and techniques for doing so continue to evolve and become faster and easier to use. Given the benefits afforded by big-data analytics, now is the time to survey your data landscape, infrastructure, and current systems and adopt a big-data strategy. Your future may depend on it.