Big Data is defined in terms of transformative economics. A Big Data system has four properties:
- It uses local storage to be fast but inexpensive
- It uses clusters of commodity hardware to be inexpensive
- It uses free software to be inexpensive
- It is open source to avoid expensive vendor lock-in
Cheap storage means logging enormous volumes of data to many disks is easy. Processing this data is less so. Distributed systems which have the above four properties are disruptive because they are approximately 100 times cheaper than other systems for processing large volumes of data, and because they deliver high I/O performance for the buck.
Apache Hadoop is one such system. Hadoop ties together a cluster of commodity machines with local storage using free and open source software to store and process vast amounts of data at a fraction of the cost of any other system.
SAN Storage | NAS Filers | Local Storage |
$2-10/GB | $1-5/GB | $0.05/GB |
It is out of this cost differential that our opportunity arises: to log every shred of data we can in the cheapest place possible. To provide access to this data across the organization. To mine our data for value. This is Big Data.
The post Big Data Defined appeared first on Hortonworks.