Every online purchase you make adds to the growing stream of data.
©Monkey Business/ThinkstockIn essence, big data is just what it seems – vast amounts of data. Since the rise of the Internet, we've been generating data at an unprecedented rate. It's been noted that up until 2003, only 5 exabytes of data were created, equaling 5 billion gigabytes. However, from 2003 to 2012, that number surged to approximately 2.7 zettabytes (2,700 exabytes, or 2.7 trillion gigabytes) [sources: Intel, Lund]. Berkeley researchers now claim we're producing about 5 quintillion bytes (roughly 4.3 exabytes) of data every two days [source: Romanov].
The phrase 'big data' typically refers to enormous, fast-growing, diverse, and often unorganized digital datasets that are challenging to manage using conventional databases. This can encompass all types of digital data circulating on the Internet, the confidential information of businesses we've engaged with, official government files, and much more. Additionally, there's a suggestion that this data is being processed for specific purposes.
We've created much of this data ourselves through online purchases and social media activity, but that's just the beginning. Big data can consist of digital documents, photos, videos, audio files, tweets, social media posts, emails, text messages, phone logs, search queries, RFID tag and barcode scans, and financial transaction data, among other things. Every action you take online leaves behind a digital footprint, which can be mined for valuable insights by others.
The variety and number of devices generating data have been rapidly expanding. In addition to personal computers and retail point-of-sale systems, we now have smartphones connected to the Internet, WiFi-enabled scales that post our weight on social media, fitness trackers that monitor and share health data, cameras that upload images and videos automatically, and GPS devices that track our global location. Don’t forget sensors in weather stations, traffic monitoring, security cameras, and even vehicles and planes, constantly generating data. This massive network of interconnected devices has led to the concept of 'the Internet of Things.'
While there are various interpretations of big data, it broadly refers to any data that could potentially be valuable and amenable to computer analysis. These enormous and often unwieldy datasets necessitate new techniques for collection, storage, processing, and analysis.
How Big Data is Analyzed and Used
Data centers like this one in San Jose, California, are handling vast quantities of data, striving to uncover patterns and connections.
© Bob Sacha/CorbisFor big data to be useful, it must be gathered, refined, connected, and analyzed. Organizations must sift through immense amounts of available data to extract what matters most to them. Thankfully, advancements in hardware and software are making it more affordable and efficient to process, store, and analyze large datasets, eliminating the need for prohibitively expensive supercomputers. Some software is even becoming more user-friendly, allowing those without programming or data science expertise to manage the data (though having knowledgeable professionals is always beneficial).
Companies are increasingly turning to cloud computing services, allowing them to avoid purchasing their own equipment for data processing. Data centers, also known as server farms, can distribute data processing across multiple servers, and the number of servers can be adjusted quickly depending on demand. This scalable distributed computing is made possible through technologies like Apache Hadoop, MapReduce, and Massively Parallel Processing (MPP). NoSQL databases have been introduced as more flexible, scalable alternatives to traditional SQL-based systems.
A major focus of big data processing and analysis is uncovering patterns and correlations that yield valuable insights. Businesses now have the ability to mine enormous datasets to understand consumer behavior, product popularity, and more efficient ways of conducting business. Big data analytics enables targeted advertising, recommending products or services to customers most likely to purchase them, or creating ads that are broadly appealing. Some companies even send real-time ads and coupons to consumers' smartphones when they are near locations where they've recently made credit card purchases.
Big data isn't just about influencing purchasing decisions. Businesses also leverage the data to enhance operational efficiency, like optimizing delivery routes or managing inventory. Government agencies use it to analyze traffic patterns, crime data, utility consumption, and other factors to inform policy decisions and public services. Intelligence agencies may use it for surveillance, with the goal of thwarting criminal and terrorist activities. News organizations can analyze trends to create stories, and of course, write more articles on the topic of big data itself.
In essence, big data enables organizations to make decisions based on nearly real-time information, instead of relying on outdated data like in the past. However, the ability to observe our current actions and even predict our future behavior can sometimes feel unsettling.
Big Data: Friend or Foe?
Your ATM withdrawals and credit or debit card purchases contribute to the data profile that aids companies in predicting your spending habits.
© Erik Tham/CorbisThe notion of big data often makes many people feel uneasy. It brings to mind Orwell’s Big Brother, and with advertisements from companies that seem to track our every move, along with the recent NSA surveillance revelations, it’s easy to see why so many find the vast amount of personal information being collected about us disturbing.
This data reveals a lot about you, such as your age, gender, sexual preferences, relationship status, income, health, interests, hobbies, behaviors, and a range of other personal details that you may or may not want to be public. All it takes is having the necessary tools and the intent to gather and process it. Regardless of whether the motives are good or bad, the consequences can be unforeseen.
We unknowingly share more personal information with companies than we realize, particularly when we use loyalty programs or pay using credit or debit cards. By simply examining your purchases, businesses can infer a lot about you. Target gained attention when it was revealed that they could identify pregnant customers and even estimate their due dates based on purchases like supplements and lotions. In one instance, Target sent baby product coupons to a teenage girl, which upset her father over what he deemed inappropriate ads — until he learned of her pregnancy [sources: Datoo, Duhigg, Economist].
Governments and privacy advocates have tried to regulate how personally identifiable information (PII) is used or disclosed, granting individuals some control over what becomes publicly known. However, predictive analytics often bypasses these laws, which mainly target specific data such as your financial, medical, or educational records. Companies can make conclusions about you indirectly, often without your awareness, using fragmented data gathered from online sources. Some businesses use this data to assess creditworthiness through factors beyond the typical credit score, which can either benefit or harm you, depending on the interpretation. A major concern is that this personal data may lead to hidden discrimination in employment, housing, or lending. Even worse, it may not always be entirely accurate.
Patterns discovered in big data can sometimes be misunderstood and lead to poor decisions. Like any tool, its effectiveness relies on how it is used. Even though mathematical analysis is involved, big data analytics is not a precise science, and human judgment plays a crucial role. When handling large datasets, decisions must be made about what information to prioritize and what to disregard. However, executing big data analytics effectively can provide companies with a distinct competitive edge.
Big data analytics can be applied to important uses, such as preventing fraud. Financial institutions, credit card companies, and other organizations handling money increasingly use big data to detect suspicious activity. On an individual account, they can be alerted to red flags like unusual purchases, spending beyond the usual habits, odd locations, or small test purchases followed by large ones. When patterns appear across multiple accounts, such as similar charges in the same area on different cards, it can indicate potential fraudulent behavior.
Massive datasets have the potential to support research in various fields like science, sociology, election forecasting, weather predictions, and other valuable endeavors. Social media updates and Google search trends have even been utilized to rapidly track the locations of disease outbreaks. So, it's not all doom and gloom. It will take time to address the challenges and to establish laws that can safeguard us from potential risks. In the meantime, if you're concerned, you might consider returning to cash transactions and being more cautious about the personal information you share online. However, we’re likely already too far along in this digital age to fully escape the radar.
