Basta, Big Data: It’s Time to Say Arrivederci

One of my favorite Italian words is “basta,” followed by an exclamation point. No, basta does not mean “bastard”; it means “enough,” as in “I’ve had ENOUGH of you!” We’ve definitely had enough of Big Data. As a term, Big Data has been an utter failure. It has never managed to mean anything in particular. A term that means nothing in particular means nothing at all. The term can legitimately claim two outcomes that some consider useful:

  1. It has sold a great many products and services. Those who have collected the revenues love the term.
  2. It has awakened some people to the power of data to inform decisions. The usefulness of this outcome, however, is tainted by the deceit that some data today is substantially different from data of the past. As a result, Big Data encourages organizations to waste time and money seeking an illusion.

If you’ve thought much about Big Data, you’ve noticed the confusion that plagues the term. What is Big Data? This question lacks a clear answer for the following reasons:

  1. There are almost as many definitions of Big Data as there are people with opinions.
  2. None of the many definitions that have been proposed describe anything about data and its use that is substantially different from the past.
  3. Most of the definitions are so vague or ambiguous that they cannot be used to determine, one way or the other, if a particular set of data or use of data qualifies as Big Data.

The term remains what it was when it first became popular—a marketing campaign, and as such, a source of fabricated need and endless confusion. Nevertheless, like spam, it refuses to go away. Why does this matter? Because chasing illusions is a waste of time and money that also carries a high cost of lost opportunity. It makes no sense to chase Big Data, whatever you think it is, if you continue to derive little or no value from the data that you already have.

Ill-defined terms that capture minds and hearts, as Big Data has, often exert influence in irresponsible and harmful ways. Big Data has been the basis for several audacious claims, such as, “Now that we have Big Data…”

  • “…we no longer need to be concerned with data quality”
  • “…we no longer need to understand the nature of causality”
  • “…science has become a thing of the past”
  • “…we can’t survive without it”

People who make such claims either don’t understand data and its use or they are trying to sell you something. Even more disturbing in some respects are the ways in which the seemingly innocuous term Big Data has been used to justify unethical practices, such as gleaning information from our private emails to support targeted ads—a practice that Google is only now abandoning.

Data has always been big and getting bigger. Data has always been potentially valuable for informing better evidence-based decisions. On the other hand, data has always been useless unless it can inform us about something that matters. Even potentially informative data remains useless until we manage to make sense of it. How we make sense of data involves skills and methods that have, with few exceptions, been around for a long time. Skilled data sensemakers have always made good use of data. Those who don’t understand data and its use mask their ignorance and ineffectiveness by introducing new terms every few years as a bit of clever sleight of hand.

The definitions of Big Data that I’ve encountered fall into a few categories. Big Data is…

  1. …data sets that are extremely large (i.e., an exclusive emphasis on volume)
  2. …data from various sources and of various types, some of which are relatively new (i.e., an exclusive emphasis on variety)
  3. …data that is both large in volume and derived from various sources (and is sometimes also produced and acquired at fast speeds, to complete the three Vs of volume, velocity, and variety)
  4. …data that is especially complex
  5. …data that provides insights and informs decisions
  6. …data that is processed using advanced analytical methods
  7. …any data at all that is associated with a current fad

Let’s consider the problems that are associated with the definitions in each of these categories.

Data Sets That Are Extremely Large

According to the statistical software company SAS:

Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis.

SAS.com

This definition fails in several respects, not the least of which is its limitation to business data. The fundamental problem with definitions such as this, which focus primarily on the size of data as the defining factor, is their failure to specify how large data must be to qualify as Big Data rather than just plain data. Large data sets have always existed. What threshold must be crossed to transition from data to Big Data? This definition doesn’t say.

Here’s a definition that attempts to identify the threshold:

Big Data is a phrase used to mean a massive volume of both structured and unstructured data that is so large it is difficult to process using traditional database and software techniques.

Vangie Beal, Webopedia.com

Do you recognize the problem of defining the threshold in this manner? What are “traditional database and software techniques”? The following definition is slightly less vague:

Big data means data that cannot fit easily into a standard relational database.

Hal Varian, Chief Economist, Google

(Source Note: All of the definitions that I quote that are attributed to an individual, independent of a particular publication, appeared in an article written by Jennifer Dutcher of the U.C. Berkeley School of Information titled “What is Big Data?” on September 3, 2014.)

In theory, there are no limits to the amount of data that can be stored in a relational database. Databases of all types have practical limits. People have suggested technology-based volume thresholds of various types, including anything that cannot fit into an Excel spreadsheet. All of these definitions establish arbitrary limits. Some are based on arbitrary measures.

Big data is data that even when efficiently compressed still contains 5-10 times more information (measured in entropy or predictive power, per unit of time) than what you are used to right now.

Vincent Granville, Co-Founder, Data Science Central

So, if you are accustomed to 1,000 row Excel tables, a simple SQL Server database consisting of 5,000 to 10,000 rows qualifies as Big Data. Such definitions highlight the uselessness of arbitrary limits on data volume. Here’s another definition that acknowledges its arbitrary nature:

Big data is when…the standard, simple methods (maybe it’s SQL, maybe it’s k-means, maybe it’s a single server with a cron job) break down on the size of the data set, causing time, effort, creativity, and money to be spent crafting a solution to the problem that leverages the data without simply sampling or tossing out records.

 John Foreman, Chief Data Scientist, MailChimp

Some definitions acknowledge the arbitrariness of the threshold without recognizing it as a definitional failure:

The term big data is really only useful if it describes a quantity of data that’s so large that traditional approaches to data analysis are doomed to failure. That can mean that you’re doing complex analytics on data that’s too large to fit into memory or it can mean that you’re dealing with a data storage system that doesn’t offer the full functionality of a standard relational database. What’s essential is that your old way of doing things doesn’t apply anymore and can’t just be scaled out.

John Myles White

What good is a definition that is based on a subjective threshold in data volume?

The following definition acknowledges that, when based on data volume, what qualifies as Big Data not only varies from organization to organization, but over time as well:

Big data is data that contains enough observations to demand unusual handling because of its sheer size, though what is unusual changes over time and varies from one discipline to another. Scientific computing is accustomed to pushing the envelope, constantly developing techniques to address relentless growth in dataset size, but many other disciplines are now just discovering the value — and hence the challenges — of working with data at the unwieldy end of the scale.

Annette Greiner, Lecturer, UC Berkeley School of Information

Not only do these definitions identify Big Data in a manner that lacks objective boundaries, they also acknowledge (perhaps inadvertently) that so-called Big Data has always been with us, for data has always been on the increase in a manner that leads to processing challenges. In other words, Big Data is just data.

There is a special breed of volume-based definitions that advocate “Collect and store everything.” Here is the most thorough definition of this sort that I’ve encountered:

The rising accessibility of platforms for the storage and analysis of large amounts of data (and the falling price per TB of doing so) has made it possible for a wide variety of organizations to store nearly all data in their purview — every log line, customer interaction, and event — unaggregated and for a significant period of time. The associated ethos of “store everything now and ask questions later” to me more than anything else characterizes how the world of computational systems looks under the lens of modern “big data” systems.

Josh Schwartz, Chief Data Scientist, Chartbeat

These definitions change the nature of the threshold from a measure of volume to an assumption: you should collect everything at the lowest level of granularity, whether useful or not, for you never know when it might be useful. Definitions of this type are a hardware vendor’s dream, but they are an organization’s nightmare, for the cost of unlimited storage extends well beyond the cost of hardware. The time and resources that are required to do this are enormous and rarely justified. Anyone who actually works with data knows that the vast majority of the data that exists in the world is noise and will always be noise. Don’t line the pockets of hardware vendor executives with gold by buying into this ludicrous assumption.

Data from Various Sources and of Various Types

Independent of a data set’s size, some definitions of Big Data emphasis its variety. Here’s one of the clearest:

What’s “big” in big data isn’t necessarily the size of the databases, it’s the big number of data sources we have, as digital sensors and behavior trackers migrate across the world. As we triangulate information in more ways, we will discover hitherto unknown patterns in nature and society — and pattern-making is the wellspring of new art, science, and commerce.

Quentin Hardy, Deputy Tech Editor, The New York Times

Definitions that emphasize variety suffer from the same problems as those that emphasize volume: where is the threshold? How many data sources are needed to qualify data as Big Data? In what sense does the addition of new data sources—something that is hardly new—change the nature of data or its use? It doesn’t. New data sources are sometimes useful and sometimes not. Collecting and storing every possible source of data is no more productive than collecting and storing every instance of data.

Data That Exhibits High Volume, Velocity, and Variety

I’ll use Gartner’s definition to represent this category in honor of the fact that Doug Laney of Gartner was the first to identify the three Vs (volume, velocity, and variety) as game changers.

Big Data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.

Gartner

Combining volume and variety, plus adding velocity—the speed at which data is generated and acquired—produces definitions that suffer from all of the problems that I’ve already identified. Increases in volume, velocity, and variety have been with us always. They have not fundamentally changed the nature of data or its use.

Data That Is Especially Complex

Some definitions focus on the complexity of data.

While the use of the term is quite nebulous and is often co-opted for other purposes, I’ve understood “big data” to be about analysis for data that’s really messy or where you don’t know the right questions or queries to make — analysis that can help you find patterns, anomalies, or new structures amidst otherwise chaotic or complex data points.

Philip Ashlock, Chief Architect, Data.gov

You can probably anticipate what I’ll say about definitions of this sort: once again they lack of a clear threshold and identify a quality that has always been true of data. How complex is complex enough and at what point in history has data not exhibited complexity?

Data the Provides Insights and Informs Decisions

As you no doubt already anticipate, definitions in this category exhibit the same problems as those in the categories that we’ve already considered. Here’s an example:

Big Data has the potential to help companies improve operations and make faster, more intelligent decisions. This data…can help a company to gain useful insight to increase revenues, get or retain customers, and improve operations.

Vangie Beal, Webopedia.com

Data That Is Processed Using Advanced Analytical Methods

According to definitions in this category, it is nothing about the data itself that determines Big Data, but is instead the methods that are used to make sense of it.

The term “big data” often refers simply to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data.

Wikipedia

Some of these definitions allow quite a bit of leeway regarding the nature of the advanced methods, while others are more specific, such as the following:

Big data is an umbrella term that means…the possibility of doing extraordinary things using modern machine learning techniques on digital data. Whether it is predicting illness, the weather, the spread of infectious diseases, or what you will buy next, it offers a world of possibilities for improving people’s lives.

Shashi Upadhyay, CEO and Founder, Lattice Engines

What analytical methods qualify as Big Data? The answer usually depends on the methods that the person who is quoted uses or sells. Can you guess what kind of software Lattice Engines sells?

Advanced methods that are considered advanced have been around for a long time. Even most of the methods that are identified as advanced today when defining Big Data have been around for quite some time. For example, even though computers were not always powerful enough to run machine-learning algorithms on large data sets, these algorithms are fundamentally based on traditional statistical methods.

A few of the definitions in this category have emphasized advanced skills rather than technologies, such as the following:

As computational efficiency continues to increase, “big data” will be less about the actual size of a particular dataset and more about the specific expertise needed to process it. With that in mind, “big data” will ultimately describe any dataset large enough to necessitate high-level programming skill and statistically defensible methodologies in order to transform the data asset into something of value.

Reid Bryant, Data Scientist, Brooks Bell

Once again, however, there is nothing new about these skills.

Any Data at All that Is Associated With a Current Fad

Some definitions of Big Data apply the term to anything data related that is trending. Here’s an example:

I see big data as storytelling — whether it is through information graphics or other visual aids that explain it in a way that allows others to understand across sectors.

Mike Cavaretta, Data Scientist and Manager, Ford Motor Company

This tendency has been directly acknowledged by Ryan Swanstrom in his Data Science 101 blog: “Now big data has become a buzzword to mean anything related to data analytics or visualization.” This is what happens with fuzzy definitions. They can be easily manipulated to mean anything you wish. As such, they are meaningless and useless.

Now What?

The definitional messiness and thus uselessness of the term Big Data is far from unique. Many information technology terms exhibit these dysfunctional traits. I’ve worked in the field that goes by the name “business intelligence” for many years, but this industry has never adhered to or lived up to the definition provided by Howard Dresner, who coined the term: “Concepts and methods to improve business decision making by using fact-based support systems.” Instead, the term has primarily functioned as a name for technologies and processes that are used to collect, store, and produce automated reports of data. Rarely has there been an emphasis on “concepts and methods to improve business decision making,” which features humans rather than technologies. This failure of emphasis has resulted in the failure of most business intelligence efforts, which have produced relatively little intelligence.

All of the popular terms that have emerged during my career to describe the work that I and many others do with data, including decision support, data warehousing, analytics, data science, and of course, Big Data, have been plagued by definitional dysfunction, leading to confusion and bad practices. I prefer the term “data sensemaking” for the concepts, methods, and practices that we engage in to understand data. And to promote the value of data as the raw material from which understanding is woven, healthcare has suggested one of the most useful terms: “evidence-based medicine.” In its generic form, “evidence-based decision making” is simple, straightforward, and clear. If we used these terms to describe the work and its importance, we would stop wasting time chasing illusions and would focus on what’s fundamentally needed: data sensemaking skills, augmented by good technologies, to support evidence-based decision making. Perhaps then, we would make more progress.

Let’s say “goodbye” to the term Big Data. It doesn’t mean anything in particular and all of the many things that people have used it to mean merely refer to data. Do we really need a new term to promote the importance of evidence-based decision making? The only people who are prepared to glean real value from data don’t need a new term or a marketing campaign. Rallying those who don’t understand data or its use will only lead to good outcomes if we begin by helping them understand. Meaningless terms lead in the opposite direction.

Take care,

Signature

52 Comments on “Basta, Big Data: It’s Time to Say Arrivederci”


By Alberto Cairo. June 27th, 2017 at 4:19 pm

It’s fanciful to be a nationalist in America nowadays (unfortunately,) so I’m going to just intervene to say that “Basta!” is also a word in Spanish, with exactly the same meaning

By Stephen Few. June 27th, 2017 at 4:27 pm

Good to know, Alberto. It was my Italian friends, however, who entertained me with the frequent and demonstrative use of “basta.” For that reason, I will always think of it as Italian. (Perhaps there was something about my behavior that prompted them to snarl “basta” so often.)

By rjss. June 27th, 2017 at 7:57 pm

I agree to some extent but maybe part of the problem is that the specifics of the underlying definition are evolving therefore it is hard to give a definite answer. Many of the definitions I have found are a variant of the one you posted above “Big Data is a phrase used to mean a massive volume of both structured and unstructured data that is so large it is difficult to process using traditional database and software techniques. — Webopedia”. The problem appears when we start to talk about specifics because the methods and techniques are evolving therefore what is correct today is not necessarily correct tomorrow.

However, I dont see the term Big Data as a total failure. For example if you say Big Data today people will think about some particular technologies/features (millions or billions of records, Hadoop, parallel algorithms, etc.) that for better or worst do not have a better term to group them.

By Stephen Few. June 27th, 2017 at 8:26 pm

rjss,

The definition of Big Data isn’t evolving. It’s a chaos of nonsense and always has been. Evolution suggests progress. The term Big Data began as nonsense and has become even more nonsensical over time.

Why not accept the reality of data as data–nothing more, nothing less. It grows and it changes, but it remains a collection of facts. Facts are of no value until we identify those that are relevant, make sense of them, and put them to use for good. Facts don’t need new terms, they need understanding. The term Big Data has bred nothing but confusion and failed promises.

By rjss. June 27th, 2017 at 9:59 pm

I was not implying that the definition is evolving but that the specific details change over time therefore they are technically not part of the definition. I think the term have not changed much since the 1990s. The part you talk about marketing and non-sense I agree but that does not mean the term is a failure. I think if you ignore the marketing info you find something technically sound in the term.

By milang. June 27th, 2017 at 10:32 pm

There was a time when a computer was called micro computer…

By Stephen Few. June 28th, 2017 at 9:01 am

rjss,

Despite the fact that someone used the term “big data” back in the ’90s, it only came into common use in the 2000s. The term has never had a commonly-accepted definition. It has always meant whatever people want it to mean, which is typical of marketing terms. Contrary to your statement above, what people think Big Data means has changed dramatically over the years. This is what happens when a term has not been defined. Your preferred definition–the one that appears in Webopedia–is neither sound nor useful. What are the “traditional database and software techniques” that cannot process large data sets? Even if such database and software techniques were identified, what would be the point of giving data sets that could not be handled by them a special name that adorns them as game changing? The sizes of data sets have always been growing beyond the capacity of current technologies, which is why those technologies constantly adapt to handle them. This has been the case throughout my career. For what reason did the need for a new term suddenly develop at some point in the last 10 to 15 years? I have the answer. It developed to sell products and services.

By Stephen Few. June 28th, 2017 at 9:41 am

milang,

I’m not sure what you’re trying to say. The term “microcomputer” had a clear definition that dintiguished it in a meaningful way from other computers. A microcomputer, which we today more commonly refer to as a personal computer, was a computer that contained a microprocessor. Unlike “microcomputer,” the term Big Data has never had a clear or meaningful definition, nor has it ever described anything that is substantially different from data in general.

By Nate. June 28th, 2017 at 11:19 am

I like this definition:

Big data is…
“Whatever the labeler wants it to be; data that is not small; a faddish buzz word; a recognition that it’s difficult to store and access massive databases; a false (but with occasional, and temporary, bright truths) hope that if characteristics down to the microsecond are known and stored we can predict everything about that most unpredictable species, human beings.”

Stolen from here… http://wmbriggs.com/post/6465/

By rjss. June 28th, 2017 at 1:26 pm

Hi Stephen; the New York Times ran an article a few years ago about this which I finally found (https://bits.blogs.nytimes.com/2013/02/01/the-origins-of-big-data-an-etymological-detective-story/). At the end of the day some claims the honor goes to John Mashey (https://www.slideshare.net/amhey/big-data-yesterday-today-and-tomorrow-by-john-mashey-techviser). Although as most things it probably was just the way certain issues related to large datasets was described in different circles. His definition is pretty much in sync with what you see today in many references. That is broad and a movable target. Sure! As many people point out in their talks the term big data is not much about the actual data but the methods and techniques.

I guess having these discussions is natural especially when things are somewhat new. I will still say that the term is not a total failure. The same thing can be said about data visualization, high performance computing, fuel efficient car, etc. I agree that the use of Big Data as a buzz word hurt sometimes people that may not have a grasp of all the details and even used in unscrupulous ways but still help people group certain ideas/concepts.

By Stephen Few. June 28th, 2017 at 1:35 pm

rjss,

You keep suggesting that “Big Data” describes something new, but it doesn’t. That’s part of the problem with the term. It didn’t identify anything new or different. If it had, and did so in a clearly defined manner, it might have been a useful term. Given your opinion that “the term is not a total failure,” I’m curious to know in what sense you believe it has been of value.

By rjss. June 28th, 2017 at 6:40 pm

Stephen; I agree that the concept underlying “Big Data” is nothing new. I am just saying that the term exist now and if you look at the definitions in presentations, articles, or online dictionaries—that I have seen—use more or less the same language. There have been people that have tried to define what correspond a “big dataset” (in terms of the big data concept) but that is just a measure in time. Again big data is about methods or procedures used to analyze, store, visualize , etc. these large dataset and not the actual dataset. Same way if we try to define software/hardware used in big data it will change over time. Therefore I think is a general concept but it seems you want a more specific definition. In that case I agree; however, it is still useful at this point in history. Just a few examples where I believe is useful: 1) if you are looking for cloud services that help you store/analyze “big data” many companies will assume you are looking at services such as Hadoop; Spark; network with low latency and they package their services around categories such as Big Data; 2) communication materials (abstracts, titles) since it helps summarize or categorize that the techniques being presented correspond to a somewhat different spectrum. 3) similarly is a good tag/keyword for questions and information that circle around the methods and techniques related to “big data”.

I agree with your post in some respect. Companies use it (to their advantage) and sometimes misconstrue what you may accomplish. However, I think that can be the same for many other examples. Many marketing phrases or slogan are just that and people should also be a little more informed at what they are getting.

PS: As I was reading this post I got interested in how the article for Big Data changed over time in Wikipedia; it was a good compliment to look at the evolution of the article.

By jlbriggs. June 29th, 2017 at 6:27 am

@rjss – “Companies use it (to their advantage) and sometimes misconstrue what you may accomplish. However, I think that can be the same for many other examples. Many marketing phrases or slogan are just that and people should also be a little more informed at what they are getting.”

Well…yeah. That’s kind of the point. “Big Data” is more of a marketing phrase than anything else. That it’s the same with other marketing phrases is definitely not an argument for its usefulness as a real term.

In fact, your entire post seems like a confirmation of exactly what Stephen is saying. You are arguing for the usefulness of the phrase Big Data, while simultaneously demonstrating its lack of usefulness.

I don’t mean to attack. I am confused by your arguments, frankly. If I could sum up your post, I would say

“I know the term isn’t very useful, and yet, I think it’s useful anyway”

I am at a loss as to where this line of thinking gets us.

By Dale Lehman. June 29th, 2017 at 8:27 am

Definitions have always led to problems such as this. I was trained as an economist, and for years (when I had to teach introductory courses) I defined “economics” for my students as “what economists do.” That seemed to match the constantly changing scope of things economists applied their set of tools to study. That definition seemed better than the alternatives, but never quite satisfactory.

Stephen, I share your sentiments and rather like “data sensemaking.” Indeed, if fits what I do and what I hope to continue teaching people to do better. However, I fear that your post only works in opposition to the many efforts to sell “big data” products, hardware, software, training, degree programs, etc. In opposition to those sales pitches, it is useful to counter them.

However, I fear that the same type of criticisms can be leveled against attempts to declare that nothing has changed, that data sensemaking has the same meaning that it always has had. These claims of stasis, constancy, seem to me to be equally devoid of meaning. Surely there is something qualitatively different about the data analysis environment today than in the past. As much as I like to be the curmudgeon, I don’t believe that hundreds of graduate programs have sprung up solely because of marketing hype. Underneath the hype lies some real sense that the world is changing in some important ways (not to imply any sense that the world is “better,” only that it is “different”).

So, while I can agree with you in criticism of how “big data” is being used, I think that “nothing has changed” is equally assailable. Indeed, many statisticians appear to dismiss machine learning methods as uninteresting or unimportant. I find the whole subject of machine intelligence (self-learning algorithms, etc.) deeply interesting, disturbing, provocative, and somewhat challenging of my own training and abilities. Is it possible to discard “big data” and at the same time recognize that things are somehow different now? Or are you really intending to say that nothing is different (except in the obvious dimensions of size of data sets, etc.)?

By Stephen Few. June 29th, 2017 at 8:30 am

rjss,

I invited you to share the ways in which the term “Big Data” has been of value hoping that you would then see for yourself that it is fruitless to discuss the term’s usefulness given its lack of definition. I’ve identified the only possible benefits that can be ascribed to the term given its ill-defined nature: 1) it has sold products and services, which is only of benefit to those who earn the resulting revenues, and 2) it has encouraged some people to pay more attention to data, which cannot be considered a benefit without also taking into account the great confusion that the term has created. Rather than recognizing the fruitless of this discussion, however, you ascribed benefits to the term based on the assumption that it is defined in a particular way: “Big data is about methods or procedures used to analyze, store, visualize, etc. these large datasets and not the actual dataset.” This is not the commonly accepted definition of Big Data. If it were, we could discuss the usefulness of the term based on this particular definition. As it is, however, none of your examples of benefits are valid, for they are based on an invalid assumption. In other words, your argument is illogical.

By Stephen Few. June 29th, 2017 at 8:36 am

Dale,

I am not arguing that nothing has changed. Rather, I am arguing that nothing substantial about data and its use has changed that justifies the term “Big Data” as meaningful and useful.

By Dale Lehman. June 29th, 2017 at 9:19 am

I would be most interested in you expanding on what you think has changed. I think that is more interesting than continuing attacks on the term “big data” (and I am not pointing this at you any more than at myself). But I think that if we can clearly articulate what is different, that may be a more effective way to combat the misuse of the big data terminology (and its various permutations). In other words, people sense that things are different, but in the absence of being able to articulate how, they succumb to the hype.

By Stephen Few. June 29th, 2017 at 10:27 am

Dale,

The changes that I referred to are nothing but gradual improvements to data sensemaking technologies and methods. None of these changes justify Big Data as a meaningful or useful term. The term Big Data has been used to suggest that something substantial has changed in recent years regarding data and its use, which is not the case. The sense that inclines people to believe that something substantial has changed is entirely fabricated. As I’m sure you know, popular beliefs are not necessarily based on anything real. A significant percentage of people still believe that Sadam Hussein was responsible for the events of 9/11, which has no basis in reality. Big Data is nothing more than a marketing campaign.

By maverick. June 29th, 2017 at 11:33 am

Great Article…With all the hype and hoopla surrounding ‘Data Analysis’ in the last few years, someone took a pause to assess things. ‘Data sensemaking skills, augmented by good technologies, to support evidence-based decision making’simply makes sense and is enough for this world to bring the changes it needs!

By rjss. June 29th, 2017 at 1:21 pm

Stephen and jlbriggs. You may want to look a little closer at the information that I mentioned (links). I am somewhat confused now by your post Stephen. You mentioned a few examples and implied that Big Data is a bad term because (and I am summarizing here) lacks objectivity. What I reinforced is that the original uses of the word never implied the kind of objectivity you are looking at. It seems it started being used in conversations on the problems “with big dataset” or big data as John Mashey put it.

I am not sure I can argue this much more. Maybe some weird analogies will help. If you are buying a car and you are comparing “Model A” vs “Model A Limited Edition” most people will think that the limited edition have some extras and will help in the communication of a concept. I hope nobody make a decision to buy or not buy the car because some letters but it definitely help in communicating some information. In the same way “biG dAtA” have become the defacto standard to group certain concepts. Not sure what is the problem with that.

By Stephen Few. June 29th, 2017 at 1:46 pm

rjss,

Big data has not become “a de facto standard to group certain concepts.” It hasn’t become anything other than a term that means whatever people wish it to mean. As I’ve already mentioned, the earliest uses of the words “big data” have no direct relationship to the term “Big Data” that eventually became popular and remains popular to this day. I have not criticized the term for “lacking objectivity.” I have criticized it for lacking meaning and for deceitfully suggesting that something substantial about data or its use has changed in recent years.

I concur that you cannot and should not “argue this…further.”

By rjss. June 29th, 2017 at 3:57 pm

You mentioned again that the term in the 19o0s is not directly related to the popular term used nowadays. I am not familiar with this topic. Definitely a good topic about the etimiology of big data. I think the earliest entry in Wikipedia is 2010. If you have reference about the change in meaning you may want to contribute to the big data wikipedia article. I will think the meaning is somewhat different depending the background and is important for people to know these connotations. I guess we were arguing from two different perspectives therefore we are both correct and wrong at the same time.

By Martijn van der Munnik. June 30th, 2017 at 3:23 am

Big Data to me is a conceptual (often marketing driven) description of the holistic approach to data and information management as a whole. A way we describe all processes and tools involved in dealing data, generating insights into data and formulating information, regardless of format, volume or the velocity by which the raw data is collected.

As the article mentions, size, velocity and variety are all arbitrary and subjective measures. If you are used to dealing with 1000 line of excel, a 1GB database is your ‘big data’. If you are called Google, anything under a petabyte might be considered normal data. As with everything in life, all is relative and all is subjective, trying to describe anything in ‘absolutes’ is an exercise in futility. Great for markteers, but useless for the most part…

By Gimo. June 30th, 2017 at 5:02 am

I am confused, the author seems to be saying that there is no such thing as Big Data. But if that’s the case how do you distinguish the data that underlies very large data sets that run companies such as Google, Facebook, Twitter, The hadron collider, astronomical research, genomic research etc. These data sets are by their nature VERY LARGE in the order of magnitude bigger than any data you can import into EXCEL or any relational databases (Peta, Zeta bytes etc).

How do you distinguish large distributed data processing frameworks such as Spark, Hadoop, streaming data analytics etc.?

Any attempt to group these factors and technologies with the same data as was the standard in the 80’s-90’s will provide you with a very different concept. Try to explain someone who has been using SQL Server or Teradata and they will say “I have always had big data all of my data is big it’s 100GB+” so on, but this is false. If you are a DBA or an analyst using normal relational data sets you are doing something very different than Big Data you are using Normal Data analysis.

When you integrated the genomic data, store images, integrate social media feeds, so on and so forth does your data become truly “Big”. This “Bigness” is not just size though that could be a part of it on the order of magnitude bigger then whatever your relationship data ecosystem is like, it is also a variety of the data. This variety comes in all forms which cannot be processed same as a normal relational database.

Yes it is hard to define something that can by it’s nature very based on the enterprise or the industry you are talking about, but to say Big Data is no different from just “Data” will lose a very important distinction which developed from new technologies such as Hadoop and Spark to process very large amount of data in a different fashion than had been the practice prior to these technologies.

To ignore Big Data is to ignore the progress we have made in the past 10+15 years in data processing, data analysis, machine learning, research etc.

By Stephen Few. June 30th, 2017 at 9:01 am

Gimo,

I am not denying the existence of large data sets. I’ve been dealing with large data sets my entire career, which has spanned over 30 years. The fact that what we consider large has continued to grow over time has not fundamentally changed the nature of data or its use. The fact that technologies have continued to improve over time has not fundamentally changed the nature of data or its use.

Your definition of Big Data is not the commonly accepted definition of Big Data. The term has no commonly accepted definition. As such, Big Data means different things to different people. For this reason, it is meaningless.

We have also had large distributed data processing frameworks for many years. The products that you mentioned are just newer incarnations of distributed frameworks. They have not changed the fundamental nature of data or its use.

Data analysis, which I’ve been doing for a very long time, has not fundamentally changed in recent years. Data analysis is always evolving, but it has not fundamentally changed for a very long time.

To dismiss the term Big Data as meaningless and useless is not “to ignore the progress we have made in the past 10-15 years in data processing, data analysis, machine learning, research, etc.” I fully acknowledge and appreciate the progress that we’ve made. The term Big Data does not describe this progress. Progress over the last 10-15 years is much like the progress that we made during the 15 years that preceded it. Progress in the field of data sensemaking plods along continuously. We cannot draw a line in the sand at some point 10-15 years ago and say that it marks something that is so fundamentally different that it deserves a new name. Even if we could, the term Big Data wouldn’t suffice, because it is ill defined. The term is a deceit. It suggests something fundamental, substantia, and significant that hasn’t happened. As such, it serves only as a marketing campaign for product and service providers, and as a term that some data sensemakers like to use to make their work seem mystical and more important than that of their predecessors. Those who are confident in their data sensemaking skills work feel no need for “big” names to promote their worth. They are happy to do this work, and to do it well, because it is important and needed.

By rjss. June 30th, 2017 at 10:00 am

Stephen I am confused of your post; if consensus is what you are looking for then why we use “data visualization”. Many of the examples you provided are part of the definition of big data therefore there is some consensus. I though your post was more about the pitfalls of the term therefore you suggest we “ban” it. Some of us; including myself have tried to clarify certain things because the term is useful but we need to keep educating people about what is big data and try to remove bad connotations.

If we put the term of “data visualization” through this same standard then I feel we should get rid of it. We shouldnt be educating people about the term ” data visualization” instead. Maybe you have had the opportunity to work with many companies that have distributed framework before the 1990s and can provide some background information but not sure you truly mean nothing have fundamentally changed in the last 15 years. This is definitely is not my area of expertise but my simple understanding is that Google (I imagine other companies were working independently in same issues) was one of the originator when they published information about their GFS. That sparked Hadoop and many of the technologies we see today that are associated with “Big Data”. I have no proof of this but I think the phenomenon and term also cascaded for the following reasons among others:
1. Open source technologies (Hadoop; Spark); this was not available before 2000s
2. Cloud resources (no company in their right mind unless that was their thing; eg., Google, Amazon, etc.) would have invested the kind of money that was necessary to store/manage big data
3. Smartphones/Internet of Things; velocity or rate for the collection/generation of data which you mentioned

I hope the conversation steer into something more instructive. What should we do when a “term” is used incorrectly in some industries and circles. It seems in your circle that is the case. I don’t see that from my perspective and when I have seen it I see people correcting them. Aside from educating people is there something else we should do? Maybe a post about misconceptions of “big data” would be more educational.

By Stephen Few. June 30th, 2017 at 10:10 am

rjss,

As you read in the NY Times article about the origins of the term “Big Data,” the etymology isn’t clear. Whether John Mashey’s use of “big data” in the 1990s has a connection to the term that became popular around 10 years ago isn’t obvious. At no point has the term, once it came into popular use, had anything but a loose and malleable definition that has freely morphed to suit the perspective and interests of the individual who uses it. Everyone who defends the term as useful is only defending their own definition of the term. The very fact that their definitions widely and substantially vary demonstrates the meaningless and uselessness of the term.

I suspect that the term appeared around 10 to 15 years ago in some vendor’s marketing materials and that other vendors picked it up and began using it as well. It is common practice among vendors to pick up on the claims of others in an effort to compete with them for mindshare. The fact that the term has always lacked definitional integrity strongly suggests that it emerged as vendor marketing, which benefits from a lack of definition.

By Stephen Few. June 30th, 2017 at 10:29 am

rjss,

Many of our terms suffer from a lack of clarity, including “data visualization,” but most are not rendered entirely meaningless by this. Everyone agrees that data visualization involves the visual representation of data. We cannot say that everyone agrees about anything regarding the meaning of Big Data.

Regarding the Cloud, I worked with a version of the model 30 years ago. Prior to the Internet, many companies used remote data processing service bureaus to run their systems. As such, they used that service’s software rather than owning or licensing the software themselves. At the time I worked for a large bank whose computer applications were run on mainframe computers by an outside service bureau in Fresno, California, which the banks connected to via T1 phone lines. Other than the fact that the platform is now the Internet, the arrangement was the same.

Regarding Open Source software, are you suggesting that sharing software freely and allowing multiple people to contribute to it is a fundamental change in data and how we use it? Smartphones are just small computers. They have not fundamentally changed the nature of data and how we use it for decision support purposes. The so-called Internet of Things is nothing more than the proliferation of data from lots of devices that are connected to the Internet. None of these technologies have fundamentally changed the nature of data and how we use it.

It is too late to redeem the term Big Data. If you want to describe something that is fundamentally new and worthwhile about data and its use, you’ll need to begin with a new term and define it clearly.

By rjss. June 30th, 2017 at 11:30 am

Stephen; I am not sure I follow you. You have switched back and forth. You said that the term from the 1990s was disconnected from the way we use it nowadays. I sent some information that show how in certain circles it have not changed. So it is pretty well defined. The definition in your circle it seems to change and apparently it has been mutilated (although most examples you provided are part of the definition). I will just refer you to your own post about data visualization http://www.perceptualedge.com/blog/?p=2636 to show you that nor everyone agrees in the the definition of data visualization. I also want to remind you that the term data visualization have been used way longer than Big Data.

Saying that the internet or “cloud” is the same thing than the system we had in the 80s; I am not sure I follow you at all. Very confused! Just one comment I was using Cloud in terms that now many companies can just “rent” the infrastructure. That concept is relatively new. Is not the same thing to pay many million dollars and hire several server administrator to build the insfrastructure from the ground. “Cloud” was definitely a game changer in that.

Yes the availability of open source software such as Hadoop have fundamentally changed how we manage data. I am not sure why you keep mentioning that it have not changed data (of course not; I guess is the way Big Data is seen in your circle).

That Big Data is use in marketing and sometimes promises are make that are simply crazy; I have no doubt about that. I also have no doubt that big data have a specific meaning in many circles. The term may not be needed in many years but right now is how companies, academia and others use it to differentiate between two different animals. How starting to use a new term will help anybody?

By Alex. June 30th, 2017 at 11:50 am

Stephen,

Big Data is so passé. Why now it’s Data Lakes ( http://www.9sight.com/pdfs/Data_Lakes_and_Why_Business_Might_Want_One.pdf ). I was wondering what the grandfather of Data Warehouse, Barry Devlin, was doing these days.

Btw, I’ve had a couple of definitions of Big Data:

1. Big Data, n.: the belief that any sufficiently large pile of sh#t contains a pony with a probability approaching 1.

2. Big Data: A gullible customer (who read about Big Data in a magazine on a plane) walks into a bar with IBM, Oracle, and Microsoft …

Btw, thanks for the article.

By Dale Lehman. June 30th, 2017 at 1:12 pm

Since a number of people seem confused, and since I still think your post is less useful than it might be, I’ll try again. I have no quarrels about the term “big data” – let’s lose if once and for all. For the comments about cloud services, open source, etc. I also think these are somewhat off the point. If the only thing that is different now is that we use different hardware and software, then indeed nothing much has changed. So, I agree with Stephen in most respects (perhaps all) regarding his critique of the term “big data” and the way it is used (abused).

What I think is missing – and believe I disagree with – is the statement that “nothing but gradual improvements” have changed. I’d like to believe that (since I am comfortable with what I have learned in my own life), but I do think something is qualitatively different. This relates to other discussions we have had previously on this blog – and I’ve been involved in on other blogs. Many statisticians dismiss the variety of machine learning techniques as nothing much different than what statisticians have always done. This is related, as there is one group of people (whom I have great respect for) that believe that making sense out of data involves the same critical thinking as it always has – only some of the tools have changed. There is another group of people that seem to be fairly successful that do not feel the same need for statistical reasoning (they never say it is irrelevant, only that they often don’t need it). They seem to go about their work in a different way. Their use of deep learning models appears to offer replacement of much of the human critical judgement with machine algorithms. If I had to choose one field necessary for good data sensemaking in the future, it would be computer science and not statistics. Fortunately, there is no need to choose and both are useful and necessary.

What I am getting at is that the ways that data are now being collected and analyzed is becoming so different that it appears to be something new. While I am leery of fads and bandwagons, I do not believe that hundreds of graduate programs have suddenly emerged just because they buy into the hype. I believe they are onto something.

It is that “something” that I am looking for Stephen and others to discuss. I don’t know what it is – but I am not willing to accept that it simply does not exist (I may be persuaded of that, but not yet).

By Stephen Few. June 30th, 2017 at 1:32 pm

rjss,

You have not provided any information that indicates an etymological connection between John Mashey’s use of the term big data and the usage that arose 10-15 years ago. The New York Times article that you referred to did not claim an etymological connection between them. It merely indicated that Mashey might have been the first person to use this term for something that is somewhat similar to the way some people use the term today. For the current use of the term Big Data to be connected etymologically to Mashey’s use of the term, the current usage must have been derived from Mashey’s use either directly or indirectly. There is no evidence to suggest that this occurred.

If you don’t understand the difference that I’ve explained between the term data visualization’s lack of clarity and the term Big Data’s complete lack of definition, perhaps the problem resides in the fact that you are not a native speaker of English.

Your understanding that the Cloud refers to infrastructures that many companies can rent confirms my point. Thirty years ago many companies, including the bank that I worked for, rented information technology infrastrures and applications from remote sources. Nothing is different today except that connection to the Cloud is handled by the Internet. This concept is not new. The fact that you have not encountered it in your own experience does not mean that it didn’t exist prior to the Cloud.

Regarding open source and Hadoop, I said that they have “not fundamentally changed the nature of data or its use.” If you disagree, please describe in what sense the nature of data and its use has been changed by open source or Hadoop. You are obligated to provide evidence for your position.

The term Big Data cannot be used to “differentiate between different animals.” The reason for this is simple: the term has no commonly accepted meaning. If you want to differentiate between different sets of data or different uses of data sensemaking purposes, you’ll need to use a different term. If you don’t understand this based on what I’ve already said, there is no point to this discussion.

By Stephen Few. June 30th, 2017 at 1:46 pm

Dale,

With Big Data, there is no there there. The graduate programs that associate themselves with “Big Data” offer nothing that is fundamentally new. If you believe otherwise, please review those programs, as I have, and give me examples of the fundamentally new approaches to data sensemaking that they teach. These academic programs are merely looking for new revenue streams and leveraging a popular fad to do so. This, unfortunately, is not unusual.

In my opinion, machine learning is “nothing much different than what statisticians have always done.” Furthermore, I believe that those who claim that machine learning can operate independently from human interaction are both nuts and dangerous. Data sensemaking involves understanding. Computers have no understanding.

By Stephen Few. June 30th, 2017 at 1:50 pm

Alex,

Does Barry Devlin really claim to be the grandfather of data warehousing? If so, this is news to me.

By Dale Lehman. June 30th, 2017 at 2:13 pm

Machine learning independent from human interaction – dangerous, absolutely. Nuts? I’m not so sure. I’ve had to re-educate myself several times due to the way the data and tools are changing. I will always be more comfortable understanding the models I am building and being able to appreciate the uncertainty that is associated with any model. But I have been impressed by how well some machine learning techniques (such as random forests, neural networks, etc.) work – absent this understanding. Sure, there are problems with overfitting and spurious results, but there are also automated ways to protect against these problems. And, when you have hundreds of potential predictors, it seems to become infeasible to analyze data the same way I used to. So, I guess I feel like there is something different in how I analyze data now.

As for graduate programs, I am well aware of the pressures and trends as I actually direct one of these (without the “Big Data” jargon). While academic institutions can be pretty quick to jump on bandwagons, they do have a built-in conservative bias and I do not think you can so easily dismiss the rapid creation of hundreds of new programs as simply attempts to increase revenue streams.

Rather, what is different is the intense and serious interest that organizations (businesses, government agencies, nonprofits, etc.) have taken in making sense out of data. To an extent this is where I agree with you – much of what I teach is what I have always been teaching, but now people are seriously interested whereas in the past, not so much. The MBA statistics course used to be a dreaded requirement that students put up with (not my course, of course) and now it is one that is anticipated with excitement. This is a difference.

I don’t attribute the increased interest to the hype of big data – I won’t credit the term with any part in that. But somehow the combination of availability of data (often close to real-time), tools for accessing and visualizing it, and potential ways it can be used to improve decision making have created an environment that seems distinctly different than it was even 10 years ago. That is what I am trying to convey: “Something” is different.

By rjss. June 30th, 2017 at 2:55 pm

Stephen; maybe big data actually have a more defined meaning with non-english speakers. To keep it in the humorous side and keep it with the Italian theme; lets get rid of the word gelato since is not much different than ice cream and have brought some much confusion within the US frozen desert market. Also, please dont get me started with custard, sorbet, etc.

By Stephen Few. June 30th, 2017 at 4:40 pm

Dale,

In my work, I’ve observed that the organizations that are enthused about Big Data today are the same organizations that were excited about analytics 10 to 15 years ago, business intelligence 15 to 20 years ago, and data warehousing 20 to 25 years ago. No matter what they’re calling it at the moment, their enthusiasm is rarely more than skin deep. They invest in the latest technologies, launch a bunch of new initiatives and projects, and then continue to derive almost no value from data whatsoever because they never develop basic data sensemaking skills. I’ve observed this among many of the largest corporations in the world, including many in high tech. As someone who works directly with organizations of all types to help them make good use of data, I can say without reservation that the enthusiasm that Big Data has generated has not translated into tangible outcomes. The good outcomes that have been produced in the last 10 to 15 years are the result of a relatively small group of people who have taken the time to develop actual data sensemaking skills. Big Data has had nothing to do with it. It is nothing but misdirection from the real work that must be done.

Regarding the academic programs that associate themselves with Big Data, I don’t agree that these institutions are conservative about new ventures such as these. If it appears lucrative and potential clients (previously called students) are asking for it, they’ll provide it. To illustrate the point with a tangentially related event, a few days ago I corresponded with a local university that offers a graduate program in “Business Intelligence and Analytics,” because I was potentially interested in contributing to the program in some way. The tag line on the web page that advertises the program is “The Future of Business Combines Big Data with Intuition.” I decided to give it a pass when I discovered that no one who was involved in designing the program and no one who teaches in the program has any experience in business intelligence and or direct expertise in the work. You might ask, why then are they offering a graduate program in this field? The answer is because they recognize it as a revenue-generating opportunity.

You continue to say that “something” must be fundamentlly different today that is driving these programs, but you haven’t provided any evidence that this is true. People respond to marketing claims whether they are true or not, just as they elect presidents whose campaigns are based entirely on idiocy and lies.

By Nate. July 2nd, 2017 at 7:22 pm

Stephen, regarding your comment about universities and their programs: Years ago, my ‘relational database’ class was taught completely with Microsoft Access, and we learned about the *very relational* concepts like ‘MS Access Forms’ and ‘MS Access Graphical Queries’.

Long story short – In the USA, at least, I view Academia’s MIS programs as basically a joke, and their CS programs as mostly useless for anyone not going to work for silicon valley (or similar).

By Gimo. July 3rd, 2017 at 6:01 am

Stephen, you seem to be getting very defensive and I do not want to argue because on the Internet arguments are like banging your head against the wall for no reason, everyone is a loser no matter how many time your head meets the wall.

I think the points you are making are not backed up by actual examples or proof, you are showing no studies or research papers that validate your conclusion. You seem to be saying because you believe something it is true regardless of facts.

Data growth in the past 10-5 years and prior has been on orders of magnitude larger than anything that came 20-25 years prior, to deny that is to deny reality. Cloud industry has become defacto standard as an IT standard practice, which was not the case 15-20 years ago. This is a reality if you don’t like the term “Cloud” simply don’t use it, but for the majority of people same as with “Big Data,” it makes something easier to understand.

There are several types of folks using the term “Big Data” they are technical folks with the background to know what they are talking about and non-technical folks to whom it just presents a new concept in information technology. The fact that to a technical person is is nothing “New” about the overall concept of analytics and data gathering and storing etc. To a non-technical person, this is all new. This all aligns with the growth of data and it’s important that more people are now interested in it.

Who are we to say that general terms cannot change? Does capitalism work because there is a demand for something and companies meet that demand by providing software and services? If Big Data was all hype how could the industry be worth 150 billion $ by 2020? Yes, there is always Hype and bubbles around new technologies and thus we may be witnessing such activity. But to deny completely that nothing new is happening is almost like saying just because Humans always lived in caves and had a fire, that now we are all living in houses with electricity nothing has changed fundamentally.

I understand that you would like to get rid of the term “Big Data” but you are not offering any actual alternative you are just saying, I do not agree with the way it is used and I am too knowledgeable to agree with using technical terms but for not-technical concepts. As a non-technical term, Big Data succeeds as it increases the focus on data and its importance in transforming our society.

On business intelligence/analytics degree, how can you teach something that has no accepted definition? By its definition it is something that can be done many different ways with different systems and there is no one way of doing it, so does this mean it cannot be taught and then only experts can teach this topic? If this was the case there wouldn’t exist many courses offered by non-profit/for-profit schools. Which may be a good thing but I tend to err on the side of markets that if there is a demand there must be some value the product/service provides.

By Mark Pawson. July 4th, 2017 at 3:12 pm

I agree that the term Big Data is meaningless; but Data Lake (?!) is worse. Both speak to the age of social media and specifically Twitter where instant gratification, i.e. don’t make me think, is the norm. Data Sense Making is not a nice short pithy term that catches interest. I like it, it speaks to me but that’s because I have put the time into reading two of Stephen’s books and practicing it when I can. In that case Stephen’s arguments are bang on. We should drop the term Big Data.

However, if a company develops a successful strategy based on data sense making that delivers value to its customers and therefore increases profits for itself, all because the CEO caught the buzz of “Big Data” then great, celebrate and educate later.
Perhaps this is an example of the “something has changed” that other comments were inferring “Big Data” was responsible for. But it requires those who jump in on the band wagon be surrounded by people who understand how to make sense out of data so it becomes both usable and useful.

By Stephen Few. July 5th, 2017 at 3:50 pm

Gimo,

None of your arguments are valid. I’ll address each, in turn, to explain how this is so.

To be defensive, I would need to have something to be defensive about. I have exhibited no signs of defensiveness. Rather, I have merely responded to comments with reason and evidence.

You said that my arguments are not backed up by actual examples or proof. This is not the case. I have made two basic claims: 1) Big Data has always lacked definition, which has rendered it meaningless and useless, and 2) the claim by Big Data proponents that the nature of data has fundamentally or substantially changed in recent years is false. The article that I wrote is filled with evidence to back my first claim. Regarding my second claim, the only way to provide evidence to support the claim that something has not happened is to refute claims that it has happened, which I have thoroughly done. I do not believe something regardless of the facts. My position regarding Big Data is similar to my position regarding God. In the absence of evidence that something exists, I choose to believe that that it does not exist until shown convincing evidence to the contrary.

Regarding increases in data volume, I have never said that data has not increased dramatically in the last 10-15 years. It indeed has. What I’ve said is 1) these increases in data volume have not fundamentally or substantially changed the nature of data or its use, and 2) data volume did not suddenly begin to increase exponentially in the last 10-15 years. When in your career has data not increased exponentially in volume? Unless you’re ancient, I can answer this question for you: never. Data didn’t suddenly begin to increase by orders of magnitude in the last 10-15 years. If you believe otherwise, you’re welcome to provide evidence that it has. Even if data volumes had been increasing at only a gradual pace and then suddenly began to increase exponentially 10-15 years ago, this would not in and of itself represent a change in the nature of data or its use.

I have no trouble with the term “cloud.” Unlike “Big Data,” it actually has an agreed-upon definition. Someone else injected the term cloud into this discussion as evidence of Big Data. I simply pointed out that the cloud represents nothing new about data or its use.

I have not said that “general terms cannot change.” They can and do. I’ve said that Big Data has never had an agreed-upon definition. A definition must first exist before it can change.

You said that I have not offered an alternative to the term Big Data. An alternative to the term Big Data is not needed. Data is data, no matter how big or small. Also, one cannot propose an alternative for a term that lacks definition.

I have not said that nothing new is happening. I have said that neither the nature of data nor its use have changed in fundamental or substantial ways during the last 10-15 years. If you believe otherwise, please identify specific changes that constitute so-called Big Data and back them up with evidence.

Regarding business intelligence, when the term was originally coined by Howard Dresner, it was defined. Unlike the term Big Data, despite diversity in the definitions that have since been proposed, the term business intelligence has always contained a core of meaning: methods and technologies that support the use of data for decision making. As such academic programs in business intelligence may be offered. As is true of all academic disciplines, business intelligence can only be effectively taught be people who have expertise in business intelligence. People who lack expertise in the subject matter are not qualified to teach it. You seem to suggest that I have argued that business intelligence cannot be taught. I have not. You are conflating business intelligence and Big Data.

I have now addressed all of your points. Your arguments are logically flawed. If you wish to respond, please respond to all of my points and to so with reason and evidence, as I have done for you.

Similar to most people who work with data, you apparently have not been trained in critical thinking. You cannot make a valid argument without understanding the basics of logic. Furthermore, you cannot be an effective data sensemaker without training in critical thinking.

By Evinton Antonio Cordoba Mosquera. July 6th, 2017 at 9:58 am

I’m with this topic, Big Data is a buzzword that need to be more clean. People are using it depending of their bacon business. It’s clear that the data is increasing lately, but most of them are messy.

we don’t need of large data to make sense in data, such that there is not direct relation between Big Data and Data Sensemaking.

I only hope that with Causality Data not happen the same.

By David. July 14th, 2017 at 4:08 pm

Stephen,

Great article. I have to say your experience has matched my own. I have met a number of very intelligent people who bristle when I cast aside “big data” as a meaningless term with a variety of definitions depending on who is saying it. Each of them has rebutted me my providing a definition that they believe is “the one.” Of course, that one definition is at least slightly different every time.

I will argue with your assertion that machine learning is not fundamentally different (or “not much different”) from statistics. Deep learning via neural networks is one of the most popular methods in AI right now. It certainly isn’t “new” as a general method (the only thing new is that we actually have the computing power to tackle problems of reasonable size now), but it is fundamentally different from traditional statistical approaches.

None of this changes the uselessness of the term ‘big data’, of course.

By Stephen Few. July 14th, 2017 at 5:03 pm

David,

I believe what I said is that machine learning is “nothing much different than what statisticians have always done.” I wasn’t talking about the mechanism. Over time technologies have altered the mechanisms. These mechanisms, as I understand them, however, are rooted in statistical principles and methods that have been around for awhile. Statisticians have always taken advantage of new technologies to support their work as they emerge, but the fundamental nature of the work has changed little. Am I wrong?

By David. July 17th, 2017 at 9:19 am

Stephen,

I did a little research on this to try to figure out whether my bias was clouding my judgment on this. I concede the point. There do seem to be many folks much smarter than I arguing that the overlap between ML and Stats is greater than I would have thought. Or rather, that though there exist methods in ML not based on traditional statistics, those exceptions don’t represent a fundamental difference in, as you would say, the nature of the work.

By Benazir. July 28th, 2017 at 1:33 pm

I’ve had clients just throw this question at me: We want to invest in Big Data. What should we do?
That stumped me a few years ago and it still does. Companies just think Big Data is some kinda juggernaut that’ll just end their woes and make them suddenly smart and intelligent. Most of them relate Big Data to a sort of Sentiment Analysis scheme and many others thinks of a big database somewhere that they need to pay to get access to.

Big Data – the term is a complete pain to address! Thank you Stephen! Now I know I’m not inadept at explaining it to others. I always tell them that BI has since inception been about doing exactly this – making informed decisions based on data (really really big data – pun intended). And that Big Data isn’t some radical new thing – it’s just a nice sounding new term. What has changed, however, is how we visualize data and how great we’re getting at combining/comparing heterogeneous sources/kinds of data. Now that? That you should invest in.

By Stephen Few. July 28th, 2017 at 2:11 pm

Benazir,

Do you really believe that “how we visualize data” has recently changed in a significant way? If so, how so? The principles, best practices, and effective forms of data visualization have changed relatively little. More tools support data visualization today than in the past, but most of them support it poorly.

By Benazir. August 5th, 2017 at 5:21 am

Stephen,

When I started creating reports as a naive developer, I’ve seen executives mull through tables and static charts. “How we visualize” has changed for the better with reports that are laced with interactivity. Hope that explains my point.

Rgds.

By Stephen Few. August 5th, 2017 at 8:18 am

Benazir,

Interactive displays of data are hardly new. I’ve built interactive reports throughout my 30+ year career, originally hosted on a large mainframe computer. It is true that the ease with which we can interact with data has gradually improved as interaces have improved (e.g., the introduction of dynamic filters), but this isn’t something to crow about. Rather than being impressed with our progress, we should be embarrassed that we’ve progressed so little. Similarly, it’s a vast overstatement to say that we have become “great” at “combining/comparing heterogeneous sources/kinds of data.” Once again, we should be ashamed that we continue to do this so poorly given how long we’ve been at it. Those who have been doing this for as long as I have regularly commisserate about the fact that people and organizations largely struggle with the same things that we were teaching them to handle with greater skill many years ago. Little has changed. BI/analytics vendors have been making the same hollow promises of improved functionality and ease of use with each new release of their software, hoping that we fail to notice that they’ve never delieverd on those same promises in the past. The little progress that we’ve made is largely the result, not of the vendors, but of a relatively small group of people who have taken the time to develop their data sensemaking skills rather than spending their time constantly learning new tools. If you were speaking of these skills when you wrote, “That we should invest in,” I agree wholeheartedly.

By Benazir. August 10th, 2017 at 12:51 pm

30yrs! Wow! I can’t argue with that kind of experience. But I’ve certainly felt a difference in the ease of use that BI tools currently sport in terms of bringing interactivity to the table. If you feel that whatever progress BI tools/ vendors have made is just overhyped, I wouldn’t object to that. We’re far from perfection and even today, mostly depend on duct-taped solutions! And that’s precisely why I’m excited at the offering our team is working on – will reveal soon! Wish us luck ;)

By Jim Linnehan. September 2nd, 2017 at 3:43 pm

Stephen, this is off-topic —

In a response to Gimo, you wrote, “Similar to most people who work with data, you apparently have not been trained in critical thinking. You cannot make a valid argument without understanding the basics of logic. Furthermore, you cannot be an effective data sensemaker without training in critical thinking.”

In your opinion, what would constitute some of the elements of a serious curriculum on critical thinking?

By Jim Linnehan. September 2nd, 2017 at 4:58 pm

Apologies: Before posing my question, I should have run the following Google search:
site:perceptualedge.com “critical thinking”

Leave a Reply