♪ I love it when you call me Big Data ♪ Welcome to the Dr. Data show! I’m Eric Siegel. Data science, big data, what
the hell do these buzzwords really specifically mean? Are they just cockamamie? Intentionally vague jargon that
overhypes and overpromises? Or are these terms actually helpful? Do they somehow designate,
like, the most profound impact of the Information Age? Well, I’ll start with the
vague and overhyping side and then circle back
to why these buzzwords may matter after all. It’s time for the Dr.
Data Buzzword Smackdown. There are a lotta
problems with these words. First, data scientist is redundant. It’s like calling a
librarian a book librarian. If you’re doing science,
it involves data, duh. Furthermore, and don’t
tell anyone I said this, but real sciences like
physics and chemistry don’t have science in their name. Your science is trying too hard if it has to call itself a science. Social science, political
science, data science, and I gotta say, even though
I have three degrees in it and was a professor of it, computer science is an
arbitrarily defined field. It’s just the amalgam of
everything to do with computers, as a concept and as an appliance, from the engineering of how to build them and the deep mathematics about
their theoretical limitations to how to make them more user friendly, and even business strategies for managing a team of programmers. Universities might as well also have a toaster science department, which covers the engineering
of better toasters as well as the culinary arts
on how to best cook with them. But I digress. Okay, next buzzword, big data. First of all, it’s just
grammatically incorrect. It’s like looking at the Pacific
Ocean and going big water. It should be a lotta
data or plenty of data. But the real problem with big data is that it emphasizes the size. ‘Cause what’s exciting about data isn’t how much of it there is per se. It’s about how quickly it’s growing which is amazing, by the way. There’s always so much more data today than there was yesterday. So we’re gonna run out of
adjectives really quickly. Big data, bigger data, even
bigger data, the biggest data. Actually, there’s been a
long-running conference called the International Conference on
Very Large Databases since 1975. I’m not joking. That’s before the first
Star Wars movie came out. Now, in some cases, people
use the terms data science and big data just to
refer to machine learning, i.e. when computers
learn from the experience encoded in data. That’s the topic of most
episodes of this program, The Dr. Data Show. It’s a show about machine learning, which is a well-defined field and, by the way, is also often
called predictive analytics, especially when you’re
talking about its deployment in the private or public sector. I would urge folks to use
the well-defined terms machine learning or predictive analytics if in fact that’s what you’re
specifically talking about. But as for data science and big data, in their general usage they suffer from a terrible case of vagueness. They have a wide range of
subjective definitions, which compete and conflict. Basically, they’re often used
to mean nothing more specific than some clever use of data. The terms don’t necessarily refer to any particular technology,
method, or value proposition. They’re just plain subjective. You can use them to mean
whichever technology you’d like. Machine learning, data visualization, or even just basic reporting. But much worse than that,
this vagueness often serves to mislead and misrepresent
by alluding to capabilities that don’t exist. For example, the popular press, as well certain analytics vendors, sometimes use data science to denote some whole collection of methods that includes machine learning as well as some other advanced methods. The problem is, those other
advanced methods are implied but often actually don’t really exist. They’re vaporware. This confusion is sometimes inadvertent, such as when journalists
aren’t fully knowledgeable of the topic yet want it to
sound as powerful as possible but, either way, the end
result is souped-up hype that overpromises and
circulates misinformation. All these issues, by the way, also apply to the
older-school term data mining, also totally subjective. Besides, calling it data mining is like instead of gold mining,
saying dirt mining. Malfunction, failed analogy. ‘Cause we aren’t searching for data, we’re searching within data. So now you’re probably asking yourself, how could Dr. Data come
down so hard on these words if he loves data so much? Well, no, Dr. Data
doesn’t hate these words, only the misleading ways in
which they’re often used. Dr. Data’s love for data is fully intact. After all, he named himself after it. Anyway, let’s talk data for a moment. These buzzwords are all
data this and data that. So what exactly is all
the fuss about data? I mean, most people couldn’t
be less interested in data. The non-geeks out there
think it’s the driest, most boring word ever. The word data is a deal-killer
at cocktail parties. I know from personal experience. I have the data. And data just grows like a weed anyway. It’s so indiscriminately
collected and warehoused, like some bland, uninteresting residue that companies dump into the cloud as they transactionally
churn away endlessly. But, no, that’s wrong. Actually, let me make a correction. It isn’t indiscriminate. The stuff logged into
all these memory banks are exactly the things that matter. That’s why they’re being recorded. People think data’s boring
because they’re overlooking the fact that data is experience. It’s a long list of prior events from which it’s possible
to analytically learn. In fact, we could say
that data is powerful and all-encompassing
for the very same reason that it’s misconstrued as boring, which is that it’s very abstract. Data can mean anything and everything. In its most abstract, it
means nothing in particular, but in the particular, it always means something valuable and interesting. Every medical diagnosis,
medical procedure, credit application, phone
call, Facebook post, movie viewing, ad click,
fraudulent transaction, spammy e-mail, traffic
camera passed, flight taken, earthquake, purchase,
successful or failed sales call, each positive and negative
outcome of any significance is encoded as data somewhere. There are quintillions
and quintillions of bytes. That’s my Carl Sagan impersonation. Data grows by an estimated
2.5 quintillion bytes per day. A quintillion is a one
with 18 zeros after it. And here’s the big win. We can improve everything with this data. All the main functions and
day-to-day operational decisions of companies and governments are exactly what these data streams are recording. Therefore, data records exactly the right, relevant experience to
apply predictive analytics where it’s needed most. We have just the right
data for this technology to learn how to streamline
the major operations behind financial risk management, fraud detection,
marketing, law enforcement, healthcare, and manufacturing. Boom! This is major. We’re witnessing an
epic, fundamental shift in how technology integrates with, alters, and improves society and its functions. And so data isn’t the
most boring after all. In fact, it’s the most sexy? The Harvard Business Review
declared data scientist the sexiest job of the 21st century. I mean, really? Data people are the most sexy? That’s great news! Geek is the new chic. It’s hip to be square. You know, I had always
assumed the sexiest profession was firefighters, but who knows. Maybe it’s just the hard hat. This is a picture of me dressed up as a data miner for Halloween. Actually, the New York
City Fire Department uses predictive analytics to triage and prioritize the
inspections of buildings with the highest risk of fire. Yet another priceless
application of machine learning. Anyway, we actually
produced a rap music video about predictive analytics
and how being a data geek affects your social life. It’s the the best ever
educational predictive analytics rap music video ever created ever, period. And also the only one. Just three and a half minutes long. You can check it out at PredictThis.org. In conclusion, there’s a
lot to be excited about when it comes to the data explosion and what we can do with it. The buzzwords are kinda
inane when viewed up close. Perhaps an equally appropriate
and less misleading buzzword for all this would be datapalooza, but, in any case, the terms
really allude to a culture of smart people doing creative things to make value of all this data. Today’s totally historic
advent of having data about everything and
using data for everything is mind-blowingly profound and important. I’m Eric Siegel, thanks for watching. Hit like and share this video
if you think your friends were also wondering what
the hell data science and big data really mean. And for access to the entire web series, go to TheDoctorDataShow.com. ♪ Who’s your data? ♪ ♪ Provide me the data to improve ♪ ♪ And I’ll apply the computation ♪ ♪ I love it when you call me Big Data ♪ ♪ Predictive analytics can
help you with decisions ♪ ♪ You can call, mail, credit,
or hire with precision ♪ ♪ On law, love, and life,
you can prognosticate ♪ ♪ Whom to investigate, incarcerate, ♪ ♪ Set up on a date, or medicate ♪ ♪ Charlie Brown never gets his kicks ♪ ♪ That’s why every old dog
needs a brand new trick ♪ ♪ If you get sick of chasing sticks ♪ ♪ Or clicks with just a quick fix ♪ ♪ You need to learn to predict ♪ ♪ I can predict your every move ♪ ♪ Just gimme all your information ♪ ♪ Who’s your data? ♪ ♪ Provide me the data to improve ♪ ♪ And I’ll apply the computation ♪ ♪ I love it when you call me big data ♪

2 thoughts on “What the Heck Does “Data Science” Really Mean? The Dr. Data Show”

  1. Thanks for watching The Dr. Data Show! To sign up for notifications of future episodes and for more info, see: http://www.TheDoctorDataShow.com.

  2. I can't agree more. I started learning data science a couple months ago, and even though the internet is a great source of information, NO ONE agrees on what data science means exactly. I finally realized data science is better understood as just a collection of different concepts, theories, and tools related to interpreting and extracting information out of data.

Leave a Reply

Your email address will not be published. Required fields are marked *