– [Interviewer] Hello and welcome to Experian’s Weekly Data Talk, a show featuring some
of the smartest people working in data science. Today we’re excited to talk with Bill Vorhies, he’s the editorial director at Data Science Central, and the president and chief data scientist at Data Magnum. Bill, thank you so much for being part of today’s broadcast. Can you share a little bit about your background and what got you started working in data science? – Well hey Mike, thanks
for having me today. Yeah my own background. In the ’90s I came out of industry and went into consulting in management consulting where
I ran the consulting shop at JD Power and Associates, and then I moved over into
Big Four consulting at Ernst & Young and PricewaterhouseCoopers. And during those experiences
I had a couple of projects that required linear regression as part of the solution, which was pretty magical at the time. So in 2001 I came out with a partner and I started my own company called Predictive Modeling, LLC, PreMo. I must say that it’s possible to be almost too early in a market. Well then, there wasn’t
really much business going on. There might have been 10 or 15 independent consulting companies like my own at the time, and the, it was really thin, there wasn’t much air to breathe. But we persevered until big data came along in 2007, and since then we’ve been focusing on predictive analytics, big data, streaming, IoT, AI, and then for the
last two and a half years I’ve been very honored to
be the editorial director of DataScienceCentral.com, which really causes me to stay abreast of the whole industry in a way that I really
didn’t have to before. – [Mike] Yeah and for those that haven’t been reading
DataScienceCentral.com, highly highly recommend it, Bill does an outstanding job of not only writing and curating articles, but it’s really just a huge resource for anybody who’s
interested in the trends, what’s happening in the industry, so we’ll make sure to put the link to DataScienceCentral.com
in the About section of this YouTube video, as well as in the comments
of the Facebook video, because it’s outstanding. And in fact, one of the reasons why we have Bill joining us today is because of an article that he wrote a couple weeks ago about artificial intelligence
and the ethical dilemma that we’re now unexpectedly facing. And there are several things
about your article, Bill, that caught our attention, and first is the issue of ethics associated with AI. Something that not a lot of
us have been thinking about, especially for those of us
who are new to the field or just watching the field like me. And you describe this
problem of ethics in AI as something that’s urgent. And I wonder if you can kinda share a little bit about what urged you to write this article, because it is one of the most
popular articles right now at Data Science Central. – Right, and believe me, I was kinda caught off guard by this too. You know, there were two studies that caught my attention. One of them’s almost a year old now, November of 2016, and two research academics out of universities in Canada and China cooperating together, published a peer reviewed study, that showed that they could identify criminals from non-criminals based solely on facial recognition with 89.5% accuracy. Well that kind of blew me away. – [Interviewer] Yeah}
– At the time, I wrote an article that says, “Man, have we gone too far?” And then just about a
month ago, September, two research academics from Stanford published a peer reviewed article claiming that they could
tell sexual preference amongst men and women with 91% accuracy for men and 83% accuracy for women. Well you know, holy cow. One of the reasons people
have to be suspicious of AI has a lot to do with
online privacy and bias, but here are two examples that clearly go much beyond that into
much more personal issues than just online bias and privacy. This is an application of AI where you can’t opt out, and where it impacts basic social issues on our relationships with other people, and you know, you really have to wonder what types of abuse uninformed governments or police might try to put this to. So this has some crossover with the folks behind the Right to be Anonymous movement. But specifically with regard to our physical characteristics, these are things that we
can’t, we can’t change or at least can’t easily change. Now to the issue of ethics. And if you look it up, ethics and moral philosophy are pretty much the same thing, and they’re about defining or recommending concepts of right and wrong conduct. Missing from that definition
is the implicit fact that these are human judgments about human behaviors. So you might say that ethics
is a human interpretation of the consequences of
our actions toward others. So this may be the first time in history that we’ve found it necessary to apply restrictions of
right and wrong conduct to non-human entities, because it’s the first time that a mechanical or non-human entity has been able to interact with us in a way that would cause us societal harm. And that obviously is also
the reason why it’s urgent. – [Mike] Yeah no doubt, and I was, when I first read some of those studies, I was shocked as well, I was really surprised that simply through facial recognition, an AI could be that accurate in determining
sexual preference, or criminal activity or motive. And it was just shocking, and so like you said, this is now an ethical dilemma around personal privacy and I mean, this is something that like you said, you’re not opting into. Just walking in the streets, a camera picks up your picture, right? – Mmhmm. – [Mike] And now you’re in a database, and it’s now being tracked
for various things, and so are there any other studies that you were looking at that kind of concerned you as well? – Well yes, but you know, in the spirit of fairness, let me tell you a little
bit more in detail about the two studies themselves, the criminality study and
the sexual orientation study, because I think it’s important that the audience understand that these really were peer reviewed and basically good data science conducted in an academic environment. Let me use my notes here, and I’ll tell you a
little bit about these, and then I’ll answer your question about other studies. I mean, in this criminality study, these are two professors, one out of McMaster University and the other out of a
university in Shanghai. They looked at 1856 ID photos, and these were Chinese males, 18 to 55, no facial hair, no facial
scars or other markings who were known to be convicted criminals, both violent and non-violent crime. They compared those with ID photos of 1126 non-criminals with similar socioeconomic backgrounds. And you know, in facial recognition you use a variety of points on the face to derive a feature set. So once they had that feature set and the two classes designed, they used several different types of supervised logistic regressions logistic regressions, K Nearest Neighbor, support vector machines, and convolutional neural networks. And as you might guess,
the CNNs and the SVNs gave them the strongest results. In their credit, you know, since we all know that CNNs can sometimes be confused by random noise, they actually injected three percent random noise into their test, and they still got good results. So can’t fault them on the data science. Then in the case of the Stanford folks, Professors Wang and Kosinski, who by the way have been
getting a lot of pushback on this study, first of all we should be
clear that the 91% accuracy that they claimed for
men and the 83% for women required them to have five images. If they just had one, the result was much less clear. But you know, as a society we joke about having gaydar, and how we think as a general
public we can spot those folks that have different sexual
preferences from ours, but these two folks actually
ran a controlled study, and showed that only about
61% of the time for men and 54% of the time for women
could folks simply guess from a picture as their
personal preference. So that’s not a whole lot
better than a coin toss. Again they used a facial
recognition program that’s real well known, and they had 130,000 images of men, 170,000 images of women, they were 18 to 40. They got these off of dating websites where folks self-reported as
heterosexual or homosexual. And it was US located Caucasians. And once again they used a deep neural net facial recognition program, they got 4,000 attributes, they used a simple logistic regression, and I must say that while
the Chinese researchers did not make any comment
about the importance of their research, that at least in the case
of the Stanford study, they at least said that look
out, because here’s a chance that this might be abused. So those are the two that I
referenced in the article. But you asked about other studies. Two that I’ve come across recently. There was a, a study in a psychology journal, ELOS, that used the digitized
voice characteristics of couples in marriage counseling to determine whether or
not their relationships would succeed or fail after two years. – [Interviewer] What?
– Yeah, pretty long longitudinal study. And they got strong results. So the question is, boy, are we going to go
to a marriage counselor and he’s gonna record our conversations and say, “Well listen,
it’s not gonna work.” You know, a much less friendly example in this last year in Russia, there is a Facebook equivalent that lets you use facial characteristics to locate people in social media. Well, apparently these
folks, these trolls, used pictures of sex workers to cross-identify sex workers with their actual
identities on social media and then outed them. And of course caused a lot of harm, and also a lot of inaccuracy. But you know, that’s kind of the other end of the spectrum here, that we’re really concerned about. – [Mike] Yeah no doubt, I mean the cases you’re bringing up are things I have never heard of. The earliest examples I think I remember reading about were about autonomous vehicles and decisions that vehicles have to make. – The trolley problem, yeah. – [Mike] The trolley problem. Can you describe that
for those listening in? – Oh the trolley problem is a classical philosophical problem and it has to do with
who you’re gonna hurt. And the origin of the name comes from an example that was about
a rail-mounted trolley where the operator had the opportunity to either pull a lever and
injure his passengers or not pull the lever and
injure the bystanders. And it’s one that comes
up in philosophy classes over and over again and it doesn’t have any good answer. But it is a reason to wonder
about autonomous vehicles and what rule has been built into them. I mean, am I gonna get into a car that has been programmed to say, “Oh 40% probability, I
better hurt the passenger.” Well, thanks, but anyway. – [Mike] Mmhmm. Yeah it’s fascinating
how much just recently in the news how we’re reading more and more of these stories about AI and, like you’re saying the
ethical implications of this. There’s also that recent example where, was it, that Twitter account that was created, an AI Twitter account, that was within 24 hours became racist and prejudiced based on people tweeting back to it, and it was learning from people. – Right, right. That’s the, let’s see, that was Tay and that was Microsoft, and that’s a real good example, and the point that we
really need to make here is that our AI as it exists today is just a machine. It does not understand what it’s saying. It does not have the ability to, to act on its own accord to try and act against us as human beings. So a lot of what we read about bias and prejudice in AI today follows two memes. One of them is oh, the
system’s biased against me, it’s gonna try and do
something that I don’t like. Or these are our robot
overlords on the way, and so we should be concerned. Well you know, that, let’s deconstruct this a little bit. While there are some concerns
that we should probably have over online and geographic
tracking privacy, my own take on that is
that that’s fairly minor. I think that you want to remember that that type of tracking
increases the efficiency in time and also reduces
cost of advertising by presenting the stuff that, you know, frankly, that most aligns
with our interests. So you know, if you
want to protect yourself from that stuff go ahead, if you’re motivated to
withhold your information that’s fine, you’ll be giving up a little
convenience and economy. But the second case is much more close to what your question about Tay is, and that is, are these robot overlords going to be able to do something to us that we really don’t want? And so let’s talk about chatbots, of which Tay was an example. So no matter how sophisticated they are, and by the way it was
just in the news this week that there is now a chatbot
that you could call up and get psychological
counseling from, alright? But that chatbot has no understanding of the conversation that
it’s having with you. It is a combination of some very clever natural language processing combined with a very, very large dataset of previously successful conversations, and it’s simply learned what is successful relative to your question. So now okay, so Tay. Perfect example, 2016. So Microsoft unlooses two
versions of its chatbot, one in Japan, and one
in the United States. Well the one in Japan, it got rave reviews and pretty soon there
were young single men in Japan actually pledging
their love to Tay, because she was so
sensitive and so responsive, then oh wow, they actually were. In the United States, unfortunately, some clever but bad
meaning folks figured out that Tay learned from what you told her. So in the space of 16 hours, they got Tay to start spewing very sexual and anti-Semitic and
frankly Nazi language, and Microsoft obviously had to pull it. But it’s a good example, because if you compare it
to example for, with a– (loudspeaker voice interjects) – [Mike] I’ll go on mute. (both laugh) – But, not to get shaken up, but you know, if you
compare this for example to the behavior of, say, a
three or a four year old child, raised in a home that’s
filled with hate speech, you know that child is
looking for approval and love from those adults that are having that point of view, and that’s why it continues
to say those things. The chatbot’s not the same. They have no human component. And as a matter of fact, it’s important that the audience remember that these AIs are actually what we would call very brittle. If you change their sensors, if you change their actuators, if you let their body of knowledge contain outdated or incorrect information, they’ll just fail outright. Also, you know, these systems can’t learn from one system and apply it to another. And even if they could– (music plays over loudspeaker) Okay, and we circle have, good. (both laugh) And you know, even if they
could learn from one system and apply it to another, it would be we humans who
would have to tell them what the goal of the game is. So in interaction with humans, it’s about answering your
customer service questions, but it’s not about
manipulating your feelings to the benefit of the AI. – [Mike] Yeah I’m glad you mentioned the overlord scenario, because it’s definitely gotten
a lot of buzz in the news, proponents like I think Elon Musk and others have talked about
their concerns about AI, right, in the future. And also just the ability to be able to take control and do devious things. Do you have any concerns about that, or just tell us your thoughts. – Well I, I think that Elon is thinking way out in the future, so first of all, let’s take a breath. (both laugh) Remember to look more at the
doughnut and not at the hole. We’ve got some time. And also, words like ethics and privacy and even artificial intelligence are, there was, I think everybody
knows who Marvin Minsky was, and he said that these
words were suitcase words. And that is words that
carry so many meanings that each person looking at the problem is going to see them from a
totally different point of view. So you know, from the
wizard side of the curtain, if you can say that about data science, I think we need to help
the public and the press not to start with their
default of hyperventilating which is so much the case these days. But yeah, we should be concerned about who’s gonna use and
who’s gonna misuse this stuff, like the trolls and what, in Russia. Also in the press this week, Turkey is having an LGBT crackdown. Are they gonna look to this software and start using public cameras? Of course they have not said that, and I hope they’re not, and they would be wrong if they did that, but a couple of things
to always keep in mind. Data science, okay. This is correlation, not causation. Especially with respect to criminality, we’re not recreating the
minority report here. We’re not gonna arrest people because they look like
they might be criminals. Second, even the most
transparent of models have error rates associated. There are no models
that are 100% accurate. So you always have to be
aware of false positives, and yeah, even false negatives. But, but particularly the
potential for any of these models to predict that someone is criminal, or someone has a sexual
orientation that you don’t like, and it may be wrong in
a significant percentage of the cases. So what should we be concerned about? Well, I’m concerned, I’m concerned, about how pervasive physical
tracking has become. Not necessarily the clicks or even the geographic location stuff, but the videocameras
that capture our images that are everywhere. And now that we know the voice recordings can be used in kind of the same way, or for that matter whatever
else can be captured that we can’t change. It could be out DNA. Could be, I suppose you could even imagine that they could capture your breath through some, and digitize it and do some sort of analysis on it. And these are not things
that we can opt out of. So we need to focus, I think, on the type of data that’s
being collected about us that we have no option to modify. – [Mike] Yeah, as you were talking Bill, just about the advances
made in facial recognition, I was reading some studies recently about how recruiters when they’re looking to interview candidates through webcams, there’s now some technology available to help them distinguish you know, how truthful someone is being through the conversation based on all these different factors. And so it is interesting, where someone may be totally
truthful in the interview, and trying to do their best, but for whatever reason, like you said there could be false positives, there could be some issues with the AI that’s determining that they’re lying because their eyes are going
in a certain direction, right? – Yeah exactly, exactly. – [Mike] So we just have to be very, very mindful of that, because that could hurt someone’s ability to maybe get a job. – Yes, yes. – [Mike] So as you know, we have a community of data scientists, and also people who are
interested in entering the field, and I was just curious, if you could share some advice for those who are maybe students of data science who are looking to pursue a career and just not sure like where to start, what
programming languages to learn, curious your advice for them. – You know I have published a variety of career-oriented articles
on DataScienceCentral.com. And if you go there and
use our search function to look for terms like, so you
want to be a data scientist, or mid-career switching into data science, you’ll find a lot of deeper thought than I can give you here today. But a couple of things to keep in mind. So I often get the question, “Am I going to be able to learn this OJT?” Or, “Am I going to be able to
learn this through a mook?” Or, “Can I, how do I learn
data science in six weeks?” Well, the answer is
probably none of the above. (both laugh) I think that legitimately
you’re looking at somewhere between 12 and 24 months of concentrated study to reach the level of entry-level data scientists, a competent data level scientist. Now mooks may be a help for some folks, but you know, when you go to get a job there is a huge amount of bias in favor of degrees from accredited universities, and for the most part these days that’s the Master of Science
with a data science focus. But the fact of the matter is if you don’t have a bachelor’s degree, you’re not left out. Because there are now bachelor’s programs that are teaching exactly the same skills at the bachelor’s level that you would learn
at the master’s level. The thing you need to really
watch out for here though is that the course that you take, the degree that you seek, needs to say specifically data science, and not something more
generic like data analytics or computer science. It needs to say data science and be extremely specific about it. Once you’ve committed to a
comprehensive curriculum, those instructors will
take care of making sure that you have all the basics. You know, whether that’s
Python or R or SAS or SPSS, but more importantly,
how to prep the data, what techniques are most appropriate, how to formulate a data science question out of a business question, how to identify and implement
a data science algorithm to actually produce
benefit to your company. Those are the real important things, quite beyond the issue
of whether or not you can program in Python or R. – [Mike] I like your advice because it goes beyond just learning
a programming language or being somebody who’s an analyst, but somebody who can develop
those really smart questions, being very hyper-curious. Because there is that art
to data science, right? – Mmhmm, mmhmm. – [Mike] So my last question, Bill, before we go is do you have any advice for senior leaders that
are looking to build a great data science team? – Well, okay. We know that data scientists in general, and great data scientists specifically, are still in demand. There’s not enough supply
to fill the demand. What I see in organizations that, say, have more than half a
dozen or more than a dozen data scientists, is a, an increased focus on
efficiency and effectiveness. A lot of times I’ll get a new or about to be graduated data scientist who comes up to me and he talks about what he’s done and he starts to go into great detail about his R code or his Python code, and I have to remind them, you know, that in the real business world you don’t have an infinite amount of time to work on these projects. So one of the things that
everybody needs to keep in mind, and an eye on, and I think this is particularly true of senior business leaders, is what’s rapidly emerging as
automated machine learning. And if you look at folks like, Data Robot, or Pure Predictive, or there are probably
eight or nine of them now, that have largely automated the front end of machine learning. What you see is that
what’s really going on here is that these are packages
that work quite well and they allow just a few data scientists to do the work of a lot of
data scientists in the past. So it gives you two things. First of all, it gives you the
efficiency and effectiveness but it also gives you a common platform. I think the other thing that
senior leaders struggle with is if everybody is
freelancing in R or Python it’s pretty difficult
to control for quality or even to debug so while I’m not in favor
of fully automated ML, I think that these
high-efficiency platforms are indeed the way of the future. – [Mike] Thank you, Bill. And for viewers and listeners who want to be better informed on AI, deep learning, machine learning and also the ethical implications that go along with that, can you let everyone know
about DataScienceCentral.com? – Yes, yes, please. We are, we like to think of ourselves and I’m sure in fact we are the premier aggregator of
all things data science and data engineering. It’s DataScienceCentral.com, no spaces. Membership is free, we have about 800,000 data scientists and data engineers that visit our site every month, which I think testifies to
the quality of the content that we’re able to put out there. And it’s both written as well as we have quite an in-depth video
and webinar resource that you can tap into, as well as resources for finding jobs or simply posing questions
to the community. So we try and be one-stop shopping for everything that you’d
need in data science. – [Mike] Bill you always have wonderful writers and great content. For the data scientists listening in who may be interested in
possibly trying to write for DataScienceCentral.com, is there a process where they can apply? – Yes, and I think the
fastest way to do that, and because that’s part of my job, is if they would just
drop me a note on the site or in email, you can find my email all over the site or all over the material
that I’ve written, it’s [email protected] And if you tell me that
you’re interested in writing, I will send you a complete set
of guidelines for doing that. We have a very large number of our members who contribute material so that we are able to turn over, oh, 20 or 30 or 40 new articles a week of high quality, and that’s, I believe, one of our greatest strengths. And if you email me, I’ll
send you the guidelines and we don’t ask for
prior approval of topics, your submission comes in and it goes to our editorial board, we take a look at it, if it’s of good quality, then you have a chance to
be featured on our site. – [Mike] That’s awesome. I’ll make sure to put the, again, the URL for DataScienceCentral.com
in the About section of the videos as well as in the comments so that everyone can go check out the tremendous resource that is there. And Bill, it’s been
wonderful talking with you, it’s been an honor, thank
you so much for sharing your insights with us
on the ethical dilemma with AI and what we’re
facing in the future. I want to thank the viewers and also the listeners of the podcasts, as always thank you so much for being part of this and making our community better by your comments and by sharing topics that are interesting to you. And as always, if you’d like to learn more about upcoming podcast episodes as well as upcoming videos or past videos, you can always go to
experian.com/datatalk. And we do these shows every single week, so we will see you next week and we hope you all take care. Bill, thank you again.

2 thoughts on “The Ethical & Social Implications of Artificial Intelligence @DataScienceCtrl (Episode 44) #DataTalk”

  1. Check out http://datasciencecentral.com for anyone interested in learning about big data, predictive analytics, machine learning, etc.

Leave a Reply

Your email address will not be published. Required fields are marked *