Making a racist AI without really trying (2017)

blog.conceptnet.io

229 points by spatten 6 years ago

asploder 6 years ago

I'm glad to have kept reading to the author's conclusion:

> As a hybrid approach, you could produce a large number of inferred sentiments for words, and have a human annotator patiently look through them, making a list of exceptions whose sentiment should be set to 0. The downside of this is that it’s extra work; the upside is that you take the time to actually see what your data is doing. And that’s something that I think should happen more often in machine learning anyway.

Couldn't agree more. Annotating ML data for quality control seems essential both for making it work, and building human trust.

ma2rten 6 years ago

This approach only works if you use OP's assumption that a text's sentiment is the average of it's word's sentiment. That assumption is obviously flawed (e.g. "The movie was not boring at all" would have negative sentiment).
Making this assumption is fine in some cases (for example if you don't have training data for your domain), but if you build a classifier based on this assumption why don't you just use an off-the-shelf sentiment lexicon? Do you really need to assign a sentiment to every noun known to mankind? I doubt that this improves the classification results regardless of the bias problem.
- jakelazaroff 6 years ago
  
  Sure, it's flawed, but that's the point of the post: that assumptions about your dataset can lead to unexpected forms of bias.
  > Do you really need to assign a sentiment to very noun known to mankind?
  No, but it seems like a simple (and seemingly innocuous) mistake that many programmers can and will make.
  
  ma2rten 6 years ago
  
  I was just trying to explain in this comment why I think the human moderation solution is solving the wrong problem.
swingline-747 6 years ago

Heck, it's so important that it needs people with detail-orientation and solid judgement, because crowdsourcing (ie populism) may not be the best source of Godwin's law ethical mooring.
- User23 6 years ago
  
  The old Wise and Benevolent Philosopher King model of governance applied to machine learning?
rhizome 6 years ago

Another point in favor of having moderators.

gwern 6 years ago

> There is no trade-off. Note that the accuracy of sentiment prediction went up when we switched to ConceptNet Numberbatch. Some people expect that fighting algorithmic racism is going to come with some sort of trade-off. There’s no trade-off here. You can have data that’s better and less racist. You can have data that’s better because it’s less racist. There was never anything “accurate” about the overt racism that word2vec and GloVe learned.

The big conclusion here after all that code buildup does not logically follow. All it shows is that one new word embedding, trained by completely different people for different purposes with different methods on different data using much fancier semantic structures, outperforms (by a small and likely non-statistically-significant degree) an older word embedding (which is not even the best such word embedding from its batch, apparently, given the choice to not use 840B). It is entirely possible that the new word embedding, trained the same minus the anti-bias tweaks, would have had still superior results.

ma2rten 6 years ago

I also disagree with the conclusion, but for a different reason. I think it's unlikely that the word embeddings were just lower quality. That should result in noise, not bias.
I that there is a real statistical pattern in the training data that names associated with certain ethnicities are more likely to appear close to words with negative sentiment. I just don't think this necessarily means that the news is racist. I think more analysis is needed to see where this pattern comes from.
However, if it is true that the news is biased and racist in a quantifiable way, that would be a bigger problem than biased word vectors. I would genuinely be interested in seeing that type of analysis.
- bo1024 6 years ago
  
  Note though that "the news is racist" is different from "the model we learned (from the news) is racist". Maybe the first can be false while the second is true.
skybrian 6 years ago

I think you're reading this statement as more general than it's meant to be? I interpret it as meaning that there is not necessarily any tradeoff, as there wasn't in this case. "You can have data" -> there exists.
- gwern 6 years ago
  
  > I interpret it as meaning that there is not necessarily any tradeoff, as there wasn't in this case.
  They haven't shown that there is no tradeoff, either in general or in this case.
- guywhocodes 6 years ago
  
  Is there anyone who thinks that the current level of racism is required for the current accuracy? I can't imagine people that racist to be common in the data community
  
  AnthonyMouse 6 years ago
  
  > Is there anyone who thinks that the current level of racism is required for the current accuracy? I can't imagine people that racist to be common in the data community
  It depends on two things. The first is how you're defining racism. If the algorithm is predicting that 10% of white people and 30% of black people will do X, because that is what actually happens, some people will still call that racism but there is no possible way to change it without reducing accuracy.
  If the algorithm is predicting that 8% of white people and 35% of black people will do X even though the actual numbers are 10% and 30%, then the algorithm has a racial bias and it is possible to both reduce racism and increase accuracy. But it's also still possible to do the opposite.
  One way to get the algorithm to predict closer to 10% and 30% is to get better data, e.g. take into account more factors that represent the actual cause of the disparity and just happen to correlate with race, so factoring them out reduces the bias and improves accuracy in general.
  The other way is to anchor a pivot on race and push on it until you get the results you want, which will significantly harm accuracy in various subtle and not so subtle ways all over the spectrum because what you're really doing is fudging the numbers.
  
  nnnnnande 6 years ago
  
  "If the algorithm is predicting that 10% of white people and 30% of black people will do X, because that is what actually happens, some people will still call that racism but there is no possible way to change it without reducing accuracy."
  What is actually happening? Does it tell you if they are they doing X precisely because they are black or white? The racist part might not be the numbers per se, but in the conclusion that the color of their skin has anything to do with their respective choices.
  edit: spelling
  
  TeMPOraL 6 years ago
  
  ML is spitting out correlations, not an explicit causal model. If, in reality, X is only indirectly and accidentally correlated with race, but I look at the ML result and conclude the skin color has something to do with X, then the only racist element in the whole system is me.
  
  nnnnnande 6 years ago
  
  Agreed. That was the point I was trying to get at, albeit I might not have phrased it as clearly.

lalaland1125 6 years ago

> Some people expect that fighting algorithmic racism is going to come with some sort of trade-off.

Um, that's because we know it comes with trade-offs once you have the most optimal algorithm. See for instance https://arxiv.org/pdf/1610.02413.pdf. If your best performing algorithm is "racist" (for some definition of racist") you are mathematically forced to make tradeoffs if you want to eliminate that "racism".

Of course, defining "racism" itself gets extremely tricky because many definitions of racism are mutually contradictory (https://arxiv.org/pdf/1609.05807.pdf).

ma2rten 6 years ago

Not necessarily. In the case of word vectors we are using unsupervised learning to identify patterns in a large corpus of data to improve the learning. This is a completely different issue than your credit score example, which is supervised learning.
Not all patterns are equally useful. By removing those unuseful patterns we might make less mistakes (for example giving negative sentiment to a Mexican restaurant review) and free up capacity in the word vectors to store more useful patterns. I would expect baking other real-world assumptions into your word vectors unrelated to bias could also be helpful.
dan-robertson 6 years ago

> If your best performing algorithm is racist
There are two ways to look at this:
1. Racism makes the algorithm good so we should make the algorithm less racist (at a cost to its performance) or decide we want to allow systematic racism.
2. The metric for how good the algorithm is (ie training data) encourages it to be racist and therefore correcting the bias in the algorithm may decrease its performance on the training data but may not affect its performance in the real world, or decrease its performance in the “performance + meets legal requirements” metric.

paradite 6 years ago

To oversimplify, I think the training set is something like:

Italian restaurant is good.

Chinese restaurant is good.

Chinese government is bad.

Mexican restaurant is good.

Mexican drug dealers are bad.

Mexican illegal immigrants are bad.

And hence the word vector works as expected and the sentiment result follows.

Update:

To confirm my suspicion, I tried out an online demo to check distance between words in a trained word embedding model using word2vec:

http://bionlp-www.utu.fi/wv_demo/

Here is an example output I got with Finnish 4B model (probably a bad choice since it is not English):

italian, bad: 0.18492977

chinese, bad: 0.5144626

mexican, bad: 0.3288326

Same pairs with Google News model:

italian, bad: 0.09307841

chinese, bad: 0.19638279

mexican, bad: 0.16298543

EB66 6 years ago

Just thinking out loud here...

It seems to me that if you wanted to root out sentiment bias in this type of algorithm, then you would need to adjust your baseline word embeddings dataset until you have sentiment scores for the words "Italian", "British", "Chinese", "Mexican", "African", etc that are roughly equal, without changing the sentiment scores for all other words. That being said, I have no idea how you'd approach such a task...

I don't think you could ever get equal sentiment scores for "black" and "white" without biasing the dataset in such a manner that it would be rendered invalid for other scenarios (e.g., giving a "dark black alley" a higher sentiment than it would otherwise have). "Black" and "white" is a more difficult situation because the words have different meanings outside of race/ethnicity.

rossdavidh 6 years ago

I think I would agree. You otherwise run the risk of having fixed the metric ("Italian" vs. "Mexican", "Chad" vs. "Shaniqua", etc.) without actually fixing the underlying issue.
Also, regarding black/white etc., there might legitimately be words which have so many different meanings (whether race-related or not) that you should just exclude them from sentiment analysis. "Right" can mean like "human rights", "right thing to do", or "not left". Probably plenty of other words like that. You might do better to have a list of 100-200 words that are just excluded because of issues like that.
- taneq 6 years ago
  
  > there might legitimately be words which have so many different meanings
  I haven't studied word embeddings past the pop-sci level but wouldn't such words form multiple clusters in the embedding space? I would have thought it would be relatively easy to get different 'words' for 'right (entitlement)', 'right (direction)', etc?
  Edit: Nibling post answers this question.
- acpetrov 6 years ago
  
  Would it be worth trying to think of words with different meanings as entirely new words? So, "white" in one sentence may be a different word than "white" in another?
  
  visarga 6 years ago
  
  There's a long list of papers on that - 'multi-sense word embeddings'. But more recently we have found that passing the raw character embeddings through a two layer BiLSTM will resolve the ambiguity of meaning from context - 'ElMO'.
  https://arxiv.org/abs/1802.05365 (state of the art)
mattkrause 6 years ago

Does “a dark black alley” have a sentiment at all?
I would argue that it’s pragmatically associated with bad things (e.g., being mugged, overcrowded areas) but it’s not intrinsically bad (or good) itself.
- grandmczeb 6 years ago
  
  > associated with bad things
  Is that not what's meant by sentiment?
  
  mattkrause 6 years ago
  
  My intuition is that word-level sentiment is rather pointless. “The Disaster Artist was not bad” has a positive sentiment overall, but each of the individual words, except possibly ‘artist’, have are usually thought to be negative. Moreover, you can totally flip the overall sentiment by adding another neutralish word “The Disaster Artist was not even bad.”
  Similarly, my guess is that alley is rarely found in a positive context, but the actual sentiment comes from elsewhere in the utterance.
  
  TheCoelacanth 6 years ago
  
  Word-level sentiment is like spherical cows in a vacuum in physics. Everyone knows its an extremely flawed model, but it produces good results in a lot of scenarios, so it will inevitably be used because it also has the enormous benefit of simplicity.
  
  monochromatic 6 years ago
  
  This article is about a simple model. Within that model, it absolutely makes sense for “dark black alley” to get a negative score.
  
  mattkrause 6 years ago
  
  It certainly gets a sentiment score, but whether that score is in any way meaningful or corresponds to actual human sentiment is important. Otherwise, you’re just playing stupid games, and winning stupid prizes...though I suppose just stupid is a step up from stupid and racist.

k__ 6 years ago

Does this mean the text examples the AI learns from are biased and as such it learns to be biased too?

So it's not giving us objetive decisions, but a mirror. Not so bad either.

kibwen 6 years ago

Yes, and it's pretty scary how many technologists seem to be surprised by this. If we train bots using data derived from humans, the expectation is that they will inherit biases from humans. There's nothing about a silicon brain that automatically bestows perfect objectivity, only perfect obedience.
- sidr 6 years ago
  
  I struggle to think of a single person with the faintest understanding of what machine learning algorithms are being surprised by this. Who are these "technologists" you're speaking of?
  
  anonthrowaway2 6 years ago
  
  Almost everyone I know with at least a faint understanding of ML is surprised by models picking up racism etc when there was zero intent to do so, because of systemic racism etc in available data. Or at least surprised by how much can be picked up. You're bubbled if no one you know is surprised.
  
  sonnyblarney 6 years ago
  
  "because of systemic racism"
  Sometimes data might be 'racist' (i.e. human written corpus text)... but sometimes data is just data.
  Are facts racist?
  I would seem the world is rather diverse, i.e. 'people are different' and as we are different, AI is going to pick up on that. That's the whole point.
  Now ... some bad examples like in this example taking positive/negative inferences the wrong way. OR actual systematic racisms showing up in bad ways i.e. maybe some groups are more likely to be monitored than others, thereby showing up more frequently in mad terms etc..
  
  User23 6 years ago
  
  Why is this surprising? ML models are just recognizers and bias on the basis of ancestry is observable in all human cultures at all times.
  If we nobly insist that the models describe the world as we wish it were and ought to be, then we won't be describing the data accurately. Maybe that trade-off is worthwhile if it somehow reforms human attitudes along lines we find more agreeable?
  
  int_19h 6 years ago
  
  Conversely, almost everyone I know with at least a faint understanding of ML is entirely unsurprised about this.
  Then again, my personal social bubble leans heavily liberal and hard left. And I think that has a lot more to do with it than with how much people understand ML. When you explain this sort of thing to people who have no idea about ML, in very simple terms ("we give the robot the text that humans wrote, so that it can pick up the patterns" etc), they see why it does that very quickly, as well - if their politics makes them aware of bias in general.
  
  rossdavidh 6 years ago
  
  Hmmm...I'm no expert, but my master's thesis topic in the 90's was on neural networks that use R-squared (a measure of correlation), and when I saw the news about Microsoft's chatbot going Nazi, I was not at all surprised. Not saying no one you knew was surprised, but I had "at least a faint understanding of ML", and the primary thing I learned about it was that it learns what's in the data, whether that's the part of the data that you intended it to learn or not.
  
  artursapek 6 years ago
  
  Tay was trolled hard by 4chan, that's why she went hardcore Nazi almost immediately. It was amusing, but not a fair & controlled experiment by any means.
  
  pixl97 6 years ago
  
  The real world is neither fair or a controlled experiment.
  
  TeMPOraL 6 years ago
  
  Which is why I'm surprised about all this "AI is biased" outrage. A decent algorithm will learn what's in the data. Cast on a wide enough scale, the data is roughly what the world is. If your bot learns from newspaper corpus, then it learns how the world looks through the lens of news publishing. If news publishing is somewhat racist, and your algorithm does not pick on that, then your algorithm has a bug in it.
  It seems to me like the people writing about how AI is bad because it picks up biases from data are wishing the ML would learn the world as it ought to be. But that's wrong, and that would make such algorithms not useful. ML is meant to learn the world as it is. Which is, as you wrote, neither fair nor a controlled experiment.
  
  artursapek 6 years ago
  
  Well put. The people complaining about how AI is bad are the same people who push "diversity hires" to try to pretend that the population of software developers is equal parts male/female, and white/black.
  
  lukev 6 years ago
  
  It’s because most tech people have the default position that racism is not really a big deal, an edge case in modern society. That certainly is the message the political center and right is pushing.
  Stuff like this puts the lie to that, though.
  
  travisoneill1 6 years ago
  
  Given that the data showed a massive range by name within the same race and a much smaller skew between different races, couldn't this data be said to support that conclusion?
  Disclaimer: I don't know enough about the data or the algorithm to determine this mathematically but I think worth pointing out. Would have been nice to see some statistical analysis instead of just assuming the charts speak for themselves.
- colordrops 6 years ago
  
  This thread reminds me of the nature of political discourse at the moment, for example with regards to political correctness. The loudest and most popular voices are pushing simple fixes to intractable problems, and more sensible voices mentioning the truth of the matter are buried.
- sonnyblarney 6 years ago
  
  "There's nothing about a silicon brain that automatically bestows perfect objectivity"
  Systems that are perfectly objective about recognizing patterns will definitely be biased in their predictions, that's the whole point.
- hannasanarion 6 years ago
  
  But technologists shouldn't be surprised by this. "Garbage In, Garbage Out" has been a mantra in machine learning since forever.
  
  jhbadger 6 years ago
  
  Indeed. Before there were computers, in fact. In his 1864 autobiography, Charles Babbage wrote:
  "On two occasions I have been asked, — 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' In one case a member of the Upper, and in the other a member of the Lower, House put this question. I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."
  
  i_made_a_booboo 6 years ago
  
  Ergo human beings are inherently garbage.
contem 6 years ago

The data is correct and a-biased. If you ask 100 people around you, they are, on average, more likely to have had a negative burrito experience, than a negative pasta experience.
The learning algorithms are crude and dumb. They will simply fit to any data you provide it (you choose how many Mexican restaurant food reviews you train your sentiment classifier on). Then they count how many times the words "mexican" and "man" and "mexican man" appear with a positive or negative label in the train set. And objectively try to give the best probability for that.
Current sentiment analyzers are not AI: no common sense, no understanding, no reasoning. We are just rushing to replace looking a job candidate in the eyes with running some 1960's logistic regression over their cover letter. Let's hope for their sake they did not manage a Mexican restaurant.
- DonHopkins 6 years ago
  
  The world's largest and most positive burrito experience was so enormous that it had to be photographed from an airplane.
  http://costena.com/famous.html
  On May 3rd, 1997 La Costeña of Mountain View, California created the world's largest burrito. The burrito weighed in at 4,456.3 pounds and was measured at 3,578 feet long. It was created at Rengstorff Park in Mountain View.
s73v3r_ 6 years ago

Well, not so bad if we use the mirror to reflect on ourselves and our biases, and work to negate them. Fairly bad if they're used for recommendations and for rankings.
- marcus_holmes 6 years ago
  
  That was my thought too. You can't manage what you can't measure. This is a tool for measuring the amount of racism in our society. It's a good thing, not a bad thing :)

ma2rten 6 years ago

I think that the bias problem they are highlighting is very important. That said, I'm wondering if they really didn't try (like the title suggests) or if they choose this approach on purpose because it highlights the problem.

To explain what happened here: They trained a classifier to predict word sentiment based on a sentiment lexicon. The lexicon would mostly contain words such as adjectives (like awesome, great, ...). They use this to generalize to all words using word vectors.

The way word vectors work is that words that frequently occur together are going to be closer in vector space. So what they have essentially shown is that in common crawl and google news names of people with certain ethnicities are more likely to occur near words with negative sentiment.

However, the sentiment analysis approach they are using amplifies the problem in the worst possible way. They are asking their machine learning model to generalize from training data with emotional words to people's names.

int_19h 6 years ago

I think the point is that they did what's commonly done in real world machine learning. It's no surprise that it's flawed - but that flawed stuff is actually being used all over the place.
visarga 6 years ago

They could have tried to have a dataset of bias triples (A in relation to C is like B in relation to C), and minimise the score on that by adding it to the loss function, so the model trains with minimal bias.

User23 6 years ago

It would be interesting to use the Uber/Lyft dataset of driver and passenger ratings to do an analysis like this.

For any such analysis there are a great many confounds, both blatant and subtle. Finding racism everywhere could be because overt racism is everywhere, or it could be confirmation bias. It could even be both! That's the tricky thing about confirmation bias—one never knows when one is experiencing it, at least not at the time.

travisoneill1 6 years ago

I've heard a lot about racism in AI, but looking at the distributions of sentiment score by name, a member of any race would rationally be more worried about simply having the wrong name. Has there been any work done on that?

joatmon-snoo 6 years ago

This is a pretty well known study: http://www.nber.org/digest/sep03/w9873.html
- travisoneill1 6 years ago
  
  I mean name within the same race. On the range of racial averages was 3, but the range of names within a race was around 10. I don't know how significant the results for individual names are, but I was very surprised by that result.

practice9 6 years ago

> fighting algorithmic racism

Reminds me of how Google Photos couldn't differentiate between a black person & a monkey, so they've excluded that term from search altogether.

While the endeavour itself is good, fixes are sometimes hilariously bad or biased (untrue)

ggreer 6 years ago

> Reminds me of how Google Photos couldn't differentiate between a black person & a monkey, so they've excluded that term from search altogether.
Technically that is what happened, but it paints an incorrect picture in people's minds. Out of the billions of images that Google Photos had auto-tagged, it tagged one picture of two black people as "gorillas".[1] This was probably the first time this had ever happened. (If it had happened before, it surely would have been spread far and wide by social media & the press.)
So Google's classifier was inaccurate 0.0000001% of the time, but the PR was so bad that Google "fixed" the issue by blacklisting certain tags (monkey, gorilla, etc). If you take photos of monkeys, you'll have to tag them yourself.
I'm sure Google could do better, but the standard required to avoid a PR disaster is impossible to meet. If the classifier isn't perfect forever, they're guaranteed to draw outrage.
1. https://twitter.com/jackyalcine/status/615329515909156865
- mediumdeviation 6 years ago
  
  Our expectation of our algorithms are based on human performance. A human would never tag a black person as a gorilla, or vice versa, and if someone did it even once we could pretty safely conclude they're either extraordinarily incompetent, or racist, and in either case we wouldn't trust any tagging done by such a human.
- forapurpose 6 years ago
  
  > This was probably the first time this had ever happened. (If it had happened before, it surely would have been spread far and wide by social media & the press.)
  That is a very big leap. Social media might be widespread, but almost everything in the world goes unremarked upon. Think of all the news stories that turn up an old tweet or Facebook post that, if anyone had paid attention at the time, would have stopped events from progressing.
tedivm 6 years ago

There's a difference between a short term hack and real fix. The real solution was for them to train their data with more pictures of black people.
- webspiderus 6 years ago
  
  The research fix may have been to train their data better but the blacklist of bad terms was as real of a product fix as it gets.
  
  Consultant32452 6 years ago
  
  One of the things that amuses me is trying to find racist/sexist google search results. Here's a few:
  I remember a while back Google got flack because the image search for "scientist" was almost entirely famous African American scientists. That's now changed and shows stock images of (mostly white) people in lab coats.
  "Three black teenagers" shows mostly groups of mugshots.
  The word "Brazilian" shows hot, almost nude women. "German" shows the flag. "Portuguese" shows maps, flags, and a lot of normal looking people. "Hispanic" all pictures are normal looking people.
  
  TangoTrotFox 6 years ago
  
  Seeing images that would be 'racist' or 'sexist' is reflective of you, not the results. For instance if you search for 'white man and white woman' you'll find almost exclusively pictures of interracial couples. Is it some conspiracy to push interracial relations onto people? People of a different bias would say so, and it's equally ridiculous. In reality the simple matter is that Google's search is still extremely primitive and the results are mediocre at best. So you can easily break the search when searching for anything that cannot be trivially mapped to a direct text mapping such as e.g. Justin Bieber or Abraham Lincoln.
  For instance search for 'green circle' - okay you get mostly green circles. Now search for 'green circle with red line' and the results are completely nonsensical. The huge leap forward in search engines was being able to avoid returning hardcore porn when searching for Abraham Lincoln. But in spite of tens of thousands of engineers, hundreds of billions of dollars in revenue, and all sorts of fancy declarations of ultra sophisticated AI solving every problem under the sun, we really haven't moved that far beyond that early milestone.
  
  Consultant32452 6 years ago
  
  Yeah, I didn't mean to suggest I think the AI/search results are actually racist/sexist. If I really believed that, I wouldn't find it amusing. As you suggest, it's an amusing anecdote which shows how much farther we have to go with regards to getting ML/AI/search right.
  
  justtopost 6 years ago
  
  'Brazilian' has other meanings you may be unaware of... namely it being the name for a bikini wax. Almost nude women is beyond expected in this case. Just another example of how complex these things are linguistic and cuturally.
  
  Consultant32452 6 years ago
  
  I would recommend checking out the google image search for "Brazilian wax" to see what comes up for that. It's not a bunch of hot models in bikinis.
  
  int_19h 6 years ago
  
  I would recommend looking into why people do Brazilian waxing, and particularly how it relates to bikinis.
  
  Consultant32452 6 years ago
  
  I'm not suggesting that models are an illegitimate way of representing the word "Brazilian." There's normal Brazilian people, Brazilian monuments, maybe the flag, the relatively "clinical" pictures that come up with "Brazilian wax", and of course models. The fact that all of the results for "Brazilian" are only in one of those categories shows a bias that I find amusing.
webspiderus 6 years ago

Well, to be fair they excluded high tens / low hundreds of potentially offensive terms from search before even launching and when this came out they just extended the list a little. Sometimes having product vision requires recognizing that the products you build come with limitations and potential for very real emotional reactions of very real human being users.
on_and_off 6 years ago

I believe it was gorilla, not monkey and I understand Google for not wanting its product to randomly call people animal names, especially when they are part of a group where it is far too common.

js8 6 years ago

Maybe, you know, humans are simply not Chinese rooms.

Recently there was an article about recognition of bullshit: https://news.ycombinator.com/item?id=17764348

To me the article brought great insight - I realized that humans do not just pattern match. They also seek understanding, which I would define as an ability to give a representative example.

It is possible to give somebody a set described by arbitrarily complex conditions while the set itself is empty. Take any satisfiability problem (SAT) with no solution - this is a set of conditions on variables, yet there is no global solution to these.

So if you were a Chinese room and I would train you on SAT problems, by pure pattern matching, you would be willing to give solutions to unsolvable instances. It is only when you actually understand the meaning behind conditions you can recognize that these arbitrary complex inputs are in fact just empty sets.

So perhaps that's the flaw with our algorithms. There is no notion of I understand the input. Perhaps it is understandable, because understanding (per above) might as well be NP-hard.

int_19h 6 years ago

Humans can do more than pattern-match. But they often just pattern-match anyway, because it's far easier and quicker, and doing more than that for all the brief day-to-day interactions is virtually impossible.
So at some point you need to decide when you pattern-match and accept the result for granted, and when you decide to dig into it further to understand why the pattern matched the way it did, and whether it's relevant. But that is itself a choice, and it's also going to be biased (for example, towards people you personally know, and against random strangers).
adrianN 6 years ago

There is no indication that brains are better at solving NP hard problems than computers.
- js8 6 years ago
  
  That is not my argument at all. What I argue is that brains attempt to resolve the problem, while computers (when they pattern match in typical ML algorithm) do not.
  It is possible that brain has specialized circuits to solve small instances of SAT, and it just gives up on large enough instance. I am sure you know the feeling that you get when you understand something - it's very much like the pieces of the puzzle that suddenly perfectly fit to each other.

elihu 6 years ago

This is an interesting result:

> Note that the accuracy of sentiment prediction went up when we switched to ConceptNet Numberbatch.

> Some people expect that fighting algorithmic racism is going to come with some sort of trade-off. There’s no trade-off here. You can have data that’s better and less racist. You can have data that’s better because it’s less racist. There was never anything “accurate” about the overt racism that word2vec and GloVe learned.

I wonder if this could be extended to individual names that have strong connotations with people because of the fame of some particular person, like "Barack", "Hillary", "Donald", "Vladimir", or "Adolf", or if removing that sort of bias is just too much to expect from a sentiment analysis algorithm.

abenedic 6 years ago

Where I grew up, there is a majority group with fair skin, later(possibly incorrectly) attributed to the fact that they worked in the fields less. The minority group is darker skinned. If you train any reasonable machine learning model on any financial data, it will pick up on the discrepancy. If it did not I would say it is a flawed model. But that is more a sign that people should avoid such models.

gumby 6 years ago

Please add 2017 to title

b6 6 years ago

How to make a program that does what you asked it to do, and then add arbitrary fudge factors as the notion strikes you to "correct" for the bogeyman of bias.

Suppose sentiment for the name Tyrel was better than for Adolf. Would that indicate anti-white bias? Suppose the name Osama has really poor sentiment. What fudge factor do you add there to correct for possible anti-Muslim bias? Suppose Little Richard and Elton John don't have equal sentiment. Is the lower one because Little Richard is black, or because Elton John is gay?

What we have been seeing lately is an effort to replace unmeasurable bias that is simply assumed to exist and to be unjust and replace it with real bias, encoded in our laws and practices, or in this case, in actual code.

whiddershins 6 years ago

I feel like the author is heavily biased to believe society is biased, which is muddling the entire point of the article.

If Mexican restaurants tend to get lower user ratings, perhaps it’s not because people/society are biased against Mexican restaurants. There are so many other possible reasons, I can’t begin to speculate.

Semiapies 6 years ago

RTFA. It has nothing to do with user ratings, but a direct calculation on the phrases "Mexican restaurant", "Italian restaurant", and "Chinese restaurant" based on the corpus of material.
Go further and follow the links. This example is specifically covered in the linked material. "Mexican" picks up a negative association from the corpus containing frequent mention of "illegal" (listed as a negative term) in close proximity to "Mexican", so the phrase "Mexican restaurant" gets rated less favorably than "Chinese restaurant".
The underlying problem is that we're throwing text at math and pretending that we're building things that understand anything more than word proximity. Human beings that can actually understand context can be horribly biased; software that doesn't have the slightest inkling of context will produce twisted versions of our own biases.
burlesona 6 years ago

Hmm I didn't get that from the article at all. It felt to me like the author was showing a relatively straightforward explanation of building a system that does not intuitively have racial bias, and then demonstrating that in practice it does anyway, which is kind of surprising.
- tedivm 6 years ago
  
  There was a great discussion at Google Next about this, and it's something my company also thinks a lot about. There are so many ways that bias can creep into algorithms, and it's really important to validate your algorithms against this.
  As one non-racial example there was a pneumonia detection model that got some pretty (at least amongst people interested in radiology and deep learning) where the model was detecting the metal badge the hospital was using when taking scans (so every xray from this hospital had a similar graphic on it). This hospital had a higher pneumonia rate as it generally had older people going to it, and because of this the model started giving everyone who went to that hospital a higher likelihood of having pneumonia. So essentially it picked up on the age based bias of the hospital itself.
  This is an issue in other types of medical data as well. There have been multiple studies that show doctors are less likely to successfully diagnose a heart attack in women than in men, and that doctors will also underestimate how much pain a black man is in compared to any other category. Those biases end up in the data itself, so if you naively train your models against that data they too will make similar mistakes.
- rundell1x 6 years ago
  
  It's only surprising if you think racism does not have a basis on reality.
  
  swingline-747 6 years ago
  
  An unfortunate fact: there are infinitely many kinds and degrees of bias hard-coded in human nature. They cannot be suddenly wished, willed, educated or technologied away, only imperfectly, temporarily overcome with conscious, uphill vigilance.
  Anyone whom says they're impartial, unbiased, justice is blind or racism is over.. is lying to you and themselves because such statements, maybe laudible goals, but are unobtainium virtues.
  
  tedivm 6 years ago
  
  The fact that this comment was [dead] but someone felt the need to unbury it says a hell of a lot about the state of this site.
  
  dang 6 years ago
  
  I looked in the logs. That comment was never [dead] and then unkilled, so I'm not sure what you were seeing?
  In any case, you posted this 5 minutes after the comment appeared. That's not enough time to conclude anything about HN, other than that trolls sometimes post here, like they do everywhere on the public internet. It takes time for the community systems (voting, flagging, and moderation) to take effect. It would be nice if we could make them work instantly, but we don't know how to do that. As you can see, the comment is now flag-killed. No single comment can say "a hell of a lot" about a site where 8000 comments appear daily, but since the community functioned as intended in the end, we can count this data point in its favor.
  But there's a more important point. Since you're participating here, you're part of the state of the site, and we need you to help take care of it. In the present case that means flagging an egregious comment, instead of feeding it by replying, which only makes the site worse. This is in the guidelines: https://news.ycombinator.com/newsguidelines.html. Had you contributed a flag, you would have helped take care of the problem sooner. Fortunately other users came along and did, thereby improving the state of the site. If you'd be willing to help out like that in the future, we'd appreciate it. But in any case, please don't feed the trolls on HN. The more you do, the more responsibility for the flamewar accrues to you, regardless of how right you are. This is ancient internet wisdom for good reason.
  To flag a comment, click on its timestamp to go to its page, then click 'flag' at the top. There's a small karma threshold, currently 30, before flag links appear. This is in the FAQ: https://news.ycombinator.com/newsfaq.html
  
  tedivm 6 years ago
  
  I had no idea that I could flag comments- I assumed that since there was no link to do so I couldn't, not realizing that I had to go to the comment page itself to do so.
  Would it be possible to clean this UI up a bit? I bet I'm not the only user who didn't realize they could do this.
  
  rundell1x 6 years ago
  
  If you feel like I'm wrong how about you explain further instead of complaining that my comment wasn't removed? I wish this site would force you to comment before you downvote or flag.
  
  tedivm 6 years ago
  
  All you said is that racism has a basis in reality- you didn't give anything specific to argue against. Why don't you let us all know what issues you think get labelled as "racism" that have a basis in reality? Tell us what racist things you believe if you're looking or people to argue against it.
  
  rundell1x 6 years ago
  
  I didn't want to explain and I won't bother with it because I know this site is full of people like you and I'm not going to write a few paragraphs of text just to see it all removed in a matter of minutes.
  But if you feel I am wrong, or if you feel I should be more specific, you should've said that instead of just wishing my comment never existed.
  EDIT: aaaand a mod collapsed the entire thread. So predictable :-)
  
  tedivm 6 years ago
  
  I don't think there's any issue with wishing racism- and the racists who push it- shouldn't exist.
  
  sctb 6 years ago
  
  It was a troll comment: high inflammation and low (no?) information. If you continue we'll ban the account.
  https://news.ycombinator.com/newsguidelines.html
  
  striking 6 years ago
  
  >I won't bother with it because I know this site is full of people like you
  >a mod collapsed the entire thread. So predictable
  Imagine that, a mod collapsing a thread that nothing good will come of. Maybe you had a point somewhere, but you definitely don't now.
tedivm 6 years ago

If you actually read the whole article there are other examples as well, such as black names being considered more negative.
- 706f6f70 6 years ago
  
  Commenting specifically on the black names example: I am skeptical because they replicated the results of a study on humans based on the Implicit Association Test. I would be generous in calling that the Myers-Briggs of this generation, but it's probably even less valuable other in that it's free to take online so a huge data set is easily available.
  
  Ar-Curunir 6 years ago
  
  Just because humans are biased doesn't mean we have to make our algorithms biased too. This is especially important now that AI and ML are starting to be deployed in non-CS cases.
  
  swingline-747 6 years ago
  
  Sorry to disappoint, but that's impossible.
  Socialization is an ambiguous, imprecise activity that demands infinitely many judgements, inferences, and preconceived rules & goals.
  What can be interpreted as "jerk" behavior can also be caused by "dork" behavior, and vice-versa.
  A joke at someone's expense might be expressing affinity for them or dissing them.
  Adopting ebonics speech and tone in one context maybe in-group acceptable in one context, but insulting in another.
  Ultimately, the algorithms need enough trial-and-error socializing experience to become, for lack of a better term, cool.
  
  disgruntledphd2 6 years ago
  
  Like, the IAT has problems, but it has been shown to have some predictive validity in behaviour. I am entirely unsurprised that the IAT finding replicated in this data, and it actually supports the structural racism explanation propounded by one of the measure's harshest critics.
  To be fair to the IAT, while it has very low test-retest reliability and lots and lots of state variance, there's definitely something in the reaction time differences that is meaningful. How meaningful it is, and whether or not it makes any practical difference remains to be determined.
  Anyone who hasn't done one, try this link and see how you feel during the process: https://implicit.harvard.edu/implicit/takeatest.html
  And at least there wasn't a massive profit motive behind the IAT, at the beginning anyway :)
  
  microcolonel 6 years ago
  
  I fear you are too generous about the IAT. The results are like noise, even when taken by the same person repeatedly with different data.
incompatible 6 years ago

Judging an individual by membership of categories always introduces bias. Even if Mexican restaurants are for some reason worse on average, say 9/10 are bad, then assuming that a particular restaurant is bad because it's Mexican is biased. Sure, there's a 9/10 chance that it's bad, but it's unfair to treat it as bad without any other evidence.
Insurance companies do this sort of thing all the time.
- thecabinet 6 years ago
  
  No, it’s not unfair to treat it as bad without any other evidence. When all you know is “Mexican restaurant” you judge by that. You can’t live your life only making judgements once you have “all” the facts, as if that’s even possible. There seems to be this unspoken assumption that the thought process must be stereotypes leading to death camps. It is possible to say, “Based on my life experiences thus far I do not enjoy the company of black people/Mexican food/whatever” without thinking “and therefore we should kill all those people”.
  
  incompatible 6 years ago
  
  The bias may be justified to you (since you just want to get a good meal somewhere and don't want a 9/10 chance that it's bad), but it's still unfair to the perfectly good Mexican restaurants that you won't eat at.
  
  thecabinet 6 years ago
  
  Fair is a word for children. It’s unfair to the Italian restaurant that you’re eating Mexican. It’s unfair to the grocer that you’re eating at a restaurant. It’s unfair that you’re buying groceries instead of seeds. As adults have told children for thousands of years, life isn’t fair.
  
  int_19h 6 years ago
  
  Ethologists disagree - adults very much care about fairness, as well. Arguably, most of our politics is about that.
  
  thecabinet 6 years ago
  
  Sure, but much of the fairness we argue about as adults is due to fundamental disagreements about morality or how the world should work. “It’s not fair that tax cuts benefit the wealthy.” Republicans think it’s not fair to take money from people who earned it. Democrats think it’s not fair for some people to have more wealth than they’ll ever use while others have so little. But the concept of “fairness” doesn’t do anything to help us resolve that disagreement.
  
  int_19h 6 years ago
  
  But at the same time, there are some aspects of fairness that do appear to be innate - as in, they're observed in very small children regardless of the culture they're from, in experiments where they're asked to share (or not share) something that they have, or assess how someone else shared theirs. Extreme "wealth inequality" - as defined, say, through the amount of candy each child has - is universally seen as unfair, for example, although you also have to account for parochial altruism. Bonobos also demonstrate similar attitudes.
  So it appears that our evolution as social species has set some hard boundaries. Abstract ideologies can go beyond them, of course, but their real-world success seems to correlate to some extent with how much they do or not - I would argue that ancap definition of "fair" is so unpopular precisely because it's so out-of-bounds wrt our biology.
  
  int_19h 6 years ago
  
  It doesn't need to help us resolve the disagreement to be useful to explain why the disagreement is there in the first place, though.
  And, in practice, there are large groups of people who do share the broad definitions of "fairness", and therefore saying that something is fair or unfair is useful to communicate the idea within those groups. This is not something that people really like to see put quite so explicitly, but when we say something "this should not be so because it is unfair", it carries an implicit "... and I don't care what those who disagree with me about what 'unfair' means think".
  
  incompatible 6 years ago
  
  So racist AIs may be unfair to some people, but it doesn't matter because life is unfair?
crooked-v 6 years ago

Society is biased. Even assuming purely a statistical averaging of outliers, there are, for example, enough open racists in the US that a crowd of them decided that marching through Charlottesville with tiki torches was a good idea.
Ar-Curunir 6 years ago

There are as many shitty pizza/burger joints and diners as there are shitty restaurants of any other kind.
- Mountain_Skies 6 years ago
  
  Cilantro is a prominent ingredient in Mexican cuisine. It is also notoriously difficult to clean compared to other produce. Add in that it is prepared and served uncooked and you have a food with a higher likelihood of harboring things that will make you ill. Compare that to something like BBQ that is cooked for extended periods of time and the result isn't bias, it's microbiological reality.
  If Mexican restaurants are cited more for having ill patrons, it might simply be due to the cuisine using preparation methods that lend themselves more easily to distributing contamination than many other cuisines. No bias necessary.
  
  forapurpose 6 years ago
  
  Really? Mexican food is inherently a health risk because of cilantro?
  
  thrower123 6 years ago
  
  Also, for a significant portion of the population, cilantro tastes just like harsh lye soap, and will render anything contaminated with it edible but extremely unappetizing.
- Z8064k 6 years ago
  
  A statement like this has no backing at all. And it's exactly what the parent comment was talking about when they said: "a bias toward thinking society is biased."
  We can't just conclude that x is true because it seems more fair that it would be true. The universe isn't necessarily fair and equal.
  We can lie to ourselves and our machine learning models about things like: "there must be an equal number of bad restaurants within each genre of restaurant." But that won't make it true. It will just make our models work less well.
- manfredo 6 years ago
  
  Are there? It's not entirely inconceivable that certain themed restaurants are worse than average in a given area. It wouldn't be surprising if seafood restaurants aren't as good in areas far from the coast, since ingredients need to travel a longer distance. Even if we assume that there is no degredation of quality, the added logistical cost would mean that savings need to be made elsewhere in order to provide a meal at the same pricepoint.
- microcolonel 6 years ago
  
  Other subjective, anecdotal justifications which come to mind:
  I'm actually pretty sure that, in general, it is easier to make passable burgers, and pizza which is passable to the people who end up consuming it (often intoxicated), than it is to make passable mexican food of the sort which is worth buying from a professional kitchen.
  The cost, variety, and selection of ingredients, the number of error-prone steps, and general common knowledge all stack up in favour of it being more likely that a pizza or burger joint will satisfy whomever it attracts.
  Now, these points could fail to pass further scrutiny, but I think the point holds: even if there are "as many shitty pizza/burger joints and diners" as there are "shitty restaurants of any other kind", it could be that people are nonetheless more satisfied with "shitty" burger joints than "shitty" taco stands. Another thing which could easily be a fair source of "bias" may be that people have more positive (or fewer negative) social interactions at pizzerias, burger shops, and diners; maybe they're just more familiar to people who are literate in English (the cohort analyzed here).
  The other day I went to a well operated burrito shop with an open kitchen, and was minorly poisoned (thankfully I work from home, so the consequences were more humorous than life-altering). That will surely affect my sentiment toward burrito shops, and I think it is fair that it does. Edit: another reply mentions that handling cilantro is error prone and that it contributes to a higher rate of minor poisoning from Mexican food prepared in commercial kitchens.
  If the goal is to maximize positive sentiment, it is very hard to justify throwing away any signal in that domain, especially one this consistent.

swingline-747 6 years ago

Setting aside blatant shock behaviors... If the other side, the audience, were less sensitive and not looking for the next micro-outrage, wouldn't ML chatbots evolve more pro-social values by positive reinforcement?

It takes two to Tango .. the average audience behavior isn't blameless for the impact of its response. Also, how an AI decides interprets an ambiguous response as being desirable or not is really interesting.

s73v3r_ 6 years ago

Blaming the victim for being discriminated against isn't going to help anything.