Machine Learning Fairness

Machine Fairness. How to assess AI system's fairness and… | by Francesca  Lazzeri | Microsoft Azure | Medium

When data teaches a machine learning (ML) model the wrong lessons, developers call it data bias. Data bias is one of the most common causes of ML bias, which can damage brand perception and make years of thoughtful, inclusive messaging seem disingenuous or even manipulative. Bias is machine learning’s original sin. It’s embedded in machine learning’s essence: the system learns from data, and thus is prone to picking up the human biases that the data represents.

Most of the time, machine learning allows brands to be more helpful to customers. The good news is that marketers and cross-functional teams have more power to combat data bias than they think, from implementing routine audits to launching creative crowd sourcing solutions.

When data mirrors societal bias, it imparts a distorted view.

An AI system won’t work unless it can teach its underlying ML model what it needs to know. Typically, an ML pipeline starts with developers collecting and annotating training data for this purpose. They use that data to train an ML model to make educated guesses about the world. The better the data, the better the guess. Problems arise when that data is flawed. Rather than ask for more information, the model will simply accept those flaws as fact, leading to flawed outcomes. “Garbage in, garbage out,” as computer scientists say.

An ML model trained on faulty, incomplete, or homogenous data may exclude potential customers you didn’t know you had, amplify offensive stereotypes, or simply not work as well for some users. Once deployed, a model can also confirm its own biases. Imagine that an ML model is trained on a dataset that doesn’t represent as many women as men, so it learns to associate a certain product with men. As more people see and click on the ads, the ML model learns that men are far more likely to buy the product, creating a feedback loop of unfair bias. The model can’t think critically, so it doesn’t suspect there’s anything wrong. It’s up to the team to observe the model’s behavior and catch the problem.

Diverse teams are best equipped to explore all the potential pitfalls of an algorithm. Understanding how different communities may be affected by an algorithm is critical. That’s why diverse teams are best equipped to explore all the potential pitfalls of an ML model and its applications. It’s important to ask the tough questions, including how the algorithm’s performance might vary across demographic groups, in order to give your team the opportunity to make changes well before deployment and continually monitor the product. People vividly recall their firsthand experiences with ML bias. As part of the study, we talked to individuals who identified as being from underrepresented groups and socioeconomic backgrounds. When asked specifically about online ad targeting, those interviewed described seeing endless “low-income ads” that weren’t helpful or relevant to their situation. They objected strongly to online ads personalized based on demographic characteristics, or based on the online behavior of users who share them.

Vetting data is an important step in preventing ML bias, but it’s not enough. Products should be consistently and proactively monitored, tested, and tuned, because many unforeseen variables can affect a model’s behavior in the real world. Teams may find that certain use cases are never justified, as Google did in 2016 when we banned payday loan ads from our systems. Several people interviewed said the policy changes increased their trust in the brand. When it comes to improving data fairness in marketing, training, testing, and transparency are key. From ensuring the use of diverse training data to educating your team about unconscious bias, there are many steps you can take to build fairness into your product’s DNA.

Be transparent.

Tell people how your algorithm makes decisions. Knowing how your product works — and how well it works across groups — will make people more comfortable using it.

Test, tune, and test again.

Inspect training datasets for bias using a fairness indicatorvisualizer, or other tool. Even a widely used dataset might have flaws, so it’s important to review it carefully. Teams should also continue monitoring algorithms after they are released.

Seek different points of view.

Hire people with diverse backgrounds and areas of expertise. Invite the public to share local knowledge. Collaborate with community groups and advocates. A wide range of input makes data more robust.

Cleaning the data so thoroughly that the system will discover no hidden, pernicious correlations can be extraordinarily difficult. Even with the greatest of care, an ML system might find biased patterns so subtle and complex that they hide from the best-intentioned human attention. Hence the necessary current focus among computer scientists, policy makers, and anyone concerned with social justice on how to keep bias out of AI.  Yet machine learning’s very nature may also be bringing us to think about fairness in new and productive ways. Our encounters with machine learning (ML) are beginning to  give us concepts, a vocabulary, and tools that enable us to address questions of bias and fairness more directly and precisely than before. We have long taken fairness as a moral primitive.

But what constitutes a “relevant distinction”? The fact is that we agree far more easily about what is unfair than what is fair. We may all agree that racial discrimination is wrong, yet sixty years later we’re still arguing about whether Affirmative Action is a fair remedy.

For example, we can all agree that in the 1970s, it was unfair that women musicians made up as little as 5% of the top five symphony orchestras. In this case, we might agree that the actual remedy orchestras institute seems far fairer: by having applicants audition behind a curtain to mask their gender, the percentage of women in the five top symphony orchestras rose to 25% in 1997, and to 30% now. 

But is a gender-blind process enough to make the outcome actually fair? Perhaps cultural biases confer non-biological advantages on male musicians — if more men were accepted to top conservatories, for example, they may have received better musical education. Perhaps standards of performance in music have been shaped over the centuries around  typically male traits or preferences, such as palm sizes or the aggressiveness of performance. And is 30% enough for us to declare that the orchestras are now fair in their treatment of women? Perhaps the gender breakdown of musicians should be 51% to mirror the overall national gender demographics? Or perhaps it should reflect the percentage of male and female applicants for seats in the orchestra? Or perhaps higher than that to partially redress the centuries of historical bias that have led to the overrepresentation of men in orchestras? (Not to mention that this entire discussion assumes that gender is binary, which it isn’t.) 

Machine learning can help us with these sorts of discussions because it requires us to instruct it in highly precise ways about what sort of outcomes we’ll find ethically acceptable. It gives us the tools to have these discussions — often arguments — in clearer and more productive ways.

Those tools include a vocabulary that arises from machine learning’s most common task: deciding which bin to put a given input into. If the input is a real-time image of a tomato on a conveyor belt in a spaghetti sauce factory, the bins might be labeled “Acceptable” or “Discard.” Each input will be assigned to bin with a confidence level attached: a 72% certainty that this tomato is edible, for example.

If sorting tomatoes is your system’s basic task, then you’re going to care how many tomatoes get sorted wrong: how many good tomatoes the ML is putting in the Discard pile, and how many bad tomatoes it’s putting in the Acceptable bin – mistaken approvals and missed opportunities. And because the assignments to bins are always based on a confidence level, ML gives its designers sliders to play with to adjust the outcomes to reflect different definitions of fairness.

For example, if it’s your tomato factory, you might care most about the overall accuracy of your new ML tomato sorting app. But a regulator may be more concerned about bad tomatoes making into the Approved bin than good tomatoes getting tossed into the Discard bin. Or, if you’re a sleazy tomato factory owner, you may be more upset by throwing out good tomatoes than by including some rotten tomatoes in your sauce. 

ML requires us to be completely clear about what we want. If you’re worried about the bad tomatoes making it into your sauce, you’ll have to decide what percentage of bad tomatoes you (and your customers and probably your lawyers) can live with.  You can control this percentage by adjusting the confidence level required to put a tomato into the Approved bin: do you want to set the threshold confidence level to 98% or lower it to just 60%? As you move that slider to the left or right, you’ll be consigning more good tomatoes to the Discard bin, or putting more bad tomatoes into the Approved bin. 

The types of fairness we’ve discussed here, and more, have also been given precise definitions by researchers in the ML field, with names like “Demographic Parity,” “Predictive Rate Parity,” and “Counterfactual Fairness.” Having them available when talking through these issues with experts can make those discussions go more easily, with more comprehension on all sides of the argument. They don’t tell us what type of fairness to adopt in any situation, but they make it easier for us to have productive arguments about the question.

This is true at a higher level of abstraction as well, for we get to decide what counts as success for an ML system. For example, we could train our ML loan application sorter to optimize itself for the highest profit for our business. Or for the highest revenues. Or for the maximum number of customers. We could even decide for reasons of economic justice that we want to provide some loans to poorer people, rather than always going for the richest people around. Our ML system should enable us to judge the risk, to adjust the percentage of lower income people we want in the Approved bin, or to set a minimum profitability level for the loans we make.

ML also makes it clear that we can’t always, or even usually, optimize our outcomes for every value we may hold. For example, the loan company may find — in this hypothetical — that admitting more lower-income applicants into the Approved bin affects the percentage of women in that bin. It’s conceivable that you can’t simultaneously optimize the system for both. In such a case, you may well want to find another value you’re willing to modify in order to create outcomes fairer to both low income folks and women. Perhaps if you increase your company’s risk by an acceptable amount, you can accomplish both goals. Machine learning systems give us the levers to make such adjustments and to anticipate their results.

As we look at higher levels of abstraction — from using sliders to adjust the mixes in the bins, to questions about optimizing possibly inconsistent values — ML is teaching us that fairness is not simple but complex, and that it is not an absolute but a matter of trade-offs. 

The decisions that ML’s helpless literalness requires from us can naturally lead to discussions that sound less like high-minded arguments over morality — or jargon-laden arguments over technology — and more like political arguments among people with different values: Great tomato sauce, or cheap sauce that maximizes our profit? Increase the percentage of female musicians in the orchestra or maintain the current configuration of instruments? Grant loans to lower income folks but perhaps lower the percentage of women in the mix?

If machine learning raises these questions with a new precision, gives us a vocabulary for talking about them, and lets us try out adjustments to see the best ways to optimize the system for the values we care about, then that is a step forward. And if machine learning leads us to talk about remedies to unfair situations in terms of the values we care about, ready to make realistic compromises, then that too is not a bad model for many moral arguments.


I must confess, I am a little out of my element with this subject. I did the best I can to make subject that I find confusing into something more easily understandable. I don’t know if I was successful. I do know that this is a scary technology, that is ripe for abuse. I know google and some of the other social media sites are blaming it on the censoring that is occurring. However, I believe if any censoring is occurring it is because the programmers wrote the code to do this. I don’t think our software is smart enough to think for itself and to learn. I know it will happen eventually, just not yet.

Resources:, “A Guide to Machine Learning Fairness;”, “How Machine Learning Pushes Us to Define Fairness,” By David Weinberger;, ” #DELETED: Big Tech’s Battle to Erase the Trump Movement and Steal the Election,” By Breitbart Tech;


Bokhari explained some of the material his inside sources in Facebook, Google, and Twitter have told him — in particular Silicon Valley’s little-known development of the field known as “Machine Learning Fairness,” which aims to blend computer science with the racist, far-left ideology of Critical Race Theory.

Transcript as follows: 

TUCKER: It’s a free country! If you’re over 40 – remember when people used to say that? No-one says that anymore. Silicon Valley is a big part of the reason. Tech oligarchs do whatever they can to censor and humiliate anyone who challenges the approved position on all kinds of topics, the coronavirus, the coronavirus lockdowns, mail-in balloting, George Soros – you can’t criticize him! You’ve seen all that. But what are you not seeing? What are these companies doing internally to affect the way we think and the way we vote? Allum Bokhari has thought a lot about this, he’s written a new book on it called #DELETED: Big Tech’s Battle to Erase the Trump Movement And Steal The Election, we’re glad to have him on tonight. Thanks for joining us. Congrats on the book.

BOKHARI: Thanks Tucker. You know, I’ve been following the activities of these Silicon Valley tech giants for nearly five years now, and I have no other way to put it, we are in an era of digital totalitarianism. We’ve somehow allowed a handful of unaccountable corporations to seize control of political discourse, and in the process seize control of democracy.

But you don’t have to take [it] from me. Take it from my sources, the people who worked for Google, who worked for Twitter and Facebook. These are the people I’ve interviewed for this book, and let me tell you, they are so alarmed by what they’ve seen inside these Silicon Valley companies that they’ve put their own careers on the line to come forward and warn the American public about what’s going on.

This is not just about people getting banned. We all know people get banned on social media — that is just the tip of the iceberg. The really terrifying stuff is what’s going on behind the scenes, and that’s what these sources have told me about.

I know we’re short on time, so I’ll focus on just one example that more people need to know about. It’s called “Machine Learning Fairness.” Machine Learning Fairness – everyone needs to memorize those three words.

TUCKER: Machine Learning Fairness…

BOKHARI: I’ll tell you what it is, briefly. This is Big Tech’s attempt to merge the fields of computer science on the one hand, and Critical Race Theory on the other. Critical Race theory, Tucker! The same racist ideology that’s being rightly purged from the federal government by President Trump is running rampant in Silicon Valley, where it couldn’t be more dangerous.

Because these people control the algorithms that are going to control almost every aspect of our lives… They control whose messages are allowed to be seen, whose political movements are allowed to go viral and gain momentum, even whose businesses are going to be successful – if you’re on the tenth page of Google search, no-one will ever find you – and the people who have this awesome power, which by the way affects not just America, but so many other countries around the world, the people who have this power are the same people who think that Ibram Kendi and Robin DiAngelo are the leading intellectual figures of our time. These people are crazy – and they’re racists! And they’re running the technologies that are running our world. That’s where we are. That’s digital totalitarianism. That’s what this book is about.

TUCKER: Machine Learning Fairness. I won’t forget it. Allum Bokhari, I hope you’ll come back. It’s a remarkable story, and I appreciate it.

BOKHARI: Thank you, Tucker.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s