Racial bias in software

Anti-racism
By: Jasmine Wang
Date: May 6, 2021

How Software Can Be Racist?

You may or may not notice that AI has penetrated into our daily life and has been adopted by all walks of life. But AI is not perfect and the technology is not immune to racial bias. Some AI software is programmed to be racist and thus will only worsen the matter instead of improving it.

One way for the bias to occur is during the data collection process. The police force has been replacing officer intervention with Predictive Policing Tools to “control” the crime more effectively. The tools predict which areas and which groups of people are more likely to commit crimes or to reoffend after being released from the jail, based on the correlation of the demographics and the crime rate. That leads to a lot of police harassment in black neighborhoods, because the tools indicate that they are more “dangerous”, and unfair sentencing is given to black people, because they are presumed to be repeated criminals.

Another way for the bias to happen is when engineers code and design the software.. Tech company workforces are not often known to be racially diverse. Black people have reported that they have a hard time getting the facial recognition software to recognize their faces. It seems easier to recognize Asian and white people. It’s because most of the software developers are either white or Asian, and they build the program based on their own facial structure, often without testing it out on people outside of their races.

Source: https://medium.com/@Joy.Buolamwini/response-racial-and-gender-bias-in-amazon-rekognition-commercial-ai-system-for-analyzing-faces-a289222eeced

How do we combat racist AI?

The good news is that the world is generally optimistic towards technology, especially AI advancement. In order to facilitate the convenience AI can bring to our life, we can take some steps to make our software less biased:

1. Feed the machine with diversified data. Data is the core of machine learning, so in order to improve the machine, we need to start by improving the data. While sorting the training data, we should engage people of different races and design different questions for different demographic groups.

2. Add social science curriculum into the training program for software engineers. Racial equity awareness is as important as coding skills when it comes to creating software. The software should be programmed, from the start,  to serve people of all races.

3. Increase diversity in  the workplace. If the engineer team is composed of people from different racial backgrounds, they will come up with features that make sense to their own cultures and are applicable to their races. That way the software will be all-user-friendly.

Source: https://9to5mac.com/2020/07/28/apple-racial-equity-and-justice-initiative-education/

References

https://www.technologyreview.com/2020/07/17/1005396/predictive-policing-algorithms-racist-dismantled-machine-learning-bias-criminal-justice/

https://towardsdatascience.com/racial-bias-in-software-772d6e949269

https://www.parkersoftware.com/blog/racism-in-technology-how-our-tools-acquire-bias/

Data Walks – From Research Subject to Research Partners

Anti-racism

Part of the reason I chose to come to MIIS was because I wanted a school that was focused on outcomes and not solely research for research, like many graduate programs.  Academic research is helpful but largely inaccessible to the general public.  Try to think back to the first time your teacher had you read a scientific paper in school.  They can be overwhelming and confusing if you’re not used to the language being used.  Even in public policy and social sciences, where much of the research is aimed at helping low-income, BIPOC communities, the results of the research are seldom shared with the communities they are designed to help.

Most research ends up in formal reports, filled with technical jargon and confusing infographics.  Should community members even be exposed to the final product, it can seem intimidating or patronizing.  Unfortunately, hiding these findings is a missed opportunity to gather context for the collected data and also to promote self-determination and community-based decision-making.  Building these connections with affected communities is a win-win; researchers gather better context for their data and the community has access to information that can aid them in implementing programs.

Thankfully, in recent years, organizations have developed methods to bridge this divide and help make data more accessible to communities, which has important potential for low-income, BIPOC communities.  One such tool, Data Walks, was developed by the Urban Institute to help encourage dialogue by engaging community members to view data presentations in small groups and then jointly interpret the results.  This method allows participants to tap into their individual experiences to connect it with the data and discuss ways to improve policies and programs.

Some perks of the Data Walks method is that it can help researchers improve their analysis and understanding of the data, shape policies in order to address both the strengths and needs of specific communities, and to even inspire action among community members.  This method moves community members from research subjects to active research partners.

Sharing community data on employment, food security, mental health, and more combined with program-specific data, such as participation and engagement, can be very informative, especially when combined with individual experiences.  In some cases, community members that partake in the Data Walks method were able to give more informed answers in focus groups than was typical because they were able to pull from specific data points and a larger context.  It’s important to note that providing national or state benchmarks for context is often necessary to comprehensively interpret data.  Careful preparation must be taken when preparing the data and presentations for the data walk to ensure it’s accessible for the intended audience.

When used correctly, the Data Walks method has the potential to benefit a range of programs within BIPOC communities.  This could include designing and evaluating school programs, housing opportunity policies, adolescent sexual health and safety programs, and more.  Almost any community program would benefit from informed conversation and personal experiences from residents, and that’s what Data Walks aim to achieve.

Data analysis can help reveal problems and solutions across programs and communities, but it can also perpetuate the issues if not used correctly.  The ability of the Data Walks method to engage BIPOC communities can help address some of the structural racism embedded in data analysis and presentation, while also benefiting the communities directly.


Bibliography:
https://www.urban.org/sites/default/files/publication/72906/2000510-Data-Walks-An-Innovative-Way-to-Share-Data-with-Communities.pdf
https://www.urban.org/sites/default/files/publication/99852/confronting_structural_racism_in_research_and_policy_analysis_0.pdf
https://www.aisp.upenn.edu/wp-content/uploads/2020/08/AISP-Toolkit_5.27.20.pdf
https://weunlockpotential.com/datawalks/
https://west.edtrust.org/data-equity-walk-in-action/

Assessment: Just as Important as the Program

Anti-racism

The problems with policing in America have, for centuries, been a long-unheard outcry from communities of color, dating back to the origins of police practices as slave-catching patrols[1]. In the summer of 2014, four Latino citizens killed by police in the Monterey County seat Salinas, CA brought the issue closer to MIIS than ever before. Although the officers involved were eventually cleared of all charges by the DA, the public unrest that was sparked following the shootings had already brought the topic of much-needed police reform to the city[2]. The Salinas Police Department’s (SPD) first act was the request of a review by the Community Oriented Policing Services department of the Department of Justice. Their 2016 report highlighted, among many concerning discoveries, a weak relationship between the SPD police and the communities they were meant to serve and protect[3].

Vox; Data from FBI’s 2012 Supplementary Homicide Report[7]

Police reform over the years can be characterized by two steps forward, one step back, sometimes achieving progressive victories such as Miranda v. Arizona (1966) (which instituted the required reciting of Miranda rights by law enforcement upon arrest)[4], and just as quickly reverting back with Terry v. Ohio (1988) (which led to the famously racist policy of Stop and Frisk)[5]. Although discussed, debated, and legislated upon for years, awareness of racism present in policing practices was not brought into mainstream American consciousness until the early 1990’s, when the 1992 Los Angeles riots made the issue unavoidable. Conscious or not, racism in policing didn’t become a topic of priority for mainstream American politics until the 2010s, thanks in large part to the emergence of the Black Lives Matter movement following the acquittal of Trayvon Martin’s killer[6]. They brought to attention the drastic disparity between the rate black Americans, often men, and white Americans are killed by police[7].

The recommendations made by the DOJ in their report brought about a host of new initiatives at the SPD to strengthen internal policies and practices, as well as several new programs for the much needed strengthening of community relations[8]. One such program was Why’d You Stop Me? (WYSM), which our very own META Lab was contracted to help assess (as required by the grant which funded WYSM in Salinas). The final report, published in 2018, is a prime example of what happens when qualified, well-intentioned data researchers are given unrealistic requirements to evaluate programmatic success or failure. The META Lab members who created this report would likely agree, as they state in the methodology section it was “… expected that the time allowed under the grant would be insufficient for substantial changes in perceptions to be formed, much less detected”[8] (p. 16). The researchers specified four long-term goals to evaluate which would have served as excellent indicators for program success[8] (p. 12-13), had their timeline allowed for a full assessment of them.

Authors of the report are very clear in the limitations of their findings, none of which result from any shortcomings of the researchers themselves. They can instead be traced to the inadequate timeline that the researchers were offered to produce a robust evaluation of the program. The final recommendation in the report, and perhaps the most important one, is “to continue to monitor and test whether or how the program is having an effect” and “that a follow-up evaluation be funded and conducted in Salinas to test whether general outreach has been achieving the goals that were behind the grant application that initiated this project and this process in Salinas”[8] (p. 49). Were Salinas decision makers not to heed this final recommendation, they would face great risks when basing future decisions on highly limited data.

Quantitative evaluation is a vital component of any program, but when done inadequately can cause more harm than good. Decision makers must understand that holistic assessment cannot be an afterthought to new programs and policies, but rather an integral part of any earnest initiative for positive social impact. Even if a program fails to meet its intended goals, a well-designed assessment plan applied from start to finish can inform future progress. Failure is a far better teacher than success, and without assessment we are rendered unable to even determine the difference. Unfortunately, the price of failure when it comes to police reform is the loss of lives, disproportionately black and brown, and continued failure (or lack to assess thereof) can not be tolerated.


References
[1] https://lawenforcementmuseum.org/2019/07/10/slave-patrols-an-early-form-of-american-policing/
[2] https://www.cnn.com/2014/05/22/us/california-protest-police-shooting-hispanics/index.html
[3] https://bloximages.newyork1.vip.townnews.com/montereycountyweekly.com/content/tncms/assets/v3/editorial/9/db/9db445fc-f075-11e5-9c0f-cff328653ea9/56f1ba4792594.pdf.pdf
[4] https://www.oyez.org/cases/1965/759
[5] https://www.law.cornell.edu/wex/stop_and_frisk
[6] https://blacklivesmatter.com/herstory/
[7] https://www.vox.com/identities/2016/8/13/17938186/police-shootings-killings-racism-racial-disparities
[8] https://drive.google.com/file/d/1Gk6mZUK-vAfS8sE_Ql58fMJakcquf3fW/view?usp=sharing

Facts Only: Real life impacts of racism

Anti-racism
By: Daniella Saint-Phard
Date: April 6, 2021

Racism persists all around us, whether in education, housing and infrastructure, or healthcare systems. Racial groups face problems on every front. Some of these problems include more police presence, less funding, social interventions, opportunity, and credibility. Data is so important to combatting the everyday issues BIPOC folx face. It is literally a driver of change and we must be responsible with it.


“The PRMR [pregnancy-related mortality rate] for black women with at least a college degree was 5.2 times that of their white counterparts.”

“Cardiomyopathy, thrombotic pulmonary embolism, and hypertensive disorders of pregnancy contributed more to pregnancy-related deaths among black women than among white women.”


Have you ever wondered why pregnant black women fatalities are higher than all other races? Or wondered why women of color in general have higher pregnancy-related mortality rates than white women? These terrifying statistical findings are provided by the CDC. These statistics serve as a tool to illuminate the lived experiences of the BIPOC (Black, Indigenous, People of color) community. How can data then impact BIPOC experiences? It reflects realities and provides insight into areas (variables) of possible change. Throughout data analysis, it is important to be mindful of implicit racism while navigating the method planning, data framing, and historical context.

Based on the CDC’s findings, the following recommendations were made to hospitals and healthcare providers: provide higher quality care, pay closer attention when diagnosing, and learn more about warning signs across different races. The implementation of these recommendations can prevent at least 60% of these deaths and lower the PRMR. 

A glaringly important aspect of data collection, analysis, and presentation is utilizing ethical, responsible, and unbiased language

These recommendations do not address the implicit racial bias faced by black women and other minority groups. Data science and analysis should be for the good of people. As demonstrated in the PRMP statistics, you can have all the numbers, but clear and accurate presentation is equally important. A glaringly important aspect of data collection, analysis, and presentation is utilizing ethical, responsible, and unbiased language. There are three important takeaways from this mini case study of real life data to consider when doing analytic work: methods, framing, and context



Takeaways

Methods

The provided PRMR statistics reflect one aspect of the population, but many other aspects (variables) of life that impact outcomes are not included. It is important to be mindful of reflecting the analyzed population in all stages: planning, methodology, implementation, analysis, and reporting. Tailor data collection methods to your population including, but not limited to, survey design, population sampling, administration/implementation, and monitoring/evaluation.

Framing

The wording of the above reported data can read as accusatory, placing the blame on black women and other minority groups, when in reality the most effective intervention should be implemented by healthcare providers. The wording also includes high-level understanding of medical conditions that the average reader may not immediately understand, which is why it’s important to know your audience. Language is a powerful tool for advocating and presenting data. Check your biases. Peer-review. Communicate openly with the data’s reflected population. Report clear and concise information. Make raw data and findings accessible to affected communities. These are simple habits to develop while dealing with data collection, analysis, and reporting. 

notebook with pen on messy desk

Context

The historical and background context is so important to understanding figures and statistics and applying them in a beneficial and ethical manner. The goal should be to provide a full picture of the situation and the “why” behind data results, rather than reporting data for open interpretation. These reports impact expenditure, planning, and policy decisions that significantly impact life trajectory for many people. When context is not taken into consideration, racial biases and discrimination persists for the BIPOC community.  

In conclusion, data is a powerful undercurrent of lived experiences and gateway to change. Be an ethical and responsible data analysis change driver for not only BIPOC communities, but everyone. That’s it. That’s the message of this blog post.

For more in-depth information on data analysis, visit the META Lab bootcamp course. For more information on the insane reality for BIPOC pregnant women, visit the CDC.gov website.


References

[1] https://www.cdc.gov/media/releases/2019/p0905-racial-ethnic-disparities-pregnancy-deaths.html

[2] https://ocrdata.ed.gov/assets/downloads/FAQ.pdf

[3] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4194634/

[4] https://www.loyola.edu/academics/data-science/blog/2018/why-ethics-are-important-in-data-science

[5] https://www.cssny.org/news/entry/New-Neighbors

How Big Data Impacts Black Lives

Anti-racism
Written by: Madeleine Smith

March 2, 2021

What is Big Data?

We’ve all spent time on datasets—big, medium, and small. But imagine a dataset so big that you cannot conceptually fit it on any number of screens that you might have in your office. ‘Big data’ refers to datasets that are so colossal they need many machines to process and store their information. These armies of machines are usually linked to cloud platforms that manage them as a single unit. 

Big data is everywhere. It is constantly being collected by companies through our daily actions—hospital trips, energy bills, cell phone and mobile app usage, advertising, and even job applications. One of the attractions of big data is the low cost of collecting it. Many companies and decision-makers will use proxy variables from big data sets to make decisions at scale. A proxy variable is a data category that substitutes immeasurable data (e.g. trustworthiness) with something else (e.g. a credit score based on previous financial behavior). This is where the danger of big data comes in—proxies are prone to losing qualitative context in the ocean of quantitative numbers. This is what author Cathy O’Neil addresses in her book Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. 

“This is where the danger of big data comes in—proxies are prone to losing qualitative context in the ocean of quantitative numbers.”


What is a ‘Weapon of Math Destruction’?

Besides being a symbolic pun, a “weapon of math destruction”, or a “WMD”, is a harmful algorithmic model for these big data sets. Data models are created by mathematicians and statisticians and they can be encoded with the implicit biases that those mathematicians and statisticians have. This may not seem like a problem on a case by case basis, but when applied at scale to millions, or even billions, of people, these models cause their algorithms to have what O’Neil calls “toxic feedback loops” which reinforce the original bias. These toxic feedback loops disproportionately impact the poor and marginalized while simultaneously reinforcing systemic issues like racism, xenophobia, and poverty. 

WMD’s have three major characteristics that O’Neil outlines in her book: opacity, scale, and damage. O’Neil explains how algorithms used by large companies are shrouded in mystery and heavily protected, sometimes being sold for billions of dollars (e.g. Google’s search engine algorithms). Their opaqueness makes these algorithms dangerous and the companies that use them unaccountable for the consequences their products create. As mentioned earlier, the scale at which WMD’s are used allows them to harm millions of people. An example would be the case of Michelle Rhee. Rhee was hired to increase educational outputs in Washington D.C. and used test scores to judge teacher efficacy with no consideration of the context in which they were teaching. Her approach caused dozens of quality teachers to lose their jobs. 

Some advocates of these secretive algorithms claim that the number of people who benefit from their product or service outweigh those that are harmed. O’Neil rebukes this by claiming that “…the point is not whether some people benefit. It’s that so many suffer. These models, powered by algorithms, slam doors in the face of millions of people, often for the flimsiest of reasons, and offer no appeal. They’re unfair.” It is this level of scale that creates the third characteristic of WMD’s—damage. Toxic feedback loops act as self-fulfilling prophecies for people who happen to fall into the wrong end of the algorithm.

“…These models, powered by algorithms, slam doors in the face of millions of people, often for the flimsiest of reasons, and offer no appeal. They’re unfair.”

There are many different demographic groups that WMD’s negatively impact around the world. This article will be focusing on the inequitable experience of Black Americans in the United States. Let’s take a look, shall we?

Predictive Policing – PredPol

If you haven’t heard of predictive policing, let me enlighten you. Predictive policing is the use of historical crime data within any given police department to utilize officers time and energy efficiently. The theory behind this type of policing is that studying trends of previous crimes will inform the department about potential future crimes. O’Neil points out the unfortunate reality of predictive policing by analyzing the impact of a software called PredPol. She goes on to explain how the software works, saying that “The program processes historical crime data and calculates, hour by hour, where crimes were most likely to occur.” The softwares predictions are presented as squares on a grid system. When a square lights up, police are encouraged to patrol that area on the grid. 

This approach to reducing crime and increasing police efficiency has led to the over-policing of low-income and historically Black neighborhoods. This approach has also opened doors to more invasive police behaviors—standardized stop and frisk (oftentimes accused of being a racist practice) and even taking pictures of civilians to upload to facial recognition software. O’Neil tells us that in New York City alone, 85% of these stop and frisk encounters involved young Black or Latino men, with only 0.1% of those men being linked to any type of violent crime. The reason this becomes dangerous with programs like PredPol is because the model behind the algorithm is based on historical police department data—which is based on the biases of previous police. This data goes back generations—to times when policies like segregation were legal. So the algorithm for PredPol is a prime tool for the toxic feedback loop that leads to over-policing in Black and brown neighborhoods,  arrests of more Black and brown men and women, and more hurdles to overcome.  

For-Profit Colleges

Another example of how big data disproportionately impacts Black people in the United States are the predatory advertising algorithms used by for-profit colleges. Have you ever seen an advertisement on your Google search engine for DeVry University or the University of Phoenix? These for-profit schools use algorithmic models that specifically target vulnerable populations. According to O’Neil, a similar institute called Vatterott College, “directs recruiters to target ‘Welfare Mom w/Kids. Pregnant Ladies. Recent Divorce. Low Self-Esteem. Low Income Jobs. Experienced a Recent Death. Physically/Mentally Abused. Recent Incarceration. Drug Rehabilitation. Dead-End Jobs—No Future.” While O’Neil doesn’t explicitly discuss the racial component of these targets, it’s easy to find if you do a little research. According to the National Center for Education Statistics, Black Americans consistently make up between 12-25% of individuals on welfare. According to The Bureau of Justice Statistics, 38% of state prison inmates are Black. And finally, according to the United States Census Bureau, 18.8% of Black Americans are living in poverty. These numbers may not sound alarming, but when we apply the second characteristic of WMD’s (scale), these percentages equate to millions of people being targeted by for-profit colleges. The colleges are notorious for inflating their program costs to astronomical levels and lack credibility in the job market, leaving graduates with high-interest student loans with little ability to earn a higher income. 

Conclusion

The inequalities of big data are present in many other areas outside of what Cathy O’Neil explains in her book. Racial discrimination is rampant in areas like our healthcare system, facial recognition software, and online hiring platforms (to name a few). The good news is that people are beginning to highlight the inequalities caused by these once unmonitored algorithms. Better yet, they’re working to fix the algorithms that cause such blatant inequity in the first place. Not only do we need more diversity and representation in data science, we need those who are not affected by poverty and marginalization to educate themselves on the needs of others and work to build algorithms that help lift people out of poverty and eradicate systems like racism. 


2020 METALab Anti-racism commitments

Anti-racism, Creative

November 6, 2020

Dear MIIS community, 

In response to the chain of events detailed in Jasmine Sturdifen’s presentation to Student Council, the METALab Graduate Assistants stand by the Black student body at MIIS. We also express great disappointment with the MIIS administration’s lack of transparency in their response. As a student-run research center, we know from lessons and first-hand experience that inclusivity and transparency are pillars for accountability and stakeholder engagement. In the virtual space that we call on as our MIIS community, we, as students, need indication that we are heard and that our voices are welcomed. There is still much more to be done to address anti-racism and the only way forward is to move collectively. 

In addition, METALab is committed to establish a series of blog posts and dedicate a section on our website on the topic of anti-racism and ethics. The remainder of Fall 2020 will be dedicated to creating a strategy to curate resources, literature and relevant sources. This will be a permanent tab on the site to address race, power, and social research in an effort to shine a light on data collection methods and the importance of incorporating anti-racist work & principles into social research design, data collection, and analysis. Curated with open input from students, faculty, and alumni, this will become part of routine staff training at the METALab and accessible to the open public.

In solidarity, 

The METALab Graduate Assistants