K. Hankes, S. Shanmugasundaram
Southern Poverty Law Center (SPLC)
2018 represents a new era for the hate movement. Hate is no longer in physical, concrete hierarchies – it’s online. This tutorial will discuss spikes and trends in hate groups across the United States, online propaganda and its violent consequences, and why tech companies need to fix their enforcement measures.
B. Hutchinson, M. Mitchell, S. Mitchell
Quantitative definitions of what is fair have been introduced for over half a century, with different concrete mathematical formulations emerging in disciplines such as education, hiring, economics, and most recently in data science and machine learning. However, the formal relationships between these formulations have not been widely explored. Learning from prior work in adjacent fields is hindered by differing terminology and notation. This tutorial focuses on the quantitative definitions of fairness that arose in education and hiring in the wake of the U.S. Civil Rights Act of 1964, including considerations of the social, political and legal motivations. This tutorial translates this quantitative research from 1966-1976 into modern day ML notation, showing that researchers of this period were the original inventors of notions of group fairness such as equality of opportunity, equalized odds, sufficiency and predictive parity. In doing so, we also point to gaps and mismatches between the two literatures, pointing out opportunities for further research. Zooming out to consider the long term obstacles faced by the discipline on fairness in testing, we point out challenges that machine learning should heed.
H. Sassaman, R. Jones, D. Robinson
Impact Statement: We will build shared understanding of the experiences, needs and goals of individuals and communities affected by pretrial risk assessment algorithms, as reflected in the recent national Statement of Concerns, which we helped draft and which reflects the considered perspective of more than 120 advocacy organizations. We will then engage attendees to begin to collaboratively co-develop the feasible specifics of a community-driven, socially informed and scientifically rigorous independent auditing and validation process for pretrial risk assessment instruments.
H. Cramer, K. Holstein, J. Wortman Vaughan, H. Daumé III, M. Dudík, H. Wallach, S. Reddy, J. Garcia-Gathright
Despite mounting public pressure to design machine learning (ML) systems that treat all users fairly, industry practitioners face considerable challenges when translating research in algorithmic fairness into practice. This tutorial aims to reduce the gap between fairness research and industrial practice. We will provide an overview of organizational and technical challenges encountered in practice, covering stakeholder involvement, data gathering, resourcing, and prioritization of tradeoffs. The insights discussed are drawn based on direct practical experience as well as conversations, formal interviews, and surveys with industry practitioners. Attendees will gain useful insight into approaches taken by other practitioners, as well as potential pitfalls. Academic researchers and educators will gain insight into a number of understudied practical challenges that may present barriers to real-world research impact. Both researchers and practitioners will explore possibilities for mutually productive researcher-practitioner partnerships, and gain insight into challenges that may arise in forming and maintaining such partnerships.
This translation tutorial will demonstrate that making fairness a tractable engineering goal requires demarcating a clear conceptual difference between bias and unfairness. Making this distinction demonstrates that bias is a property of technical judgments, whereas fairness is a property of value judgments. With that distinction in place, it is easier to articulate how engineering for fairness is an organizational or social function not reducible to a mathematical description. Using hypothetical and real-life examples I will show how product design practices would benefit from an explicit focus on value-driven decision processes. The upshot of these claims is that designing for fairness (and other ethical commitments) is more tractable if organizations build out capacity for the “soft” aspects of engineering practice.
C. Conti-Cook, G. Rodriguez
The "Correctional Offender Management Profiling for Alternative Sanctions" or "COMPAS" risk assessment instruments used in Florida criminal sentencing hearings sparked a national debate in 2016 when ProPublica published Machine Bias. That debate centered on whether the algorithm was biased against people of color, who were more likely to be identified by the instrument as "high risk". COMPAS is also used in prisons across the country to determine whether someone should be eligible for release. In New York State, one of the questions a prison counselor must answer on the COMPAS risk assessment is whether the person appears to have "notable disciplinary issues: yes, no or unsure." In prisons where this instrument is used, Question 19 has become a notorious example among the men and women who are its subject of how risk assessment instruments in the criminal justice system get it wrong. Hear directly from Glenn Rodriguez who, even after decades without any disciplinary tickets, still found Question 19 answered "yes" by his counselor. He wasn't alone. Hear how Mr. Rodriguez and other people (some of whom are still in prison but will share their experiences in writing) took on COMPAS from inside, how the grievance process they worked through did and didn’t work. Hear the strange and inconsistent answers they received from prison counselors, who were the people answering Question 19. Mr. Rodriguez will talk about how he and others witnessed the impact of changing the answer to one question was the difference between a score of medium versus low risk, parole release and parole denied. Ms. Conti-Cook will walk through the obstacles to eliminating Question 19 through litigation, including how Northpointe invokes trade secret privileges around the COMPAS instrument, and provide documents from open records requests claiming to guide counselors’ discretion around how to answer Question 19. We will conclude with a discussion about what lessons we should take from this experience into relying on algorithms in the criminal justice system generally.
S. Saria, A. Subbaswamy
Data-driven decision support tools are increasingly being deployed in a variety of applications such as predicting crime, assessing recidivism risk, and automated medical screening. However, common assumptions used in training such models often do not hold in practice, yielding models that make dangerous predictions (as we will demonstrate in our case studies). Specifically, modelers typically assume that training data is representative of the target population or environment where the model will be deployed. Yet commonly there is bias specific to the training dataset which causes learned models to be unreliable: they do not generalize beyond the training population and, more subtly, are not robust to shifts in practice or policy in the training environment. This bias can arise due to the method of data collection, frequently due to some form of selection bias. The bias may also be caused by differences between the policy or population in the training data and that of the deployment environment. In some instances, the very deployment of the decision support tool can change practice and lead to future shifts in policy in the training environment. The features causing such biases can be difficult to detect compared to the often prespecified protected attributes (e.g., race or gender) typically considered in works concerned with bias as it relates to fairness. In this tutorial, we will show the audience real examples of the challenges associated with deploying machine learning driven decision aids to demonstrate how common they are. We will also introduce concepts and terminology to help them frame issues related to dataset shift and think about how it may occur in their own applications. Finally, we will also give an overview of the types of solutions currently available, their applicability, and their respective pros and cons.
While computer scientists working on questions of fairness have diligently produced algorithmic approaches that seek to minimize disparate impacts across racial categories, the concept of race itself remains either unexamined, or constrained by definitions arising in legal and policy domains. While this may be appropriate for some applications, it is not altogether obvious that the FAT community benefits from refraining from developing a theory of race to guide its own practices. This tutorial will translate concepts from critical race theory and social scientific discourses into concepts legible to a community of machine learning practitioners through a discussion of these theories and small-group activities that illustrate the salience of these theories for problems of fairness in machine learning.
R. Dobbe, M. Ames
This tutorial provides tools for identifying the values that influence the construction, modeling, implementation and tuning of automated decision-making systems. We will learn about how a model reflects the context it comes from -- with all of its biases -- and how it can also shape the context in which is it applied. We will pay particular attention to methods from the field of Science & Technology Studies, such as value-sensitive design, that allow for dealing with questions of values and social implications, and how we might apply these techniques in constructive ways.
C. Jolley, A. Anthony
Applications of machine learning (ML) and artificial intelligence (AI) in international development are inherently interdisciplinary. In general, development experts and technologists bring complementary skills, knowledge and experience to the table. They also bring differing priorities, institutional cultures, and definitions of success. While partners from both camps are generally eager to work together, there is always potential for misunderstanding and miscommunication.
D. Borkan, L. Dixon, J. Sorensen, N. Thain, L. Vasserman
Unintended bias is a major challenge for machine learning systems. In this tutorial, we will demonstrate a way to measure unintended bias in a text classification model using a large set of online comments which have been labeled for toxicity and identity references. We will provide participants with starter code that builds and evaluates a machine learning model, written using open source Python libraries. Using this code they can experiment with bias measurement and mitigation. At the end of this tutorial, participants should walk away with new techniques for bias measurement.
R. Bellamy, K. Dey, M. Hind, S. Hoffman, S. Houde, K. Kannan, P. Lohia, J. Martino, S. Mehta, A. Mojsilović, S. Nagar, K. Ramamurthy, J. Richards, D. Saha, P. Sattigeri, M. Singh, K. Varshney, D. Wang, Y. Zhang
Machine learning models are increasingly used to inform high-stakes decisions about people. Discrimination in machine learning becomes objectionable when it places certain privileged groups at systematic advantage and certain unprivileged groups at systematic disadvantage. We have developed the AI Fairness 360 (AIF360), a comprehensive Python package (https://github.com/ibm/aif360) that contains nine different algorithms, developed by the broader algorithmic fairness research community, to mitigate that unwanted bias. AIF360 also provides an interactive experience (http://aif360.mybluemix.net/data) as a gentle introduction to the capabilities of the toolkit for people unfamiliar with Python programming. Compared to existing open source efforts on AI fairness, AIF360 takes a step forward in that it focuses on bias mitigation (as well as bias checking), industrial usability, and software engineering. In our proposed hands-on tutorial, we will teach participants to use and contribute to AIF360 enabling them to become some of the first members of the community. Toward this goal, all participants in this tutorial will get to experience first-hand: 1) how to use the metrics provided in the toolkit to check fairness of an AI application, and 2) how to mitigate bias they discover. Our goal in creating a vibrant community, centered around the toolkit and its application, is to contribute to efforts to engender trust in AI and make the world more equitable for all.
S. Friedler, C. Scheidegger, S. Venkatasubramanian
The field of fairness-aware classification is awash in algorithms and fairness measures, all evaluated on different data sets with different preprocessing of features and different evaluation methodologies. We have developed an extensible python package that allows users to systematically explore the behavior of algorithms, measures and preprocessing methods in a common framework that allows for a rigorous comparison of methods. This tutorial introduces this framework to the FAT* community with the hope that users will find it useful as a way to benchmark their own tools, and also that users will add to the resource for the benefit of all.