Algorithmic Discrimination and Avoiding Data Bias
I like to think of myself as a technology geek. I marvel at futuristic technology in science fiction shows like Star Trek and think - someday this will be our reality. Wouldn’t it be wonderful to have a machine that materializes fully cooked food, non-invasively scans your body for disease, and allows one to be transported instantly across space? It is with this curiosity that I tend to regard AI as an advancement that should be allowed to grow and mature. However, as much as we want to rely on AI, there are hidden dangers that we need to be aware of. AI is really a way for computer algorithms to make decisions based on pattern detection and coded rules, and it turns out that the way algorithms are coded can be biased.
Today, the conveniences we get from technology are everywhere. Especially during COVID, we’ve come to rely on technology to do everything online, from shopping to exercise, and even doctors moved to online conferencing for some medical visits.
Earlier this year, I participated in a panel for the EDUCAUSE ELI conference. The session, titled “Data Discrimination in a Sea of Diversity,” included members of the EDUCAUSE Student Success Analytics Community Group Steering Committee. We had a great discussion about the topic of data discrimination, highlighting reasons why now is the time to discuss it, establishing definitions, and sharing resources and emerging frameworks in the space.
When it comes to learning analytics and using data for Student Success, we are seeing an uptick in emphasis on use of AI and ML. There are 13 mentions of AI and ML in the 2021 EDUCAUSE Horizon Report: “Teaching and Learning Edition”. EDUCAUSE recently published results from a QuickPoll conducted on AI use in Higher Education. One surprising observation from the QuickPoll results was that up to 30% of respondents didn’t know the status of AI usage at their institutions across a variety of categories from the use of plagiarism-detection software and tutoring to assessing financial need. This seems to indicate Universities may already have products with AI capabilities, but not realize how it is deployed and used.
From our panel, Maureen Guarcello of San Diego State University observed "as institutions of learning, we have an opportunity to use data for the good of our students - rather than potentially perpetuating bias that exists within our institutional data." As part of the conversation, we polled participants, and most characterize their institution as opportunistic when it comes to using data for Student Success. This means that there are data quality and insight efforts, but these efforts are still in silos.
So, how do we define data discrimination? According to TechTarget.com, “Data discrimination, also called discrimination by algorithm, is bias that occurs when predefined data types or data sources are intentionally or unintentionally treated differently than others.” There are many examples of data discrimination in the media. Recently, a Netflix documentary called “Coded Bias” elevated algorithmic bias into our consciousness and introduced me to amazing role models in Joy Buolamwini and Cathy O´Neil.
In the interest of avoiding data discrimination, what can we do? First we can examine machine learning algorithms for fairness: checking who a given model might harm. Fairness asks questions like who is neglected, and who is misrepresented? There is also a concept of model explainability: the ability to interpret and explain the behavior of a trained model.
Below are some resources that I’ve come across that can assist in evaluating algorithms for discrimination:
- Microsoft has built out a set of Principles of Responsible AI - it has 6 guiding principles for development and use. There is also a fairness checklist.
- New toolkit from IBM research called AI Fairness 360
But not all institutions will have AI models of their own that they need to evaluate. Products that serve students are moving fast to adopt AI technology in their underlying platforms. This is why it’s becoming especially critical that institutions have trusted ways to evaluate products built using AI. Along those lines, the Institute for Ethical AI & Machine Learning has created a Procurement Framework to help practitioners evaluate AI systems at procurement time. There is an RFP Template, and they developed a Machine Learning Maturity Model as well.
Specific to Education, here are examples of how principles of responsible AI can be applied to student success analytics.
- Ben Motz from Indiana University wrote a great article a while ago for Educause, where he outlines a set of questions to ask when designing responsible automation for Student Support Tools. This came out of the mobile app called Boost, that deploys automated real time student support.
- Another example is the work of Josh Gardner, Chris Brooks and Ryan Baker - on ways to evaluate the fairness of predictive student models through slicing analysis.
- A gathering referred to as “Asilomar II” produced a document several years ago called Responsible Use of Student Data in Higher Education.
- In the UK, JISC has a code of practice focused on issues of responsibility, transparency and consent, privacy, validity, access, enabling positive interventions, minimizing adverse impacts, and data stewardship. LACE has developed a checklist which contains eight action points that should be considered by managers and decision makers when implementing learning analytics projects
- There is also a Learning Analytics Strategy Toolkit here with some useful templates. (https://solve.everylearnereverywhere.org/kit/8S6kbfaTtXefIQmP4iqq )
- Recently, a new initiative called EdSAFE AI Alliance was announced at ASU-GSV in August 2021. This initiative is led by DxTerra, a consortium of Higher Education schools, along with an Education AI company called Riiid. The goal of this initiative is to foster public confidence in the use of AI in education, through voluntary benchmarks and standards. Organizations who have indicated commitment to this effort include non-profit entities such as Carnegie Learning, InnovateEDU and the Federation of American Scientists.
Throughout these examples, there is a common theme related to assessing AI algorithms and ML practice, using values such as transparency, agency or consent, and privacy. While these resources look promising, I’d love to hear more from others who are trying to use them. What do you like about them? Do you find them easy or hard to use? I’m especially interested in experiences with using tools and/or rubric criteria to assist with detecting and avoiding bias.
If you are interested in sharing ideas on emerging resources and how to take next steps, please reach out to us here at Unicon. It would be great to engage in a conversation on this as tools and best practices emerge.