Algorithmic Bias in AI-Assisted Conversations

What does the introduction of reply suggestions generated by AI do to human communication? Whose “voice” is represented in these machine-generated responses and whose voice is diminished by them?

Google, LinkedIn, and Facebook are now offering automated reply suggestions on their platforms, which are used by millions of people every day. The reply suggestions are generally short text snippets generated by machine learning models, trained on massive amounts of data. Google’s Smart Reply, for example, provides reply suggestions to Gmail users, and is used in 10% of all mobile replies (Kannan, et al., 2016). These assistive-AI systems are part of a broader class of systems that aim to aid people in conducting everyday online tasks.

This project examines the semantic and stylistic similarity of Google’s Smart Reply suggestions to replies generated by a diverse set of people. The leading hypothesis is that underrepresented populations in Google training set (e.g. older adults, low-income people, less educated individuals) would produce responses that are less similar to the ones provided by Smart Replies both in terms of content and style. This hypothesis is motivated by algorithmic biases shown in Google’s image recognition, Apple’s face recognition products, and criminality risk-assessment software used by Chicago police (Seaver, 2013). Underlying all of these, is the fundamental issue of training machine learning models based on available data rather than a representative sample of the target population, which introduces bias.