As artificial intelligence becomes a regular part of our lives, understanding its strengths and limitations is key to using it safely and effectively. Trust is key to a successful partnership between humans and AI and is built over time through education and past experiences. The level of belief or confidence we place in AI should be thoughtfully measured and evaluated by taking into account the AI’s capabilities and its performance under normal, stressed, and adversarial situations. We must also consider how it works with different types of users, each with their own level of AI knowledge and sophistication.
AI is advancing fast as evidenced by increased language skills and object recognition in images, and when we misunderstand how much to trust it, things can go wrong, especially in high-stakes situations. Recently, we’ve seen instances where people trusted AI too much, like lawyers who used an AI tool poorly in a Federal court case, or times when AI helped to trick people, like when a sophisticated AI fooled someone into solving a CAPTCHA, or when an AI program was allegedly implicated in a tragic suicide incident in Belgium. If we trust our ability to work with it too little, then we start to throw photos out of art contests prematurely and corporations lose competitive advantage by not benefiting from productivity increases. Not knowing when to trust or when to doubt AI is causing problems and even dangers. This will probably get worse until us humans learn how to handle AI and figure out how to put this trust (and mis-trust) into action.
Understanding AI’s Strengths and Limitations
In order to learn this, we need to change the way we make, test, and understand AI. AI labs and tech firms need to be encouraged to stop rushing to release new AI systems, without carefully evaluating how humans and machines work together and if unexpected problems might result. Even though the agile launch now, fix later approach to AI development has generated lots of new ideas and products, the next generation of AI poses new challenges due to increased use in more areas of our lives and new types of natural language interaction capabilities. Unlike traditional software that we can understand by examining each line of code, AI systems function as sophisticated pattern recognizers that function using millions or even billions of calculations. As such, we can’t fully grasp how these systems work. AI systems undergo regular updates, requiring continuous rather than one-time testing. The job of evaluation never ends; therefore, the manner in which we ensure they function as intended must adapt.
It is worth considering how to apply aspects of how we rigorously test new medicines into how we build and use AI systems. We need to adopt a more careful, systematic approach to evaluating AI and human teaming, considering both immediate and long-term impacts on human decision making. How well AI systems perform often varies dramatically across different use cases and settings. It’s important that users can understand when AI systems are likely to perform well and poorly, and adjust how they interact. This requires clear testing questions, the right test subjects, precise measurement criteria, and robust data analysis to ascertain the test’s success before widespread use. It is crucial that these tests don’t merely assess the AI system but also scrutinize the human-machine trust relationship, as their collaborative success is paramount in real-world scenarios.
Designing for AI-Human Teaming
Instead of focusing only on how well the AI works against some performance standard alone, let’s start asking more questions about how well people and AI work together before we start using the AI. We should ask things like:
- Do people feel like they can count on the AI to do what it’s supposed to do?
- Do they trust it more as they see it work well over time and across similar tasks in varied situations?
- Does AI provide explanations and uncertainty estimates that humans can effectively use?
- Has the AI system performed its intended function as part of the human-machine team?
AI-Human Teaming Questions in Practice
Let’s show how these questions might improve human-AI interactions and yield better results. First, consider a radiologist who uses an AI tool to interpret and explain test results and develop treatment plans. Each time the AI makes a recommendation, it provides a clear explanation and indicates the level of uncertainty. Over several months, the radiologist finds the AI tool’s diagnoses consistently align with her own, and she feels her work quality has improved due to faster decision-making. She also observes that the AI system’s explanations have become more meaningful to her as she has learned how to interpret them better. She is asked regularly if she understands the AI’s recommendations, if she believes the AI is accurate in its uncertainty estimation, and whether she feels her critical insights were incorporated and if productivity has increased. Her responses to these questions combined with a systematic analysis of the accuracy of the AI system recommendations help evaluate both the AI system and her effective collaboration with it.
Next, imagine an electrical lineman using an AI tool to manage potential hazards for executing a safe repair of the power grid. The number of risks that need to be managed are large and include optimal tool usage in the presence of rain, psychological stresses from working high above the ground in strong winds, and even finding safest routes for travel after a severe storm. By providing real-time advice and safety checks, this tool not only increases efficiency but significantly reduces the inherent risks of the job, such as electrical shocks, falls, tool-related injuries, extreme weather conditions, and vehicular accidents. It’s not just an aid, but a vital partner for the lineman’s safety and productivity. Over time, the lineman sees that the AI’s predictions are accurate, which builds trust in the tool. Regularly, he is asked if the AI’s explanations are clear, if he agrees with the AI’s assessment of risk levels, and whether his work quality or speed has improved since using the AI. His answers help gauge the success of the AI tool in real-world conditions and the quality of their teamwork.
In both of these scenarios, ongoing feedback is crucial to understand how well the AI system is being integrated and trusted in real-world tasks with changing circumstances and challenges, and whether it is truly enhancing productivity and work quality.
Eventually these types of questions could become measures and metrics, perhaps as part of the design of hybrid intelligence systems. The usefulness of an AI system lies in its ability to enhance productivity, be fit-for-purpose, and enable capabilities beyond human reach. This integrated framework of effectiveness, acceptability, and security, allows for a comprehensive and diverse blend of qualitative and quantitative measures, ensuring the AI system’s performance in real-world conditions.
Conclusion
Trust in AI isn’t only about how well the AI does its job, but also about how it interacts with its users. It’s about the clarity in its communication about what it’s doing, why it’s doing it, and how certain it is about the results. It also involves how well the AI is updated based on feedback from users. This understanding of trust isn’t limited to the AI and user alone, but extends to how well they work as a team, involving other systems and processes. In addition to these measures, if we return to our examples above, outcomes can be evaluated based on the end results of the team-executed tasks and can help determine if the AI has achieved the desired goals. This would help assess if the AI-assisted group (as compared to the group that uses the current method) has increased productivity, reduced time spent on information search, improved learning efficiency, and whether trust in the AI is well-founded and appropriately calibrated. Based on the results, adjustments can be made to the AI system and the trial repeated as necessary.
With the rapid integration of AI into our daily lives, we must all learn when and whether to trust the technology, understand its capabilities and limitations, and adapt as these systems — and our functional relationships with them — evolve. For AI system designers, developing metrics and exploring ways to ensure users are appropriately qualified to use these systems is crucial to achieving the desired results that are aligned with trust-based outcomes. Looking forward, this design and evaluation approach that interweaves trust and performance metrics has implications that extend beyond the immediate, and could also pave the way for new technical and non-technical advancements. I think we will find that this human-machine teaming approach will help ensure that humans can work effectively together with AI both now and long into the future.