1. Introduction
The capabilities of artificial intelligence (AI) agents are advancing rapidly, reportedly doubling roughly every seven months in a pattern reminiscent of Moore’s law across various domains [
1]. As agents evolve, they are replacing and automating substantial portions not only of software development but across all professions, and healthcare is no exception. Progression is underway toward an era of generalist medical AI (GMAI) systems, envisioned as versatile tools potentially capable of integrating diverse data types to support or even perform many aspects of clinical care, from diagnosis to treatment planning [
2]. Current large language models (LLMs) like ChatGPT, Gemini and Claude, which have achieved pervasive adoption and are also built on similar flexible, transformer-based generative AI architectures, offer a tangible preview of this AI-augmented future and an immediate opportunity to begin preparing medical students.
However, determining how best to prepare students for this AI-driven future presents challenges, especially given the rapid pace of AI development and the resulting ambiguity regarding the specific competencies physicians will need [
3]. This uncertainty creates a complex educational dilemma: how should students be equipped to leverage powerful AI tools effectively without succumbing to the potential pitfalls of automation? To navigate these challenges, insights are drawn from the seminal paper by Bainbridge [
4], “Ironies of automation”. Despite being written over 40 years ago in the context of industrial control systems and aviation, it provides enduring lessons that still apply to the technological transitions in healthcare.
This perspective commentary proposes a framework for medical education that anticipates and addresses these potential ironies of automation in healthcare (
Fig. 1). Drawing upon the insights by Bainbridge [
4] and using current LLMs as practical examples representing the human-AI interaction challenges students will increasingly face, four strategies are outlined. These strategies aim to guide the thoughtful integration of AI into medical curricula while preserving the essential human dimensions of clinical practice.
2. Deskilling by offloading: preserve core competencies in AI-augmented learning
The work by Bainbridge [
4] warns of the “deskilling” effect where automation can erode skills that remain essential, particularly during system failures or unusual situations. This insight is particularly relevant as students increasingly use AI chatbots to offload higher-order cognitive tasks like creating and analyzing, which represent the upper tiers of Bloom’s taxonomy [
5].
When students engage with AI from the initial stages of problem-solving, they risk “premature cognitive anchoring” or idea fixation. Early reliance on readily available AI-generated content can shortcut the essential cognitive work of formulating hypotheses and constructing arguments from the ground up [
6]. This hindrance to independent problem-solving is particularly concerning for developing hypothetico-deductive reasoning (HDR), a cornerstone of clinical competence requiring robust argumentation skills. Research indicates that medical students often struggle to build sound arguments with appropriate data and warrants during problem-based learning (PBL) [
7]. Introducing AI prematurely could exacerbate this deficit, allowing students to bypass the demanding but vital practice of independent reasoning and justification. Furthermore, this early offloading may negatively impact students’ creative self-efficacy, sense of autonomy, and ownership over their learning and eventual judgments [
6].
Therefore, preserving core competencies requires a deliberate pedagogical strategy focused on timing and sequence. Medical education must first ensure students acquire and practice foundational skills including physical examination, independent diagnostic reasoning, and therapeutic planning through methods emphasizing active human cognition. Implementing dedicated “AI-free” learning phases, especially for activities fostering HDR skills like PBL focusing on argumentation, can build this necessary cognitive foundation. Once students have solidified these core abilities, AI tools can then be introduced as a means to evaluate their reasoning, identify improvement areas, and augment their clinical decisionmaking.
Assignments represent a particularly vulnerable target for AI offloading and therefore require thoughtful redesign. Two distinct approaches can address this challenge: creating AI-resistant assignments or deliberately incorporating AI as an educational tool. In the first approach, faculty can develop assignments that inherently discourage AI use through multi-staged processes requiring documented reasoning progression, substantial personal reflection components, highly creative openended problems, collaborative activities requiring peer interaction, and spontaneous oral presentations. Alternatively, the second approach explicitly incorporates AI tools while requiring students to critically evaluate outputs, identify errors or biases, and articulate necessary modifications to AI-generated content. Strategic implementation of these assignment designs, coupled with controlled timing of AI integration throughout the curriculum, can effectively minimize deskilling effects while preserving the core competencies essential for clinical practice in increasingly AI-pervasive healthcare environments.
3. The monitoring trap: teach vigilance and AI literacy
Bainbridge identified several challenges related to monitoring automated systems: the increased difficulty of monitoring tasks without active engagement, the degradation of vigilance due to boredom, and the emergence of new error types specific to humanautomation interaction. These insights are profoundly relevant when considering the integration of AI, particularly LLMs, into medical practice. Interacting with AI creates complex cognitive demands for physicians, requiring more than passive observation.
Medical hallucinations, defined as instances where an AI model generates misleading, inaccurate, or fabricated medical content that appears plausible, represent a significant concern in clinical AI applications (
Table 1).
However, merely acknowledging the existence of hallucinations is insufficient for addressing the monitoring trap. These hallucinations manifest in various forms, including factual errors, reliance on outdated references, spurious correlations, fabricated sources or guidelines, or flawed reasoning [
8]. Understanding the taxonomy of hallucinations and recognizing the inherent limitations of generative AI models due to their probabilistic token prediction mechanisms can transform the physician’s monitoring role into one of active critical appraisal. Additionally, a deep understanding of effective prompting techniques may help maintain vigilance against novel error types arising from human-AI interaction. This is particularly relevant as generative AI systems often employ complex, multi-staged inner workings where even a properly formulated request from a physician can lead to errors propagating through the system’s processing chain. Physicians familiar with how generative AI systems operate and multi-staged agentic prompting frameworks such as ReAct can more rapidly detect, diagnose, and address these errors when they occur [
9].
To address these challenges and combat the potential for “complacency” that Bainbridge [
4] warned against, medical curricula must teach AI literacy. This includes foundational knowledge of AI principles, capabilities, limitations (including bias and hallucinations), ethical/legal aspects, and practical clinical applications [
10]. Teaching methods should incorporate hands-on practice, critical appraisal exercises using current LLMs, and case-based learning, potentially utilizing simulations that allow students to experience how AI tools function in realistic scenarios. A spiral curriculum, which introduces basic concepts early and progressively advances to more complex topics, possibly combined with mandatory core learning and elective specialized tracks, could effectively structure this education [
10]. This focused training aims to build the necessary vigilance and analytical habits required to use AI tools effectively while mitigating associated risks.
4. Ironies of automation: train for what AI cannot do
The central irony by Bainbridge [
4] highlights a paradox: while automation is intended to simplify work, it often takes over the more straightforward, routine tasks, leaving human operators primarily responsible for managing complex, atypical situations, and edge cases. Compounded by potential deskilling effects from relying on automation, this can paradoxically make the remaining human roles more demanding, requiring higher levels of expertise precisely when it’s most needed. Therefore, significant investment in training operators specifically for these non-routine, high-stakes scenarios becomes crucial.
In the medical context, the physician’s essential role pivots towards managing the complex, ambiguous, and uniquely human aspects that automation and computerized systems cannot easily address. These areas include integrating psychosocial factors, navigating complex ethical dilemmas, engaging in nuanced communication, demonstrating empathy, and facilitating shared decision-making. This shift can be viewed positively as freeing up cognitive bandwidth for physicians to focus on these higher-order aspects of care.
Medical education should deliberately adjust its focus toward these areas. Utilizing case-based learning for complex or atypical scenarios, enhancing training in advanced communication skills, medical ethics (especially AI-related), managing uncertainty, and interprofessional collaboration prepares students for the higher cognitive load and expertise required for the tasks left unaddressed by automation. This aligns with the concept “gift of time,” which envisions AI as freeing physicians to focus on humanistic care [
3]. Medical curricula should therefore prioritize these humanistic skills alongside technical competencies.
5. Out-of-the-loop hazards: train for human intervention during AI failure
Another paradox identified by Bainbridge [
4] concerns the human operator’s role during automation failures. While automation excels at routine tasks, it positions the human to intervene during unexpected or complex failures, precisely when the system is least predictable and intervention is most critical, often under significant time pressure, potentially the “worst possible time” to regain control. An operator who has been “out-of-the-loop” may lack situational awareness and find their manual or cognitive skills degraded from disuse, facing the challenge of rapidly diagnosing both the underlying situation and the nature of the automation’s failure.
In the context of medical AI, this translates to a significant hazard. A physician relying on an AI tool for diagnostic support might suddenly face an AI error, bias, or nonsensical output during a critical phase of patient care. They must not only manage the clinical situation but also simultaneously recognize and mentally override or compensate for the AI’s failure, drawing upon core competencies. Addressing this requires specific training strategies that prepare students for these high-stakes, low-frequency events where they must rapidly regain full cognitive control under duress.
Medical education can leverage carefully designed simulations using current AI tools, like LLMs, to target these challenges. For instance, a simulation could involve a student using an LLM assistant for differential diagnosis, where intentionally injected flaws—such as incorrect lab value units provided in the prompt, reliance on outdated guidelines programmed into the simulated LLM, or subtle prompt framing that induces bias—lead to a plausible but erroneous output. The crucial pedagogical step is requiring the student to not only identify the incorrect output but also trace the source of the error within the human-AI interaction. By practicing the identification and troubleshooting of simulated AI failures in a controlled environment, students develop the necessary vigilance, critical appraisal, and override skills. The ultimate goal is to cultivate cognitive flexibility, robust independent reasoning, and the metacognitive ability to critically evaluate AI behavior in real-time, ensuring physicians remain adaptable and authoritative agents capable of navigating unforeseen complexities, especially when their tools fail.
6. Conclusion
Current LLMs present a valuable preview of the capabilities and limitations that will characterize future GMAI systems in healthcare. While not yet achieving the full integration of multimodal data envisioned for comprehensive medical AI, these models embody many of the human-AI interaction challenges that will define clinical practice in the coming years. The framework presented here, inspired by the enduring insights on automation by Bainbridge [
4], offers a timely opportunity for medical educators to prepare students during this transitional period. By implementing the four strategies outlined—preserving core competencies, teaching vigilance and AI literacy, training for what AI cannot do, and preparing for human intervention during AI failure—medical education can evolve alongside technological advancement rather than react to it retrospectively. This proactive approach leverages the current “preview window” provided by existing LLMs to develop educational interventions that will remain relevant as AI capabilities progress.
Integrating current LLMs into medical curricula provides a practical training environment for developing future physician-AI collaboration skills. By evaluating today’s AI tools in controlled settings, students develop the metacognitive abilities needed for working with advanced GMAI systems. This approach ensures physicians maintain competence in areas where human judgment remains essential, even as AI automates routine clinical tasks. Medical education that deliberately addresses automation challenges will prepare physicians who can effectively leverage AI while preserving the critical human elements of healthcare that automation cannot replace.
Acknowledgements
None.
Funding
This research was supported by the National Research Foundation of Korea (NRF), funded by the Korean government (MSIT) (RS-2025-02214129)
Conflicts of interest
No potential conflict of interest relevant to this article was reported.
Author contributions
All work was done by Sangzin Ahn.
Fig. 1.
Framework Aligning Bainbridge’s Automation Ironies with Medical Education Strategies
Automation challenges identified by Bainbridge (left) are paired with corresponding educational countermeasures (right). The framework guides curriculum design by translating automation ironies into targeted instructional approaches, preparing physicians for effective artificial intelligence (AI) collaboration while preserving essential clinical skills.
Table 1.Taxonomy of Medical Hallucinations in Generative AI Systems
Table 1.
|
Category |
Definition |
|
Factual errors |
Hallucinations caused by generating statements that are factually incorrect, internally contradictory, or conflict with known medical information. |
|
Outdated references |
Errors resulting from reliance on obsolete clinical guidelines, outdated training data, or misremembered knowledge. |
|
Spurious correlations |
Hallucinations that emerge from incorrectly linking unrelated medical facts or data, often leading to misleading conclusions. |
|
Fabricated sources or guidelines |
Inventing non-existent medical procedures, drug recommendations, or research findings that lack any clinical basis. |
|
Incomplete chains of reasoning |
Hallucinations caused by flawed, partial, or logically incoherent reasoning processes in diagnosis or treatment suggestions. |
References
- 1. Kwa T, West B, Becker J, et al. Measuring ai ability to complete long tasks. arXiv [Preprint]. 2025 Mar 30. https://doi.org/10.48550/arXiv.2503.14499
- 2. Moor M, Banerjee O, Abad ZSH, et al. Foundation models for generalist medical artificial intelligence. Nature. 2023;616(7956):259-265. https://doi.org/10.1038/s41586-023-05881-4
- 3. Schuitmaker L, Drogt J, Benders M, Jongsma K. Physicians’ required competencies in AI-assisted clinical settings: a systematic review. Br Med Bull. 2025;153(1):ldae025. https://doi.org/10.1093/bmb/ldae025
- 4. Bainbridge L. Ironies of automation. Automatica. 1983;19(6):775-779. https://doi.org/10.1016/0005-1098(83)90046-8
- 5. Handa K, Bent D, Tamkin A, et al. Anthropic education report: how university students use Claude. https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claude. Published April 9, 2025. Accessed April 15, 2025
- 6. Qin P, Yang CL, Li J, Wen J, Lee YC. Timing matters: how using LLMs at different timings influences writers’ perceptions and ideation outcomes in AI-assisted ideation. arXiv [Preprint]. 2025 Feb 10. https://doi.org/10.48550/arXiv.2502.06197
- 7. Ju H, Choi I, Yoon BY. Do medical students generate sound arguments during small group discussions in problem-based learning?: an analysis of preclinical medical students’ argumentation according to a framework of hypotheticodeductive reasoning. Korean J Med Educ. 2017;29(2):101-109. https://doi.org/10.3946/kjme.2017.57
- 8. Kim Y, Jeong H, Chen S, et al. Medical hallucinations in foundation models and their impact on healthcare. arXiv [Preprint]. 2025 Feb 26. https://doi.org/10.48550/arXiv.2503.05777
- 9. Yao S, Zhao J, Yu D, et al. ReAct: synergizing reasoning and acting in language models. arXiv [Preprint]. 2023 Mar 10. https://doi.org/10.48550/arXiv.2210.03629
- 10. Kim S, Kim SH, Kim H, Lee YM. Integrating artificial intelligence into medical curricula: perspectives of faculty and students in South Korea. Korean J Med Educ. 2025;37(1):65-70. https://doi.org/10.3946/kjme.2025.324