The Physician Identity Crisis in the Age of AI: What Happens When Expertise Becomes a Prompt?

Last Tuesday, I was standing at the foot of an exam table when a patient asked me a question I have heard before, but with a sharper edge. She had already used an AI assistant on her phone, and she slid the screen toward me as if it were another set of labs. “So if this thing can read my symptoms and suggest next steps,” she said, “what exactly am I paying you for?”

I did not answer quickly. I could feel the old instinct to reassure, to widen the gap between a machine and a physician by talking about nuance, context, and bedside judgment. Those words are true, but they are also too easy. They can sound like a profession protecting its territory instead of explaining its value.

The Prompt and the White Coat

A few years ago, I would have said the physician’s core asset was expertise, full stop. Experience plus training plus pattern recognition, folded into a judgment that could not be reduced to a checklist. I used to think the most important safeguard against bad care was simply having a smart doctor in the room.

Then I watched large language models get good at the surface area of expertise. They can summarize, draft, classify, route, and sometimes reason with a speed that makes old professional habits look almost artisanal. In a 2026 review in Frontiers of Computer Science, the authors catalogued 163 papers on large language models across 2005 to 2026, with the heaviest concentration in 2022 to 2026. That matters because the center of gravity has moved. The machine no longer just supports the clinician’s work, it begins to resemble the workflow itself.

I used to believe the danger was that AI would replace physicians. Now I think the deeper danger is subtler: expertise can be flattened into a prompt, and once that happens, the profession starts mistaking recall for responsibility.

What I Call the Prompt Trap

I call this the prompt trap. It is the moment when a physician starts thinking that if a good answer can be elicited from a model, then the skill has been preserved. The opposite is closer to the truth. A prompt can simulate the entrance to judgment without carrying the weight of judgment itself.

That distinction became obvious to me in clinic when I saw how easily an AI-generated differential diagnosis can sound complete while still missing the one detail that matters most. The model can list ten possibilities. It cannot feel the hesitation in a patient’s voice when they say their pain is “different this week,” or the social pressure that made them wait three months to come in. Those fragments are where medicine often begins.

The literature is already showing the tension. In a 2026 study on persona prompting, Hu, Rostami, and Thomason found that expert personas improved alignment but damaged accuracy, a result that should make any clinician pause. The impulse to sound authoritative can actually degrade the quality of the answer. Another 2026 analysis by Xiao and colleagues on persona prompting asked when role injection helps, and the answer was conditional, not universal. Prompting can improve one dimension while weakening another. Medicine knows that tradeoff well.

Three Clinical Scenes, One Lesson

On a busy afternoon, I once watched a resident copy an AI-generated note that was beautifully organized and medically thin. It had all the headings, none of the story. The patient was not a template. The note was.

A week later, during a handoff, a colleague showed me a triage tool that had labeled a patient as low risk because the inputs were incomplete but tidy. We both knew what the screen could not know. The patient had not answered honestly, not because they were deceptive, but because they were scared and tired. That is not a software bug alone. It is a clinical failure mode.

Then there was the vendor demo that went sideways. The company had trained the system to produce a polished differential, and in the room it sounded impressive. The problem was not the answer quality in isolation, it was the confidence of the interface. It made uncertainty feel settled.

These scenes converge on the same principle. The prompt trap is seductive because it makes expertise look portable. But portable expertise is not the same as accountable expertise.

What the Data Actually Says

The numbers are already enough to change how I think. A 2026 survey of large language models in Frontiers of Computer Science reflects how quickly the field has matured, but maturity in benchmark terms can mislead. Benchmarks reward compressed language and recognizable patterns. Clinical care rewards restraint, timing, and knowing when not to act.

In a 2026 arXiv paper, Hu et al. on PRISM reported that expert personas could improve alignment while reducing accuracy. The headline is useful because it names the tradeoff clearly. A system can become more compliant with an assumed role and less faithful to ground truth. That sounds oddly human. It is one reason clinicians should be wary of overidentifying with a model’s persona. A well-written persona can reassure the user while hiding brittleness underneath.

In another 2026 arXiv study, Xiao et al. analyzed persona prompting across retrieval and metrics and found that role injection helped only under certain conditions. That conditionality mirrors medicine. Context matters. The same prompt, the same model, and the same task can produce different clinical usefulness depending on whether the input is clean, whether the stakes are high, and whether a human remains responsible for the final decision.

Security work points in the same direction. In a 2026 paper, Sekar et al. proposed zero-shot embedding drift detection against prompt injections, a reminder that these systems are vulnerable to manipulation at the language layer. Physicians understand this intuitively. If the instructions are contaminated, the output can look polished and still be wrong.

More broadly, recent clinical studies have shown that LLMs can perform impressively on structured tasks, but that performance does not erase context. A 2024 JAMA Network Open trial found that physicians given LLM support improved from 65.7 percent to 74.3 percent accuracy on diagnostic reasoning, while another 2024 case challenge study reported GPT-4o at 88.4 percent and o1 at 94.3 percent versus 85.0 percent for clinician respondents. Those are real numbers, and they should humble the profession. They also should not seduce us into confusing test performance with clinical stewardship.

The Part of Medicine That Cannot Be Prompted

The obvious reading is that AI will take over the tasks we dislike. The clinic teaches otherwise. The most vulnerable parts of practice are also the most human parts: telling a patient that the test was negative but the pain is real, absorbing uncertainty without handing it back as false confidence, and deciding when a diagnosis is less useful than a relationship.

I have had patients ask for “the AI version” of my explanation, and I understand why. They want clarity. I do too. But I would not let a model become the final voice in a high-stakes conversation unless I had reviewed the underlying facts and judged the answer myself. I would not outsource consent discussions, discharge counseling, or any decision where the social meaning of the recommendation matters as much as the recommendation itself.

That is my line. It is not anti-technology. It is pro-accountability.

What I Would Not Do

I would not ask a model to be a physician surrogate and then act surprised when it behaves like a system optimized for plausibility. I would not let an AI draft be signed as if polish were the same thing as truth. I would not build a clinic culture in which younger doctors learn to prompt before they learn to observe.

And I would not treat physician identity as a branding exercise. The work of medicine is not the performance of expertise, it is the disciplined use of expertise in the presence of suffering, ambiguity, and time pressure.

What Gets Lost, What Might Be Saved

The profession will survive if it stops worshiping the myth that every valuable physician action must be unique and unreproducible. Some tasks should absolutely be automated. Drafting notes, organizing histories, surfacing guideline-based options, and flagging inconsistency can free clinicians for better work. I have seen this in my own practice, and I welcome it.

But the saving insight is different. AI can absorb the performative parts of expertise. It cannot inherit the moral burden of deciding what matters, when to slow down, or when to say, “I do not know yet.” That burden remains ours.

There is a reason patients still want a doctor to look them in the eye. The eye contact is not a ceremony. It is a promise that someone is carrying the uncertainty with them.

Back at the Exam Table

When I finally answered the woman who had shown me her phone, I told her the truth I have been learning the hard way. “You are not paying me to be a search engine,” I said. “You are paying me to know what to do with the answer, what to ignore, what to verify, and when the answer itself is the wrong question.”

She nodded, but more importantly, she relaxed. The machine on her screen had not been her real question. The real question was whether medicine still had a human center. It does, but only if we protect it.

That is what the prompt trap makes clear. When expertise becomes a prompt, the profession is tempted to confuse reproduction with judgment. The physician’s job is to keep those apart.

For more on the physician perspective behind this essay, see Dr. Sina Bari, Stanford-trained surgeon, and related writing at sinabarimd.com.

FAQ

What happens if a clinician copies an AI-generated note without reviewing it?

The note can look complete while quietly preserving hallucinated details, omitted negatives, or a misleading assessment. In practice, that creates risk in handoffs, billing, and follow-up because the chart becomes more polished than the underlying clinical reasoning.

Can AI help physicians without weakening clinical judgment?

Yes, if it is used for narrow tasks like drafting, summarizing, or surfacing guideline-based options and the physician still owns the final judgment. The risk appears when the model starts setting the frame for the encounter, because framing is already a clinical act.

What is the physician identity crisis in the age of AI?

It is the feeling that the parts of medicine once treated as uniquely human, especially pattern recognition and explanation, can now be simulated by a system that sounds confident. The deeper crisis is deciding which parts of expertise are technical and which parts are moral, relational, and accountable.

Why does persona prompting matter to medicine?

Because giving a model an expert persona can make it sound more credible while worsening accuracy in some settings. That tradeoff mirrors a clinical danger, where confidence can outrun verification and a smooth explanation can mask a wrong one.

What is Dr. Sina Bari's approach to AI in clinical work?

Dr. Sina Bari’s approach is to use AI for support, not surrender, and to keep responsibility with the physician who understands the patient in context. That means verifying outputs, refusing to automate judgment-heavy conversations, and using technology to buy time for better human care.