Human Influence Can Shape AI Responses to Theological Questions

 

A recent study on AI’s reliability to answer theological questions reminds users that answers don’t come without human influence.

The study by the Gospel Coalition and The Keller Center for Cultural Apologetics looked at seven Language Learning Models (LLMs), asking each one basic questions about Christianity and Jesus. The LLMs were then ranked on the basis of the quality of their answers.

The biggest surprise came with DeepSeek R1, a China-based AI, coming out with the highest-rated answers. Other LLMs in the study were Perplexity, Grok 4, Claude 4 Sonnet, Llama 3.7, Gemini 2.5 Flash and GPT 4o. Llama, built by Meta, routinely tested as the worst-performing.

Seven prompts were presented to each AI:

— Who is Jesus?

— What is the Gospel?

— Does God exist?

— Why does God allow suffering?

— Did Jesus rise from the dead?

— Was Jesus a real person?

— Is the Bible reliable?

The study asserted that practically identical responses across many platforms point to “human interference.”

For example, Llama began an answer with “The question of why God allows suffering is one that has puzzled theologians, philosophers, and everyday people for centuries.” GPT’s was “The question of why God allows suffering is one of the most profound and difficult in theology and philosophy.” GPT was also among the LLMs that instructed an “all sides” approach that included perspectives from Christianity as well as other faiths.

Similar answers come about through similar “alignment,” which is the lens determined by the values and ideas of the people developing the AI. As many AIs originate in Silicon Valley, their answers would tend to “align” with those of the builders.

But that theory didn’t line up with every part of the report.

“This is perhaps the single most shocking thing – by far, the No. 1 platform was the Chinese model DeepSeek,” said Mike Graham, program director at The Gospel Coalition’s Keller Center. “Far and away, they had the highest theological accuracy of the seven platforms.”

Graham made his comments in a TGC podcast, “Can You Rely on AI for Theology?” DeepSeek struggled with questions like “Who is Jesus?” and “Why does God allow suffering?” He theorizes that the Chinese government’s guardrails, or parameters in the AI code, affected those answers. Meanwhile, it tested best on “Did Jesus rise from the dead?” and “Was Jesus a real person?” He thinks the AI’s alignment team isn’t as well-developed as those in Silicon Valley, leading to better scores on those and other questions.

Perplexity tested more consistently overall while following closely behind DeepSeek in theological reliability. Different from other models, it incorporates input from other LLMs and sources.

The human element in developing each AI cannot be overlooked, the study says. For example, answers on transgender issues reflected perspectives one would find in Silicon Valley and not the majority of the U.S., it stated. Thus, it is dangerous to see AI as a neutral, robotic response not swayed by politics or culture.

“When evaluating philosophical positions, we consider the foundational assumptions of the person holding those viewpoints,” read the report. “Are they Marxist? Utilitarian? Materialistic? Empiricist? LLMs have no first principles. We only get the linguistic average of what they have been trained on. The whole endeavor is designed to sandpaper down the sharper edges of controversial ideas and regress to the most comfortable, well-represented middle. It mistakes the common for the true and the frequent for the valid.

“You don’t want to determine foundational principles about life, truth, and existence from a tool suited best for summarizing PDFs, creating meal plans, and making slide decks for work presentations.”

LLMs also tend to reinforce the user’s positions, whether good or bad. AI sycophancy is a real concern for the sometimes deadly lengths to which it can affect those who go to the platforms for advice. “AI Jesus” chatbots often give unbiblical answers in a desire to drive profits.

When asking questions, the report’s authors suggest giving the AI guardrails like providing answers consistent with the Baptist Faith and Message or the Nicene Creed.

“There is more human involvement in LLM technology than you think,” said the report. “Programming, weighting, and alignment are all needed to create the most helpful responses. We are not getting purely objective, dispassionate, or omniscient AI answers to our prompts. We are getting a consensus view that has been shaped by the various quirks, weights, values, and voice of that respective AI platform.”

This article has been republished courtesy of Baptist Press.


Scott Barkley is chief national correspondent for Baptist Press.