arxiv:2603.21854

Reasoning or Rhetoric? An Empirical Analysis of Moral Reasoning Explanations in Large Language Models

Published on Mar 23

· Submitted by

Aman Chadha on Mar 25

Upvote

Authors:

Aman Chadha ,

Abstract

Large language models exhibit post-conventional moral reasoning patterns inconsistent with human developmental trajectories, showing systematic logical incoherence and rhetorical sophistication without underlying moral reasoning development.

AI-generated summary

Do large language models reason morally, or do they merely sound like they do? We investigate whether LLM responses to moral dilemmas exhibit genuine developmental progression through Kohlberg's stages of moral development, or whether alignment training instead produces reasoning-like outputs that superficially resemble mature moral judgment without the underlying developmental trajectory. Using an LLM-as-judge scoring pipeline validated across three judge models, we classify more than 600 responses from 13 LLMs spanning a range of architectures, parameter scales, and training regimes across six classical moral dilemmas, and conduct ten complementary analyses to characterize the nature and internal coherence of the resulting patterns. Our results reveal a striking inversion: responses overwhelmingly correspond to post-conventional reasoning (Stages 5-6) regardless of model size, architecture, or prompting strategy, the effective inverse of human developmental norms, where Stage 4 dominates. Most strikingly, a subset of models exhibit moral decoupling: systematic inconsistency between stated moral justification and action choice, a form of logical incoherence that persists across scale and prompting strategy and represents a direct reasoning consistency failure independent of rhetorical sophistication. Model scale carries a statistically significant but practically small effect; training type has no significant independent main effect; and models exhibit near-robotic cross-dilemma consistency producing logically indistinguishable responses across semantically distinct moral problems. We posit that these patterns constitute evidence for moral ventriloquism: the acquisition, through alignment training, of the rhetorical conventions of mature moral reasoning without the underlying developmental trajectory those conventions are meant to represent.

View arXiv page View PDF Add to collection

Community

amanchadha

Paper author Paper submitter 1 day ago

The paper shows that LLMs systematically produce post-conventional moral rhetoric (Kohlberg Stages 5–6) independent of scale, prompting, or context—revealing “moral ventriloquism,” where models mimic advanced moral reasoning without underlying coherent reasoning processes.

➡️ 𝐊𝐞𝐲 𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬 𝐨𝐟 𝐌𝐨𝐫𝐚𝐥 𝐕𝐞𝐧𝐭𝐫𝐢𝐥𝐨𝐪𝐮𝐢𝐬𝐦 𝐀𝐧𝐚𝐥𝐲𝐬𝐢𝐬:

🧪 𝑲𝒐𝒉𝒍𝒃𝒆𝒓𝒈-𝑩𝒂𝒔𝒆𝒅 𝑫𝒊𝒂𝒈𝒏𝒐𝒔𝒕𝒊𝒄 𝑬𝒗𝒂𝒍𝒖𝒂𝒕𝒊𝒐𝒏:
Introduces a large-scale evaluation pipeline using Kohlberg’s moral development stages as a distributional diagnostic, scoring 600+ responses from 13 LLMs across 6 dilemmas and 3 prompting regimes via an LLM-as-judge system (multi-judge validated). Unlike prior work, it benchmarks developmental structure rather than surface correctness, enabling detection of stage distribution inversion vs. human norms .

🧩 𝑴𝒐𝒓𝒂𝒍 𝑽𝒆𝒏𝒕𝒓𝒊𝒍𝒐𝒒𝒖𝒊𝒔𝒎 & 𝑫𝒊𝒔𝒕𝒓𝒊𝒃𝒖𝒕𝒊𝒐𝒏𝒂𝒍 𝑰𝒏𝒗𝒆𝒓𝒔𝒊𝒐𝒏:
Finds that 86% of responses fall in Stages 5–6 (vs. ~20% in humans), with near-zero Stage 1–3 presence—an inversion of human developmental distributions (Table on p.6). Models exhibit cross-dilemma rigidity (ICC > 0.90) and negligible sensitivity to prompting (p = 0.15), indicating outputs are governed by a fixed rhetorical prior rather than contextual reasoning .

🧠 𝑴𝒐𝒓𝒂𝒍 𝑫𝒆𝒄𝒐𝒖𝒑𝒍𝒊𝒏𝒈 & 𝑨𝒍𝒊𝒈𝒏𝒎𝒆𝒏𝒕-𝑰𝒏𝒅𝒖𝒄𝒆𝒅 𝑹𝒉𝒆𝒕𝒐𝒓𝒊𝒄:
Identifies a novel failure mode—action–reasoning decoupling—where models produce high-stage justifications but choose lower-stage actions (logical inconsistency). Factorial analysis shows scale has limited effect (<1 stage range), while RLHF drives a shared “moral vocabulary manifold,” implying alignment training installs rhetorical patterns independent of decision processes .

librarian-bot

about 20 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2603.21854

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.21854 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.21854 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.21854 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.