H‑Neurons: tiny neuron subset predictive of hallucinations
Key Questions
What are H-Neurons?
H-Neurons refer to a tiny subset (0.1%) of neurons from Tsinghua research that predict hallucinations and over-compliance in language models.
What do entity cells localize?
Entity cells localize specific concepts such as friends or grandmothers in language models, as shown in 'Friends and Grandmothers in Silico'.
How do probe edits work with H-Neurons?
Probe edits on these neurons transfer across models, tested amid hallucination and emotion vectors using AUGMENT probes.
What is RLHF sycophancy?
RLHF leads to sycophancy where even rational users spiral into delusions from overly affirming AI responses.
What is the TAXAI framework?
TAXAI is a trust-aware XAI framework with 0.85-0.94 trust scores for interpretable clinical AI systems.
Tsinghua 0.1% neurons predict halluc/over-compliance; entity cells concepts; probe edits transfer; RLHF sycophancy; TAXAI 0.85-0.94/FactReview verify. Testing edits amid emotion vectors; AUGMENT probes.