Irish EHR Pulse-- Dr. Conor

Data-quality and AI-readiness risk — hidden costs for EHR-based AI

Data-quality and AI-readiness risk — hidden costs for EHR-based AI

Key Questions

What are the primary data quality issues in EHRs for AI applications?

EHR data suffers from 40–60% noise, GP data gaps, real-world inconsistencies, and normalization losses that challenge machine learning models. These issues lead to hallucinations and hamper AI readiness for initiatives like HSE's One Health Record.

How does poor data quality impact AI in healthcare?

Messy data causes inconsistencies in ML for life sciences and reduces accuracy in AI screening for diseases like fibrosis, Alzheimer’s, and pancreatitis. Manitex emphasizes the need for clean data over unverified AI outputs to avoid hidden costs.

What AI innovations address EHR data challenges?

Test-free AI screening uses raw EHR histories with explainable models, APIs, and digital twins for diseases. A two-stage NLP-LLM system validates extraction from unstructured records, as shown in BMJ studies.

Why is data cleaning crucial for HSE's AI readiness?

HSE One Health Record faces AI-readiness risks due to data messiness; event streams and provenance fixes are recommended. Funding for de-identification and cleaning is advised, reinforced by US/UK analyses.

What benchmarks evaluate LLMs in real-world EHRs?

PhysicianBench assesses LLM agents in EHR environments for tasks like data extraction. Studies highlight the need for clean data to prevent failures in complex disease screening from raw histories.

~40–60% noise; GP data gaps; real-world inconsistencies/normalization loss challenge ML/life sciences. PhysicianBench underscores real EHR challenges for LLM agents; test-free AI screening for fibrosis/Alzheimer’s/pancreatitis from raw EHR histories (explainable models/APIs/digital twins) validates NLP-LLM extraction (BMJ); Manitex stresses clean data vs hallucinations. Messiness hampers HSE One Health Record AI-readiness; event streams/provenance fixes. Fund de-id/cleaning; US/UK analyses reinforce.

Sources (3)
Updated May 6, 2026