INTERNALcallingallminds.com/test-simplify

Grok Model Comparison — Simplify

Testing model behaviour on question-titled text blocks. The critical failure mode: models answer the question instead of simplifying it.
grok-4.3 is the current production model (expensive: $1.25/$2.50 per 1M). Testing llama-3.1-8b-instant ($0.05/$0.08) and openai/gpt-oss-20b ($0.075/$0.30) via Groq as cheaper alternatives.

PASS — simplified correctly WARN — output too long or has markdown FAIL — answered the question or added preamble
Runs all 6 preset blocks simultaneously across all 3 models
Q1Question title — simple
INPUT

Why Are Workplaces Experiencing Higher Disclosure of ADHD?

Q2Question title — complex body
INPUT

Can Neuroinclusion Revolutionise Your Business Performance? The implementation of comprehensive neuroinclusive workplace policies demonstrates statistically significant improvements in employee productivity, retention rates, and organisational innovation metrics. Enterprises that systematically accommodate neurodivergent cognitive profiles report measurably superior outcomes across multiple performance indicators.

Q3Question title — medical/technical
INPUT

What Are the Long-Term Neurological Consequences of Untreated ADHD in Adults? Longitudinal research indicates that individuals with attention-deficit/hyperactivity disorder who do not receive appropriate therapeutic intervention experience accelerated deterioration in executive function, working memory capacity, and emotional regulation capabilities compared to neurotypical cohorts.

Q4Question title — legal/compliance
INPUT

Are Organisations Legally Obligated to Provide Reasonable Adjustments for Neurodivergent Employees? Under the Equality Act 2010, employers have a statutory duty to implement reasonable adjustments for employees whose neurodivergent conditions constitute a disability as defined by the legislation. Failure to comply with this obligation may expose organisations to significant legal liability and reputational consequences.

Q5Question title — ambiguous short
INPUT

Is Your Website Accessible?

Q6Question title — rhetorical
INPUT

Why Does Accessibility Still Feel Like an Afterthought? Despite decades of advocacy and increasingly stringent regulatory frameworks, accessibility considerations continue to be deprioritised during the initial design and development phases of digital products. This systemic marginalisation of disabled users perpetuates exclusionary digital environments.

Q7Custom input
INPUT

Notes

  • Column 1 calls xAI direct API (api.x.ai). Columns 2 & 3 call Groq (api.groq.com).
  • grok-4.3 / none — current production model via direct xAI API (XAI_TOOLBAR_API_KEY). $1.25/$2.50 per 1M. Expensive baseline.
  • llama-3.1-8b-instant — Groq (GROQTOOLBAR_API_KEY). $0.05/$0.08 per 1M · 560 t/s. ~25× cheaper on input.
  • openai/gpt-oss-20b — Groq (GROQTOOLBAR_API_KEY). $0.075/$0.30 per 1M · 1000 t/s. Fastest option.
  • PASS/WARN/FAIL is heuristic-based. Always read the output carefully — a PASS may still have subtle issues.
  • The production simplify route currently calls Grok via Azure endpoint. Migration to direct xAI API (or a cheaper Groq model) is planned based on these results.
Internal tool · callingallminds.com/test-simplify · Not indexed