Real-time scientific quality scores updated after every analysis run. All metrics are computed from live data — no cherry-picking.
Four core quality signals measured continuously across every analysis. Green = meets bar, amber = marginal, red = below threshold.
Last 30 analysis runs. Each point represents one analysis session.
Cumulative counts across all GaiaLab analysis sessions.
All benchmarks are reproducible. Clone the repo and run the gold standard eval in under 5 minutes.
# Clone and install git clone https://github.com/gaialab/gaialab-app cd gaialab-app && npm install # Run gold standard benchmark (requires ANTHROPIC_API_KEY or OPENAI_API_KEY) npm run eval:gold # Run trust & reliability benchmarks npm run eval:trust # Generate HTML dashboard of results npm run eval:dashboard
# Or run a single eval file directly: node scripts/gaialab-eval.js --benchmark=gold --bypassCache=true