Benchmark — GaiaLab Scientific Evaluation

Live Scores

Scientific Quality Metrics

Four core quality signals measured continuously across every analysis. Green = meets bar, amber = marginal, red = below threshold.

Citation Coverage

—

% insights with ≥1 PMID

Grounded Ratio

—

% claims verified by NLI

Cite-F1

—

Citation precision × recall

PMID Hallucination

—

% fabricated citations (lower = better)

Historical Trend

Quality Over Time

Last 30 analysis runs. Each point represents one analysis session.

Score Trend

Citation coverage, grounded ratio, and confidence index across recent runs

Citation Coverage Grounded Ratio Confidence Index

Analysis Volume

Platform Activity

Cumulative counts across all GaiaLab analysis sessions.

—

Total Analyses

—

Saved Snapshots

35+

Data Sources

8

MCP Tools

Reproducibility

Run Your Own Evaluation

All benchmarks are reproducible. Clone the repo and run the gold standard eval in under 5 minutes.

# Clone and install
git clone https://github.com/gaialab/gaialab-app
cd gaialab-app && npm install

# Run gold standard benchmark (requires ANTHROPIC_API_KEY or OPENAI_API_KEY)
npm run eval:gold

# Run trust & reliability benchmarks
npm run eval:trust

# Generate HTML dashboard of results
npm run eval:dashboard

# Or run a single eval file directly:
node scripts/gaialab-eval.js --benchmark=gold --bypassCache=true

→ Benchmark Methodology → Trust Report → MCP API Docs → Validation Log

Benchmark Dashboard

Scientific Quality Metrics

Quality Over Time

Score Trend

Platform Activity

Run Your Own Evaluation