← Methodology page · Trust & Transparency · Validation data

GaiaLab AI — Technical Platform White Paper

Version 0.1.1 · May 2026 Platform: https://www.gailabai.com

Abstract

GaiaLab is an AI-powered biological intelligence platform that transforms gene panels into structured, citation-linked research insights in under 60 seconds. The platform aggregates live data from 75+ biological databases,¹ applies a six-agent AI debate framework for hypothesis generation and critique, and produces PMID-grounded therapeutic hypotheses ranked by a six-factor scoring model. An MCP (Model Context Protocol) server interface allows direct integration into AI assistant workflows. All outputs are computational research hypotheses requiring independent experimental validation.

1. Introduction

Translational genomics research faces a reproducibility and synthesis bottleneck: the information required to reason about a gene panel is distributed across dozens of databases, literature corpora, and clinical trial registries, none of which share a common schema. A researcher submitting a panel of 3–10 genes to understand their disease biology must manually cross-reference PubMed, ClinicalTrials.gov, ChEMBL, gnomAD, UniProt, AlphaFold, KEGG, Reactome, COSMIC, and dozens of other sources — a process that takes days and produces results that are difficult to cite, compare, or reproduce.

GaiaLab automates this synthesis layer. It does not replace experimental biology; it accelerates the hypothesis-generation and literature-triangulation steps that precede it. Every output carries PMID citations, confidence labels, and an explicit evidence-quality flag so researchers can distinguish AI-generated inference from database-backed assertion.

2. Architecture Overview

2.1 Runtime Stack

ComponentTechnology

RuntimeNode.js 22, ESM modules throughout DeploymentRailway (Nixpacks, auto-deploy from GitHub main) DatabasePostgreSQL (Railway / Neon) via pg AI providersDeepSeek → OpenAI → Google Gemini → Anthropic Claude (failover chain) MCP interface@modelcontextprotocol/sdk v1.24 Python inferencePython 3.11 in py/ for SAE and ESM-2 protein embeddings CachingL1 in-memory LRU (1 h TTL) + optional L2 disk tier 3D visualizationThree.js + 3d-force-graph

2.2 Request Lifecycle

``User submits gene panel + disease context │ ▼ Rate / quota gate (tier-based: free / researcher / enterprise) │ ▼ Gene normalisation ── HGNC alias resolution, symbol canonicalisation │ ▼ Parallel data fetch ── 75+ sources via Promise.allSettled() [~15–40 s cold] │ ▼ Literature cache check ── 5-min in-memory keyed by genes + disease │ ▼ Evidence ledger build ── PMID validation, polarity classification, grounding attribution, knownPmid backfill │ ▼ Drug repurposing engine ── Six-factor scoring, tier assignment, AlphaFold structural bonus, DepMap essentiality │ ▼ Six-agent AI debate ── Parallel: Hypothesis · Critic · Evidence · Innovation · Risk · Synthesis agents │ ▼ Convergence scoring ── Cross-source family validation (PubMed, ClinicalTrials, FDA, ChEMBL, structural, network) │ ▼ Insight assembly + grounding gate ── PMID linkage, confidence labelling, pathway FDR │ ▼ Snapshot persistence + KG write ── PostgreSQL, shareable URL │ ▼ Structured JSON response ── Rendered in analyze.html widget`

Typical latency: 30–60 s cold start; < 1 s from L1 cache.

`3. Data Layer`

`3.1 Source Coverage`

GaiaLab fetches from 75+ biological databases¹ across seven domains. All clients follow the same defensive pattern: they never throw; they return partial results with an error field so Promise.allSettled() can continue regardless of upstream failures.

Gene annotation & variation AlphaFold EBI · ClinVar · Ensembl · gnomAD (variant + ancestry + constraint) · GWAS Catalog · HGNC · OMIM · UniProt · VEP · NCBI Gene · ClinGen · Monarch Initiative

Pathway & functional Enrichr · GO (Gene Ontology) · JASPAR (transcription factor binding) · KEGG · MSigDB · PathwayCommons · Reactome · ChEA3

Interaction & network BioGRID · IntAct · STRING · STRING-DB partners · ComplexPortal · SynLethDB (synthetic lethality)

Literature PubMed (NCBI Entrez) · Europe PMC · PubMed Central full-text (JATS XML → quantitative extraction: IC50, p-values, HR, OR, fold-changes) · bioRxiv · Semantic Scholar · OpenAlex · preprint monitor

Drug & clinical ChEMBL · CIViC · ClinicalTrials.gov API v2 · DGIDb · DrugBank · DrugCentral · FDA Regulatory · OncoKB · OpenFDA adverse events · OpenTargets (genetics + disease–gene association) · OpenTargets Genetics · PubChem (compound + bioassay) · RxNorm DDI · TTD · PharmGKB · HMDB (metabolomics)

Omics & cancer CBIOPORTAL · COSMIC Signatures · CPTAC (proteomics) · DepMap (cancer dependency + co-essentiality) · GDSC (drug sensitivity) · GTeX (expression + eQTL) · HPA (protein atlas) · MetaboLights · PRIDE (proteomics) · ProteomicsDB · scRNA (Cell × Gene) · TCGA (mutation + survival) · AGR (Alliance of Genome Resources)

Structural AlphaFold (pLDDT → druggability score) · PDB (experimental structures) · NGL molecular viewer

Regulatory / patent Drug resistance intelligence · FDA regulatory intelligence · Patent status + expiry · Regulatory intelligence client · LINCS (perturbation signatures)

`3.2 Aggregator Architecture`

Sources are grouped into seven aggregators (src/data/aggregators/):

- Gene aggregator — symbol resolution, aliases, disease associations

Pathway aggregator — enrichment with Benjamini–Hochberg FDR (genome size 21,000; three-tier labels: significant / suggestive / exploratory at 0.05 / 0.20 / 1.0)


Literature aggregator — PubMed search + PMC full-text + secondary sources; polarity classification per PMID
Interaction aggregator — PPI networks, co-essentiality, synthetic lethality
Clinical aggregator — trial matching, survival stratification, ClinVar pathogenicity
Drug aggregator — ChEMBL bioactivity, OpenFDA, trial–drug linkage

Interpretability aggregator — SAE / ESM-2 feature extraction (optional; requires GAIALAB_INTERPRETABILITY_ENABLED=1

)
4. Drug Repurposing Engine
4.1 Six-Factor Scoring Model
Every candidate drug receives a score from 0–100 derived from six independently calibrated factors:
FactorWeightSignal captured

Target match (targetMatch)0.30Direct binding evidence against panel genes (ChEMBL pChEMBL ≥ 6, confirmed binding targets, DGIDb)


Clinical evidence (

clinicalEvidence

)0.25FDA approval status, phase, ClinicalTrials.gov trial outcomes, OncoKB actionability
Mechanism relevance (

mechanismRelevance

)0.20MOA alignment to disease pathway (kinase inhibitor × kinase pathway, etc.)
Context relevance (

contextRelevance

)0.12Disease co-mention in literature and clinical data; OpenTargets association score
Pathway overlap (

pathwayOverlap

)0.08Enriched pathway membership overlap with drug targets
Safety profile (

safetyProfile

)0.05OpenFDA adverse event burden; Lipinski / ADMET flags
Weights were calibrated against 10 known disease–drug pairs from OpenTargets and clinical guidelines.
4.2 Bonus Signals
Beyond the six factors, three bonus signals can increase the final score:
- AlphaFold structural bonus (+0 to +10): derived from mean pLDDT of the target protein (pLDDT ≥ 80 → +10; ≥ 70 → +6; ≥ 60 → +3; < 60 → 0). Fetched from the AlphaFold EBI API.
DepMap essentiality bonus (+0 to +8): drugs targeting genes with high cancer dependency scores receive a signal reflecting essentiality in disease cell lines.
Network proximity bonus (+0 to +5): drugs whose targets are within two hops of panel genes in the STRING / BioGRID interaction network.
4.3 Tier Assignment
TierScore rangeInterpretation
I≥ 70Strong mechanistic + clinical evidence; primary recommendation
II50–69Moderate evidence; warranting deeper review
III< 50Exploratory; context-specific interest only
FDA-approved on-panel drugs receive a score floor of 70 (Tier I guaranteed) regardless of context relevance, because the approval represents validated clinical evidence. Off-label FDA-approved drugs with non-zero context relevance receive a floor of 35.

Context penalty: Off-label drugs with contextRelevance < 20 receive a ×0.3 score multiplier; those with contextRelevance 20–34 receive ×0.45. These are mutually exclusive — only the most severe penalty applies.

`4.4 Convergence Scoring`

Each drug is evaluated against six independent evidence families. A drug scoring 4/6 or higher is considered "convergent" — supported by multiple orthogonal source types rather than a single strong signal:

FamilyPasses when

pubmed≥ 1 PMID linked to drug–disease combinationclinicaltrials≥ 1 trial record in ClinicalTrials.govfdaFDA approved, phase ≥ 3, OR phase label matches "approved/phase 3/4"chemblConfirmed binding targets or bioactivity records existstructuralAlphaFold pLDDT ≥ 50, PDB structures present, or hasAlphaFold=truenetwork≥ 3 interaction neighbours or hasNetworkProximity=true

`5. Evidence Grounding System`

`5.1 The Grounding Problem`

AI-generated insights are only as credible as their citations. GaiaLab's grounding pipeline ensures that every strategy card and pathway insight either carries validated PMIDs or is explicitly labelled as a hypothesis.

`5.2 PMID Attribution Pipeline`

Attribution occurs in two passes:

Pass 1 — _preAttributePmids: After data fetch, each insight item is matched against the full literature pool. Matching uses token overlap between the item's text (label + mechanism + gene symbols) and each paper's title + abstract. A gene-symbol match requires only 1 overlapping token; non-gene content requires 2. Up to 2 PMIDs are collected per item (collecting two is important — items with ≥ 2 PMIDs plus a disease/gene text match reach grounded status, the highest evidence tier).

Pass 2 — buildInsightLinks: Inside the polarity assignment loop, secondary attribution runs the same gene-aware 1-token / 2-token rule against the known-PMID set, collecting up to 2 additional papers.

`5.3 Evidence Status Classification`

Each insight item receives an evidenceStatus:

StatusCriteriaGrounding weight

grounded≥ 1 PMID + supporting polarity, OR ≥ 2 PMIDs + disease/gene text match1.0supported1 PMID + disease/gene text match0.6hypothesis0 PMIDs0.0

The grounded ratio — the proportion of pathway + strategy items reaching grounded or supported — is reported on every analysis and monitored by the canary suite. Observed range: 28% (cold start, PubMed rate-limited) to 70%+ (warm cache, full paper pool).

`5.4 Polarity Classification`

Each linked PMID is classified as support, neutral, or contradict relative to the insight claim. Items in the knownPmids set with neutral polarity are promoted to support (papers confirmed as relevant by prior analysis are treated as supporting unless explicitly classified as contradictions).

`6. Multi-Agent Reasoning Framework`

`6.1 Six-Agent Debate`

GaiaLab uses a structured six-agent debate (enabled by default via GAIALAB_MULTI_AGENT_ENABLED=true) where each agent has a distinct epistemic role:

AgentRole

HypothesisAgentGenerates primary mechanistic hypotheses from pathway and gene evidence CriticAgentChallenges hypothesis validity; flags unsupported claims, contradictions EvidenceAgentGrounds claims in database facts (ChEMBL bioactivity, OpenTargets scores, clinical trial outcomes) InnovationAgentIdentifies non-obvious therapeutic angles; queries active recruiting trials RiskAgentEvaluates adverse event burden, drug–drug interactions, safety flags SynthesisAgentIntegrates the five prior perspectives into a consensus output

Each agent operates independently in parallel. The SynthesisAgent receives all five outputs and produces the final structured response.

`6.2 Live Database Enrichment per Agent`

Before the debate, src/ai/agent-data-retrieval.js pre-fetches four live APIs and formats the results per agent:

- OpenTargets disease–gene associations → EvidenceAgent + CriticAgent

ClinicalTrials.gov v2 active recruiting trials → InnovationAgent


OpenFDA adverse event counts per drug → RiskAgent

ChEMBL bioactivity + pChEMBL values → EvidenceAgent + CriticAgent
This grounds agent reasoning in current database state rather than training-data recall.
6.3 AI Provider Failover
The platform supports four AI providers in priority order:


DeepSeek → OpenAI GPT-4o → Google Gemini Pro → Anthropic Claude


Each analysis attempt tries the primary provider; if it times out or returns an error, the next provider is tried. Only after all providers fail does the system return an error. This ensures high availability during provider-side outages.
7. Knowledge Graph & Population Insights
7.1 Knowledge Graph

Every completed analysis writes to a PostgreSQL knowledge graph (kg_nodes + kg_edges + kg_cooccurrence tables). Nodes represent genes, drugs, pathways, and disease contexts. Edges represent:

- Drug → target (gene) bindings

Gene → pathway memberships


Gene → disease associations
Drug → pathway links (derived from target memberships)
The KG accumulates cross-analysis signal over time. Endpoints:

GET /api/knowledge-graph/stats — node/edge counts

GET /api/knowledge-graph/drugs?disease= — top drugs by disease context

GET /api/knowledge-graph/gene-neighbors?gene=

 — PPI neighbourhood
7.2 Population-Level Insights

scripts/aggregate-insights.js reads all stored snapshots and computes gene, pathway, and drug frequency + co-occurrence across all analyses run on the platform. This surfaces cross-patient, cross-study signal that single-analysis views cannot show. The Research Intelligence Panel on the homepage shows calibration charts and frequency pills derived from this aggregate.

`7.3 Prediction Tracking`

src/utils/prediction-tracker.js records every drug-disease prediction at analysis time, then polls ClinicalTrials.gov v2 periodically for outcome updates. This enables prospective calibration: the fraction of predictions that are eventually validated by trial completion is tracked as a calibration curve and reported on the platform's validation page.

`8. Intelligence Boards`

GaiaLab maintains 10 active disease intelligence boards — living summaries updated on a 24-hour refresh cycle. Each board aggregates evidence across the KG, surfaces emerging contradictions, and sends email alerts when new contradictions are detected against prior conclusions.

Current boards:


Breast Cancer · Triple-Negative Breast Cancer · Non-Small Cell Lung Cancer · Colorectal Cancer · Glioblastoma · Prostate Cancer · Ovarian Cancer · Melanoma · Pancreatic Ductal Adenocarcinoma · Alzheimer's Disease

Board data is stored in PostgreSQL (disease_boards, board_evidence_items, board_alerts tables). Case study pages for selected boards (lecanemab/AD, adagrasib/NSCLC, adagrasib/PDAC) provide focused mechanistic analysis with open research questions.

`9. MCP Server Interface`

GaiaLab exposes a Model Context Protocol server at POST /mcp, allowing AI assistants (Claude Desktop, custom agents built with the Anthropic Agent SDK) to call the platform as a tool.

Tool: gaialab_generate_insights

Input schema (Zod-validated):`json { "genes": ["string"], "diseaseContext": "string", "audience": "researcher | clinician | general" }`

Each POST creates a fresh McpServer + StreamableHTTPServerTransport instance. Responses carry Access-Control-Allow-Origin: * for cross-origin use. The MCP interface is the primary integration surface for research workflow automation.

`10. Workspace & Collaboration`

Authenticated users (Stripe-backed tiers) receive persistent workspaces with:

- Saved analyses — shareable snapshot URLs, reproducible replay

Analysis history — timeline of past gene panels and results


Workspace memory — cross-session context for the chat assistant
Report export — PDF export with PMID-gated trust score (blocked if valid-PMID rate < 70%)
Weekly digests — emailed summaries of board updates and new contradictions relevant to saved analyses

Row-level security — per-workspace PostgreSQL isolation (when GAIALAB_RLS_ENABLED=true

)
Subscription tiers:


TierDaily analysesDrug repurposingExport
FreeLimited (IP-gated)BasicNo
ResearcherUnlimitedFull (all tiers)PDF + CSV
EnterpriseUnlimitedFull + matrixFull + API
11. SAE / ESM-2 Interpretability (Optional)

When GAIALAB_INTERPRETABILITY_ENABLED=1, GaiaLab spawns Python 3.11 subprocesses from py/ to run sparse autoencoder (SAE) inference over ESM-2 protein language model embeddings. This surfaces learned biological features from the ESM-2 representation that are not explicitly encoded in database annotations — an experimental interpretability layer for protein function.

Requirements: PostgreSQL, Python 3.11, py/requirements.txt, ESM-2 model cache. This feature is disabled by default on the public deployment.

`12. Evaluation & Benchmarking`

`12.1 AUROC (Retrospective)`

A retrospective AUROC of 0.545 (95% CI bootstrap: 0.526–0.562) was computed in March 2026 against 529 predictions across 22 disease areas, using ClinicalTrials.gov completed trial matches as the gold standard. The random baseline is 0.50. This represents a modest but consistent signal above random. This is not a clinically validated predictor.

`12.2 Temporal Holdout`

A separate temporal holdout benchmark using 22 known drug approvals (held out by year) showed AUROC 0.90 with 8/8 negative controls correctly rejected. This benchmark is on a smaller, curated dataset and should be interpreted accordingly.

`12.3 Grounding Rate`

The primary quality signal for day-to-day health is the grounding ratio — fraction of insight items with at least one validated PMID. The canary suite monitors this on every push. The npm run verify:engineering gate includes a grounding check (test:grounding-gate).

`12.4 Continuous Evaluation`

The evaluation suite (scripts/gaialab-eval.js) supports:

NDCG@10 for drug ranking quality


Paired t-test vs. baseline for significance

Gold standard benchmarks (data/benchmarks/gold-standard.json)

Trust and reliability benchmarks (data/benchmarks/trust-benchmarks.json)

Snapshot replay for regression detection
13. Engineering Quality Gates

All deployments must pass npm run verify:engineering, which runs in sequence:

1. security:scan — secret detection (no API keys, connection strings, or credentials in committed code)

2.

test:critical

 — critical flow tests
3.

test:grounding-gate

 — grounding ratio threshold check
4.

test:biomedical-trust

 — biomedical claim credibility checks
5.

test:biomedical-trust:standard

 — golden test suite for trust surfaces
6.

test:week2 through test:week6

 — regression suites per weekly milestone
7.

test:drug-scoring

 — drug scoring unit tests
8.

test:ad-regression

 — Alzheimer's disease regression
9.

test:contracts

 — 60+ API contract tests
10.

test:export-surfaces, test:provenance-surfaces, test:critical-flows, test:ui-trust-surfaces

11.

test:smoke, test:lineage, test:nav-e2e

12.

test:concordance-integrity

 — PMID annotation concordance
13.

test:sse-stream, test:kg-explorer, test:drug-paywall, test:prod-smoke

The gate is enforced by a canary runner (scripts/canary.js) that executes a full live analysis against https://www.gailabai.com and validates: completion time, drug candidate count, grounded ratio, job failure rate, and trust page availability.

`14. Security Model`

- Secret detection: scripts/check_secrets.jsruns as a git pre-commit hook and in every CI job. Matches known patterns for API keys, connection strings, and credentials.

Stripe webhook verification: All payment webhook events are verified using stripe.webhooks.constructEvent() with a required STRIPE_WEBHOOK_SECRET. Unsigned webhooks are rejected with HTTP 400.


Rate limiting: IP-based daily quota for free-tier users; API key tier for researcher/enterprise.
No SSRF exposure: All outbound API calls are to fixed, known biological database URLs with 30-second timeouts. User input does not influence outbound URL construction.

PMID validity gate: PDF export is blocked when the valid-PMID rate in an analysis falls below a configurable threshold (default 70%), preventing export of poorly-grounded reports.
15. Deployment

GaiaLab is deployed on Railway via Nixpacks with automatic deploys from the main branch of the GitHub repository. The production URL is https://www.gailabai.com.

Environment variables required for full functionality:

VariablePurpose

DEEPSEEK_API_KEYPrimary AI provider

OPENAI_API_KEY

AI failover #1

GOOGLE_API_KEY

AI failover #2

ANTHROPIC_API_KEY

AI failover #3

DATABASE_URL

PostgreSQL connection

STRIPE_SECRET_KEY

Payments

STRIPE_WEBHOOK_SECRET

Webhook signature verification

NCBI_API_KEYPubMed rate: 3 → 10 req/s

Optional premium data source keys: BIOGRID_API_KEY, DISGENET_API_KEY, DRUGBANK_API_KEY, SEMANTIC_SCHOLAR_API_KEY`.

16. Limitations & Responsible Use

GaiaLab is a computational research tool. Its outputs are hypotheses, not clinical recommendations. Specific limitations:

- All therapeutic suggestions require independent experimental validation before any clinical application.

Drug rankings are scoring artifacts, not clinical priority lists. The six-factor model was calibrated on known drug–disease pairs and may not generalise to novel disease contexts.

AUROC 0.545 is a modest signal above random (0.50). The platform cannot reliably predict which specific drug will succeed in a given trial.

Coverage is 19–22 sources without paid API keys (DisGeNET, DrugBank require subscriptions). Coverage metrics shown in the UI reflect available data only.

Grounding varies with PubMed rate limits and cache state. Cold-start analyses (first run after server restart) may show grounding ratios of 28–35%; subsequent cached runs reach 65–70%.

SAE interpretability is experimental and disabled by default. ESM-2 features have not been independently validated for clinical relevance.

The platform does not ingest or process patient-level data. All analyses operate on gene symbols and disease context strings.

17. Citation

If you use GaiaLab in published research, please cite:

*This document reflects the platform as of version 0.1.1, May 2026. For the latest implementation details, refer to the source repository.*

¹ Active source count varies with API key configuration. Full source list in Section 3.1. Without optional paid keys (DisGeNET, DrugBank), active coverage is approximately 60 sources. The count of 75+ reflects the full set of integrated clients shipped with the platform.