# GaiaLab AI — Technical Platform White Paper **Version 0.1.1 · May 2026** **Platform:** https://www.gailabai.com --- ## Abstract GaiaLab is an AI-powered biological intelligence platform that transforms gene panels into structured, citation-linked research insights in under 60 seconds. The platform aggregates live data from 75+ biological databases,¹ applies a six-agent AI debate framework for hypothesis generation and critique, and produces PMID-grounded therapeutic hypotheses ranked by a six-factor scoring model. An MCP (Model Context Protocol) server interface allows direct integration into AI assistant workflows. All outputs are computational research hypotheses requiring independent experimental validation. --- ## 1. Introduction Translational genomics research faces a reproducibility and synthesis bottleneck: the information required to reason about a gene panel is distributed across dozens of databases, literature corpora, and clinical trial registries, none of which share a common schema. A researcher submitting a panel of 3–10 genes to understand their disease biology must manually cross-reference PubMed, ClinicalTrials.gov, ChEMBL, gnomAD, UniProt, AlphaFold, KEGG, Reactome, COSMIC, and dozens of other sources — a process that takes days and produces results that are difficult to cite, compare, or reproduce. GaiaLab automates this synthesis layer. It does not replace experimental biology; it accelerates the hypothesis-generation and literature-triangulation steps that precede it. Every output carries PMID citations, confidence labels, and an explicit evidence-quality flag so researchers can distinguish AI-generated inference from database-backed assertion. --- ## 2. Architecture Overview ### 2.1 Runtime Stack | Component | Technology | |---|---| | Runtime | Node.js 22, ESM modules throughout | | Deployment | Railway (Nixpacks, auto-deploy from GitHub main) | | Database | PostgreSQL (Railway / Neon) via `pg` | | AI providers | DeepSeek → OpenAI → Google Gemini → Anthropic Claude (failover chain) | | MCP interface | `@modelcontextprotocol/sdk` v1.24 | | Python inference | Python 3.11 in `py/` for SAE and ESM-2 protein embeddings | | Caching | L1 in-memory LRU (1 h TTL) + optional L2 disk tier | | 3D visualization | Three.js + 3d-force-graph | ### 2.2 Request Lifecycle ``` User submits gene panel + disease context │ ▼ Rate / quota gate (tier-based: free / researcher / enterprise) │ ▼ Gene normalisation ── HGNC alias resolution, symbol canonicalisation │ ▼ Parallel data fetch ── 75+ sources via Promise.allSettled() [~15–40 s cold] │ ▼ Literature cache check ── 5-min in-memory keyed by genes + disease │ ▼ Evidence ledger build ── PMID validation, polarity classification, grounding attribution, knownPmid backfill │ ▼ Drug repurposing engine ── Six-factor scoring, tier assignment, AlphaFold structural bonus, DepMap essentiality │ ▼ Six-agent AI debate ── Parallel: Hypothesis · Critic · Evidence · Innovation · Risk · Synthesis agents │ ▼ Convergence scoring ── Cross-source family validation (PubMed, ClinicalTrials, FDA, ChEMBL, structural, network) │ ▼ Insight assembly + grounding gate ── PMID linkage, confidence labelling, pathway FDR │ ▼ Snapshot persistence + KG write ── PostgreSQL, shareable URL │ ▼ Structured JSON response ── Rendered in analyze.html widget ``` Typical latency: 30–60 s cold start; < 1 s from L1 cache. --- ## 3. Data Layer ### 3.1 Source Coverage GaiaLab fetches from **75+ biological databases**¹ across seven domains. All clients follow the same defensive pattern: they never throw; they return partial results with an `error` field so `Promise.allSettled()` can continue regardless of upstream failures. **Gene annotation & variation** AlphaFold EBI · ClinVar · Ensembl · gnomAD (variant + ancestry + constraint) · GWAS Catalog · HGNC · OMIM · UniProt · VEP · NCBI Gene · ClinGen · Monarch Initiative **Pathway & functional** Enrichr · GO (Gene Ontology) · JASPAR (transcription factor binding) · KEGG · MSigDB · PathwayCommons · Reactome · ChEA3 **Interaction & network** BioGRID · IntAct · STRING · STRING-DB partners · ComplexPortal · SynLethDB (synthetic lethality) **Literature** PubMed (NCBI Entrez) · Europe PMC · PubMed Central full-text (JATS XML → quantitative extraction: IC50, p-values, HR, OR, fold-changes) · bioRxiv · Semantic Scholar · OpenAlex · preprint monitor **Drug & clinical** ChEMBL · CIViC · ClinicalTrials.gov API v2 · DGIDb · DrugBank · DrugCentral · FDA Regulatory · OncoKB · OpenFDA adverse events · OpenTargets (genetics + disease–gene association) · OpenTargets Genetics · PubChem (compound + bioassay) · RxNorm DDI · TTD · PharmGKB · HMDB (metabolomics) **Omics & cancer** CBIOPORTAL · COSMIC Signatures · CPTAC (proteomics) · DepMap (cancer dependency + co-essentiality) · GDSC (drug sensitivity) · GTeX (expression + eQTL) · HPA (protein atlas) · MetaboLights · PRIDE (proteomics) · ProteomicsDB · scRNA (Cell × Gene) · TCGA (mutation + survival) · AGR (Alliance of Genome Resources) **Structural** AlphaFold (pLDDT → druggability score) · PDB (experimental structures) · NGL molecular viewer **Regulatory / patent** Drug resistance intelligence · FDA regulatory intelligence · Patent status + expiry · Regulatory intelligence client · LINCS (perturbation signatures) ### 3.2 Aggregator Architecture Sources are grouped into seven aggregators (`src/data/aggregators/`): - **Gene aggregator** — symbol resolution, aliases, disease associations - **Pathway aggregator** — enrichment with Benjamini–Hochberg FDR (genome size 21,000; three-tier labels: significant / suggestive / exploratory at 0.05 / 0.20 / 1.0) - **Literature aggregator** — PubMed search + PMC full-text + secondary sources; polarity classification per PMID - **Interaction aggregator** — PPI networks, co-essentiality, synthetic lethality - **Clinical aggregator** — trial matching, survival stratification, ClinVar pathogenicity - **Drug aggregator** — ChEMBL bioactivity, OpenFDA, trial–drug linkage - **Interpretability aggregator** — SAE / ESM-2 feature extraction (optional; requires `GAIALAB_INTERPRETABILITY_ENABLED=1`) --- ## 4. Drug Repurposing Engine ### 4.1 Six-Factor Scoring Model Every candidate drug receives a score from 0–100 derived from six independently calibrated factors: | Factor | Weight | Signal captured | |---|---|---| | Target match (`targetMatch`) | 0.30 | Direct binding evidence against panel genes (ChEMBL pChEMBL ≥ 6, confirmed binding targets, DGIDb) | | Clinical evidence (`clinicalEvidence`) | 0.25 | FDA approval status, phase, ClinicalTrials.gov trial outcomes, OncoKB actionability | | Mechanism relevance (`mechanismRelevance`) | 0.20 | MOA alignment to disease pathway (kinase inhibitor × kinase pathway, etc.) | | Context relevance (`contextRelevance`) | 0.12 | Disease co-mention in literature and clinical data; OpenTargets association score | | Pathway overlap (`pathwayOverlap`) | 0.08 | Enriched pathway membership overlap with drug targets | | Safety profile (`safetyProfile`) | 0.05 | OpenFDA adverse event burden; Lipinski / ADMET flags | Weights were calibrated against 10 known disease–drug pairs from OpenTargets and clinical guidelines. ### 4.2 Bonus Signals Beyond the six factors, three bonus signals can increase the final score: - **AlphaFold structural bonus** (+0 to +10): derived from mean pLDDT of the target protein (pLDDT ≥ 80 → +10; ≥ 70 → +6; ≥ 60 → +3; < 60 → 0). Fetched from the AlphaFold EBI API. - **DepMap essentiality bonus** (+0 to +8): drugs targeting genes with high cancer dependency scores receive a signal reflecting essentiality in disease cell lines. - **Network proximity bonus** (+0 to +5): drugs whose targets are within two hops of panel genes in the STRING / BioGRID interaction network. ### 4.3 Tier Assignment | Tier | Score range | Interpretation | |---|---|---| | I | ≥ 70 | Strong mechanistic + clinical evidence; primary recommendation | | II | 50–69 | Moderate evidence; warranting deeper review | | III | < 50 | Exploratory; context-specific interest only | **FDA-approved on-panel drugs** receive a score floor of 70 (Tier I guaranteed) regardless of context relevance, because the approval represents validated clinical evidence. Off-label FDA-approved drugs with non-zero context relevance receive a floor of 35. **Context penalty:** Off-label drugs with `contextRelevance < 20` receive a ×0.3 score multiplier; those with `contextRelevance 20–34` receive ×0.45. These are mutually exclusive — only the most severe penalty applies. ### 4.4 Convergence Scoring Each drug is evaluated against six independent evidence families. A drug scoring 4/6 or higher is considered "convergent" — supported by multiple orthogonal source types rather than a single strong signal: | Family | Passes when | |---|---| | `pubmed` | ≥ 1 PMID linked to drug–disease combination | | `clinicaltrials` | ≥ 1 trial record in ClinicalTrials.gov | | `fda` | FDA approved, phase ≥ 3, OR phase label matches "approved/phase 3/4" | | `chembl` | Confirmed binding targets or bioactivity records exist | | `structural` | AlphaFold pLDDT ≥ 50, PDB structures present, or `hasAlphaFold=true` | | `network` | ≥ 3 interaction neighbours or `hasNetworkProximity=true` | --- ## 5. Evidence Grounding System ### 5.1 The Grounding Problem AI-generated insights are only as credible as their citations. GaiaLab's grounding pipeline ensures that every strategy card and pathway insight either carries validated PMIDs or is explicitly labelled as a hypothesis. ### 5.2 PMID Attribution Pipeline Attribution occurs in two passes: **Pass 1 — `_preAttributePmids`:** After data fetch, each insight item is matched against the full literature pool. Matching uses token overlap between the item's text (label + mechanism + gene symbols) and each paper's title + abstract. A gene-symbol match requires only 1 overlapping token; non-gene content requires 2. Up to 2 PMIDs are collected per item (collecting two is important — items with ≥ 2 PMIDs plus a disease/gene text match reach `grounded` status, the highest evidence tier). **Pass 2 — `buildInsightLinks`:** Inside the polarity assignment loop, secondary attribution runs the same gene-aware 1-token / 2-token rule against the known-PMID set, collecting up to 2 additional papers. ### 5.3 Evidence Status Classification Each insight item receives an `evidenceStatus`: | Status | Criteria | Grounding weight | |---|---|---| | `grounded` | ≥ 1 PMID + supporting polarity, OR ≥ 2 PMIDs + disease/gene text match | 1.0 | | `supported` | 1 PMID + disease/gene text match | 0.6 | | `hypothesis` | 0 PMIDs | 0.0 | The **grounded ratio** — the proportion of pathway + strategy items reaching `grounded` or `supported` — is reported on every analysis and monitored by the canary suite. Observed range: 28% (cold start, PubMed rate-limited) to 70%+ (warm cache, full paper pool). ### 5.4 Polarity Classification Each linked PMID is classified as `support`, `neutral`, or `contradict` relative to the insight claim. Items in the `knownPmids` set with `neutral` polarity are promoted to `support` (papers confirmed as relevant by prior analysis are treated as supporting unless explicitly classified as contradictions). --- ## 6. Multi-Agent Reasoning Framework ### 6.1 Six-Agent Debate GaiaLab uses a structured six-agent debate (enabled by default via `GAIALAB_MULTI_AGENT_ENABLED=true`) where each agent has a distinct epistemic role: | Agent | Role | |---|---| | **HypothesisAgent** | Generates primary mechanistic hypotheses from pathway and gene evidence | | **CriticAgent** | Challenges hypothesis validity; flags unsupported claims, contradictions | | **EvidenceAgent** | Grounds claims in database facts (ChEMBL bioactivity, OpenTargets scores, clinical trial outcomes) | | **InnovationAgent** | Identifies non-obvious therapeutic angles; queries active recruiting trials | | **RiskAgent** | Evaluates adverse event burden, drug–drug interactions, safety flags | | **SynthesisAgent** | Integrates the five prior perspectives into a consensus output | Each agent operates independently in parallel. The SynthesisAgent receives all five outputs and produces the final structured response. ### 6.2 Live Database Enrichment per Agent Before the debate, `src/ai/agent-data-retrieval.js` pre-fetches four live APIs and formats the results per agent: - **OpenTargets** disease–gene associations → EvidenceAgent + CriticAgent - **ClinicalTrials.gov v2** active recruiting trials → InnovationAgent - **OpenFDA** adverse event counts per drug → RiskAgent - **ChEMBL** bioactivity + pChEMBL values → EvidenceAgent + CriticAgent This grounds agent reasoning in current database state rather than training-data recall. ### 6.3 AI Provider Failover The platform supports four AI providers in priority order: ``` DeepSeek → OpenAI GPT-4o → Google Gemini Pro → Anthropic Claude ``` Each analysis attempt tries the primary provider; if it times out or returns an error, the next provider is tried. Only after all providers fail does the system return an error. This ensures high availability during provider-side outages. --- ## 7. Knowledge Graph & Population Insights ### 7.1 Knowledge Graph Every completed analysis writes to a PostgreSQL knowledge graph (`kg_nodes` + `kg_edges` + `kg_cooccurrence` tables). Nodes represent genes, drugs, pathways, and disease contexts. Edges represent: - Drug → target (gene) bindings - Gene → pathway memberships - Gene → disease associations - Drug → pathway links (derived from target memberships) The KG accumulates cross-analysis signal over time. Endpoints: - `GET /api/knowledge-graph/stats` — node/edge counts - `GET /api/knowledge-graph/drugs?disease=` — top drugs by disease context - `GET /api/knowledge-graph/gene-neighbors?gene=` — PPI neighbourhood ### 7.2 Population-Level Insights `scripts/aggregate-insights.js` reads all stored snapshots and computes gene, pathway, and drug frequency + co-occurrence across all analyses run on the platform. This surfaces cross-patient, cross-study signal that single-analysis views cannot show. The Research Intelligence Panel on the homepage shows calibration charts and frequency pills derived from this aggregate. ### 7.3 Prediction Tracking `src/utils/prediction-tracker.js` records every drug-disease prediction at analysis time, then polls ClinicalTrials.gov v2 periodically for outcome updates. This enables prospective calibration: the fraction of predictions that are eventually validated by trial completion is tracked as a calibration curve and reported on the platform's validation page. --- ## 8. Intelligence Boards GaiaLab maintains **10 active disease intelligence boards** — living summaries updated on a 24-hour refresh cycle. Each board aggregates evidence across the KG, surfaces emerging contradictions, and sends email alerts when new contradictions are detected against prior conclusions. **Current boards:** Breast Cancer · Triple-Negative Breast Cancer · Non-Small Cell Lung Cancer · Colorectal Cancer · Glioblastoma · Prostate Cancer · Ovarian Cancer · Melanoma · Pancreatic Ductal Adenocarcinoma · Alzheimer's Disease Board data is stored in PostgreSQL (`disease_boards`, `board_evidence_items`, `board_alerts` tables). Case study pages for selected boards (lecanemab/AD, adagrasib/NSCLC, adagrasib/PDAC) provide focused mechanistic analysis with open research questions. --- ## 9. MCP Server Interface GaiaLab exposes a Model Context Protocol server at `POST /mcp`, allowing AI assistants (Claude Desktop, custom agents built with the Anthropic Agent SDK) to call the platform as a tool. **Tool:** `gaialab_generate_insights` **Input schema (Zod-validated):** ```json { "genes": ["string"], "diseaseContext": "string", "audience": "researcher | clinician | general" } ``` Each POST creates a fresh `McpServer` + `StreamableHTTPServerTransport` instance. Responses carry `Access-Control-Allow-Origin: *` for cross-origin use. The MCP interface is the primary integration surface for research workflow automation. --- ## 10. Workspace & Collaboration Authenticated users (Stripe-backed tiers) receive persistent workspaces with: - **Saved analyses** — shareable snapshot URLs, reproducible replay - **Analysis history** — timeline of past gene panels and results - **Workspace memory** — cross-session context for the chat assistant - **Report export** — PDF export with PMID-gated trust score (blocked if valid-PMID rate < 70%) - **Weekly digests** — emailed summaries of board updates and new contradictions relevant to saved analyses - **Row-level security** — per-workspace PostgreSQL isolation (when `GAIALAB_RLS_ENABLED=true`) **Subscription tiers:** | Tier | Daily analyses | Drug repurposing | Export | |---|---|---|---| | Free | Limited (IP-gated) | Basic | No | | Researcher | Unlimited | Full (all tiers) | PDF + CSV | | Enterprise | Unlimited | Full + matrix | Full + API | --- ## 11. SAE / ESM-2 Interpretability (Optional) When `GAIALAB_INTERPRETABILITY_ENABLED=1`, GaiaLab spawns Python 3.11 subprocesses from `py/` to run sparse autoencoder (SAE) inference over ESM-2 protein language model embeddings. This surfaces learned biological features from the ESM-2 representation that are not explicitly encoded in database annotations — an experimental interpretability layer for protein function. Requirements: PostgreSQL, Python 3.11, `py/requirements.txt`, ESM-2 model cache. This feature is disabled by default on the public deployment. --- ## 12. Evaluation & Benchmarking ### 12.1 AUROC (Retrospective) A retrospective AUROC of **0.545** (95% CI bootstrap: 0.526–0.562) was computed in March 2026 against 529 predictions across 22 disease areas, using ClinicalTrials.gov completed trial matches as the gold standard. The random baseline is 0.50. This represents a modest but consistent signal above random. **This is not a clinically validated predictor.** ### 12.2 Temporal Holdout A separate temporal holdout benchmark using 22 known drug approvals (held out by year) showed AUROC 0.90 with 8/8 negative controls correctly rejected. This benchmark is on a smaller, curated dataset and should be interpreted accordingly. ### 12.3 Grounding Rate The primary quality signal for day-to-day health is the **grounding ratio** — fraction of insight items with at least one validated PMID. The canary suite monitors this on every push. The `npm run verify:engineering` gate includes a grounding check (`test:grounding-gate`). ### 12.4 Continuous Evaluation The evaluation suite (`scripts/gaialab-eval.js`) supports: - NDCG@10 for drug ranking quality - Paired t-test vs. baseline for significance - Gold standard benchmarks (`data/benchmarks/gold-standard.json`) - Trust and reliability benchmarks (`data/benchmarks/trust-benchmarks.json`) - Snapshot replay for regression detection --- ## 13. Engineering Quality Gates All deployments must pass `npm run verify:engineering`, which runs in sequence: 1. `security:scan` — secret detection (no API keys, connection strings, or credentials in committed code) 2. `test:critical` — critical flow tests 3. `test:grounding-gate` — grounding ratio threshold check 4. `test:biomedical-trust` — biomedical claim credibility checks 5. `test:biomedical-trust:standard` — golden test suite for trust surfaces 6. `test:week2` through `test:week6` — regression suites per weekly milestone 7. `test:drug-scoring` — drug scoring unit tests 8. `test:ad-regression` — Alzheimer's disease regression 9. `test:contracts` — 60+ API contract tests 10. `test:export-surfaces`, `test:provenance-surfaces`, `test:critical-flows`, `test:ui-trust-surfaces` 11. `test:smoke`, `test:lineage`, `test:nav-e2e` 12. `test:concordance-integrity` — PMID annotation concordance 13. `test:sse-stream`, `test:kg-explorer`, `test:drug-paywall`, `test:prod-smoke` The gate is enforced by a canary runner (`scripts/canary.js`) that executes a full live analysis against `https://www.gailabai.com` and validates: completion time, drug candidate count, grounded ratio, job failure rate, and trust page availability. --- ## 14. Security Model - **Secret detection**: `scripts/check_secrets.js` runs as a git pre-commit hook and in every CI job. Matches known patterns for API keys, connection strings, and credentials. - **Stripe webhook verification**: All payment webhook events are verified using `stripe.webhooks.constructEvent()` with a required `STRIPE_WEBHOOK_SECRET`. Unsigned webhooks are rejected with HTTP 400. - **Rate limiting**: IP-based daily quota for free-tier users; API key tier for researcher/enterprise. - **No SSRF exposure**: All outbound API calls are to fixed, known biological database URLs with 30-second timeouts. User input does not influence outbound URL construction. - **PMID validity gate**: PDF export is blocked when the valid-PMID rate in an analysis falls below a configurable threshold (default 70%), preventing export of poorly-grounded reports. --- ## 15. Deployment GaiaLab is deployed on **Railway** via Nixpacks with automatic deploys from the `main` branch of the GitHub repository. The production URL is `https://www.gailabai.com`. Environment variables required for full functionality: | Variable | Purpose | |---|---| | `DEEPSEEK_API_KEY` | Primary AI provider | | `OPENAI_API_KEY` | AI failover #1 | | `GOOGLE_API_KEY` | AI failover #2 | | `ANTHROPIC_API_KEY` | AI failover #3 | | `DATABASE_URL` | PostgreSQL connection | | `STRIPE_SECRET_KEY` | Payments | | `STRIPE_WEBHOOK_SECRET` | Webhook signature verification | | `NCBI_API_KEY` | PubMed rate: 3 → 10 req/s | Optional premium data source keys: `BIOGRID_API_KEY`, `DISGENET_API_KEY`, `DRUGBANK_API_KEY`, `SEMANTIC_SCHOLAR_API_KEY`. --- ## 16. Limitations & Responsible Use GaiaLab is a **computational research tool**. Its outputs are hypotheses, not clinical recommendations. Specific limitations: - **All therapeutic suggestions require independent experimental validation** before any clinical application. - **Drug rankings are scoring artifacts**, not clinical priority lists. The six-factor model was calibrated on known drug–disease pairs and may not generalise to novel disease contexts. - **AUROC 0.545** is a modest signal above random (0.50). The platform cannot reliably predict which specific drug will succeed in a given trial. - **Coverage is 19–22 sources** without paid API keys (DisGeNET, DrugBank require subscriptions). Coverage metrics shown in the UI reflect available data only. - **Grounding varies** with PubMed rate limits and cache state. Cold-start analyses (first run after server restart) may show grounding ratios of 28–35%; subsequent cached runs reach 65–70%. - **SAE interpretability** is experimental and disabled by default. ESM-2 features have not been independently validated for clinical relevance. - The platform does not ingest or process patient-level data. All analyses operate on gene symbols and disease context strings. --- ## 17. Citation If you use GaiaLab in published research, please cite: > GaiaLab AI Platform (2026). Computational drug repurposing and pathway analysis via multi-source biological database integration and six-agent AI debate. https://www.gailabai.com --- *This document reflects the platform as of version 0.1.1, May 2026. For the latest implementation details, refer to the source repository.* --- ¹ Active source count varies with API key configuration. Full source list in Section 3.1. Without optional paid keys (DisGeNET, DrugBank), active coverage is approximately 60 sources. The count of 75+ reflects the full set of integrated clients shipped with the platform.