# GaiaLab AI — Technical Platform White Paper

**Version 0.1.1 · May 2026**
**Platform:** https://www.gailabai.com

---

## Abstract

GaiaLab is an AI-powered biological intelligence platform that transforms gene panels into structured, citation-linked research insights in under 60 seconds. The platform aggregates live data from 75+ biological databases,¹ applies a six-agent AI debate framework for hypothesis generation and critique, and produces PMID-grounded therapeutic hypotheses ranked by a six-factor scoring model. An MCP (Model Context Protocol) server interface allows direct integration into AI assistant workflows. All outputs are computational research hypotheses requiring independent experimental validation.

---

## 1. Introduction

Translational genomics research faces a reproducibility and synthesis bottleneck: the information required to reason about a gene panel is distributed across dozens of databases, literature corpora, and clinical trial registries, none of which share a common schema. A researcher submitting a panel of 3–10 genes to understand their disease biology must manually cross-reference PubMed, ClinicalTrials.gov, ChEMBL, gnomAD, UniProt, AlphaFold, KEGG, Reactome, COSMIC, and dozens of other sources — a process that takes days and produces results that are difficult to cite, compare, or reproduce.

GaiaLab automates this synthesis layer. It does not replace experimental biology; it accelerates the hypothesis-generation and literature-triangulation steps that precede it. Every output carries PMID citations, confidence labels, and an explicit evidence-quality flag so researchers can distinguish AI-generated inference from database-backed assertion.

---

## 2. Architecture Overview

### 2.1 Runtime Stack

| Component | Technology |
|---|---|
| Runtime | Node.js 22, ESM modules throughout |
| Deployment | Railway (Nixpacks, auto-deploy from GitHub main) |
| Database | PostgreSQL (Railway / Neon) via `pg` |
| AI providers | DeepSeek → OpenAI → Google Gemini → Anthropic Claude (failover chain) |
| MCP interface | `@modelcontextprotocol/sdk` v1.24 |
| Python inference | Python 3.11 in `py/` for SAE and ESM-2 protein embeddings |
| Caching | L1 in-memory LRU (1 h TTL) + optional L2 disk tier |
| 3D visualization | Three.js + 3d-force-graph |

### 2.2 Request Lifecycle

```
User submits gene panel + disease context
         │
         ▼
  Rate / quota gate (tier-based: free / researcher / enterprise)
         │
         ▼
  Gene normalisation  ──  HGNC alias resolution, symbol canonicalisation
         │
         ▼
  Parallel data fetch  ──  75+ sources via Promise.allSettled() [~15–40 s cold]
         │
         ▼
  Literature cache check  ──  5-min in-memory keyed by genes + disease
         │
         ▼
  Evidence ledger build  ──  PMID validation, polarity classification,
                             grounding attribution, knownPmid backfill
         │
         ▼
  Drug repurposing engine  ──  Six-factor scoring, tier assignment,
                               AlphaFold structural bonus, DepMap essentiality
         │
         ▼
  Six-agent AI debate  ──  Parallel: Hypothesis · Critic · Evidence ·
                           Innovation · Risk · Synthesis agents
         │
         ▼
  Convergence scoring  ──  Cross-source family validation (PubMed,
                           ClinicalTrials, FDA, ChEMBL, structural, network)
         │
         ▼
  Insight assembly + grounding gate  ──  PMID linkage, confidence
                                        labelling, pathway FDR
         │
         ▼
  Snapshot persistence + KG write  ──  PostgreSQL, shareable URL
         │
         ▼
  Structured JSON response  ──  Rendered in analyze.html widget
```

Typical latency: 30–60 s cold start; < 1 s from L1 cache.

---

## 3. Data Layer

### 3.1 Source Coverage

GaiaLab fetches from **75+ biological databases**¹ across seven domains. All clients follow the same defensive pattern: they never throw; they return partial results with an `error` field so `Promise.allSettled()` can continue regardless of upstream failures.

**Gene annotation & variation**
AlphaFold EBI · ClinVar · Ensembl · gnomAD (variant + ancestry + constraint) · GWAS Catalog · HGNC · OMIM · UniProt · VEP · NCBI Gene · ClinGen · Monarch Initiative

**Pathway & functional**
Enrichr · GO (Gene Ontology) · JASPAR (transcription factor binding) · KEGG · MSigDB · PathwayCommons · Reactome · ChEA3

**Interaction & network**
BioGRID · IntAct · STRING · STRING-DB partners · ComplexPortal · SynLethDB (synthetic lethality)

**Literature**
PubMed (NCBI Entrez) · Europe PMC · PubMed Central full-text (JATS XML → quantitative extraction: IC50, p-values, HR, OR, fold-changes) · bioRxiv · Semantic Scholar · OpenAlex · preprint monitor

**Drug & clinical**
ChEMBL · CIViC · ClinicalTrials.gov API v2 · DGIDb · DrugBank · DrugCentral · FDA Regulatory · OncoKB · OpenFDA adverse events · OpenTargets (genetics + disease–gene association) · OpenTargets Genetics · PubChem (compound + bioassay) · RxNorm DDI · TTD · PharmGKB · HMDB (metabolomics)

**Omics & cancer**
CBIOPORTAL · COSMIC Signatures · CPTAC (proteomics) · DepMap (cancer dependency + co-essentiality) · GDSC (drug sensitivity) · GTeX (expression + eQTL) · HPA (protein atlas) · MetaboLights · PRIDE (proteomics) · ProteomicsDB · scRNA (Cell × Gene) · TCGA (mutation + survival) · AGR (Alliance of Genome Resources)

**Structural**
AlphaFold (pLDDT → druggability score) · PDB (experimental structures) · NGL molecular viewer

**Regulatory / patent**
Drug resistance intelligence · FDA regulatory intelligence · Patent status + expiry · Regulatory intelligence client · LINCS (perturbation signatures)

### 3.2 Aggregator Architecture

Sources are grouped into seven aggregators (`src/data/aggregators/`):

- **Gene aggregator** — symbol resolution, aliases, disease associations
- **Pathway aggregator** — enrichment with Benjamini–Hochberg FDR (genome size 21,000; three-tier labels: significant / suggestive / exploratory at 0.05 / 0.20 / 1.0)
- **Literature aggregator** — PubMed search + PMC full-text + secondary sources; polarity classification per PMID
- **Interaction aggregator** — PPI networks, co-essentiality, synthetic lethality
- **Clinical aggregator** — trial matching, survival stratification, ClinVar pathogenicity
- **Drug aggregator** — ChEMBL bioactivity, OpenFDA, trial–drug linkage
- **Interpretability aggregator** — SAE / ESM-2 feature extraction (optional; requires `GAIALAB_INTERPRETABILITY_ENABLED=1`)

---

## 4. Drug Repurposing Engine

### 4.1 Six-Factor Scoring Model

Every candidate drug receives a score from 0–100 derived from six independently calibrated factors:

| Factor | Weight | Signal captured |
|---|---|---|
| Target match (`targetMatch`) | 0.30 | Direct binding evidence against panel genes (ChEMBL pChEMBL ≥ 6, confirmed binding targets, DGIDb) |
| Clinical evidence (`clinicalEvidence`) | 0.25 | FDA approval status, phase, ClinicalTrials.gov trial outcomes, OncoKB actionability |
| Mechanism relevance (`mechanismRelevance`) | 0.20 | MOA alignment to disease pathway (kinase inhibitor × kinase pathway, etc.) |
| Context relevance (`contextRelevance`) | 0.12 | Disease co-mention in literature and clinical data; OpenTargets association score |
| Pathway overlap (`pathwayOverlap`) | 0.08 | Enriched pathway membership overlap with drug targets |
| Safety profile (`safetyProfile`) | 0.05 | OpenFDA adverse event burden; Lipinski / ADMET flags |

Weights were calibrated against 10 known disease–drug pairs from OpenTargets and clinical guidelines.

### 4.2 Bonus Signals

Beyond the six factors, three bonus signals can increase the final score:

- **AlphaFold structural bonus** (+0 to +10): derived from mean pLDDT of the target protein (pLDDT ≥ 80 → +10; ≥ 70 → +6; ≥ 60 → +3; < 60 → 0). Fetched from the AlphaFold EBI API.
- **DepMap essentiality bonus** (+0 to +8): drugs targeting genes with high cancer dependency scores receive a signal reflecting essentiality in disease cell lines.
- **Network proximity bonus** (+0 to +5): drugs whose targets are within two hops of panel genes in the STRING / BioGRID interaction network.

### 4.3 Tier Assignment

| Tier | Score range | Interpretation |
|---|---|---|
| I | ≥ 70 | Strong mechanistic + clinical evidence; primary recommendation |
| II | 50–69 | Moderate evidence; warranting deeper review |
| III | < 50 | Exploratory; context-specific interest only |

**FDA-approved on-panel drugs** receive a score floor of 70 (Tier I guaranteed) regardless of context relevance, because the approval represents validated clinical evidence. Off-label FDA-approved drugs with non-zero context relevance receive a floor of 35.

**Context penalty:** Off-label drugs with `contextRelevance < 20` receive a ×0.3 score multiplier; those with `contextRelevance 20–34` receive ×0.45. These are mutually exclusive — only the most severe penalty applies.

### 4.4 Convergence Scoring

Each drug is evaluated against six independent evidence families. A drug scoring 4/6 or higher is considered "convergent" — supported by multiple orthogonal source types rather than a single strong signal:

| Family | Passes when |
|---|---|
| `pubmed` | ≥ 1 PMID linked to drug–disease combination |
| `clinicaltrials` | ≥ 1 trial record in ClinicalTrials.gov |
| `fda` | FDA approved, phase ≥ 3, OR phase label matches "approved/phase 3/4" |
| `chembl` | Confirmed binding targets or bioactivity records exist |
| `structural` | AlphaFold pLDDT ≥ 50, PDB structures present, or `hasAlphaFold=true` |
| `network` | ≥ 3 interaction neighbours or `hasNetworkProximity=true` |

---

## 5. Evidence Grounding System

### 5.1 The Grounding Problem

AI-generated insights are only as credible as their citations. GaiaLab's grounding pipeline ensures that every strategy card and pathway insight either carries validated PMIDs or is explicitly labelled as a hypothesis.

### 5.2 PMID Attribution Pipeline

Attribution occurs in two passes:

**Pass 1 — `_preAttributePmids`:** After data fetch, each insight item is matched against the full literature pool. Matching uses token overlap between the item's text (label + mechanism + gene symbols) and each paper's title + abstract. A gene-symbol match requires only 1 overlapping token; non-gene content requires 2. Up to 2 PMIDs are collected per item (collecting two is important — items with ≥ 2 PMIDs plus a disease/gene text match reach `grounded` status, the highest evidence tier).

**Pass 2 — `buildInsightLinks`:** Inside the polarity assignment loop, secondary attribution runs the same gene-aware 1-token / 2-token rule against the known-PMID set, collecting up to 2 additional papers.

### 5.3 Evidence Status Classification

Each insight item receives an `evidenceStatus`:

| Status | Criteria | Grounding weight |
|---|---|---|
| `grounded` | ≥ 1 PMID + supporting polarity, OR ≥ 2 PMIDs + disease/gene text match | 1.0 |
| `supported` | 1 PMID + disease/gene text match | 0.6 |
| `hypothesis` | 0 PMIDs | 0.0 |

The **grounded ratio** — the proportion of pathway + strategy items reaching `grounded` or `supported` — is reported on every analysis and monitored by the canary suite. Observed range: 28% (cold start, PubMed rate-limited) to 70%+ (warm cache, full paper pool).

### 5.4 Polarity Classification

Each linked PMID is classified as `support`, `neutral`, or `contradict` relative to the insight claim. Items in the `knownPmids` set with `neutral` polarity are promoted to `support` (papers confirmed as relevant by prior analysis are treated as supporting unless explicitly classified as contradictions).

---

## 6. Multi-Agent Reasoning Framework

### 6.1 Six-Agent Debate

GaiaLab uses a structured six-agent debate (enabled by default via `GAIALAB_MULTI_AGENT_ENABLED=true`) where each agent has a distinct epistemic role:

| Agent | Role |
|---|---|
| **HypothesisAgent** | Generates primary mechanistic hypotheses from pathway and gene evidence |
| **CriticAgent** | Challenges hypothesis validity; flags unsupported claims, contradictions |
| **EvidenceAgent** | Grounds claims in database facts (ChEMBL bioactivity, OpenTargets scores, clinical trial outcomes) |
| **InnovationAgent** | Identifies non-obvious therapeutic angles; queries active recruiting trials |
| **RiskAgent** | Evaluates adverse event burden, drug–drug interactions, safety flags |
| **SynthesisAgent** | Integrates the five prior perspectives into a consensus output |

Each agent operates independently in parallel. The SynthesisAgent receives all five outputs and produces the final structured response.

### 6.2 Live Database Enrichment per Agent

Before the debate, `src/ai/agent-data-retrieval.js` pre-fetches four live APIs and formats the results per agent:

- **OpenTargets** disease–gene associations → EvidenceAgent + CriticAgent
- **ClinicalTrials.gov v2** active recruiting trials → InnovationAgent
- **OpenFDA** adverse event counts per drug → RiskAgent
- **ChEMBL** bioactivity + pChEMBL values → EvidenceAgent + CriticAgent

This grounds agent reasoning in current database state rather than training-data recall.

### 6.3 AI Provider Failover

The platform supports four AI providers in priority order:

```
DeepSeek → OpenAI GPT-4o → Google Gemini Pro → Anthropic Claude
```

Each analysis attempt tries the primary provider; if it times out or returns an error, the next provider is tried. Only after all providers fail does the system return an error. This ensures high availability during provider-side outages.

---

## 7. Knowledge Graph & Population Insights

### 7.1 Knowledge Graph

Every completed analysis writes to a PostgreSQL knowledge graph (`kg_nodes` + `kg_edges` + `kg_cooccurrence` tables). Nodes represent genes, drugs, pathways, and disease contexts. Edges represent:

- Drug → target (gene) bindings
- Gene → pathway memberships
- Gene → disease associations
- Drug → pathway links (derived from target memberships)

The KG accumulates cross-analysis signal over time. Endpoints:
- `GET /api/knowledge-graph/stats` — node/edge counts
- `GET /api/knowledge-graph/drugs?disease=` — top drugs by disease context
- `GET /api/knowledge-graph/gene-neighbors?gene=` — PPI neighbourhood

### 7.2 Population-Level Insights

`scripts/aggregate-insights.js` reads all stored snapshots and computes gene, pathway, and drug frequency + co-occurrence across all analyses run on the platform. This surfaces cross-patient, cross-study signal that single-analysis views cannot show. The Research Intelligence Panel on the homepage shows calibration charts and frequency pills derived from this aggregate.

### 7.3 Prediction Tracking

`src/utils/prediction-tracker.js` records every drug-disease prediction at analysis time, then polls ClinicalTrials.gov v2 periodically for outcome updates. This enables prospective calibration: the fraction of predictions that are eventually validated by trial completion is tracked as a calibration curve and reported on the platform's validation page.

---

## 8. Intelligence Boards

GaiaLab maintains **10 active disease intelligence boards** — living summaries updated on a 24-hour refresh cycle. Each board aggregates evidence across the KG, surfaces emerging contradictions, and sends email alerts when new contradictions are detected against prior conclusions.

**Current boards:**
Breast Cancer · Triple-Negative Breast Cancer · Non-Small Cell Lung Cancer · Colorectal Cancer · Glioblastoma · Prostate Cancer · Ovarian Cancer · Melanoma · Pancreatic Ductal Adenocarcinoma · Alzheimer's Disease

Board data is stored in PostgreSQL (`disease_boards`, `board_evidence_items`, `board_alerts` tables). Case study pages for selected boards (lecanemab/AD, adagrasib/NSCLC, adagrasib/PDAC) provide focused mechanistic analysis with open research questions.

---

## 9. MCP Server Interface

GaiaLab exposes a Model Context Protocol server at `POST /mcp`, allowing AI assistants (Claude Desktop, custom agents built with the Anthropic Agent SDK) to call the platform as a tool.

**Tool:** `gaialab_generate_insights`

**Input schema (Zod-validated):**
```json
{
  "genes": ["string"],
  "diseaseContext": "string",
  "audience": "researcher | clinician | general"
}
```

Each POST creates a fresh `McpServer` + `StreamableHTTPServerTransport` instance. Responses carry `Access-Control-Allow-Origin: *` for cross-origin use. The MCP interface is the primary integration surface for research workflow automation.

---

## 10. Workspace & Collaboration

Authenticated users (Stripe-backed tiers) receive persistent workspaces with:

- **Saved analyses** — shareable snapshot URLs, reproducible replay
- **Analysis history** — timeline of past gene panels and results
- **Workspace memory** — cross-session context for the chat assistant
- **Report export** — PDF export with PMID-gated trust score (blocked if valid-PMID rate < 70%)
- **Weekly digests** — emailed summaries of board updates and new contradictions relevant to saved analyses
- **Row-level security** — per-workspace PostgreSQL isolation (when `GAIALAB_RLS_ENABLED=true`)

**Subscription tiers:**
| Tier | Daily analyses | Drug repurposing | Export |
|---|---|---|---|
| Free | Limited (IP-gated) | Basic | No |
| Researcher | Unlimited | Full (all tiers) | PDF + CSV |
| Enterprise | Unlimited | Full + matrix | Full + API |

---

## 11. SAE / ESM-2 Interpretability (Optional)

When `GAIALAB_INTERPRETABILITY_ENABLED=1`, GaiaLab spawns Python 3.11 subprocesses from `py/` to run sparse autoencoder (SAE) inference over ESM-2 protein language model embeddings. This surfaces learned biological features from the ESM-2 representation that are not explicitly encoded in database annotations — an experimental interpretability layer for protein function.

Requirements: PostgreSQL, Python 3.11, `py/requirements.txt`, ESM-2 model cache. This feature is disabled by default on the public deployment.

---

## 12. Evaluation & Benchmarking

### 12.1 AUROC (Retrospective)

A retrospective AUROC of **0.545** (95% CI bootstrap: 0.526–0.562) was computed in March 2026 against 529 predictions across 22 disease areas, using ClinicalTrials.gov completed trial matches as the gold standard. The random baseline is 0.50. This represents a modest but consistent signal above random. **This is not a clinically validated predictor.**

### 12.2 Temporal Holdout

A separate temporal holdout benchmark using 22 known drug approvals (held out by year) showed AUROC 0.90 with 8/8 negative controls correctly rejected. This benchmark is on a smaller, curated dataset and should be interpreted accordingly.

### 12.3 Grounding Rate

The primary quality signal for day-to-day health is the **grounding ratio** — fraction of insight items with at least one validated PMID. The canary suite monitors this on every push. The `npm run verify:engineering` gate includes a grounding check (`test:grounding-gate`).

### 12.4 Continuous Evaluation

The evaluation suite (`scripts/gaialab-eval.js`) supports:
- NDCG@10 for drug ranking quality
- Paired t-test vs. baseline for significance
- Gold standard benchmarks (`data/benchmarks/gold-standard.json`)
- Trust and reliability benchmarks (`data/benchmarks/trust-benchmarks.json`)
- Snapshot replay for regression detection

---

## 13. Engineering Quality Gates

All deployments must pass `npm run verify:engineering`, which runs in sequence:

1. `security:scan` — secret detection (no API keys, connection strings, or credentials in committed code)
2. `test:critical` — critical flow tests
3. `test:grounding-gate` — grounding ratio threshold check
4. `test:biomedical-trust` — biomedical claim credibility checks
5. `test:biomedical-trust:standard` — golden test suite for trust surfaces
6. `test:week2` through `test:week6` — regression suites per weekly milestone
7. `test:drug-scoring` — drug scoring unit tests
8. `test:ad-regression` — Alzheimer's disease regression
9. `test:contracts` — 60+ API contract tests
10. `test:export-surfaces`, `test:provenance-surfaces`, `test:critical-flows`, `test:ui-trust-surfaces`
11. `test:smoke`, `test:lineage`, `test:nav-e2e`
12. `test:concordance-integrity` — PMID annotation concordance
13. `test:sse-stream`, `test:kg-explorer`, `test:drug-paywall`, `test:prod-smoke`

The gate is enforced by a canary runner (`scripts/canary.js`) that executes a full live analysis against `https://www.gailabai.com` and validates: completion time, drug candidate count, grounded ratio, job failure rate, and trust page availability.

---

## 14. Security Model

- **Secret detection**: `scripts/check_secrets.js` runs as a git pre-commit hook and in every CI job. Matches known patterns for API keys, connection strings, and credentials.
- **Stripe webhook verification**: All payment webhook events are verified using `stripe.webhooks.constructEvent()` with a required `STRIPE_WEBHOOK_SECRET`. Unsigned webhooks are rejected with HTTP 400.
- **Rate limiting**: IP-based daily quota for free-tier users; API key tier for researcher/enterprise.
- **No SSRF exposure**: All outbound API calls are to fixed, known biological database URLs with 30-second timeouts. User input does not influence outbound URL construction.
- **PMID validity gate**: PDF export is blocked when the valid-PMID rate in an analysis falls below a configurable threshold (default 70%), preventing export of poorly-grounded reports.

---

## 15. Deployment

GaiaLab is deployed on **Railway** via Nixpacks with automatic deploys from the `main` branch of the GitHub repository. The production URL is `https://www.gailabai.com`.

Environment variables required for full functionality:

| Variable | Purpose |
|---|---|
| `DEEPSEEK_API_KEY` | Primary AI provider |
| `OPENAI_API_KEY` | AI failover #1 |
| `GOOGLE_API_KEY` | AI failover #2 |
| `ANTHROPIC_API_KEY` | AI failover #3 |
| `DATABASE_URL` | PostgreSQL connection |
| `STRIPE_SECRET_KEY` | Payments |
| `STRIPE_WEBHOOK_SECRET` | Webhook signature verification |
| `NCBI_API_KEY` | PubMed rate: 3 → 10 req/s |

Optional premium data source keys: `BIOGRID_API_KEY`, `DISGENET_API_KEY`, `DRUGBANK_API_KEY`, `SEMANTIC_SCHOLAR_API_KEY`.

---

## 16. Limitations & Responsible Use

GaiaLab is a **computational research tool**. Its outputs are hypotheses, not clinical recommendations. Specific limitations:

- **All therapeutic suggestions require independent experimental validation** before any clinical application.
- **Drug rankings are scoring artifacts**, not clinical priority lists. The six-factor model was calibrated on known drug–disease pairs and may not generalise to novel disease contexts.
- **AUROC 0.545** is a modest signal above random (0.50). The platform cannot reliably predict which specific drug will succeed in a given trial.
- **Coverage is 19–22 sources** without paid API keys (DisGeNET, DrugBank require subscriptions). Coverage metrics shown in the UI reflect available data only.
- **Grounding varies** with PubMed rate limits and cache state. Cold-start analyses (first run after server restart) may show grounding ratios of 28–35%; subsequent cached runs reach 65–70%.
- **SAE interpretability** is experimental and disabled by default. ESM-2 features have not been independently validated for clinical relevance.
- The platform does not ingest or process patient-level data. All analyses operate on gene symbols and disease context strings.

---

## 17. Citation

If you use GaiaLab in published research, please cite:

> GaiaLab AI Platform (2026). Computational drug repurposing and pathway analysis via multi-source biological database integration and six-agent AI debate. https://www.gailabai.com

---

*This document reflects the platform as of version 0.1.1, May 2026. For the latest implementation details, refer to the source repository.*

---

¹ Active source count varies with API key configuration. Full source list in Section 3.1. Without optional paid keys (DisGeNET, DrugBank), active coverage is approximately 60 sources. The count of 75+ reflects the full set of integrated clients shipped with the platform.