Computational Discovery Hypotheses
12 independent discovery algorithms running across the BiohacksAI compound–target–pathway knowledge graph. Each type identifies a distinct class of understudied biological signal. All hypotheses are deterministic, corpus-bound, and cryptographically timestamped.
Molecular Candidate
Single compound candidates with X-Vault cryptographic proof-of-timestamp. Each represents a deterministic hypothesis derived from compound–target–pathway graph analysis.
Graph traversal across compound–target–pathway triples. Scored on novelty, target evidence strength, pathway relevance, and literature gap.
A substance with documented targets, low literature coverage, and high pathway relevance — understudied relative to its biological footprint.
Similarity Discovery
Compounds with highly similar target profiles that have not been studied together. Identifies pairs where shared biology implies shared mechanism.
Jaccard similarity over binary target vectors. Pairs filtered by minimum overlap, divergent study counts, and distinct chemical identity.
Two or more compounds converging on the same target set — one well-studied, one understudied — implying transferable mechanism hypotheses.
Scaffold Family
Groups of understudied compounds sharing high target-profile similarity. Each family represents a potential novel compound series for experimental exploration.
IDF-weighted Jaccard clustering over target profiles. Families require minimum size, high-value target presence, and low median study count.
A cluster of compounds with convergent biology and minimal literature — a coherent research space that has been overlooked.
Pathway Whitespace
Biological pathways with documented compound activity but no clinical or pharmacological follow-through. Maps the gap between known biology and research investment.
Pathway coverage analysis against compound–target binding data. Pathways ranked by compound density vs. study volume ratio.
A pathway with multiple active compounds and near-zero literature coverage — a research blank spot with existing biological rationale.
Convergence Signal
Targets hit by multiple structurally unrelated compounds from the BiohacksAI corpus. Independent chemical convergence implies strong biological relevance.
Target frequency analysis weighted by chemical diversity of binders. High-value targets with low clinical follow-through are prioritized.
A molecular target hit by chemically diverse compounds — convergent evidence of functional importance without proportional research investment.
Historical Convergence
Plants independently documented in 2+ ancient medical traditions (TCM, Dioscorides, Ayurveda) with modern pharmacological support. The strongest cross-civilizational validation signal.
Tradition co-occurrence scoring with bonus weighting for independent geographic origin. Filtered by minimum modern study count and target evidence.
A plant used across independent ancient traditions for the same indication, with matching modern target evidence — convergent empirical and molecular signals.
Compound Layer
Plants with the richest identified bioactive compound libraries. Substances with 100+ mapped compounds from LOTUS and NPASS experimental binding data.
Compound count from LOTUS (420k occurrences) + NPASS binding data. Ranked by compound diversity, target coverage, and pathway reach.
A plant with an exceptionally rich compound profile — high probability of undiscovered bioactive constituents relative to current research attention.
Compound Synergy
Pairs or groups of compounds with complementary target profiles that may exhibit additive or synergistic effects when combined.
Complementarity scoring over non-overlapping target sets. Filtered by pathway co-occurrence and absence of known adverse interaction signals.
A compound pair covering adjacent nodes in the same pathway network — a candidate for combination hypothesis testing.
Target Expansion
Known compounds with strong evidence on primary targets that also bind secondary targets with minimal literature coverage.
Secondary target analysis on well-studied compounds. Secondary targets filtered by binding affinity threshold and low study count.
A studied compound with a poorly-characterized secondary binding site — repurposing signal from existing safety and efficacy data.
Cross-Species Analog
Natural compounds with documented activity in animal models or non-human species that lack human pharmacological study.
Species annotation on PubMed study corpus. Compounds with animal-only evidence ranked by target overlap with human disease pathways.
A compound with established animal pharmacology and human-relevant targets — a translational hypothesis candidate.
Ethnobotanical Gap
Plants with documented traditional use in non-indexed ethnobotanical sources that lack formal pharmacological investigation.
Cross-reference of ethnobotanical databases against PubMed coverage. Plants scored on traditional use frequency vs. modern study deficit.
A medicinal plant with rich traditional documentation and near-zero pharmacological literature — a high-prior, low-evidence hypothesis target.
Meta Discovery
Substances scoring across multiple independent discovery signals simultaneously. The highest-confidence computational hypotheses in the BiohacksAI corpus.
Cross-type signal aggregation. A substance must appear in 3+ independent discovery types to qualify. Scores are additive, not averaged.
A compound that is novel, historically validated, pathway-relevant, and target-convergent — multi-signal evidence without a single point of failure.
All outputs are computational hypothesis candidates — not experimentally validated discoveries. The BiohacksAI Discovery Engine applies deterministic graph analysis across compound–target–pathway networks. X-Vault seals provide cryptographic proof of discovery timestamp and corpus version. Cryptographic timestamping by X-Vault · Organiq Sweden AB · Patent pending: EVE-PAT-2026-001