METASCIENCE EXPERIMENTS

I build and release open-source prototypes that explore the infrastructure layer of autonomous research. As AI systems begin to run hundreds of experiments per night, new problems emerge that traditional scientific workflows were never designed to handle: How do we curate results at machine speed? How do we prevent false positives from cascading through experiment chains? How should knowledge be represented when the primary consumers are AI agents, not humans?

Each experiment below is a working prototype published as an open-source repository. They are not papers — they are tools and simulations that test a specific hypothesis about how science should work in the age of autoresearch.


  • Can prediction markets identify novel research claims? 5 LLM predictors evaluate 35 ARA-generated claims. Market ensemble achieves Brier score 0.177, identifying 1 genuinely surprising verified claim the market predicted would fail.

    2026-04-23 · prototype

    GitHub prediction-markets metascience research-evaluation

  • Structural pathology detection for AI-generated research. 7 detectors find 47 anomalies in ARA output including confidence inflation (0.98 confidence on 1 evidence item) and 82% precision oscillation rate.

    2026-04-23 · prototype

    GitHub anomaly-detection metascience quality-assurance

  • Can economic staking mechanisms rank research claims better than confidence scores? Negative result: 2-agent LMSR market (tau=0.242) loses to raw confidence (tau=0.435), revealing a minimum diversity threshold for prediction markets.

    2026-04-23 · prototype

    GitHub prediction-markets metascience negative-result

  • Trust propagation in bipolar argumentation graphs. Signed PageRank variant on supports/contradicts claim networks, tested with 12 AI research claims. 4/5 intuition checks passed.

    2026-04-18 · prototype

    GitHub trust-propagation argumentation metascience

  • Structure, search, and learn from failed autoresearch experiments. Converts discarded/crashed experiments into searchable knowledge with similarity search, pattern aggregation, and autoresearch loop integration.

    2026-04-16 · prototype

    GitHub autoresearch negative-results metascience

  • Automated science policy monitoring for Japanese funding agencies (MEXT, JST, JSPS). Scrapes, normalizes, and diff-detects 143+ policy items daily via GitHub Actions.

    2026-04-16 · prototype

    GitHub research-administration metascience automation

  • Community-driven research administration expertise as reusable AI skills. 8 skills covering policy monitoring, grant review, budget planning, and a meta-skill for RA knowledge capture.

    2026-04-16 · prototype

    GitHub research-administration metascience knowledge-management

  • Lightweight Karpathy-compatible autoresearch loop for Apple Silicon. LLM proposes hyperparameter changes, trains CNNs on CIFAR-10, logs results as keep/discard/crash.

    2026-04-14 · prototype

    GitHub autoresearch validation apple-silicon

  • Confidence scoring system that detects and prevents epistemic cascade contamination in autonomous research pipelines.

    2026-04-14 · prototype

    GitHub autoresearch reproducibility bayesian

  • Machine-readable formats for scientific knowledge that AI agents can directly operate on, manipulate, and reason with.

    2026-04-14 · prototype

    GitHub knowledge-representation ai-to-ai research-artifacts

  • Manages autonomous research experiment results as evolutionary populations with fitness selection and genealogy tracking.

    2026-04-14 · prototype

    GitHub autoresearch evolutionary-computation knowledge-management


BLOG POSTS

Longer write-ups on the motivation and design behind these experiments.