SHIRO TAKAGI - Experiments

METASCIENCE EXPERIMENTS

I build and release open-source prototypes that explore the infrastructure layer of autonomous research. As AI systems begin to run hundreds of experiments per night, new problems emerge that traditional scientific workflows were never designed to handle: How do we curate results at machine speed? How do we prevent false positives from cascading through experiment chains? How should knowledge be represented when the primary consumers are AI agents, not humans?

Each experiment below is a working prototype published as an open-source repository. They are not papers — they are tools and simulations that test a specific hypothesis about how science should work in the age of autoresearch.

Claim Prediction Market

Can prediction markets identify novel research claims? 5 LLM predictors evaluate 35 ARA-generated claims. Market ensemble achieves Brier score 0.177, identifying 1 genuinely surprising verified claim the market predicted would fail.

2026-04-23 · prototype

GitHub prediction-markets metascience research-evaluation
Research Anomaly Detector

Structural pathology detection for AI-generated research. 7 detectors find 47 anomalies in ARA output including confidence inflation (0.98 confidence on 1 evidence item) and 82% precision oscillation rate.

2026-04-23 · prototype

GitHub anomaly-detection metascience quality-assurance
Belief Market

Can economic staking mechanisms rank research claims better than confidence scores? Negative result: 2-agent LMSR market (tau=0.242) loses to raw confidence (tau=0.435), revealing a minimum diversity threshold for prediction markets.

2026-04-23 · prototype

GitHub prediction-markets metascience negative-result
Research PageRank

Trust propagation in bipolar argumentation graphs. Signed PageRank variant on supports/contradicts claim networks, tested with 12 AI research claims. 4/5 intuition checks passed.

2026-04-18 · prototype

GitHub trust-propagation argumentation metascience
Negative Result Repository

Structure, search, and learn from failed autoresearch experiments. Converts discarded/crashed experiments into searchable knowledge with similarity search, pattern aggregation, and autoresearch loop integration.

2026-04-16 · prototype

GitHub autoresearch negative-results metascience
RA Policy Collector

Automated science policy monitoring for Japanese funding agencies (MEXT, JST, JSPS). Scrapes, normalizes, and diff-detects 143+ policy items daily via GitHub Actions.

2026-04-16 · prototype

GitHub research-administration metascience automation
RA Skills Hub

Community-driven research administration expertise as reusable AI skills. 8 skills covering policy monitoring, grant review, budget planning, and a meta-skill for RA knowledge capture.

2026-04-16 · prototype

GitHub research-administration metascience knowledge-management
Autoresearch-Lite

Lightweight Karpathy-compatible autoresearch loop for Apple Silicon. LLM proposes hyperparameter changes, trains CNNs on CIFAR-10, logs results as keep/discard/crash.

2026-04-14 · prototype

GitHub autoresearch validation apple-silicon
Epistemic Cascade Validator

Confidence scoring system that detects and prevents epistemic cascade contamination in autonomous research pipelines.

2026-04-14 · prototype

GitHub autoresearch reproducibility bayesian
LLM-Native Research Artifacts

Machine-readable formats for scientific knowledge that AI agents can directly operate on, manipulate, and reason with.

2026-04-14 · prototype

GitHub knowledge-representation ai-to-ai research-artifacts
Evolutionary Experiment Database

Manages autonomous research experiment results as evolutionary populations with fitness selection and genealogy tracking.

2026-04-14 · prototype

GitHub autoresearch evolutionary-computation knowledge-management

BLOG POSTS

Longer write-ups on the motivation and design behind these experiments.