Reproducibility Protocol

End-to-End Reproducibility for Builds, Benchmarks, VTune Profiles, and Formal Proof Artifacts

This page documents a concrete rerun protocol tied to the current repository structure, CI workflows, benchmark harnesses, and proof pipelines. Follow these steps to reproduce the same classes of artifacts used by the public benchmark and verification pages.

Rust

stable toolchain

Matches CI setup via dtolnay/rust-toolchain@stable.

LLVM

18.x

CI installs llvm-18, llvm-18-dev, clang-18, lld-18.

Lean

leanprover/lean-action

Proof project validation runs in dedicated Lean workflow.

Benchmark harness

criterion 0.5.1

Configured in crates/x-bench/Cargo.toml.

01. Baseline

Pin and Validate Core Build State

Reference Branch and CI-Equivalent Checks

Reproduce on branch nightly (current working branch) unless you are intentionally validating master. Run the same test classes the CI uses.

git clone https://github.com/WillKirkmanM/x-lang.git
cd x-lang
git checkout nightly
cargo check -p x-cli
cargo test -p x-parser
cargo test -p x-typechecker
cargo test -p x-borrow-checker
cargo build -p x-cli
./target/debug/x-cli fmt --check examples/hello.x examples/types.x examples/struct.x examples/fibonacci.x
./target/debug/x-cli lint --deny-warnings examples/hello.x examples/types.x examples/struct.x examples/fibonacci.x
./target/debug/x-cli test examples/hello.x examples/types.x examples/struct.x examples/fibonacci.x

Determinism Notes

Prefer clean worktrees when producing final benchmark numbers.
Capture exact commit SHA with git rev-parse HEAD.
On Linux CI parity, use LLVM 18 and keep LLVM_SYS_181_PREFIX aligned.
Persist compiler, OS, and CPU metadata with scripts under scripts/utils.

Environment capture helper: scripts/utils/hardware_environment_stats.py

02. Criterion

Reproduce Keyword vs Non-Keyword Runtime Matrix

Harness Command

cargo bench -p x-bench --bench keyword_benchmarks
# optional overrides
# X_BENCH_XCLI=/abs/path/to/x-cli
# X_BENCH_CPP=/abs/path/to/g++
# X_BENCH_N=400000 X_BENCH_WORK=5000 X_BENCH_NI=120 X_BENCH_NJ=130 X_BENCH_NK=140 X_BENCH_ITERS=200000000

Source of benchmark configuration: crates/x-bench/benches/keyword_benchmarks.rs

Sampling and Outputs

Group: keyword_nonkeyword_runtime_x_vs_cpp
SamplingMode::Flat, sample_size=10, warmup=1s, measurement=8s
Temporary compiled binaries: target/x-bench/criterion-keyword-bench/compiled_bins
Criterion JSON estimates: target/criterion/keyword_nonkeyword_runtime_x_vs_cpp/*

03. Macro + VTune

Reproduce Family-Level Benchmarks and Profiling Exports

Benchmark Family Automation Index

become

Scripts

benchmarks/full/become/run_tco_bench.py
benchmarks/full/become/benchmark/run_become_profile.py

Primary Outputs

benchmarks/full/become/tco_bench_results.csv
benchmarks/full/become/benchmark/vtune_reports_become/*.csv

invariant

Scripts

benchmarks/full/invariant/bench_invariant.py
benchmarks/full/invariant/bench/plot_invariant.py

Primary Outputs

benchmarks/full/invariant/invariant_bench_results.csv

layout

Scripts

benchmarks/full/layout/run_fdtd_layout_bench.py
benchmarks/full/layout/summarize_fdtd_layout.py

Primary Outputs

benchmarks/full/layout/fdtd_layout_results.csv

memoise

Scripts

benchmarks/full/memoise/run_full_bench.py
benchmarks/full/memoise/bench/run_cpp_benchmarks.py

Primary Outputs

benchmarks/full/memoise/memo_50_fib(40).txt
benchmarks/full/memoise/memo_50_10k_fib(40).txt

multi

Scripts

benchmarks/full/multi/run_multi_bench.py
benchmarks/full/multi/vtune/run_all_profile.py

Primary Outputs

benchmarks/full/multi/statistical_test_summary.csv
benchmarks/full/multi/bench_compare.csv

parallel

Scripts

benchmarks/full/parallel/bench_parallel.py
benchmarks/full/parallel/plot_parallel.py

Primary Outputs

benchmarks/full/parallel/parallel_bench_results.csv

pure

Scripts

benchmarks/full/pure/benchmark.py
benchmarks/full/pure/verify_bench.py

Primary Outputs

benchmarks/full/pure/pure_bench_50.txt
benchmarks/full/pure/3x_pure.txt

static

Scripts

benchmarks/full/static/run_regex_benchmark.py
benchmarks/full/static/plot_regex_results.py

Primary Outputs

benchmarks/full/static/regex_results.csv

throws

Scripts

benchmarks/full/throws/bench_throws.py
benchmarks/full/throws/bench/vtune_throws.py

Primary Outputs

benchmarks/full/throws/throws_bench_results.csv

unique

Scripts

benchmarks/full/unique/bench_gemm.py
benchmarks/full/unique/run_bench_unified.py

Primary Outputs

benchmarks/full/unique/summary_results.csv
benchmarks/full/unique/bench/bench_results.csv

VTune Collection Contract (from x-bench)

The x-bench runner uses vtune -collect uarch-exploration, then exports -report summary -format csv to gather counters and timing into CSV-friendly metrics.

cargo run -p x-bench -- --cli-path ./target/debug/x-cli.exe --bench-dir ./x-bench/benchmarks --iterations 5
# optional flags
# --quick
# --file <path-to-single-benchmark.x>
# --subdir pure
# --vtune-path vtune
# --no-hpc

04. Formal Proofs

Reproduce Lean and Alive2-Oriented Outputs

x-alive Pipeline

cargo run -p x-alive -- -k pure
cargo run -p x-alive -- -k memoised
cargo run -p x-alive -- -k pure -k memoised
cargo run -p x-alive -- --print-alive2
cargo run -p x-alive -- --examples-dir ..\examples --out-dir ..\IR -k pure --print-alive2

x-alive emits paired LLVM IR files (with and without selected semantics) and can print Alive2 panes for direct comparison.

Proof Artifact Locations

Lean project root: crates/x-proofs-lean
Lean theorem modules: crates/x-proofs-lean/XProofs
Alive scripts: crates/x-proofs-alive/*.tv
Completed Alive scripts: crates/x-proofs-alive/done/*.tv

Lean CI mirror: .github/workflows/lean_action_ci.yml

05. Artifact Graph

Where Reproducibility Evidence Lands

Canonical Artifact Paths

target/criterion/keyword_nonkeyword_runtime_x_vs_cpp/**
target/x-bench/criterion-keyword-bench/**
benchmarks/full/**
crates/x-proofs-lean/XProofs/*.lean
crates/x-proofs-alive/*.tv
crates/x-proofs-alive/done/*.tv

For direct browsing of collected files, use the Data route: /data

06. Checklist

Publication-Grade Reproducibility Checklist

Record commit SHA, branch, Rust version, LLVM version, CPU, OS.
Run parser/typechecker/borrow-checker tests before performance runs.
Re-run criterion benchmark matrix and archive target/criterion JSON outputs.
Capture benchmark family CSV/TXT outputs under benchmarks/full.
Capture VTune CSV summary exports with the collection configuration noted.
Preserve Lean and Alive proof artifacts and command invocations used for generation.