Matches CI setup via dtolnay/rust-toolchain@stable.
End-to-End Reproducibility for Builds, Benchmarks, VTune Profiles, and Formal Proof Artifacts
This page documents a concrete rerun protocol tied to the current repository structure, CI workflows, benchmark harnesses, and proof pipelines. Follow these steps to reproduce the same classes of artifacts used by the public benchmark and verification pages.
CI installs llvm-18, llvm-18-dev, clang-18, lld-18.
Proof project validation runs in dedicated Lean workflow.
Configured in crates/x-bench/Cargo.toml.
01. Baseline
Pin and Validate Core Build State
Reference Branch and CI-Equivalent Checks
Reproduce on branch nightly (current working branch) unless you are intentionally validating master. Run the same test classes the CI uses.
git clone https://github.com/WillKirkmanM/x-lang.git
cd x-lang
git checkout nightly
cargo check -p x-cli
cargo test -p x-parser
cargo test -p x-typechecker
cargo test -p x-borrow-checker
cargo build -p x-cli
./target/debug/x-cli fmt --check examples/hello.x examples/types.x examples/struct.x examples/fibonacci.x
./target/debug/x-cli lint --deny-warnings examples/hello.x examples/types.x examples/struct.x examples/fibonacci.x
./target/debug/x-cli test examples/hello.x examples/types.x examples/struct.x examples/fibonacci.xDeterminism Notes
- Prefer clean worktrees when producing final benchmark numbers.
- Capture exact commit SHA with git rev-parse HEAD.
- On Linux CI parity, use LLVM 18 and keep LLVM_SYS_181_PREFIX aligned.
- Persist compiler, OS, and CPU metadata with scripts under scripts/utils.
02. Criterion
Reproduce Keyword vs Non-Keyword Runtime Matrix
Harness Command
cargo bench -p x-bench --bench keyword_benchmarks
# optional overrides
# X_BENCH_XCLI=/abs/path/to/x-cli
# X_BENCH_CPP=/abs/path/to/g++
# X_BENCH_N=400000 X_BENCH_WORK=5000 X_BENCH_NI=120 X_BENCH_NJ=130 X_BENCH_NK=140 X_BENCH_ITERS=200000000Source of benchmark configuration: crates/x-bench/benches/keyword_benchmarks.rs
Sampling and Outputs
- Group: keyword_nonkeyword_runtime_x_vs_cpp
- SamplingMode::Flat, sample_size=10, warmup=1s, measurement=8s
- Temporary compiled binaries: target/x-bench/criterion-keyword-bench/compiled_bins
- Criterion JSON estimates: target/criterion/keyword_nonkeyword_runtime_x_vs_cpp/*
03. Macro + VTune
Reproduce Family-Level Benchmarks and Profiling Exports
Benchmark Family Automation Index
become
Scripts
- benchmarks/full/become/run_tco_bench.py
- benchmarks/full/become/benchmark/run_become_profile.py
Primary Outputs
- benchmarks/full/become/tco_bench_results.csv
- benchmarks/full/become/benchmark/vtune_reports_become/*.csv
invariant
Scripts
- benchmarks/full/invariant/bench_invariant.py
- benchmarks/full/invariant/bench/plot_invariant.py
Primary Outputs
- benchmarks/full/invariant/invariant_bench_results.csv
layout
Scripts
- benchmarks/full/layout/run_fdtd_layout_bench.py
- benchmarks/full/layout/summarize_fdtd_layout.py
Primary Outputs
- benchmarks/full/layout/fdtd_layout_results.csv
memoise
Scripts
- benchmarks/full/memoise/run_full_bench.py
- benchmarks/full/memoise/bench/run_cpp_benchmarks.py
Primary Outputs
- benchmarks/full/memoise/memo_50_fib(40).txt
- benchmarks/full/memoise/memo_50_10k_fib(40).txt
multi
Scripts
- benchmarks/full/multi/run_multi_bench.py
- benchmarks/full/multi/vtune/run_all_profile.py
Primary Outputs
- benchmarks/full/multi/statistical_test_summary.csv
- benchmarks/full/multi/bench_compare.csv
parallel
Scripts
- benchmarks/full/parallel/bench_parallel.py
- benchmarks/full/parallel/plot_parallel.py
Primary Outputs
- benchmarks/full/parallel/parallel_bench_results.csv
pure
Scripts
- benchmarks/full/pure/benchmark.py
- benchmarks/full/pure/verify_bench.py
Primary Outputs
- benchmarks/full/pure/pure_bench_50.txt
- benchmarks/full/pure/3x_pure.txt
static
Scripts
- benchmarks/full/static/run_regex_benchmark.py
- benchmarks/full/static/plot_regex_results.py
Primary Outputs
- benchmarks/full/static/regex_results.csv
throws
Scripts
- benchmarks/full/throws/bench_throws.py
- benchmarks/full/throws/bench/vtune_throws.py
Primary Outputs
- benchmarks/full/throws/throws_bench_results.csv
unique
Scripts
- benchmarks/full/unique/bench_gemm.py
- benchmarks/full/unique/run_bench_unified.py
Primary Outputs
- benchmarks/full/unique/summary_results.csv
- benchmarks/full/unique/bench/bench_results.csv
VTune Collection Contract (from x-bench)
The x-bench runner uses vtune -collect uarch-exploration, then exports -report summary -format csv to gather counters and timing into CSV-friendly metrics.
cargo run -p x-bench -- --cli-path ./target/debug/x-cli.exe --bench-dir ./x-bench/benchmarks --iterations 5
# optional flags
# --quick
# --file <path-to-single-benchmark.x>
# --subdir pure
# --vtune-path vtune
# --no-hpc04. Formal Proofs
Reproduce Lean and Alive2-Oriented Outputs
x-alive Pipeline
cargo run -p x-alive -- -k pure
cargo run -p x-alive -- -k memoised
cargo run -p x-alive -- -k pure -k memoised
cargo run -p x-alive -- --print-alive2
cargo run -p x-alive -- --examples-dir ..\examples --out-dir ..\IR -k pure --print-alive2x-alive emits paired LLVM IR files (with and without selected semantics) and can print Alive2 panes for direct comparison.
Proof Artifact Locations
- Lean project root: crates/x-proofs-lean
- Lean theorem modules: crates/x-proofs-lean/XProofs
- Alive scripts: crates/x-proofs-alive/*.tv
- Completed Alive scripts: crates/x-proofs-alive/done/*.tv
05. Artifact Graph
Where Reproducibility Evidence Lands
Canonical Artifact Paths
- target/criterion/keyword_nonkeyword_runtime_x_vs_cpp/**
- target/x-bench/criterion-keyword-bench/**
- benchmarks/full/**
- crates/x-proofs-lean/XProofs/*.lean
- crates/x-proofs-alive/*.tv
- crates/x-proofs-alive/done/*.tv
06. Checklist
Publication-Grade Reproducibility Checklist
- Record commit SHA, branch, Rust version, LLVM version, CPU, OS.
- Run parser/typechecker/borrow-checker tests before performance runs.
- Re-run criterion benchmark matrix and archive target/criterion JSON outputs.
- Capture benchmark family CSV/TXT outputs under benchmarks/full.
- Capture VTune CSV summary exports with the collection configuration noted.
- Preserve Lean and Alive proof artifacts and command invocations used for generation.
