X-Lang logo
GitHub
Reproducibility Protocol

End-to-End Reproducibility for Builds, Benchmarks, VTune Profiles, and Formal Proof Artifacts

This page documents a concrete rerun protocol tied to the current repository structure, CI workflows, benchmark harnesses, and proof pipelines. Follow these steps to reproduce the same classes of artifacts used by the public benchmark and verification pages.

Rust
stable toolchain

Matches CI setup via dtolnay/rust-toolchain@stable.

LLVM
18.x

CI installs llvm-18, llvm-18-dev, clang-18, lld-18.

Lean
leanprover/lean-action

Proof project validation runs in dedicated Lean workflow.

Benchmark harness
criterion 0.5.1

Configured in crates/x-bench/Cargo.toml.

01. Baseline

Pin and Validate Core Build State

Reference Branch and CI-Equivalent Checks

Reproduce on branch nightly (current working branch) unless you are intentionally validating master. Run the same test classes the CI uses.

git clone https://github.com/WillKirkmanM/x-lang.git
cd x-lang
git checkout nightly
cargo check -p x-cli
cargo test -p x-parser
cargo test -p x-typechecker
cargo test -p x-borrow-checker
cargo build -p x-cli
./target/debug/x-cli fmt --check examples/hello.x examples/types.x examples/struct.x examples/fibonacci.x
./target/debug/x-cli lint --deny-warnings examples/hello.x examples/types.x examples/struct.x examples/fibonacci.x
./target/debug/x-cli test examples/hello.x examples/types.x examples/struct.x examples/fibonacci.x

Determinism Notes

  • Prefer clean worktrees when producing final benchmark numbers.
  • Capture exact commit SHA with git rev-parse HEAD.
  • On Linux CI parity, use LLVM 18 and keep LLVM_SYS_181_PREFIX aligned.
  • Persist compiler, OS, and CPU metadata with scripts under scripts/utils.
Environment capture helper: scripts/utils/hardware_environment_stats.py

02. Criterion

Reproduce Keyword vs Non-Keyword Runtime Matrix

Harness Command

cargo bench -p x-bench --bench keyword_benchmarks
# optional overrides
# X_BENCH_XCLI=/abs/path/to/x-cli
# X_BENCH_CPP=/abs/path/to/g++
# X_BENCH_N=400000 X_BENCH_WORK=5000 X_BENCH_NI=120 X_BENCH_NJ=130 X_BENCH_NK=140 X_BENCH_ITERS=200000000

Source of benchmark configuration: crates/x-bench/benches/keyword_benchmarks.rs

Sampling and Outputs

  • Group: keyword_nonkeyword_runtime_x_vs_cpp
  • SamplingMode::Flat, sample_size=10, warmup=1s, measurement=8s
  • Temporary compiled binaries: target/x-bench/criterion-keyword-bench/compiled_bins
  • Criterion JSON estimates: target/criterion/keyword_nonkeyword_runtime_x_vs_cpp/*

03. Macro + VTune

Reproduce Family-Level Benchmarks and Profiling Exports

Benchmark Family Automation Index

become

Scripts

  • benchmarks/full/become/run_tco_bench.py
  • benchmarks/full/become/benchmark/run_become_profile.py

Primary Outputs

  • benchmarks/full/become/tco_bench_results.csv
  • benchmarks/full/become/benchmark/vtune_reports_become/*.csv

invariant

Scripts

  • benchmarks/full/invariant/bench_invariant.py
  • benchmarks/full/invariant/bench/plot_invariant.py

Primary Outputs

  • benchmarks/full/invariant/invariant_bench_results.csv

layout

Scripts

  • benchmarks/full/layout/run_fdtd_layout_bench.py
  • benchmarks/full/layout/summarize_fdtd_layout.py

Primary Outputs

  • benchmarks/full/layout/fdtd_layout_results.csv

memoise

Scripts

  • benchmarks/full/memoise/run_full_bench.py
  • benchmarks/full/memoise/bench/run_cpp_benchmarks.py

Primary Outputs

  • benchmarks/full/memoise/memo_50_fib(40).txt
  • benchmarks/full/memoise/memo_50_10k_fib(40).txt

multi

Scripts

  • benchmarks/full/multi/run_multi_bench.py
  • benchmarks/full/multi/vtune/run_all_profile.py

Primary Outputs

  • benchmarks/full/multi/statistical_test_summary.csv
  • benchmarks/full/multi/bench_compare.csv

parallel

Scripts

  • benchmarks/full/parallel/bench_parallel.py
  • benchmarks/full/parallel/plot_parallel.py

Primary Outputs

  • benchmarks/full/parallel/parallel_bench_results.csv

pure

Scripts

  • benchmarks/full/pure/benchmark.py
  • benchmarks/full/pure/verify_bench.py

Primary Outputs

  • benchmarks/full/pure/pure_bench_50.txt
  • benchmarks/full/pure/3x_pure.txt

static

Scripts

  • benchmarks/full/static/run_regex_benchmark.py
  • benchmarks/full/static/plot_regex_results.py

Primary Outputs

  • benchmarks/full/static/regex_results.csv

throws

Scripts

  • benchmarks/full/throws/bench_throws.py
  • benchmarks/full/throws/bench/vtune_throws.py

Primary Outputs

  • benchmarks/full/throws/throws_bench_results.csv

unique

Scripts

  • benchmarks/full/unique/bench_gemm.py
  • benchmarks/full/unique/run_bench_unified.py

Primary Outputs

  • benchmarks/full/unique/summary_results.csv
  • benchmarks/full/unique/bench/bench_results.csv

VTune Collection Contract (from x-bench)

The x-bench runner uses vtune -collect uarch-exploration, then exports -report summary -format csv to gather counters and timing into CSV-friendly metrics.

cargo run -p x-bench -- --cli-path ./target/debug/x-cli.exe --bench-dir ./x-bench/benchmarks --iterations 5
# optional flags
# --quick
# --file <path-to-single-benchmark.x>
# --subdir pure
# --vtune-path vtune
# --no-hpc

04. Formal Proofs

Reproduce Lean and Alive2-Oriented Outputs

x-alive Pipeline

cargo run -p x-alive -- -k pure
cargo run -p x-alive -- -k memoised
cargo run -p x-alive -- -k pure -k memoised
cargo run -p x-alive -- --print-alive2
cargo run -p x-alive -- --examples-dir ..\examples --out-dir ..\IR -k pure --print-alive2

x-alive emits paired LLVM IR files (with and without selected semantics) and can print Alive2 panes for direct comparison.

Proof Artifact Locations

  • Lean project root: crates/x-proofs-lean
  • Lean theorem modules: crates/x-proofs-lean/XProofs
  • Alive scripts: crates/x-proofs-alive/*.tv
  • Completed Alive scripts: crates/x-proofs-alive/done/*.tv
Lean CI mirror: .github/workflows/lean_action_ci.yml

05. Artifact Graph

Where Reproducibility Evidence Lands

Canonical Artifact Paths

  • target/criterion/keyword_nonkeyword_runtime_x_vs_cpp/**
  • target/x-bench/criterion-keyword-bench/**
  • benchmarks/full/**
  • crates/x-proofs-lean/XProofs/*.lean
  • crates/x-proofs-alive/*.tv
  • crates/x-proofs-alive/done/*.tv
For direct browsing of collected files, use the Data route: /data

06. Checklist

Publication-Grade Reproducibility Checklist

  • Record commit SHA, branch, Rust version, LLVM version, CPU, OS.
  • Run parser/typechecker/borrow-checker tests before performance runs.
  • Re-run criterion benchmark matrix and archive target/criterion JSON outputs.
  • Capture benchmark family CSV/TXT outputs under benchmarks/full.
  • Capture VTune CSV summary exports with the collection configuration noted.
  • Preserve Lean and Alive proof artifacts and command invocations used for generation.