Evaluation Methodology
Evidence-based verdicts for MPLP lifecycle guarantees under versioned, deterministic rulesets.
š Methodology Narrative: governance/METHOD-VLAB-01_EVALUATION_METHOD.md
š Substrate Registry (Live SSOT): data/curated-runs/substrate-index.yaml
Substrate status data loaded from SSOT at build time. See /coverage for real-time matrix.
1. Four Boundaries
Non-certification
No badges, no ranking, and no compliance certificates
Non-endorsement
Verdict ā recommendation or quality assessment
No execution hosting
Lab does not host execution; you provide evidence packs
Deterministic ruleset
Same evidence + same ruleset = same verdict
See /about for full boundary statement.
2. What We Evaluate
Evidence Pack + Ruleset ā Verdict (PASS/FAIL) + verdict_hashā We Evaluate
- ⢠Evidence packs against Lifecycle Guarantees
- ⢠Structural completeness and integrity
- ⢠Claim satisfaction under frozen rulesets
ā We Do NOT Evaluate
- ⢠Runtime performance or latency
- ⢠Agent quality or intelligence
- ⢠Code correctness or security
3. Evidence Pack (Input)
pack/
āāā manifest.json # Pack metadata
āāā integrity/
ā āāā sha256sums.txt # File checksums
ā āāā pack.sha256 # Pack root hash
āāā timeline/
ā āāā events.ndjson # Execution timeline
āāā artifacts/
āāā context.json # Agent context
āāā plan.json # Agent plan
āāā trace.json # Execution traceSee /policies/contract for full specification.
4. Case Lifecycle
REGISTERED
Pack admitted, awaiting evaluation
ADJUDICATED
Evaluation complete, verdict issued
NOT_ADMISSIBLE
Pack rejected, no verdict
5. Rulesets
A Ruleset is a versioned, immutable set of decision rules. Once frozen, it never changes.
ruleset-1.0
GoldenFlow (LG-01~05)
pack-v0.2
ruleset-1.1
Four-Domain (D1~D4)
pack-v0.3
ruleset-1.2
Semantic Invariant (12 clauses)
pack-v0.4
See /rulesets for all versions.
6. Verdicts & Recheck
Determinism Guarantee
Same pack + same ruleset = same verdict_hashThird-Party Recheck
npx @mplp/recompute <pack_path> --ruleset 1.0Anyone can verify a verdict independently without trusting Lab infrastructure.
7. Substrate Model
A Substrate is an execution environment (framework, protocol, runtime) that produces evidence packs.
Current Status: 6 Tier-0 substrates, 2 ADJUDICATED, 26 REGISTERED
LangChain
framework
ā ADJUDICATED
MCP
protocol
ā ADJUDICATED
LangGraph
framework
āŖ REGISTERED
AutoGen
framework
āŖ REGISTERED
Semantic Kernel
framework
āŖ REGISTERED
A2A
protocol
āŖ REGISTERED
See /policies/substrate-scope for admission tiers. Data from substrate-index.yaml
8. Version Taxonomy
| Type | Prefix | Current | Scope |
|---|---|---|---|
| Site Freeze | site-v* | site-v0.5 | Website IA |
| Pack Format | pack-v* | pack-v0.2~0.4 | Evidence structure |
| Ruleset | ruleset-* | ruleset-1.0~1.2 | Decision rules |
| Release Seal | rel-lab-* | rel-lab-0.5 | Governance seal |