Home/Methodology

Evaluation Methodology

Evidence-based verdicts for MPLP lifecycle guarantees under versioned, deterministic rulesets.

METHOD-VLAB-01Version 1.0site-v0.5

šŸ“– Methodology Narrative: governance/METHOD-VLAB-01_EVALUATION_METHOD.md

šŸ“Š Substrate Registry (Live SSOT): data/curated-runs/substrate-index.yaml

Substrate status data loaded from SSOT at build time. See /coverage for real-time matrix.

1. Four Boundaries

Non-certification

No badges, no ranking, and no compliance certificates

Non-endorsement

Verdict ≠ recommendation or quality assessment

No execution hosting

Lab does not host execution; you provide evidence packs

Deterministic ruleset

Same evidence + same ruleset = same verdict

See /about for full boundary statement.

2. What We Evaluate

Evidence Pack + Ruleset → Verdict (PASS/FAIL) + verdict_hash

āœ… We Evaluate

  • • Evidence packs against Lifecycle Guarantees
  • • Structural completeness and integrity
  • • Claim satisfaction under frozen rulesets

āŒ We Do NOT Evaluate

  • • Runtime performance or latency
  • • Agent quality or intelligence
  • • Code correctness or security

3. Evidence Pack (Input)

pack/
ā”œā”€ā”€ manifest.json            # Pack metadata
ā”œā”€ā”€ integrity/
│   ā”œā”€ā”€ sha256sums.txt       # File checksums
│   └── pack.sha256          # Pack root hash
ā”œā”€ā”€ timeline/
│   └── events.ndjson        # Execution timeline
└── artifacts/
    ā”œā”€ā”€ context.json         # Agent context
    ā”œā”€ā”€ plan.json            # Agent plan
    └── trace.json           # Execution trace
pack-v0.2pack-v0.3pack-v0.4

See /policies/contract for full specification.

4. Case Lifecycle

Submitted→Admission Check→REGISTERED→ADJUDICATED
āœ— NOT_ADMISSIBLE (if admission fails)

REGISTERED

Pack admitted, awaiting evaluation

ADJUDICATED

Evaluation complete, verdict issued

NOT_ADMISSIBLE

Pack rejected, no verdict

5. Rulesets

A Ruleset is a versioned, immutable set of decision rules. Once frozen, it never changes.

ruleset-1.0

GoldenFlow (LG-01~05)

pack-v0.2

ruleset-1.1

Four-Domain (D1~D4)

pack-v0.3

ruleset-1.2

Semantic Invariant (12 clauses)

pack-v0.4

See /rulesets for all versions.

6. Verdicts & Recheck

Determinism Guarantee

Same pack + same ruleset = same verdict_hash

Third-Party Recheck

npx @mplp/recompute <pack_path> --ruleset 1.0

Anyone can verify a verdict independently without trusting Lab infrastructure.

7. Substrate Model

A Substrate is an execution environment (framework, protocol, runtime) that produces evidence packs.

Current Status: 6 Tier-0 substrates, 2 ADJUDICATED, 26 REGISTERED

LangChain

framework

āœ… ADJUDICATED

MCP

protocol

āœ… ADJUDICATED

LangGraph

framework

⚪ REGISTERED

AutoGen

framework

⚪ REGISTERED

Semantic Kernel

framework

⚪ REGISTERED

A2A

protocol

⚪ REGISTERED

See /policies/substrate-scope for admission tiers. Data from substrate-index.yaml

8. Version Taxonomy

TypePrefixCurrentScope
Site Freezesite-v*site-v0.5Website IA
Pack Formatpack-v*pack-v0.2~0.4Evidence structure
Rulesetruleset-*ruleset-1.0~1.2Decision rules
Release Sealrel-lab-*rel-lab-0.5Governance seal

Quick Reference

SSOT: governance/METHOD-VLAB-01_EVALUATION_METHOD.md

Validation Lab • site-v0.5 Frozen