CLI Reference

Ghostlab installs two equivalent console scripts, ghostlab and rehearsal. New examples use ghostlab.

inspect

Introspect a target MCP server.

ghostlab inspect --target targets/cortex-local.json

profile

Create a capability profile from an inspect.json.

ghostlab profile --inspect runs/<id>-inspect/inspect.json

generate-scenarios

Generate grounded scenarios from a capability profile.

ghostlab generate-scenarios \
  --profile runs/<id>-inspect/capabilities.json \
  --n 3 \
  --output-dir scenarios

generate-personas

Generate reusable domain personas from a capability profile.

ghostlab generate-personas \
  --profile runs/<id>-inspect/capabilities.json \
  --n 4 \
  --output-dir personas

generate-dataset

Generate a persona x scenario dataset.

ghostlab generate-dataset \
  --profile runs/<id>-inspect/capabilities.json \
  --personas 3 \
  --scenarios-per-persona 3 \
  --seed 7 \
  --name cortex

review-dataset

Review, flag, approve, or reject dataset cases before spending agent credits.

ghostlab review-dataset \
  --dataset datasets/cortex \
  --profile runs/<id>-inspect/capabilities.json
ghostlab review-dataset --dataset datasets/cortex \
  --approve case-a case-b --reject case-c

run

Run one scenario.

ghostlab run \
  --target targets/example-stdio.json \
  --scenario scenarios/basic-discovery.json \
  --aut-runner runners/mock-aut.json \
  --user-runner runners/mock-user.json

run-dataset

Run every case in a dataset. Use --limit for small development runs and --approved-only to skip unreviewed cases.

ghostlab run-dataset \
  --dataset datasets/cortex \
  --target targets/cortex-local.json \
  --aut-runner runners/codex-cortex-aut.json \
  --user-runner runners/codex-user-emulator.json \
  --limit 2

evaluate

Score a completed run into a pass, partial, or fail verdict.

ghostlab evaluate --run runs/<id> --capabilities runs/<id>-inspect/capabilities.json

critique

Critique the MCP server's tool usability from a completed run. Where evaluate asks "did the scenario pass?", critique asks "how do I improve this MCP?": it grades the naming, descriptions, parameter clarity, and error quality of the tools the agent actually exercised, with concrete suggestions. Pass --inspect so the judge can see the real tool definitions.

ghostlab critique --run runs/<id> --inspect runs/<id>-inspect/inspect.json

Writes critique.json and critique.md into the run directory.

compare

Diff two dataset result sets for regressions.

ghostlab compare --base runs/<base>-summary --candidate runs/<candidate>-summary \
  --output comparison.md

scorecard

Aggregate a whole dataset run into one MCP validation report (pass rate, per-tool reliability, hallucination/golden-mismatch counts, efficiency, and recurring tool-design recommendations). No model calls — it reads the per-case artifacts.

ghostlab scorecard --results runs/<id>-summary

Writes scorecard.json and scorecard.md into the summary directory.

doctor

Validate local agent and runner setup.

ghostlab doctor
ghostlab doctor --runners runners/codex-cortex-local-session.json