Documentation Index
Fetch the complete documentation index at: https://docs.archal.ai/llms.txt
Use this file to discover all available pages before exploring further.
How runs work
Archal tests your agent by routing its API calls through a TLS proxy to an Archal-provisioned twin instead of the real service:api.github.com). The TLS proxy intercepts those calls and routes them to the twin. The twin maintains full state, returns realistic responses, and records every tool call in a trace. Building your own MCP server and testing it against itself does not exercise Archal — your agent would never touch the twin.
A working harness is env-driven: it reads the task and the twin’s base URL from environment variables, then calls the service SDK normally:
examples/agents/google-workspace-local-tools/ for a complete working pattern.
Install
Run this inside your agent’s repository:init detects your agent platform(s) (Claude Code, Codex, Cursor, Windsurf), copies four agent skills into the project, and adds archal as a devDependency. Requires Node.js 20 or later.
The skills walk your agent through the full workflow. Most users only invoke onboard — it does the rest:
| Skill | What it does |
|---|---|
onboard | Inspects the repo, writes ./.archal/harness.ts, creates .archal.json, runs a smoke test. Start here. |
scenario | Authors new scenario files (Setup / Prompt / Success Criteria markdown). |
eval | Runs scenarios, interprets results, debugs failing criteria. |
vitest | Wires hosted twins into an existing Vitest suite via archal/vitest. |
Log in
Run your first test
Archal runs against your own agent — there are no bundled harnesses. Start by adding a headless entrypoint to your repo, then pointarchal run at it.
Create ./.archal/harness.ts (this is the conventional path — other layouts also work, see archal run --harness). It reads the task from ARCHAL_ENGINE_TASK, invokes your agent, and prints the final result to stdout. A minimal template:
./.archal/harness.ts
archal run fail fast on typos before spending ~30s on a cold twin start. Your agent gets each twin’s endpoint as ARCHAL_<TWIN>_URL / ARCHAL_<TWIN>_BASE_URL env vars — see Test your agent for the full list.
With the harness in place, go straight to an inline task:
Make it a default
Once you’ve confirmed the harness works, create.archal.json in your project root so you don’t have to pass --harness every time. The agent field tells Archal how to run your code; twins sets default twins:
archal run (or archal run --task "...") picks up the config automatically:
.archal/cache/last-run.json and
.archal/cache/runs/*.json. Use --output json only when you need
machine-readable stdout.
Write a scenario
Inline tasks are good for quick smoke tests, but for anything you want to run repeatedly you should write a scenario file. Scenarios are markdown files that describe the starting state, the task, and what success looks like:[D] are checked deterministically against the twin’s final state. Criteria tagged [P] are assessed by an LLM that reviews the trace and state. Run the scenario once to see if it works, then run it multiple times for a statistical satisfaction score:
What to read next
- Test your agent covers the full headless harness flow, when to use
.archal.json, how optional proxy mode works, and the environment variables your agent receives - Use an existing agent repo walks through adapting an Exo-style app or any existing harnessed repo
- Writing scenarios explains the scenario format in detail, including how to write good success criteria and how evaluation works
- Twin sessions is for when you want persistent twins you can interact with manually during development
- Twins overview lists every available twin and what it covers
