Quickstart

How runs work

Archal tests your agent by routing its API calls through a TLS proxy to an Archal-provisioned twin instead of the real service:

[Your agent harness]  ──HTTP──▶  [TLS proxy]  ──▶  [Archal twin (e.g., GitHub)]
        ▲                                                  │
        └──────── tool-call response ─────────────────────┘

Your harness must make tool calls using the real SDK against the real API endpoint (e.g., Octokit calling api.github.com). The TLS proxy intercepts those calls and routes them to the twin. The twin maintains full state, returns realistic responses, and records every tool call in a trace. Building your own MCP server and testing it against itself does not exercise Archal — your agent would never touch the twin. A working harness is env-driven: it reads the task and the twin’s base URL from environment variables, then calls the service SDK normally:

// .archal/harness.ts — env-driven base URL, real SDK call
import { Octokit } from '@octokit/rest';

const octokit = new Octokit({
  baseUrl: process.env.ARCHAL_GITHUB_BASE_URL ?? 'https://api.github.com',
});
const task = process.env.ARCHAL_ENGINE_TASK ?? '';
const [owner, repo] = ['my-org', 'my-repo'];
const result = await octokit.issues.create({ owner, repo, title: task });
console.log(JSON.stringify(result.data));

See examples/agents/google-workspace-local-tools/ for a complete working pattern.

Common mistake: if your harness contains its own MCP server and only tests that server, your agent is talking to itself — not to Archal. Run archal run --preflight-only to verify that your harness starts and that the proxy connection is healthy before running full scenarios.

Install

Run this inside your agent’s repository:

npx archal init

init detects your agent platform(s) (Claude Code, Codex, Cursor, Windsurf), copies four agent skills into the project, and adds archal as a devDependency. Requires Node.js 20 or later. The skills walk your agent through the full workflow. Most users only invoke onboard — it does the rest:

Skill	What it does
`onboard`	Inspects the repo, writes `./.archal/harness.ts`, creates `.archal.json`, runs a smoke test. Start here.
`scenario`	Authors new scenario files (Setup / Prompt / Success Criteria markdown).
`eval`	Runs scenarios, interprets results, debugs failing criteria.
`vitest`	Wires hosted twins into an existing Vitest suite via `archal/vitest`.

Skipping init is supported but unusual — npm install -g archal installs the CLI binary only, with no harness, no .archal.json, and no skills. You’d have to write all three by hand. Prefer npx archal init.

Log in

archal login

This opens a browser window where you approve the CLI. Once approved, your credentials are saved locally and you won’t need to log in again on this machine. If you’re in a CI environment or working over SSH, grab an API token from the dashboard tokens page (click your avatar → Tokens, or follow the sign-in link emailed to you) and set it as an environment variable instead:

export ARCHAL_TOKEN=arc_...

Run your first test

Archal runs against your own agent — there are no bundled harnesses. Start by adding a headless entrypoint to your repo, then point archal run at it. Create ./.archal/harness.ts (this is the conventional path — other layouts also work, see archal run --harness). It reads the task from ARCHAL_ENGINE_TASK, invokes your agent, and prints the final result to stdout. A minimal template:

./.archal/harness.ts

// Preflight: archal runs the harness with ARCHAL_PREFLIGHT=1 before
// provisioning twins, just to confirm the entrypoint exists and exits cleanly.
if (process.env.ARCHAL_PREFLIGHT === '1') {
  console.log('OK');
  process.exit(0);
}

const task = process.env.ARCHAL_ENGINE_TASK;
if (!task) {
  console.error('Missing ARCHAL_ENGINE_TASK');
  process.exit(1);
}

// Call your real agent runtime here — for example, the function that powers
// your CLI or your product's chat endpoint. Avoid booting the full app shell.
const result = await runMyAgent({ task });
console.log(typeof result === 'string' ? result : JSON.stringify(result));

The preflight branch is what lets archal run fail fast on typos before spending ~30s on a cold twin start. Your agent gets each twin’s endpoint as ARCHAL_<TWIN>_URL / ARCHAL_<TWIN>_BASE_URL env vars — see Test your agent for the full list. With the harness in place, go straight to an inline task:

archal run --task "Create an issue titled 'hello world'" --harness ./.archal/harness.ts --twin github

You should see Archal provision a GitHub twin, spawn your agent, execute the task, and print a satisfaction score. About thirty seconds on a cold start, a few seconds after that.

Make it a default

Once you’ve confirmed the harness works, create .archal.json in your project root so you don’t have to pass --harness every time. The agent field tells Archal how to run your code; twins sets default twins:

{
  "agent": {
    "command": "npx",
    "args": ["tsx", "./.archal/harness.ts"]
  },
  "twins": ["github"]
}

With that in place, bare archal run (or archal run --task "...") picks up the config automatically:

archal run --task "Close all issues older than 90 days"

Archal starts the twins, spawns your agent as a child process, and passes it the task text along with the twin API endpoints as environment variables. Your agent makes its API calls against the twins instead of production, and when it exits, Archal evaluates what happened. Every run also saves local artifacts to .archal/cache/last-run.json and .archal/cache/runs/*.json. Use --output json only when you need machine-readable stdout.

Write a scenario

Inline tasks are good for quick smoke tests, but for anything you want to run repeatedly you should write a scenario file. Scenarios are markdown files that describe the starting state, the task, and what success looks like:

# Close Stale Issues

## Setup
A GitHub repository with 10 open issues. 4 of them have no activity in 90 days.

## Prompt
Close all issues with no activity in the last 90 days. Add a comment explaining why.

## Success Criteria
- [D] Exactly 4 issues are closed
- [D] All closed issues have a new comment
- [P] Each closing comment explains the reason for closure

## Config
twins: github
timeout: 90

Criteria tagged [D] are checked deterministically against the twin’s final state. Criteria tagged [P] are assessed by an LLM that reviews the trace and state. Run the scenario once to see if it works, then run it multiple times for a statistical satisfaction score:

archal run scenarios/close-stale-issues.md
archal run scenarios/close-stale-issues.md --runs 5

Start here

Scenarios

Run anywhere

Advanced

How runs work

Install

Log in

Run your first test

Make it a default

Write a scenario

What to read next

Start here

Scenarios

Run anywhere

Advanced

Documentation Index

​How runs work

​Install

​Log in

​Run your first test

​Make it a default

​Write a scenario

​What to read next

How runs work

Install

Log in

Run your first test

Make it a default

Write a scenario

What to read next