Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.archal.ai/llms.txt

Use this file to discover all available pages before exploring further.

How runs work

Archal tests your agent by routing its API calls through a TLS proxy to an Archal-provisioned twin instead of the real service:
[Your agent harness]  ──HTTP──▶  [TLS proxy]  ──▶  [Archal twin (e.g., GitHub)]
        ▲                                                  │
        └──────── tool-call response ─────────────────────┘
Your harness must make tool calls using the real SDK against the real API endpoint (e.g., Octokit calling api.github.com). The TLS proxy intercepts those calls and routes them to the twin. The twin maintains full state, returns realistic responses, and records every tool call in a trace. Building your own MCP server and testing it against itself does not exercise Archal — your agent would never touch the twin. A working harness is env-driven: it reads the task and the twin’s base URL from environment variables, then calls the service SDK normally:
// .archal/harness.ts — env-driven base URL, real SDK call
import { Octokit } from '@octokit/rest';

const octokit = new Octokit({
  baseUrl: process.env.ARCHAL_GITHUB_BASE_URL ?? 'https://api.github.com',
});
const task = process.env.ARCHAL_ENGINE_TASK ?? '';
const [owner, repo] = ['my-org', 'my-repo'];
const result = await octokit.issues.create({ owner, repo, title: task });
console.log(JSON.stringify(result.data));
See examples/agents/google-workspace-local-tools/ for a complete working pattern.
Common mistake: if your harness contains its own MCP server and only tests that server, your agent is talking to itself — not to Archal. Run archal run --preflight-only to verify that your harness starts and that the proxy connection is healthy before running full scenarios.

Install

Run this inside your agent’s repository:
npx archal init
init detects your agent platform(s) (Claude Code, Codex, Cursor, Windsurf), copies four agent skills into the project, and adds archal as a devDependency. Requires Node.js 20 or later. The skills walk your agent through the full workflow. Most users only invoke onboard — it does the rest:
SkillWhat it does
onboardInspects the repo, writes ./.archal/harness.ts, creates .archal.json, runs a smoke test. Start here.
scenarioAuthors new scenario files (Setup / Prompt / Success Criteria markdown).
evalRuns scenarios, interprets results, debugs failing criteria.
vitestWires hosted twins into an existing Vitest suite via archal/vitest.
Skipping init is supported but unusual — npm install -g archal installs the CLI binary only, with no harness, no .archal.json, and no skills. You’d have to write all three by hand. Prefer npx archal init.

Log in

archal login
This opens a browser window where you approve the CLI. Once approved, your credentials are saved locally and you won’t need to log in again on this machine. If you’re in a CI environment or working over SSH, grab an API token from the dashboard tokens page (click your avatar → Tokens, or follow the sign-in link emailed to you) and set it as an environment variable instead:
export ARCHAL_TOKEN=arc_...

Run your first test

Archal runs against your own agent — there are no bundled harnesses. Start by adding a headless entrypoint to your repo, then point archal run at it. Create ./.archal/harness.ts (this is the conventional path — other layouts also work, see archal run --harness). It reads the task from ARCHAL_ENGINE_TASK, invokes your agent, and prints the final result to stdout. A minimal template:
./.archal/harness.ts
// Preflight: archal runs the harness with ARCHAL_PREFLIGHT=1 before
// provisioning twins, just to confirm the entrypoint exists and exits cleanly.
if (process.env.ARCHAL_PREFLIGHT === '1') {
  console.log('OK');
  process.exit(0);
}

const task = process.env.ARCHAL_ENGINE_TASK;
if (!task) {
  console.error('Missing ARCHAL_ENGINE_TASK');
  process.exit(1);
}

// Call your real agent runtime here — for example, the function that powers
// your CLI or your product's chat endpoint. Avoid booting the full app shell.
const result = await runMyAgent({ task });
console.log(typeof result === 'string' ? result : JSON.stringify(result));
The preflight branch is what lets archal run fail fast on typos before spending ~30s on a cold twin start. Your agent gets each twin’s endpoint as ARCHAL_<TWIN>_URL / ARCHAL_<TWIN>_BASE_URL env vars — see Test your agent for the full list. With the harness in place, go straight to an inline task:
archal run --task "Create an issue titled 'hello world'" --harness ./.archal/harness.ts --twin github
You should see Archal provision a GitHub twin, spawn your agent, execute the task, and print a satisfaction score. About thirty seconds on a cold start, a few seconds after that.

Make it a default

Once you’ve confirmed the harness works, create .archal.json in your project root so you don’t have to pass --harness every time. The agent field tells Archal how to run your code; twins sets default twins:
{
  "agent": {
    "command": "npx",
    "args": ["tsx", "./.archal/harness.ts"]
  },
  "twins": ["github"]
}
With that in place, bare archal run (or archal run --task "...") picks up the config automatically:
archal run --task "Close all issues older than 90 days"
Archal starts the twins, spawns your agent as a child process, and passes it the task text along with the twin API endpoints as environment variables. Your agent makes its API calls against the twins instead of production, and when it exits, Archal evaluates what happened. Every run also saves local artifacts to .archal/cache/last-run.json and .archal/cache/runs/*.json. Use --output json only when you need machine-readable stdout.

Write a scenario

Inline tasks are good for quick smoke tests, but for anything you want to run repeatedly you should write a scenario file. Scenarios are markdown files that describe the starting state, the task, and what success looks like:
# Close Stale Issues

## Setup
A GitHub repository with 10 open issues. 4 of them have no activity in 90 days.

## Prompt
Close all issues with no activity in the last 90 days. Add a comment explaining why.

## Success Criteria
- [D] Exactly 4 issues are closed
- [D] All closed issues have a new comment
- [P] Each closing comment explains the reason for closure

## Config
twins: github
timeout: 90
Criteria tagged [D] are checked deterministically against the twin’s final state. Criteria tagged [P] are assessed by an LLM that reviews the trace and state. Run the scenario once to see if it works, then run it multiple times for a statistical satisfaction score:
archal run scenarios/close-stale-issues.md
archal run scenarios/close-stale-issues.md --runs 5
  • Test your agent covers the full headless harness flow, when to use .archal.json, how optional proxy mode works, and the environment variables your agent receives
  • Use an existing agent repo walks through adapting an Exo-style app or any existing harnessed repo
  • Writing scenarios explains the scenario format in detail, including how to write good success criteria and how evaluation works
  • Twin sessions is for when you want persistent twins you can interact with manually during development
  • Twins overview lists every available twin and what it covers