Documentation

Complete reference for @isoldex/sentinel — up and running in 5 minutes.

Installation

terminal

npm install @isoldex/sentinel playwright
npx playwright install chromium

Create a .env file with your API key:

.env

GEMINI_API_KEY=your_key_here
GEMINI_VERSION=gemini-3-flash-preview   # optional

Get a free key at aistudio.google.com. The free tier covers thousands of runs. For OpenAI, Claude, or Ollama see Providers.

Quickstart

act() performs natural-language actions. extract() returns typed structured data.

index.ts

import { Sentinel, z } from '@isoldex/sentinel';

const sentinel = new Sentinel({ apiKey: process.env.GEMINI_API_KEY });
await sentinel.init();
await sentinel.goto('https://news.ycombinator.com');

// Extract structured data
const data = await sentinel.extract('Get the top 3 stories', z.object({
  stories: z.array(z.object({
    title: z.string(),
    points: z.number(),
  }))
}));

// Natural language actions
await sentinel.act('Click on the "new" link in the header');
await sentinel.act('Fill "hello@example.com" into the email field');

await sentinel.close();

act(instruction, options?)

Performs a natural language action on the current page. Sentinel automatically verifies the action and retries on weak confidence.

actions.ts

// Basic click / fill / hover
await sentinel.act('Click the login button');
await sentinel.act('Fill "user@example.com" into the email field');
await sentinel.act('Hover over the profile menu');

// All supported action types
await sentinel.act('Select "Germany" from the country dropdown');
await sentinel.act('Press Enter');
await sentinel.act('Double-click the product image');
await sentinel.act('Right-click the file');
await sentinel.act('Scroll down');
await sentinel.act('Scroll up');
await sentinel.act('Scroll to the footer');
await sentinel.act('Append " (urgent)" to the subject line');

// Variable interpolation
await sentinel.act('Fill %email% into the email field', {
  variables: { email: 'user@example.com' },
});

// Custom retry count
await sentinel.act('Click the submit button', { retries: 5 });

ActOptions

variables — Record<string, string>
retries — number (default: 2)

Supported actions

click · fill · append · hover · press · select · double-click · right-click · scroll-down · scroll-up · scroll-to

ActionResult — returned by every act() call, never throws:

act-result.ts

const result = await sentinel.act('Click the checkout button');

console.log(result.success);   // boolean
console.log(result.message);   // "Clicked Checkout button"
console.log(result.action);    // "click"
console.log(result.selector);  // '[data-testid="checkout-btn"]'

// On failure — full diagnostic
if (!result.success) {
  console.log(result.message);
  // Action failed: "Click checkout button" on "Checkout"
  // 3 paths tried:
  //   • coordinate-click: Element outside viewport at (640, 950)
  //   • vision-grounding: Element not found in screenshot
  //   • locator-fallback: strict mode violation: 3 elements matched
  // Tip: element may be off-screen. Try: sentinel.act('scroll to "Checkout"')

  console.log(result.attempts);
  // [{ path: 'coordinate-click', error: '...' }, ...]
}

extract<T>(instruction, schema)

Extracts structured data from the current page. Accepts a Zod schema or raw JSON Schema. TypeScript generics are inferred automatically.

extract.ts

import { Sentinel, z } from '@isoldex/sentinel';

// Zod schema — TypeScript type is inferred automatically
const result = await sentinel.extract(
  'Get all product names and prices',
  z.object({
    products: z.array(z.object({
      name:  z.string(),
      price: z.number(),
    }))
  })
);
// result.products is typed as { name: string; price: number }[]

// Raw JSON Schema also works
const result2 = await sentinel.extract('Get the page title', {
  type: 'object',
  properties: { title: { type: 'string' } },
});

observe(instruction?)

Returns interactive elements visible on the page, optionally filtered by a natural language hint. Useful for debugging or building dynamic workflows.

observe.ts

// All interactive elements on the page
const elements = await sentinel.observe();

// Filtered by natural language hint
const loginElements = await sentinel.observe('Find login-related elements');

// Returns ObserveResult[]
// [{ description: 'Login button', role: 'button', ... }, ...]

run(goal, options?) — Agent Loop

Runs a fully autonomous multi-step agent in a Plan → Execute → Verify → Reflect cycle until the goal is met, the step limit is reached, or an abort condition triggers.

agent.ts

const result = await sentinel.run(
  'Go to amazon.de, search for "mechanical keyboard under 100 euros", extract top 5',
  {
    maxSteps: 20,
    onStep: (event) => {
      console.log(`Step ${event.stepNumber} [${event.type}]: ${event.instruction}`);
      console.log(`  Reasoning: ${event.reasoning}`);
    },
  }
);

console.log(result.success);       // boolean
console.log(result.goalAchieved);  // boolean — final LLM reflection check
console.log(result.totalSteps);    // number of steps executed
console.log(result.message);       // human-readable summary
console.log(result.data);          // structured data extracted during the run
console.log(result.selectors);     // { searchField: '#twotabsearchtextbox', ... }
console.log(result.history);       // AgentStepEvent[] — full step-by-step log

AgentRunOptions

maxSteps — number (default: 15)
onStep — (event: AgentStepEvent) => void

Abort conditions

3 consecutive step failures
Same instruction repeated 3× without progress
maxSteps reached

Sentinel.parallel(tasks, options)

Runs multiple independent tasks in parallel. Each task gets its own browser session. A worker pool limits simultaneous sessions to concurrency. Errors in one task never affect others.

parallel.ts

const results = await Sentinel.parallel(
  [
    { url: 'https://amazon.de', goal: 'Find cheapest laptop' },
    { url: 'https://ebay.de',   goal: 'Find cheapest laptop' },
    { url: 'https://otto.de',   goal: 'Find cheapest laptop' },
  ],
  {
    apiKey: process.env.GEMINI_API_KEY,
    concurrency: 3,
    onProgress: (done, total, result) => {
      console.log(`${done}/${total}: ${result.url} — ${result.message}`);
    },
  }
);

// Results in input order regardless of completion order
// Error in one task never affects the others

Tab management

Open, switch, and close browser tabs programmatically. AOM-based state parsing requires Chromium (CDP). Firefox and WebKit fall back to DOM parsing.

tabs.ts

// Open a new tab
const tabIndex = await sentinel.newTab('https://google.com');

// Switch the active tab
await sentinel.switchTab(0);
await sentinel.switchTab(tabIndex);

// Close a tab
await sentinel.closeTab(tabIndex);

// Number of open tabs
console.log(sentinel.tabCount);

Session persistence

Save and restore authenticated sessions across runs — cookies and localStorage included. For apps that use IndexedDB (WhatsApp Web, PWAs), use userDataDir instead.

session.ts

// First run: log in, then save the session
await sentinel.goto('https://github.com/login');
await sentinel.act('Fill "myuser" into the username field');
await sentinel.act('Fill "mypassword" into the password field');
await sentinel.act('Click the sign in button');
await sentinel.saveSession('./sessions/github.json');

// Subsequent runs: session is restored automatically
const sentinel = new Sentinel({
  apiKey: process.env.GEMINI_API_KEY,
  sessionPath: './sessions/github.json', // loaded on init()
});
await sentinel.init();
await sentinel.goto('https://github.com'); // already authenticated

userDataDir persists the full browser profile including IndexedDB and ServiceWorkers:

persistent-profile.ts

// Persists the full browser profile — including IndexedDB.
// Required for services like WhatsApp Web, PWAs, and SPA-based apps.
const sentinel = new Sentinel({
  apiKey: process.env.GEMINI_API_KEY,
  userDataDir: './profiles/whatsapp',  // created automatically if missing
});
await sentinel.init();
// First run: complete login (scan QR code).
// All subsequent runs: session restored automatically — no re-auth needed.

Record & Replay

Capture any automation session as a replayable workflow. Export as TypeScript source or JSON for storage and version control.

record-replay.ts

// Start recording
sentinel.startRecording('checkout-flow');

await sentinel.goto('https://shop.example.com');
await sentinel.act('Click the first product');
await sentinel.act('Click Add to Cart');
await sentinel.act('Proceed to checkout');

// Stop and get the workflow
const workflow = sentinel.stopRecording();

// Export as TypeScript source code
const code = sentinel.exportWorkflowAsCode(workflow);
console.log(code); // ready-to-run TypeScript

// Export as JSON
const json = sentinel.exportWorkflowAsJSON(workflow);

// Replay the recorded workflow
await sentinel.replay(workflow);

Vision grounding

Vision-model fallback for canvas elements, shadow DOMs, and custom components that aren't exposed through the accessibility tree. Supported by all four built-in providers (Gemini, OpenAI, Claude, Ollama vision models).

vision.ts

const sentinel = new Sentinel({
  apiKey: process.env.GEMINI_API_KEY,
  visionFallback: true, // activates vision grounding
});

// Takes a PNG screenshot → Buffer
const png = await sentinel.screenshot();

// Natural language description of the current page
const description = await sentinel.describeScreen();
console.log(description);
// "The page shows an Amazon product listing with a laptop card..."

// Vision grounding also activates automatically inside act()
// when AOM cannot locate the target element — no extra code needed.

Self-healing & caching

Two independent caching layers dramatically reduce LLM usage on repeated runs. Enable both for production — they stack on top of Gemini's already 30× cheaper baseline.

caching.ts

const sentinel = new Sentinel({
  apiKey: process.env.GEMINI_API_KEY,

  // Self-healing locators — cache successful element → selector mappings
  locatorCache: './sentinel-locators.json', // file-persisted (or: true for in-memory)

  // Prompt cache — cache LLM responses by prompt hash
  promptCache: './sentinel-prompts.json',   // file-persisted (or: true for in-memory)
});

// Flush the prompt cache programmatically (e.g. between test runs)
sentinel.clearPromptCache();

// Custom cache backends (e.g. Redis for distributed test runs)
import type { ILocatorCache, CachedLocator } from '@isoldex/sentinel';

class RedisLocatorCache implements ILocatorCache {
  get(url: string, instruction: string): CachedLocator | undefined { /* ... */ }
  set(url: string, instruction: string, entry: CachedLocator): void { /* ... */ }
  invalidate(url: string, instruction: string): void { /* ... */ }
}

locatorCache

Caches successful element → selector mappings. On repeated calls, Playwright locator is tried first — LLM only called if it breaks. Supports custom backends via ILocatorCache.

promptCache

Caches LLM responses by a hash of prompt + schema. Identical (prompt, schema) pairs return instantly at zero token cost. URL and page title are part of the hash — cache misses automatically on DOM changes.

Stealth & proxy

Human-like delays, User-Agent rotation (automatic), and proxy support for bot-detection evasion and geo-restricted content.

stealth.ts

import { Sentinel, RoundRobinProxyProvider, WebshareProxyProvider } from '@isoldex/sentinel';

const sentinel = new Sentinel({
  apiKey: process.env.GEMINI_API_KEY,

  // Bézier mouse curves + per-action delays (80–200 ms) + human keystroke timing
  humanLike: true,

  // Static proxy
  proxy: { server: 'http://proxy.example.com:8080', username: 'u', password: 'p' },

  // — OR — round-robin through a list
  proxy: new RoundRobinProxyProvider([
    { server: 'http://p1:8080' },
    { server: 'http://p2:8080' },
  ]),

  // — OR — Webshare API with automatic rotation
  proxy: new WebshareProxyProvider({ apiKey: process.env.WEBSHARE_KEY! }),
});
// User-Agent rotation is automatic — no config needed.

Page extension — sentinel.extend(page)

Attaches act(), extract(), and observe() directly to any existing Playwright Page object. Drop-in for existing Playwright projects — no test restructuring needed.

extend.ts

import { chromium } from 'playwright';
import { Sentinel } from '@isoldex/sentinel';

const browser = await chromium.launch();
const page    = await browser.newPage();

const sentinel = new Sentinel({ apiKey: process.env.GEMINI_API_KEY });

// Attach Sentinel capabilities to any existing Playwright Page
await sentinel.extend(page);

// Now use act/extract/observe directly on the page object
await page.goto('https://example.com');
await page.act('Click the login button');

const data = await page.extract('Get the page title', z.object({
  title: z.string(),
}));

Selector export

After every run() call, result.selectors contains stable CSS selectors for every element that was acted on — ready to paste into Playwright tests. Priority: data-testid → #id → [name] → [placeholder] → [aria-label] → role:has-text.

selectors.ts

// After sentinel.run() — selectors for all act() steps
const result = await sentinel.run('Login with test@example.com');

console.log(result.selectors);
// {
//   clickLoginButton:  '[data-testid="login-btn"]',
//   fillEmailField:    '#email',
//   fillPasswordField: '[name="password"]',
// }

// Copy directly into Playwright tests — no DevTools digging
import { test, expect } from '@playwright/test';
test('login', async ({ page }) => {
  await page.click('[data-testid="login-btn"]');
});

// Single act() also exposes the selector
const r = await sentinel.act('Click the search field');
console.log(r.selector); // 'input[aria-label="Search"]'

Events & token tracking

Sentinel extends Node.js EventEmitter. Use events for logging, dashboards, or integration with external monitoring tools.

events.ts

// Event system (Sentinel extends EventEmitter)
sentinel.on('action', (event) => {
  console.log('Action:', event.instruction, event.result);
});

sentinel.on('navigate', (event) => {
  console.log('Navigated to:', event.url);
});

sentinel.on('close', () => {
  console.log('Browser closed');
});

// Direct Playwright access
const page    = sentinel.page;
const context = sentinel.context;

// Token tracking
const usage = sentinel.getTokenUsage();
console.log(usage);
// {
//   totalInputTokens: 9800,
//   totalOutputTokens: 2600,
//   totalTokens: 12400,
//   estimatedCostUsd: 0.00093,
//   entries: [...],
// }

// Export full log as JSON
sentinel.exportLogs('./logs/session.json');

OpenTelemetry

Every call emits traces and metrics automatically. Zero overhead when no OTel SDK is configured (no-op API). Drop into Datadog, Grafana, Jaeger, or any OTLP backend.

instrumentation.ts

import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { PrometheusExporter } from '@opentelemetry/exporter-prometheus';

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({ url: 'http://localhost:4318/v1/traces' }),
  metricReader: new PrometheusExporter({ port: 9464 }),
});
sdk.start(); // must be called BEFORE new Sentinel(...)

const sentinel = new Sentinel({ apiKey: process.env.GEMINI_API_KEY });
// All act() / extract() / run() calls now emit spans and metrics automatically

Emitted spans

sentinel.agent

└─ sentinel.agent.step

└─ sentinel.act / sentinel.extract / sentinel.observe

└─ sentinel.llm

Emitted metrics

sentinel.act.requests · sentinel.act.duration_ms
sentinel.llm.requests · sentinel.llm.tokens · sentinel.llm.duration_ms
sentinel.agent.steps

Playwright Test integration

Drop-in integration for existing Playwright Test suites. The ai fixture auto-initializes before each test and auto-closes after, regardless of outcome.

checkout.spec.ts

import { test, expect } from '@isoldex/sentinel/test';
import { z } from 'zod';

test('completes checkout flow', async ({ ai, page }) => {
  await ai.goto('https://shop.example.com');
  await ai.act('Click the first product');
  await ai.act('Click Add to Cart');
  await ai.act('Proceed to checkout');

  const order = await ai.extract<{ total: string; items: number }>(
    'Get the order total and item count',
    z.object({ total: z.string(), items: z.number() })
  );

  expect(order.items).toBeGreaterThan(0);
  console.log('Token cost:', ai.getTokenUsage().estimatedCostUsd);
});

Configure Sentinel options globally in playwright.config.ts:

playwright.config.ts

// playwright.config.ts
import { defineConfig } from '@playwright/test';

export default defineConfig({
  use: {
    sentinelOptions: {
      headless: false,
      verbose: 1,
      locatorCache: '.sentinel-cache.json',
    },
  },
});

CLI

Run browser automation without writing any code — paste a URL and a goal, get results. The API key is read from GEMINI_API_KEY in the environment or passed via --api-key.

terminal

# Run an autonomous agent
npx @isoldex/sentinel run "Search for mechanical keyboards" \
  --url https://amazon.de \
  --output result.json

# Perform a single action
npx @isoldex/sentinel act "Click the login button" \
  --url https://example.com

# Extract structured data
npx @isoldex/sentinel extract "Get all product names and prices" \
  --url https://shop.example.com \
  --schema '{"type":"object","properties":{"products":{"type":"array"}}}'

# Take a screenshot
npx @isoldex/sentinel screenshot \
  --url https://example.com \
  --output page.png

Command	Description	Key flags
run	Autonomous agent — achieves a natural language goal	--url, --output, --max-steps
act	Single natural language action on the page	--url, --headless
extract	Extract structured data from the page as JSON	--url, --schema, --output
screenshot	Take a PNG screenshot of the page	--url, --output

Flag	Default	Description
--url	required	URL to navigate to before running the command
--api-key	GEMINI_API_KEY env	Gemini API key
--model	gemini-3-flash	Gemini model (GEMINI_VERSION env also works)
--headless	false	Run browser headlessly (no visible window)
--output	stdout	Write JSON / PNG result to a file path
--max-steps	15	Maximum agent steps (run command only)
--schema	—	JSON Schema string for extract command
--verbose	1	Log verbosity 0–3

Error handling

All Sentinel errors extend SentinelError, which carries a code string and optional context. Most workflows prefer the non-throwing pattern via result.success.

errors.ts

import {
  SentinelError,
  ActionError,
  ExtractionError,
  NavigationError,
  AgentError,
  NotInitializedError,
} from '@isoldex/sentinel';

try {
  await sentinel.act('Click the submit button');
} catch (err) {
  if (err instanceof ActionError) {
    console.error(err.message, err.code, err.context);
    // code: "ACTION_FAILED"
  }
}

// Non-throwing alternative — check result.success
const result = await sentinel.act('Click checkout');
if (!result.success) {
  // result.message has the full diagnostic
  // result.attempts has each tried path
}

Class	Code	When thrown
ActionError	ACTION_FAILED	act() fails after all retries
ExtractionError	EXTRACTION_FAILED	extract() fails
NavigationError	NAVIGATION_FAILED	goto() fails
AgentError	AGENT_ERROR	run() exceeds maxSteps or gets stuck
NotInitializedError	NOT_INITIALIZED	any method called before init()

SentinelOptions

All options passed to new Sentinel(options).

Option	Type	Default	Description
apiKey	string	—	Gemini API key. Pass '' when using a custom provider.
headless	boolean	false	Run browser in headless mode.
browser	'chromium'\|'firefox'\|'webkit'	'chromium'	Browser engine. CDP/AOM requires Chromium.
viewport	{ width, height }	1280×720	Viewport dimensions.
verbose	0\|1\|2\|3	1	Log verbosity. 0=silent, 3=full debug with LLM JSON.
enableCaching	boolean	true	Cache AOM state between calls (500ms TTL).
visionFallback	boolean	false	Enable vision-model fallback in act().
provider	LLMProvider	Gemini	Custom LLM provider (OpenAI, Claude, Ollama…).
sessionPath	string	—	Path to session file. Loaded on init() if it exists.
userDataDir	string	—	Persistent browser profile (IndexedDB, ServiceWorkers).
proxy	ProxyOptions	—	Proxy server config ({ server, username, password }).
humanLike	boolean	false	Add random human-like delays between actions.
domSettleTimeoutMs	number	3000	Max ms to wait for DOM to settle after an action.
locatorCache	boolean\|string	false	Cache successful selectors. String = file path.
promptCache	boolean\|string	false	Cache LLM responses by prompt hash. String = file path.
maxElements	number	50	Max elements sent to LLM per act() call (chunk-processing).