Documentation
Complete reference for @isoldex/sentinel — up and running in 5 minutes.
Installation
npm install @isoldex/sentinel playwright
npx playwright install chromiumCreate a .env file with your API key:
GEMINI_API_KEY=your_key_here
GEMINI_VERSION=gemini-3-flash-preview # optionalGet a free key at aistudio.google.com. The free tier covers thousands of runs. For OpenAI, Claude, or Ollama see Providers.
Quickstart
act() performs natural-language actions. extract() returns typed structured data.
import { Sentinel, z } from '@isoldex/sentinel';
const sentinel = new Sentinel({ apiKey: process.env.GEMINI_API_KEY });
await sentinel.init();
await sentinel.goto('https://news.ycombinator.com');
// Extract structured data
const data = await sentinel.extract('Get the top 3 stories', z.object({
stories: z.array(z.object({
title: z.string(),
points: z.number(),
}))
}));
// Natural language actions
await sentinel.act('Click on the "new" link in the header');
await sentinel.act('Fill "hello@example.com" into the email field');
await sentinel.close();act(instruction, options?)
Performs a natural language action on the current page. Sentinel automatically verifies the action and retries on weak confidence.
// Basic click / fill / hover
await sentinel.act('Click the login button');
await sentinel.act('Fill "user@example.com" into the email field');
await sentinel.act('Hover over the profile menu');
// All supported action types
await sentinel.act('Select "Germany" from the country dropdown');
await sentinel.act('Press Enter');
await sentinel.act('Double-click the product image');
await sentinel.act('Right-click the file');
await sentinel.act('Scroll down');
await sentinel.act('Scroll up');
await sentinel.act('Scroll to the footer');
await sentinel.act('Append " (urgent)" to the subject line');
// Variable interpolation
await sentinel.act('Fill %email% into the email field', {
variables: { email: 'user@example.com' },
});
// Custom retry count
await sentinel.act('Click the submit button', { retries: 5 });ActOptions
- variables — Record<string, string>
- retries — number (default: 2)
Supported actions
click · fill · append · hover · press · select · double-click · right-click · scroll-down · scroll-up · scroll-to
ActionResult — returned by every act() call, never throws:
const result = await sentinel.act('Click the checkout button');
console.log(result.success); // boolean
console.log(result.message); // "Clicked Checkout button"
console.log(result.action); // "click"
console.log(result.selector); // '[data-testid="checkout-btn"]'
// On failure — full diagnostic
if (!result.success) {
console.log(result.message);
// Action failed: "Click checkout button" on "Checkout"
// 3 paths tried:
// • coordinate-click: Element outside viewport at (640, 950)
// • vision-grounding: Element not found in screenshot
// • locator-fallback: strict mode violation: 3 elements matched
// Tip: element may be off-screen. Try: sentinel.act('scroll to "Checkout"')
console.log(result.attempts);
// [{ path: 'coordinate-click', error: '...' }, ...]
}extract<T>(instruction, schema)
Extracts structured data from the current page. Accepts a Zod schema or raw JSON Schema. TypeScript generics are inferred automatically.
import { Sentinel, z } from '@isoldex/sentinel';
// Zod schema — TypeScript type is inferred automatically
const result = await sentinel.extract(
'Get all product names and prices',
z.object({
products: z.array(z.object({
name: z.string(),
price: z.number(),
}))
})
);
// result.products is typed as { name: string; price: number }[]
// Raw JSON Schema also works
const result2 = await sentinel.extract('Get the page title', {
type: 'object',
properties: { title: { type: 'string' } },
});observe(instruction?)
Returns interactive elements visible on the page, optionally filtered by a natural language hint. Useful for debugging or building dynamic workflows.
// All interactive elements on the page
const elements = await sentinel.observe();
// Filtered by natural language hint
const loginElements = await sentinel.observe('Find login-related elements');
// Returns ObserveResult[]
// [{ description: 'Login button', role: 'button', ... }, ...]run(goal, options?) — Agent Loop
Runs a fully autonomous multi-step agent in a Plan → Execute → Verify → Reflect cycle until the goal is met, the step limit is reached, or an abort condition triggers.
const result = await sentinel.run(
'Go to amazon.de, search for "mechanical keyboard under 100 euros", extract top 5',
{
maxSteps: 20,
onStep: (event) => {
console.log(`Step ${event.stepNumber} [${event.type}]: ${event.instruction}`);
console.log(` Reasoning: ${event.reasoning}`);
},
}
);
console.log(result.success); // boolean
console.log(result.goalAchieved); // boolean — final LLM reflection check
console.log(result.totalSteps); // number of steps executed
console.log(result.message); // human-readable summary
console.log(result.data); // structured data extracted during the run
console.log(result.selectors); // { searchField: '#twotabsearchtextbox', ... }
console.log(result.history); // AgentStepEvent[] — full step-by-step logAgentRunOptions
- maxSteps — number (default: 15)
- onStep — (event: AgentStepEvent) => void
Abort conditions
- 3 consecutive step failures
- Same instruction repeated 3× without progress
- maxSteps reached
Sentinel.parallel(tasks, options)
Runs multiple independent tasks in parallel. Each task gets its own browser session. A worker pool limits simultaneous sessions to concurrency. Errors in one task never affect others.
const results = await Sentinel.parallel(
[
{ url: 'https://amazon.de', goal: 'Find cheapest laptop' },
{ url: 'https://ebay.de', goal: 'Find cheapest laptop' },
{ url: 'https://otto.de', goal: 'Find cheapest laptop' },
],
{
apiKey: process.env.GEMINI_API_KEY,
concurrency: 3,
onProgress: (done, total, result) => {
console.log(`${done}/${total}: ${result.url} — ${result.message}`);
},
}
);
// Results in input order regardless of completion order
// Error in one task never affects the othersTab management
Open, switch, and close browser tabs programmatically. AOM-based state parsing requires Chromium (CDP). Firefox and WebKit fall back to DOM parsing.
// Open a new tab
const tabIndex = await sentinel.newTab('https://google.com');
// Switch the active tab
await sentinel.switchTab(0);
await sentinel.switchTab(tabIndex);
// Close a tab
await sentinel.closeTab(tabIndex);
// Number of open tabs
console.log(sentinel.tabCount);Session persistence
Save and restore authenticated sessions across runs — cookies and localStorage included. For apps that use IndexedDB (WhatsApp Web, PWAs), use userDataDir instead.
// First run: log in, then save the session
await sentinel.goto('https://github.com/login');
await sentinel.act('Fill "myuser" into the username field');
await sentinel.act('Fill "mypassword" into the password field');
await sentinel.act('Click the sign in button');
await sentinel.saveSession('./sessions/github.json');
// Subsequent runs: session is restored automatically
const sentinel = new Sentinel({
apiKey: process.env.GEMINI_API_KEY,
sessionPath: './sessions/github.json', // loaded on init()
});
await sentinel.init();
await sentinel.goto('https://github.com'); // already authenticateduserDataDir persists the full browser profile including IndexedDB and ServiceWorkers:
// Persists the full browser profile — including IndexedDB.
// Required for services like WhatsApp Web, PWAs, and SPA-based apps.
const sentinel = new Sentinel({
apiKey: process.env.GEMINI_API_KEY,
userDataDir: './profiles/whatsapp', // created automatically if missing
});
await sentinel.init();
// First run: complete login (scan QR code).
// All subsequent runs: session restored automatically — no re-auth needed.Record & Replay
Capture any automation session as a replayable workflow. Export as TypeScript source or JSON for storage and version control.
// Start recording
sentinel.startRecording('checkout-flow');
await sentinel.goto('https://shop.example.com');
await sentinel.act('Click the first product');
await sentinel.act('Click Add to Cart');
await sentinel.act('Proceed to checkout');
// Stop and get the workflow
const workflow = sentinel.stopRecording();
// Export as TypeScript source code
const code = sentinel.exportWorkflowAsCode(workflow);
console.log(code); // ready-to-run TypeScript
// Export as JSON
const json = sentinel.exportWorkflowAsJSON(workflow);
// Replay the recorded workflow
await sentinel.replay(workflow);Vision grounding
Vision-model fallback for canvas elements, shadow DOMs, and custom components that aren't exposed through the accessibility tree. Supported by all four built-in providers (Gemini, OpenAI, Claude, Ollama vision models).
const sentinel = new Sentinel({
apiKey: process.env.GEMINI_API_KEY,
visionFallback: true, // activates vision grounding
});
// Takes a PNG screenshot → Buffer
const png = await sentinel.screenshot();
// Natural language description of the current page
const description = await sentinel.describeScreen();
console.log(description);
// "The page shows an Amazon product listing with a laptop card..."
// Vision grounding also activates automatically inside act()
// when AOM cannot locate the target element — no extra code needed.Self-healing & caching
Two independent caching layers dramatically reduce LLM usage on repeated runs. Enable both for production — they stack on top of Gemini's already 30× cheaper baseline.
const sentinel = new Sentinel({
apiKey: process.env.GEMINI_API_KEY,
// Self-healing locators — cache successful element → selector mappings
locatorCache: './sentinel-locators.json', // file-persisted (or: true for in-memory)
// Prompt cache — cache LLM responses by prompt hash
promptCache: './sentinel-prompts.json', // file-persisted (or: true for in-memory)
});
// Flush the prompt cache programmatically (e.g. between test runs)
sentinel.clearPromptCache();
// Custom cache backends (e.g. Redis for distributed test runs)
import type { ILocatorCache, CachedLocator } from '@isoldex/sentinel';
class RedisLocatorCache implements ILocatorCache {
get(url: string, instruction: string): CachedLocator | undefined { /* ... */ }
set(url: string, instruction: string, entry: CachedLocator): void { /* ... */ }
invalidate(url: string, instruction: string): void { /* ... */ }
}Caches successful element → selector mappings. On repeated calls, Playwright locator is tried first — LLM only called if it breaks. Supports custom backends via ILocatorCache.
Caches LLM responses by a hash of prompt + schema. Identical (prompt, schema) pairs return instantly at zero token cost. URL and page title are part of the hash — cache misses automatically on DOM changes.
Stealth & proxy
Human-like delays, User-Agent rotation (automatic), and proxy support for bot-detection evasion and geo-restricted content.
import { Sentinel, RoundRobinProxyProvider, WebshareProxyProvider } from '@isoldex/sentinel';
const sentinel = new Sentinel({
apiKey: process.env.GEMINI_API_KEY,
// Bézier mouse curves + per-action delays (80–200 ms) + human keystroke timing
humanLike: true,
// Static proxy
proxy: { server: 'http://proxy.example.com:8080', username: 'u', password: 'p' },
// — OR — round-robin through a list
proxy: new RoundRobinProxyProvider([
{ server: 'http://p1:8080' },
{ server: 'http://p2:8080' },
]),
// — OR — Webshare API with automatic rotation
proxy: new WebshareProxyProvider({ apiKey: process.env.WEBSHARE_KEY! }),
});
// User-Agent rotation is automatic — no config needed.Page extension — sentinel.extend(page)
Attaches act(), extract(), and observe() directly to any existing Playwright Page object. Drop-in for existing Playwright projects — no test restructuring needed.
import { chromium } from 'playwright';
import { Sentinel } from '@isoldex/sentinel';
const browser = await chromium.launch();
const page = await browser.newPage();
const sentinel = new Sentinel({ apiKey: process.env.GEMINI_API_KEY });
// Attach Sentinel capabilities to any existing Playwright Page
await sentinel.extend(page);
// Now use act/extract/observe directly on the page object
await page.goto('https://example.com');
await page.act('Click the login button');
const data = await page.extract('Get the page title', z.object({
title: z.string(),
}));Selector export
After every run() call, result.selectors contains stable CSS selectors for every element that was acted on — ready to paste into Playwright tests. Priority: data-testid → #id → [name] → [placeholder] → [aria-label] → role:has-text.
// After sentinel.run() — selectors for all act() steps
const result = await sentinel.run('Login with test@example.com');
console.log(result.selectors);
// {
// clickLoginButton: '[data-testid="login-btn"]',
// fillEmailField: '#email',
// fillPasswordField: '[name="password"]',
// }
// Copy directly into Playwright tests — no DevTools digging
import { test, expect } from '@playwright/test';
test('login', async ({ page }) => {
await page.click('[data-testid="login-btn"]');
});
// Single act() also exposes the selector
const r = await sentinel.act('Click the search field');
console.log(r.selector); // 'input[aria-label="Search"]'Events & token tracking
Sentinel extends Node.js EventEmitter. Use events for logging, dashboards, or integration with external monitoring tools.
// Event system (Sentinel extends EventEmitter)
sentinel.on('action', (event) => {
console.log('Action:', event.instruction, event.result);
});
sentinel.on('navigate', (event) => {
console.log('Navigated to:', event.url);
});
sentinel.on('close', () => {
console.log('Browser closed');
});
// Direct Playwright access
const page = sentinel.page;
const context = sentinel.context;
// Token tracking
const usage = sentinel.getTokenUsage();
console.log(usage);
// {
// totalInputTokens: 9800,
// totalOutputTokens: 2600,
// totalTokens: 12400,
// estimatedCostUsd: 0.00093,
// entries: [...],
// }
// Export full log as JSON
sentinel.exportLogs('./logs/session.json');OpenTelemetry
Every call emits traces and metrics automatically. Zero overhead when no OTel SDK is configured (no-op API). Drop into Datadog, Grafana, Jaeger, or any OTLP backend.
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { PrometheusExporter } from '@opentelemetry/exporter-prometheus';
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({ url: 'http://localhost:4318/v1/traces' }),
metricReader: new PrometheusExporter({ port: 9464 }),
});
sdk.start(); // must be called BEFORE new Sentinel(...)
const sentinel = new Sentinel({ apiKey: process.env.GEMINI_API_KEY });
// All act() / extract() / run() calls now emit spans and metrics automaticallyEmitted spans
sentinel.agent
└─ sentinel.agent.step
└─ sentinel.act / sentinel.extract / sentinel.observe
└─ sentinel.llm
Emitted metrics
- sentinel.act.requests · sentinel.act.duration_ms
- sentinel.llm.requests · sentinel.llm.tokens · sentinel.llm.duration_ms
- sentinel.agent.steps
Playwright Test integration
Drop-in integration for existing Playwright Test suites. The ai fixture auto-initializes before each test and auto-closes after, regardless of outcome.
import { test, expect } from '@isoldex/sentinel/test';
import { z } from 'zod';
test('completes checkout flow', async ({ ai, page }) => {
await ai.goto('https://shop.example.com');
await ai.act('Click the first product');
await ai.act('Click Add to Cart');
await ai.act('Proceed to checkout');
const order = await ai.extract<{ total: string; items: number }>(
'Get the order total and item count',
z.object({ total: z.string(), items: z.number() })
);
expect(order.items).toBeGreaterThan(0);
console.log('Token cost:', ai.getTokenUsage().estimatedCostUsd);
});Configure Sentinel options globally in playwright.config.ts:
// playwright.config.ts
import { defineConfig } from '@playwright/test';
export default defineConfig({
use: {
sentinelOptions: {
headless: false,
verbose: 1,
locatorCache: '.sentinel-cache.json',
},
},
});CLI
Run browser automation without writing any code — paste a URL and a goal, get results. The API key is read from GEMINI_API_KEY in the environment or passed via --api-key.
# Run an autonomous agent
npx @isoldex/sentinel run "Search for mechanical keyboards" \
--url https://amazon.de \
--output result.json
# Perform a single action
npx @isoldex/sentinel act "Click the login button" \
--url https://example.com
# Extract structured data
npx @isoldex/sentinel extract "Get all product names and prices" \
--url https://shop.example.com \
--schema '{"type":"object","properties":{"products":{"type":"array"}}}'
# Take a screenshot
npx @isoldex/sentinel screenshot \
--url https://example.com \
--output page.png| Command | Description | Key flags |
|---|---|---|
| run | Autonomous agent — achieves a natural language goal | --url, --output, --max-steps |
| act | Single natural language action on the page | --url, --headless |
| extract | Extract structured data from the page as JSON | --url, --schema, --output |
| screenshot | Take a PNG screenshot of the page | --url, --output |
| Flag | Default | Description |
|---|---|---|
| --url | required | URL to navigate to before running the command |
| --api-key | GEMINI_API_KEY env | Gemini API key |
| --model | gemini-3-flash | Gemini model (GEMINI_VERSION env also works) |
| --headless | false | Run browser headlessly (no visible window) |
| --output | stdout | Write JSON / PNG result to a file path |
| --max-steps | 15 | Maximum agent steps (run command only) |
| --schema | — | JSON Schema string for extract command |
| --verbose | 1 | Log verbosity 0–3 |
Error handling
All Sentinel errors extend SentinelError, which carries a code string and optional context. Most workflows prefer the non-throwing pattern via result.success.
import {
SentinelError,
ActionError,
ExtractionError,
NavigationError,
AgentError,
NotInitializedError,
} from '@isoldex/sentinel';
try {
await sentinel.act('Click the submit button');
} catch (err) {
if (err instanceof ActionError) {
console.error(err.message, err.code, err.context);
// code: "ACTION_FAILED"
}
}
// Non-throwing alternative — check result.success
const result = await sentinel.act('Click checkout');
if (!result.success) {
// result.message has the full diagnostic
// result.attempts has each tried path
}| Class | Code | When thrown |
|---|---|---|
| ActionError | ACTION_FAILED | act() fails after all retries |
| ExtractionError | EXTRACTION_FAILED | extract() fails |
| NavigationError | NAVIGATION_FAILED | goto() fails |
| AgentError | AGENT_ERROR | run() exceeds maxSteps or gets stuck |
| NotInitializedError | NOT_INITIALIZED | any method called before init() |
SentinelOptions
All options passed to new Sentinel(options).
| Option | Type | Default | Description |
|---|---|---|---|
| apiKey | string | — | Gemini API key. Pass '' when using a custom provider. |
| headless | boolean | false | Run browser in headless mode. |
| browser | 'chromium'|'firefox'|'webkit' | 'chromium' | Browser engine. CDP/AOM requires Chromium. |
| viewport | { width, height } | 1280×720 | Viewport dimensions. |
| verbose | 0|1|2|3 | 1 | Log verbosity. 0=silent, 3=full debug with LLM JSON. |
| enableCaching | boolean | true | Cache AOM state between calls (500ms TTL). |
| visionFallback | boolean | false | Enable vision-model fallback in act(). |
| provider | LLMProvider | Gemini | Custom LLM provider (OpenAI, Claude, Ollama…). |
| sessionPath | string | — | Path to session file. Loaded on init() if it exists. |
| userDataDir | string | — | Persistent browser profile (IndexedDB, ServiceWorkers). |
| proxy | ProxyOptions | — | Proxy server config ({ server, username, password }). |
| humanLike | boolean | false | Add random human-like delays between actions. |
| domSettleTimeoutMs | number | 3000 | Max ms to wait for DOM to settle after an action. |
| locatorCache | boolean|string | false | Cache successful selectors. String = file path. |
| promptCache | boolean|string | false | Cache LLM responses by prompt hash. String = file path. |
| maxElements | number | 50 | Max elements sent to LLM per act() call (chunk-processing). |