Reference

API Reference

Complete reference for @isoldex/sentinel. For a guided introduction, see the Getting Started docs.

Sentinel class

The main class. Wraps a Playwright browser instance, manages the LLM provider, and exposes all automation methods.

new Sentinel(options)

init.ts

import { Sentinel } from '@isoldex/sentinel';

const sentinel = new Sentinel({
  apiKey: process.env.GEMINI_API_KEY,
  headless: true,
  browser: 'chromium',
  verbose: 1,
  enableCaching: true,
});

await sentinel.init();
// ...
await sentinel.close();

sentinel.init()

Launches the browser and creates a browser context. Must be called before any other method. Returns Promise<void>.

sentinel.goto(url, options?)

Navigates to a URL. Waits for networkidle by default. Accepts all Playwright GotoOptions as second argument.

sentinel.close()

Closes all pages, the browser context, and the browser. Always call in a finally block.

Static methods

Sentinel.parallel(tasks, options)

parallel.ts

const results = await Sentinel.parallel(
  [
    { url: 'https://site-a.com', goal: 'Extract product name and price' },
    { url: 'https://site-b.com', goal: 'Extract product name and price' },
  ],
  {
    apiKey: process.env.GEMINI_API_KEY,
    concurrency: 2,
    headless: true,
  }
);

// results: ParallelResult[]
results.forEach((r) => {
  if (r.success) console.log(r.data);
  else           console.error(r.error);
});

Sentinel.extend(page, options)

Adds act(), extract(), and observe() to an existing Playwright Page object. See the Page extension section.

act()

Executes a single natural-language action. Returns an ActionResult.

sentinel.act(instruction, options?)

act.ts

// Basic
const result = await sentinel.act('click the login button');

// With variables
const result = await sentinel.act('fill email with %email%', {
  variables: { email: 'user@example.com' },
  retries: 3,
});

console.log(result.success);   // true
console.log(result.action);    // 'click'
console.log(result.selector);  // '#login-btn'
console.log(result.attempts);  // 1

All action types

action-types.ts

// All supported action types:
await sentinel.act('click the submit button');
await sentinel.act('fill the email field with user@example.com');
await sentinel.act('append " (edited)" to the title field');
await sentinel.act('hover over the user avatar');
await sentinel.act('press Enter');
await sentinel.act('select "Germany" from the country dropdown');
await sentinel.act('double-click the image');
await sentinel.act('right-click the file row');
await sentinel.act('scroll down 300 pixels');
await sentinel.act('scroll up');
await sentinel.act('scroll to the footer');

extract()

Extracts structured data from the current page. Accepts a Zod schema or a JSON Schema object. Returns the typed extracted data directly (not wrapped in a result object).

sentinel.extract(instruction, schema)

extract.ts

import { z } from 'zod';

// With Zod schema (recommended — gives full TypeScript inference)
const product = await sentinel.extract(
  'Extract product name, price, and rating',
  z.object({
    name:   z.string(),
    price:  z.string(),
    rating: z.number(),
  })
);

// product is typed as { name: string; price: string; rating: number }
console.log(product.name);

// With plain JSON schema
const raw = await sentinel.extract(
  'Extract all review titles',
  { type: 'object', properties: { titles: { type: 'array', items: { type: 'string' } } } }
);

observe()

Returns a natural-language description of the interactive elements currently visible on the page. Useful for debugging, building dynamic agents, or deciding what action to take next.

sentinel.observe(instruction?)

observe.ts

const elements = await sentinel.observe();
// Returns a description of all interactive elements currently visible

const specific = await sentinel.observe('find all buttons in the checkout section');
console.log(specific);
// "checkout-btn: [button] 'Proceed to checkout' at #checkout-form > button.primary
//  back-btn: [button] 'Back to cart' at #checkout-form > button.secondary"

run() — Agent

Runs an autonomous agent loop. The agent plans, executes, verifies, and reflects until the goal is achieved or maxSteps is exceeded. Includes built-in loop detection.

sentinel.run(goal, options?)

agent.ts

const result = await sentinel.run(
  'Find the cheapest laptop under €500 and add it to the cart',
  {
    maxSteps: 25,
    onStep: (event) => {
      console.log(`[${event.step}/${event.totalSteps}] ${event.action.instruction}`);
      console.log(`  → ${event.action.action} on ${event.action.selector}`);
    },
  }
);

console.log(result.goalAchieved);  // true
console.log(result.totalSteps);    // 14
console.log(result.data);          // { name: '...', price: '€349' }
console.log(result.selectors);     // { addToCartBtn: '#add-to-cart', ... }
console.log(result.message);       // "Goal achieved: ..."

Session & Navigation

Session persistence (cookies + localStorage)

session.ts

// Save session (cookies + localStorage) for reuse
const sentinel = new Sentinel({
  apiKey: process.env.GEMINI_API_KEY,
  sessionPath: './sessions/github.json',
});

await sentinel.init();

// Check if already logged in
if (await sentinel.hasLoginForm()) {
  await sentinel.act('fill username with myuser');
  await sentinel.act('fill password with %pw%', { variables: { pw: process.env.GH_PASS! } });
  await sentinel.act('click sign in');
  await sentinel.saveSession();
}
// On next run, cookies are restored automatically

userDataDir — IndexedDB / Service Workers

userdata.ts

// Persist IndexedDB, WebSQL, Service Workers (e.g. WhatsApp Web, Notion)
const sentinel = new Sentinel({
  apiKey:      process.env.GEMINI_API_KEY,
  userDataDir: './profiles/whatsapp',
});

await sentinel.init();
await sentinel.goto('https://web.whatsapp.com');
// QR scan only needed on first run

Property	Type	Default	Description
sentinel.saveSession()	Promise<void>	—	Saves current cookies + localStorage to sessionPath.
sentinel.hasLoginForm()	Promise<boolean>	—	Returns true if a login/sign-in form is detected on the current page.

Tab management

sentinel.newTab / switchTab / closeTab / tabCount

tabs.ts

// Open a new tab
const tabId = await sentinel.newTab('https://example.com');

// Switch to it
await sentinel.switchTab(tabId);

// How many tabs are open?
const count = await sentinel.tabCount();

// Close current tab
await sentinel.closeTab();

Property	Type	Default	Description
newTab(url?)	Promise<string>	—	Opens a new browser tab, optionally navigating to url. Returns a tabId.
switchTab(tabId)	Promise<void>	—	Switches focus to the tab with the given tabId.
closeTab(tabId?)	Promise<void>	—	Closes the specified tab, or the current tab if no tabId is provided.
tabCount()	Promise<number>	—	Returns the number of currently open tabs.

Record & Replay

Record actions, export as TypeScript or JSON

record.ts

// Start recording
await sentinel.startRecording();

// Perform actions normally
await sentinel.goto('https://github.com');
await sentinel.act('click Sign in');
await sentinel.act('fill username');

// Stop and export as TypeScript
const ts   = await sentinel.stopRecording();
const code = await sentinel.exportWorkflowAsCode();
const json = await sentinel.exportWorkflowAsJSON();

// Replay later
await sentinel.replay(JSON.parse(json));

Property	Type	Default	Description
startRecording()	Promise<void>	—	Starts capturing all act() calls into a workflow.
stopRecording()	Promise<string>	—	Stops recording and returns the workflow as JSON string.
exportWorkflowAsCode()	Promise<string>	—	Returns the recorded workflow as executable TypeScript source code.
exportWorkflowAsJSON()	Promise<string>	—	Returns the recorded workflow as a JSON string for storage or replay.
replay(workflow)	Promise<void>	—	Replays a recorded workflow (parsed JSON object).

Vision

When visionFallback: true is set, Sentinel automatically falls back to screenshot-based grounding when the accessibility tree fails to locate an element (Canvas, complex Shadow DOM, non-standard widgets).

sentinel.screenshot() / describeScreen()

vision.ts

// Enable vision grounding (screenshot fallback for Canvas / Shadow DOM)
const sentinel = new Sentinel({
  apiKey: process.env.GEMINI_API_KEY,
  visionFallback: true,
});

// Take a screenshot (returns Buffer)
const buf = await sentinel.screenshot();

// Describe current screen state
const description = await sentinel.describeScreen();
console.log(description);
// "A checkout page with a total of €349 visible in the top right..."

Property	Type	Default	Description
screenshot()	Promise<Buffer>	—	Takes a full-page screenshot and returns the PNG buffer.
describeScreen()	Promise<string>	—	Returns an LLM-generated natural-language description of the current viewport.

Page extension

Sentinel.extend(page, options) enriches an existing Playwright Page object with AI methods without creating a new browser instance. Ideal for incremental adoption in existing Playwright test suites.

Sentinel.extend(page, options)

extend.ts

import { chromium } from 'playwright';
import { Sentinel } from '@isoldex/sentinel';

const browser = await chromium.launch();
const page    = await browser.newPage();

// Extend an existing page object
const ai = await Sentinel.extend(page, { apiKey: process.env.GEMINI_API_KEY });

await page.goto('https://example.com');

// Now use act/extract/observe alongside existing playwright APIs
await ai.act('click the login button');
const data = await ai.extract('get the page title', { title: 'string' });

// Standard Playwright still works
await expect(page.locator('h1')).toBeVisible();

Events & token tracking

sentinel.on() / getTokenUsage() / exportLogs()

events.ts

// Listen to events
sentinel.on('action',   (e) => console.log('action:', e));
sentinel.on('navigate', (e) => console.log('navigated to:', e.url));
sentinel.on('close',    ()  => console.log('browser closed'));

// Token usage
const usage = sentinel.getTokenUsage();
console.log(usage);
// { promptTokens: 12400, completionTokens: 840, totalTokens: 13240, estimatedCost: 0.0018 }

// Export all logs
const logs = sentinel.exportLogs();

Property	Type	Default	Description
on('action', cb)	void	—	Emitted after every act() call. cb receives ActionResult.
on('navigate', cb)	void	—	Emitted on every page navigation. cb receives { url }.
on('close', cb)	void	—	Emitted when the browser is closed.
getTokenUsage()	TokenUsage	—	Returns cumulative prompt/completion/total tokens and estimated cost.
exportLogs()	LogEntry[]	—	Returns all recorded log entries for the current session.
sentinel.page	Page	—	Direct access to the underlying Playwright Page object.
sentinel.context	BrowserContext	—	Direct access to the underlying Playwright BrowserContext.

SentinelOptions

Property	Type	Default	Description
apiKey	string	—	API key for the selected LLM provider. Required for Gemini and OpenAI.
headless	boolean	true	Run browser in headless mode.
browser	'chromium' \| 'firefox' \| 'webkit'	'chromium'	Playwright browser engine.
viewport	{ width, height }	1280×720	Browser viewport dimensions.
verbose	0 \| 1 \| 2 \| 3	0	Log level. 3 = full LLM reasoning JSON.
enableCaching	boolean	false	Enable locator + prompt cache to cut repeated costs to near-zero.
visionFallback	boolean	false	Enable screenshot-based grounding as fallback for inaccessible elements.
provider	LLMProvider	—	Custom LLM provider instance. Overrides apiKey + default Gemini.
sessionPath	string	—	File path to save/restore cookies and localStorage.
userDataDir	string	—	Persistent browser profile directory (preserves IndexedDB, Service Workers).
proxy	ProxyOptions	—	Proxy configuration. See ProxyOptions.
humanLike	boolean	false	Add random delays between actions to simulate human behaviour.
domSettleTimeoutMs	number	500	Milliseconds to wait for DOM mutations to settle after each action.
locatorCache	ILocatorCache	—	Custom locator cache implementation (default: in-memory Map).
promptCache	IPromptCache	—	Custom prompt cache implementation (default: in-memory Map).
maxElements	number	150	Max interactive elements included in each LLM context snapshot.

ActOptions

Property	Type	Default	Description
variables	Record<string, string>	—	Variable interpolation map. Keys are used as %key% in instruction strings.
retries	number	3	Number of action attempts before throwing ActionError.

ActionResult

Property	Type	Default	Description
success	boolean	—	Whether the action completed successfully.
action	string	—	The action type that was executed (e.g. 'click', 'fill').
instruction	string	—	The original instruction string.
selector	string \| null	—	The CSS selector used for the action, if applicable.
message	string	—	Human-readable outcome message.
attempts	number	—	Number of attempts made before success or final failure.

AgentRunOptions

Property	Type	Default	Description
maxSteps	number	20	Maximum number of agent steps before aborting.
onStep	(event: AgentStepEvent) => void	—	Callback invoked after each step with real-time progress.

AgentStepEvent

Property	Type	Default	Description
step	number	—	Current step index (1-based).
totalSteps	number	—	Maximum steps configured.
action	ActionResult	—	Result of the action taken in this step.
reasoning	string	—	LLM reasoning for choosing this action.

AgentResult

Property	Type	Default	Description
success	boolean	—	True if the agent completed without throwing.
goalAchieved	boolean	—	True if the LLM confirmed the goal was met.
totalSteps	number	—	Number of steps executed.
message	string	—	Summary message from the agent.
history	ActionResult[]	—	Full list of actions taken.
data	Record<string, unknown>	—	Structured data extracted during the run.
selectors	Record<string, string>	—	camelCase key → CSS selector map for elements interacted with.

ParallelTask / ParallelResult / ParallelOptions

ParallelTask

Property	Type	Default	Description
url	string	—	URL to navigate to before running the goal.
goal	string	—	Natural-language goal for this browser session.
options	AgentRunOptions	—	Per-task agent options (maxSteps, onStep).

ParallelResult

Property	Type	Default	Description
success	boolean	—	Whether this task completed without error.
data	Record<string, unknown>	—	Extracted data if the task succeeded.
error	string \| null	—	Error message if the task failed.
agentResult	AgentResult \| null	—	Full agent result for detailed inspection.

ParallelOptions

Property	Type	Default	Description
apiKey	string	—	LLM API key shared across all parallel sessions.
concurrency	number	3	Maximum number of concurrent browser sessions.
headless	boolean	true	Headless mode for all sessions.
...SentinelOptions		—	All other SentinelOptions are forwarded to each session.

ProxyOptions

Property	Type	Default	Description
server	string	—	Proxy server URL, e.g. 'http://proxy.example.com:8080'.
username	string	—	Proxy authentication username.
password	string	—	Proxy authentication password.
bypass	string	—	Comma-separated list of hosts to bypass the proxy for.

LLMProvider interface

Implement this interface to use any LLM with Sentinel. Pass the instance as provider in SentinelOptions.

llm-provider.ts

interface LLMProvider {
  complete(prompt: string, options?: CompletionOptions): Promise<string>;
  completeJSON<T>(prompt: string, schema: ZodSchema<T>): Promise<T>;
}

See the LLM Providers page for built-in implementations (GeminiProvider, OpenAIProvider, AnthropicProvider, OllamaProvider).

Error classes

All errors extend SentinelError which extends the native Error.

Error hierarchy

errors.ts

import {
  SentinelError,
  ActionError,
  ExtractionError,
  NavigationError,
  AgentError,
  NotInitializedError,
} from '@isoldex/sentinel';

try {
  await sentinel.act('click the non-existent button');
} catch (err) {
  if (err instanceof ActionError) {
    console.error('Action failed after', err.attempts, 'attempts');
    console.error('Instruction:', err.instruction);
    console.error('Selector tried:', err.selector);
  }
}

Property	Type	Default	Description
SentinelError	class	—	Base class. All Sentinel errors extend this.
ActionError	class	—	Thrown when act() fails after all retries. Has .instruction, .selector, .attempts.
ExtractionError	class	—	Thrown when extract() cannot parse the LLM response into the requested schema.
NavigationError	class	—	Thrown on navigation failure (timeout, network error, invalid URL).
AgentError	class	—	Thrown when run() exceeds maxSteps without achieving the goal.
NotInitializedError	class	—	Thrown when any method is called before sentinel.init().