Skip to main content
isoldex

Reference

API Reference

Complete reference for @isoldex/sentinel. For a guided introduction, see the Getting Started docs.

Sentinel class

The main class. Wraps a Playwright browser instance, manages the LLM provider, and exposes all automation methods.

new Sentinel(options)

init.ts
import { Sentinel } from '@isoldex/sentinel';

const sentinel = new Sentinel({
  apiKey: process.env.GEMINI_API_KEY,
  headless: true,
  browser: 'chromium',
  verbose: 1,
  enableCaching: true,
});

await sentinel.init();
// ...
await sentinel.close();

sentinel.init()

Launches the browser and creates a browser context. Must be called before any other method. Returns Promise<void>.

sentinel.goto(url, options?)

Navigates to a URL. Waits for networkidle by default. Accepts all Playwright GotoOptions as second argument.

sentinel.close()

Closes all pages, the browser context, and the browser. Always call in a finally block.

Static methods

Sentinel.parallel(tasks, options)

parallel.ts
const results = await Sentinel.parallel(
  [
    { url: 'https://site-a.com', goal: 'Extract product name and price' },
    { url: 'https://site-b.com', goal: 'Extract product name and price' },
  ],
  {
    apiKey: process.env.GEMINI_API_KEY,
    concurrency: 2,
    headless: true,
  }
);

// results: ParallelResult[]
results.forEach((r) => {
  if (r.success) console.log(r.data);
  else           console.error(r.error);
});

Sentinel.extend(page, options)

Adds act(), extract(), and observe() to an existing Playwright Page object. See the Page extension section.

act()

Executes a single natural-language action. Returns an ActionResult.

sentinel.act(instruction, options?)

act.ts
// Basic
const result = await sentinel.act('click the login button');

// With variables
const result = await sentinel.act('fill email with %email%', {
  variables: { email: 'user@example.com' },
  retries: 3,
});

console.log(result.success);   // true
console.log(result.action);    // 'click'
console.log(result.selector);  // '#login-btn'
console.log(result.attempts);  // 1

All action types

action-types.ts
// All supported action types:
await sentinel.act('click the submit button');
await sentinel.act('fill the email field with user@example.com');
await sentinel.act('append " (edited)" to the title field');
await sentinel.act('hover over the user avatar');
await sentinel.act('press Enter');
await sentinel.act('select "Germany" from the country dropdown');
await sentinel.act('double-click the image');
await sentinel.act('right-click the file row');
await sentinel.act('scroll down 300 pixels');
await sentinel.act('scroll up');
await sentinel.act('scroll to the footer');

extract()

Extracts structured data from the current page. Accepts a Zod schema or a JSON Schema object. Returns the typed extracted data directly (not wrapped in a result object).

sentinel.extract(instruction, schema)

extract.ts
import { z } from 'zod';

// With Zod schema (recommended — gives full TypeScript inference)
const product = await sentinel.extract(
  'Extract product name, price, and rating',
  z.object({
    name:   z.string(),
    price:  z.string(),
    rating: z.number(),
  })
);

// product is typed as { name: string; price: string; rating: number }
console.log(product.name);

// With plain JSON schema
const raw = await sentinel.extract(
  'Extract all review titles',
  { type: 'object', properties: { titles: { type: 'array', items: { type: 'string' } } } }
);

observe()

Returns a natural-language description of the interactive elements currently visible on the page. Useful for debugging, building dynamic agents, or deciding what action to take next.

sentinel.observe(instruction?)

observe.ts
const elements = await sentinel.observe();
// Returns a description of all interactive elements currently visible

const specific = await sentinel.observe('find all buttons in the checkout section');
console.log(specific);
// "checkout-btn: [button] 'Proceed to checkout' at #checkout-form > button.primary
//  back-btn: [button] 'Back to cart' at #checkout-form > button.secondary"

run() — Agent

Runs an autonomous agent loop. The agent plans, executes, verifies, and reflects until the goal is achieved or maxSteps is exceeded. Includes built-in loop detection.

sentinel.run(goal, options?)

agent.ts
const result = await sentinel.run(
  'Find the cheapest laptop under €500 and add it to the cart',
  {
    maxSteps: 25,
    onStep: (event) => {
      console.log(`[${event.step}/${event.totalSteps}] ${event.action.instruction}`);
      console.log(`  → ${event.action.action} on ${event.action.selector}`);
    },
  }
);

console.log(result.goalAchieved);  // true
console.log(result.totalSteps);    // 14
console.log(result.data);          // { name: '...', price: '€349' }
console.log(result.selectors);     // { addToCartBtn: '#add-to-cart', ... }
console.log(result.message);       // "Goal achieved: ..."

Session & Navigation

Session persistence (cookies + localStorage)

session.ts
// Save session (cookies + localStorage) for reuse
const sentinel = new Sentinel({
  apiKey: process.env.GEMINI_API_KEY,
  sessionPath: './sessions/github.json',
});

await sentinel.init();

// Check if already logged in
if (await sentinel.hasLoginForm()) {
  await sentinel.act('fill username with myuser');
  await sentinel.act('fill password with %pw%', { variables: { pw: process.env.GH_PASS! } });
  await sentinel.act('click sign in');
  await sentinel.saveSession();
}
// On next run, cookies are restored automatically

userDataDir — IndexedDB / Service Workers

userdata.ts
// Persist IndexedDB, WebSQL, Service Workers (e.g. WhatsApp Web, Notion)
const sentinel = new Sentinel({
  apiKey:      process.env.GEMINI_API_KEY,
  userDataDir: './profiles/whatsapp',
});

await sentinel.init();
await sentinel.goto('https://web.whatsapp.com');
// QR scan only needed on first run
PropertyTypeDefaultDescription
sentinel.saveSession()Promise<void>Saves current cookies + localStorage to sessionPath.
sentinel.hasLoginForm()Promise<boolean>Returns true if a login/sign-in form is detected on the current page.

Tab management

sentinel.newTab / switchTab / closeTab / tabCount

tabs.ts
// Open a new tab
const tabId = await sentinel.newTab('https://example.com');

// Switch to it
await sentinel.switchTab(tabId);

// How many tabs are open?
const count = await sentinel.tabCount();

// Close current tab
await sentinel.closeTab();
PropertyTypeDefaultDescription
newTab(url?)Promise<string>Opens a new browser tab, optionally navigating to url. Returns a tabId.
switchTab(tabId)Promise<void>Switches focus to the tab with the given tabId.
closeTab(tabId?)Promise<void>Closes the specified tab, or the current tab if no tabId is provided.
tabCount()Promise<number>Returns the number of currently open tabs.

Record & Replay

Record actions, export as TypeScript or JSON

record.ts
// Start recording
await sentinel.startRecording();

// Perform actions normally
await sentinel.goto('https://github.com');
await sentinel.act('click Sign in');
await sentinel.act('fill username');

// Stop and export as TypeScript
const ts   = await sentinel.stopRecording();
const code = await sentinel.exportWorkflowAsCode();
const json = await sentinel.exportWorkflowAsJSON();

// Replay later
await sentinel.replay(JSON.parse(json));
PropertyTypeDefaultDescription
startRecording()Promise<void>Starts capturing all act() calls into a workflow.
stopRecording()Promise<string>Stops recording and returns the workflow as JSON string.
exportWorkflowAsCode()Promise<string>Returns the recorded workflow as executable TypeScript source code.
exportWorkflowAsJSON()Promise<string>Returns the recorded workflow as a JSON string for storage or replay.
replay(workflow)Promise<void>Replays a recorded workflow (parsed JSON object).

Vision

When visionFallback: true is set, Sentinel automatically falls back to screenshot-based grounding when the accessibility tree fails to locate an element (Canvas, complex Shadow DOM, non-standard widgets).

sentinel.screenshot() / describeScreen()

vision.ts
// Enable vision grounding (screenshot fallback for Canvas / Shadow DOM)
const sentinel = new Sentinel({
  apiKey: process.env.GEMINI_API_KEY,
  visionFallback: true,
});

// Take a screenshot (returns Buffer)
const buf = await sentinel.screenshot();

// Describe current screen state
const description = await sentinel.describeScreen();
console.log(description);
// "A checkout page with a total of €349 visible in the top right..."
PropertyTypeDefaultDescription
screenshot()Promise<Buffer>Takes a full-page screenshot and returns the PNG buffer.
describeScreen()Promise<string>Returns an LLM-generated natural-language description of the current viewport.

Page extension

Sentinel.extend(page, options) enriches an existing Playwright Page object with AI methods without creating a new browser instance. Ideal for incremental adoption in existing Playwright test suites.

Sentinel.extend(page, options)

extend.ts
import { chromium } from 'playwright';
import { Sentinel } from '@isoldex/sentinel';

const browser = await chromium.launch();
const page    = await browser.newPage();

// Extend an existing page object
const ai = await Sentinel.extend(page, { apiKey: process.env.GEMINI_API_KEY });

await page.goto('https://example.com');

// Now use act/extract/observe alongside existing playwright APIs
await ai.act('click the login button');
const data = await ai.extract('get the page title', { title: 'string' });

// Standard Playwright still works
await expect(page.locator('h1')).toBeVisible();

Events & token tracking

sentinel.on() / getTokenUsage() / exportLogs()

events.ts
// Listen to events
sentinel.on('action',   (e) => console.log('action:', e));
sentinel.on('navigate', (e) => console.log('navigated to:', e.url));
sentinel.on('close',    ()  => console.log('browser closed'));

// Token usage
const usage = sentinel.getTokenUsage();
console.log(usage);
// { promptTokens: 12400, completionTokens: 840, totalTokens: 13240, estimatedCost: 0.0018 }

// Export all logs
const logs = sentinel.exportLogs();
PropertyTypeDefaultDescription
on('action', cb)voidEmitted after every act() call. cb receives ActionResult.
on('navigate', cb)voidEmitted on every page navigation. cb receives { url }.
on('close', cb)voidEmitted when the browser is closed.
getTokenUsage()TokenUsageReturns cumulative prompt/completion/total tokens and estimated cost.
exportLogs()LogEntry[]Returns all recorded log entries for the current session.
sentinel.pagePageDirect access to the underlying Playwright Page object.
sentinel.contextBrowserContextDirect access to the underlying Playwright BrowserContext.

SentinelOptions

PropertyTypeDefaultDescription
apiKeystringAPI key for the selected LLM provider. Required for Gemini and OpenAI.
headlessbooleantrueRun browser in headless mode.
browser'chromium' | 'firefox' | 'webkit''chromium'Playwright browser engine.
viewport{ width, height }1280×720Browser viewport dimensions.
verbose0 | 1 | 2 | 30Log level. 3 = full LLM reasoning JSON.
enableCachingbooleanfalseEnable locator + prompt cache to cut repeated costs to near-zero.
visionFallbackbooleanfalseEnable screenshot-based grounding as fallback for inaccessible elements.
providerLLMProviderCustom LLM provider instance. Overrides apiKey + default Gemini.
sessionPathstringFile path to save/restore cookies and localStorage.
userDataDirstringPersistent browser profile directory (preserves IndexedDB, Service Workers).
proxyProxyOptionsProxy configuration. See ProxyOptions.
humanLikebooleanfalseAdd random delays between actions to simulate human behaviour.
domSettleTimeoutMsnumber500Milliseconds to wait for DOM mutations to settle after each action.
locatorCacheILocatorCacheCustom locator cache implementation (default: in-memory Map).
promptCacheIPromptCacheCustom prompt cache implementation (default: in-memory Map).
maxElementsnumber150Max interactive elements included in each LLM context snapshot.

ActOptions

PropertyTypeDefaultDescription
variablesRecord<string, string>Variable interpolation map. Keys are used as %key% in instruction strings.
retriesnumber3Number of action attempts before throwing ActionError.

ActionResult

PropertyTypeDefaultDescription
successbooleanWhether the action completed successfully.
actionstringThe action type that was executed (e.g. 'click', 'fill').
instructionstringThe original instruction string.
selectorstring | nullThe CSS selector used for the action, if applicable.
messagestringHuman-readable outcome message.
attemptsnumberNumber of attempts made before success or final failure.

AgentRunOptions

PropertyTypeDefaultDescription
maxStepsnumber20Maximum number of agent steps before aborting.
onStep(event: AgentStepEvent) => voidCallback invoked after each step with real-time progress.

AgentStepEvent

PropertyTypeDefaultDescription
stepnumberCurrent step index (1-based).
totalStepsnumberMaximum steps configured.
actionActionResultResult of the action taken in this step.
reasoningstringLLM reasoning for choosing this action.

AgentResult

PropertyTypeDefaultDescription
successbooleanTrue if the agent completed without throwing.
goalAchievedbooleanTrue if the LLM confirmed the goal was met.
totalStepsnumberNumber of steps executed.
messagestringSummary message from the agent.
historyActionResult[]Full list of actions taken.
dataRecord<string, unknown>Structured data extracted during the run.
selectorsRecord<string, string>camelCase key → CSS selector map for elements interacted with.

ParallelTask / ParallelResult / ParallelOptions

ParallelTask

PropertyTypeDefaultDescription
urlstringURL to navigate to before running the goal.
goalstringNatural-language goal for this browser session.
optionsAgentRunOptionsPer-task agent options (maxSteps, onStep).

ParallelResult

PropertyTypeDefaultDescription
successbooleanWhether this task completed without error.
dataRecord<string, unknown>Extracted data if the task succeeded.
errorstring | nullError message if the task failed.
agentResultAgentResult | nullFull agent result for detailed inspection.

ParallelOptions

PropertyTypeDefaultDescription
apiKeystringLLM API key shared across all parallel sessions.
concurrencynumber3Maximum number of concurrent browser sessions.
headlessbooleantrueHeadless mode for all sessions.
...SentinelOptionsAll other SentinelOptions are forwarded to each session.

ProxyOptions

PropertyTypeDefaultDescription
serverstringProxy server URL, e.g. 'http://proxy.example.com:8080'.
usernamestringProxy authentication username.
passwordstringProxy authentication password.
bypassstringComma-separated list of hosts to bypass the proxy for.

LLMProvider interface

Implement this interface to use any LLM with Sentinel. Pass the instance as provider in SentinelOptions.

llm-provider.ts
interface LLMProvider {
  complete(prompt: string, options?: CompletionOptions): Promise<string>;
  completeJSON<T>(prompt: string, schema: ZodSchema<T>): Promise<T>;
}

See the LLM Providers page for built-in implementations (GeminiProvider, OpenAIProvider, AnthropicProvider, OllamaProvider).

Error classes

All errors extend SentinelError which extends the native Error.

Error hierarchy

errors.ts
import {
  SentinelError,
  ActionError,
  ExtractionError,
  NavigationError,
  AgentError,
  NotInitializedError,
} from '@isoldex/sentinel';

try {
  await sentinel.act('click the non-existent button');
} catch (err) {
  if (err instanceof ActionError) {
    console.error('Action failed after', err.attempts, 'attempts');
    console.error('Instruction:', err.instruction);
    console.error('Selector tried:', err.selector);
  }
}
PropertyTypeDefaultDescription
SentinelErrorclassBase class. All Sentinel errors extend this.
ActionErrorclassThrown when act() fails after all retries. Has .instruction, .selector, .attempts.
ExtractionErrorclassThrown when extract() cannot parse the LLM response into the requested schema.
NavigationErrorclassThrown on navigation failure (timeout, network error, invalid URL).
AgentErrorclassThrown when run() exceeds maxSteps without achieving the goal.
NotInitializedErrorclassThrown when any method is called before sentinel.init().