Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.simular.ai/llms.txt

Use this file to discover all available pages before exploring further.

The Simulang Primer

A hands-on tour of writing scripts in Simulang. By the end you’ll know how to drive a real browser, fill in a form, click through a UI, take screenshots, ground elements with a vision model, and compose larger workflows out of small reusable steps — all in plain TypeScript or JavaScript. This primer is meant to be read in order; each chapter builds on the one before it. The complete API reference for the underlying library lives in node_modules/@simular-ai/simulib-js/index.d.ts once you’ve installed anything — keep that open in another tab when you’re ready to go deeper than what we cover here.

Table of contents

  1. What is Simulang?
  2. Your first script
  3. Opening an app
  4. Your first automation
  5. The accessibility tree
  6. Acting on elements
  7. Finding the element you want
  8. Waiting for the UI
  9. When things don’t work
  10. A complete example
  11. When accessibility isn’t enough
  12. Files, env vars, and shelling out
  13. Composing scripts
  14. Common pitfalls
  15. Where to go next
  16. Simulang in Claude Code

1. What is Simulang?

Simulang is a small command-line tool for running automation scripts against real desktop applications — your default browser, your editor, a chat app, a native dialog, whatever happens to be running. A Simulang script is an ordinary ES module (a .ts, .mts, or .mjs file) that imports from @simular-ai/simulib-js and drives the OS through its accessibility APIs. If you’ve used Playwright or Puppeteer, the mental model will feel familiar: open a thing, find an element, interact with it, assert what happened. The differences:
  • Simulang isn’t browser-only. The same APIs drive native apps — a messaging client to grab a 2FA code, a finder window, a desktop installer.
  • There’s no headless mode. You’re automating the actual UI a person would see, on the actual screen.
  • Element discovery uses the OS-level accessibility tree, not CSS selectors. That’s how the same script can find a button in Chrome and a menu item in a native macOS app.
This primer assumes you have simulang installed and a desktop you can run scripts against — see the simulang-cli README for install, authentication, and version-pinning details. From here on, we focus on writing and running workflows.

2. Your first script

Create hello.ts:
// hello.ts
const { platform, version } = process
console.log(`Hello from simulang on ${platform}, Node ${version}.`)
Run it:
$ simulang run hello.ts
Hello from simulang on darwin, Node v22.18.0.
That’s it — no boilerplate, no async function main() wrapper. A Simulang script is just an ES module, and the top of the file is the entry point. Top-level await works, dynamic imports work, and the full Node standard library is available.
// hello-async.ts
// Top-level await is fine.
await new Promise((r) => setTimeout(r, 50))
console.log('half a frame later…')

// And so is dynamic import — useful for branching by platform.
const fs = await import('node:fs/promises')
console.log('cwd =', await fs.realpath(process.cwd()))
If you’ve worked with newer Node tooling this will feel completely ordinary. That’s the point. Recap. A Simulang script is just an ES module. Save a .ts, .mts, or .mjs file. Run it with simulang run. Use process.env, process.exit, and anything else you’d use in a normal Node script.

3. Opening an app

The @simular-ai/simulib-js package gives you everything else. Let’s open example.com in the default browser:
// open-example.ts
import { App, FocusPolicy, Visibility } from '@simular-ai/simulib-js'

const instance = App.defaultBrowser().open(
  'https://example.com',
  FocusPolicy.Steal,        // ① bring the browser to the front
  Visibility.Show,          // ② don't try to hide it
  true,                     // ③ block briefly for the page to start
)

console.log('opened:', instance.pid)
Three things to call out:
  1. FocusPolicy is Steal or DoNotSteal. “Steal” asks the OS to bring the app to the front; “DoNotSteal” asks it not to. The word “ask” is doing real work in that sentence — see the pitfalls chapter.
  2. Visibility is Show or Hidden. Same caveat: it’s a request, not a guarantee, and Chromium-based apps in particular ignore Hidden.
  3. waitForLoadComplete (the trailing boolean) blocks for a short, fixed delay so the app or URL has a chance to become responsive before open() returns. It’s a sleep, not a full network wait — you still need to give the page time to render before reaching for the DOM.
The call returns an Instance — a handle to the running app:
instance.pid              // process ID
instance.isFocused()      // boolean
instance.focus()          // bring it forward
instance.hide()           // minimise
instance.isAccessible()   // does the OS expose this app's accessibility tree?
instance.enableAccessibility()
If you want to launch a specific app (not the system default), use App.exactName():
App.exactName('Google Chrome').open(
  null,                            // no URL — just raise the app
  FocusPolicy.Steal,
  Visibility.Show,
  false,
)
open(null, …) is the canonical way to raise an already-running app to the foreground without changing its state.

4. Your first automation

Time to make something happen. Save this as wikipedia-stats.ts:
// wikipedia-stats.ts — run with: simulang run wikipedia-stats.ts
import {
  AccessibilityTree,
  App,
  AriaRole,
  FocusPolicy,
  Visibility,
  type AccessibilityNodeJs,
} from '@simular-ai/simulib-js'

const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms))

function flattenDFS(
  node: AccessibilityNodeJs,
  out: AccessibilityNodeJs[] = [],
): AccessibilityNodeJs[] {
  out.push(node)
  for (const child of node.children) flattenDFS(child, out)
  return out
}

const instance = App.defaultBrowser().open(
  'https://en.wikipedia.org', FocusPolicy.Steal, Visibility.Show, true,
)
await sleep(2500)
if (!instance.isAccessible()) instance.enableAccessibility()

const tree = AccessibilityTree.fromPid(instance.pid)
const link = flattenDFS(tree.snapshot(true)).find(
  (n) =>
    n.role === AriaRole.Link &&
    /special:statistics/i.test(n.description) &&
    n.refId != null,
)
if (!link) throw new Error('Could not find the Statistics link.')

tree.activate(link.refId)
console.log('Opened the Wikipedia statistics page.')
Run it. Wikipedia opens, Simulang walks the page’s accessibility tree, finds the Statistics link, clicks it, and your browser jumps to Wikipedia’s Special:Statistics page. Why n.description and not n.name? Web AX on macOS commonly exposes link text in the description field with name left empty. Other platforms differ — that’s exactly the kind of cross-platform variability chapter 7 addresses with a labelOf helper that checks every field a label might live in. macOS first-run prompt. The first time a Simulang script touches another app, the OS will prompt you to grant accessibility permission to the terminal you ran it from (System Settings → Privacy & Security → Accessibility). Grant it once and you’re set. That’s the whole Simulang loop in twenty-odd lines. The shape is the same for every script you’ll write — from this demo to a multi-step purchase flow:
  1. Open or attach to an appApp.defaultBrowser().open (here), or AccessibilityTree.fromPid to drive something already running.
  2. Snapshot the accessibility treetree.snapshot(true) returns the visible UI as a tree of nodes.
  3. Walk the tree to find the node you want — any standard array or object operation works, since the snapshot is plain data.
  4. Act on its refIdtree.activate(refId) for clicks, tree.setValue(refId, text) for typing, and a handful of others.
Don’t worry about every line yet — the next few chapters break it down. The takeaway is that you’ve now seen the four moves that make up every Simulang script. A note on TypeScript. This primer uses .ts so we can show type annotations where they’re informative — .mjs works just as well.

5. The accessibility tree

You just used the accessibility tree without an introduction. Let’s back up and look at what it is. The accessibility tree is a snapshot of every visible widget in a window — the same data screen readers use to read a UI aloud. Every OS exposes one, and @simular-ai/simulib-js gives you a uniform Node API over it.
import { AccessibilityTree } from '@simular-ai/simulib-js'

// Bind to the foreground window of whichever app is in front right now.
const tree = AccessibilityTree.fromForeground()
console.log('Bound to window:', tree.windowTitle)
You can also bind to a specific window by process ID — handy when you just opened the app yourself and have an Instance.pid (chapter 4 used this form):
const tree = AccessibilityTree.fromPid(instance.pid)
Once you’ve got a tree, take a snapshot:
const root = tree.snapshot(true)   // true = visible-only; almost always what you want
The shape of every node looks like this:
interface AccessibilityNodeJs {
  role: AriaRole            // numeric enum: Button = 7, Link = 32, ...
  name: string              // accessible name
  value: string             // current value (for inputs)
  description: string       // platform-specific extra
  helpText: string          // tooltip-style hint
  refId?: number            // opaque handle for interactions (more below)
  children: AccessibilityNodeJs[]
  boundingBox: { x, y, width, height, ... }
}
You walk this tree like any other object graph:
function flattenDFS(node, out = []) {
  out.push(node)
  for (const child of node.children) flattenDFS(child, out)
  return out
}

for (const node of flattenDFS(root).slice(0, 10)) {
  console.log(node.role, JSON.stringify(node.name).slice(0, 60))
}
The most important property is refId. It’s an opaque integer that identifies a node for this snapshot only. You pass it to interaction methods like tree.activate(refId) to click. Two crucial properties:
  • A refId is valid only as long as the snapshot that produced it. Take a new snapshot, and the old refIds are dead. We’ll see how to cope with that in chapter 8.
  • A node only has a refId when the OS exposed one. Some structural wrappers (groups, panels) don’t have refIds; you can still see them in the tree, but you can’t click them. Always check refId != null before acting.
The roles are an enum called AriaRole. The common ones:
import { AriaRole } from '@simular-ai/simulib-js'

AriaRole.Button       // 7
AriaRole.Checkbox     // 10
AriaRole.Dialog       // 18
AriaRole.Document     // 20  — the root of a web page
AriaRole.Heading      // 29
AriaRole.Img          // 30
AriaRole.Link         // 32
AriaRole.MenuItem     // 43
AriaRole.Radio        // 54
AriaRole.Tab          // 72
AriaRole.Textbox      // 78
Heads up. AriaRole is a numeric enum. The TypeScript reverse-mapping trick — AriaRole[someNode.role] — does not work for numeric NAPI enums. Use the exported helper instead:
import { ariaRoleToString } from '@simular-ai/simulib-js'
ariaRoleToString(node.role)  // "button"
Recap. Bind to a window → take a snapshot → walk the tree to find the node you want → act on its refId. The rest of this primer is elaborations on that loop.

6. Acting on elements

Once you have a refId, the AccessibilityTree object exposes a small family of action methods:
tree.activate(refId)         // "click" — works for buttons, links, menuitems
tree.setValue(refId, text)   // type into a textbox / combobox
tree.toggle(refId)           // flip a checkbox or switch
tree.select(refId)           // choose a tab, radio button, or list item
tree.expandCollapse(refId)   // open/close a dropdown or tree item
tree.scrollIntoView(refId)   // bring the element on-screen
tree.focusElement(refId)     // focus (also raises the window)
tree.getBounds(refId)        // current screen coordinates
In practice you’ll spend most of your time with activate and setValue. The rest are special-purpose helpers for cases where a plain click doesn’t do the right thing. Sync, not async. Every action method on AccessibilityTree is synchronous — they don’t return Promises, and you don’t await them. The await you saw in chapter 4 was for sleep, not for tree.activate. tree.snapshot() is also synchronous. The only things you await in a typical script are sleeps, polling helpers like withSnapshot (chapter 8), and Node stdlib promises (fetch, node:fs/promises). You already saw activate in chapter 4. Filling a form looks the same — find the input, set its value, find the submit button, activate it:
const nodes = flattenDFS(tree.snapshot(true))

const searchBox = nodes.find(
  (n) => n.role === AriaRole.Textbox && /search/i.test(n.name) && n.refId != null,
)
const searchBtn = nodes.find(
  (n) => n.role === AriaRole.Button && /^search$/i.test(n.name) && n.refId != null,
)
if (!searchBox || !searchBtn) throw new Error('search controls not found')

tree.setValue(searchBox.refId, 'simulang')   // ① type into the box
tree.activate(searchBtn.refId)               // ② click submit
Two details worth noticing — they apply to every action method:
  1. The predicate checks n.refId != null. Without that, you might match a structural wrapper that has the right name but no way to be acted on.
  2. Both setValue and activate use refIds from the same snapshot. As long as no new snapshot has happened between them, the refs stay valid. After any navigation or UI mutation, you’ll need a fresh snapshot — chapter 8 shows the pattern.

7. Finding the element you want

Real UIs make element discovery harder than “find by name.” A few practical patterns we use again and again. A note on the helpers in this chapter. labelOf, stripPUA, flattenDFS, pageNodes below — and withSnapshot in the next chapter — are not exported from @simular-ai/simulib-js. They’re a handful of lines each, and copying them into each script is faster than reaching for a dependency. The upside of script-local helpers is that you can tweak them when a project needs something different (a labelOf that includes automationId, a pageNodes scoped to a specific app). If you find yourself maintaining a project-wide variant, factor it into a local lib/ directory and import it normally.

7.1 The label can live in any of four fields

Different platforms put a control’s accessible text in different places:
  • Native macOS apps usually put it in name.
  • Text inputs put the current value in value (not name).
  • Web AX on macOS often puts link text in description.
  • Tooltips show up in helpText.
A single label-helper covers all four:
const labelOf = (n) =>
  [n.name, n.value, n.description, n.helpText].filter(Boolean).join(' ').trim()
Now labelOf(node) is the answer to “what would a screen reader say about this element?”, regardless of which field the OS chose.

7.2 Strip Private-Use-Area glyphs

Web pages frequently put icon-font glyphs (Font Awesome, Material Icons) into link labels. These come through as characters in Unicode’s Private Use Area (U+E000U+F8FF), and they break anchored regexes:
// node.name might be " Logout"
/^logout$/i.test(node.name)         // false ☹
Strip them:
const stripPUA = (s) => s.replace(/[-]/g, '').replace(/\s+/g, ' ').trim()
const labelOf = (n) => stripPUA([n.name, n.value, n.description, n.helpText].filter(Boolean).join(' '))

7.3 Scope to the page, not the chrome

A browser snapshot includes the URL bar, tab strip, bookmarks bar, dev-tools panes — none of which you usually want when you’re looking for an “Add to cart” button. The convention is to flatten only the last AriaRole.Document subtree, which is the active tab’s web content:
function pageNodes(root) {
  const all = flattenDFS(root)
  const docs = all.filter((n) => n.role === AriaRole.Document)
  if (!docs.length) return all                       // not a browser — full tree
  return flattenDFS(docs[docs.length - 1])            // last = active tab
}
Then use pageNodes(root) instead of flattenDFS(root) when you’re hunting for something inside the page.

7.4 Use the built-in search when it fits

AccessibilityTree also exposes a search method that mirrors the common case:
import { TraversalOrder } from '@simular-ai/simulib-js'

const hits = tree.find(
  TraversalOrder.DepthFirst,
  AriaRole.Button,             // role filter (optional)
  'Submit',                    // name contains (optional)
  true,                        // visibleOnly
  1,                           // maxResults
)
if (hits[0]) tree.activate(hits[0].refId)
Use the built-in find for “fast path, one role, one name match.” Drop back to manual DFS when you need spatial reasoning — for example, “the second textbox that appears between the heading ‘Sign in’ and the button ‘Continue’.“

8. Waiting for the UI

Pages mutate. Modals appear. The element you want shows up half a second after a click. Because refIds invalidate every time you call snapshot(), the safe pattern is a polling loop that re-snapshots each tick and looks for what you want:
async function withSnapshot(
  tree,
  predicate,
  { timeoutMs = 8000, intervalMs = 250, label = 'node' } = {},
) {
  const deadline = Date.now() + timeoutMs
  while (Date.now() < deadline) {
    const root = tree.snapshot(true)               // ① fresh refIds every tick
    const hit = predicate(flattenDFS(root))        // ② caller decides "found"
    if (hit) return hit                            // ③ first match wins
    await sleep(intervalMs)
  }
  throw new Error(`Timed out waiting for ${label}`)
}
Used at the call site:
const link = await withSnapshot(
  tree,
  (nodes) => nodes.find(
    (n) => n.role === AriaRole.Link && /^logout$/i.test(labelOf(n)) && n.refId != null,
  ),
  { label: '"Logout" link' },
)
tree.activate(link.refId)
Three reasons this pattern shows up in every nontrivial script:
  1. The element might not be there yet — page is still loading, animation is mid-flight, modal hasn’t opened.
  2. The element might be there but its refId is from a now-stale snapshot. Polling re-issues the snapshot for you.
  3. The named label gives you a useful error message when the wait genuinely times out: “Timed out waiting for the Allow button in the blocked-download panel” is much nicer to read at 3am than “Timed out waiting for node.”
A predicate doesn’t have to return a single node. When you need several pieces of state to all be visible at once, return a structured object:
const { emailField, passField, loginBtn } = await withSnapshot(
  tree,
  (nodes) => {
    const headingIdx = nodes.findIndex((n) => /sign in/i.test(labelOf(n)))
    if (headingIdx === -1) return null
    const btnIdx = nodes.findIndex(
      (n, i) => i > headingIdx && n.role === AriaRole.Button && /sign in/i.test(labelOf(n)),
    )
    if (btnIdx === -1) return null
    const inputs = nodes
      .slice(headingIdx, btnIdx)
      .filter((n) => n.role === AriaRole.Textbox && n.refId != null)
    if (inputs.length < 2) return null
    return { emailField: inputs[0], passField: inputs[1], loginBtn: nodes[btnIdx] }
  },
  { label: 'sign-in form (email + password + button)' },
)
This is also how you handle login forms whose inputs have no accessible label at all: locate them positionally, between two well-labelled landmarks.

9. When things don’t work

Sooner or later you’ll get Timed out waiting for "Continue" button, or your find will silently return undefined. Here’s the troubleshooting playbook.

9.1 Print the tree

The fastest way to debug a missing element is to dump what is there:
import { ariaRoleToString } from '@simular-ai/simulib-js'

const lines = flattenDFS(tree.snapshot(true))
  .filter((n) => n.refId != null && labelOf(n))
  .map((n) => `${ariaRoleToString(n.role).padEnd(12)} ${JSON.stringify(labelOf(n)).slice(0, 60)}`)
console.log(lines.join('\n'))
Drop that wherever your script is stuck. It prints every actionable element on the page with its role and accessible label. Two outcomes:
  • The element isn’t in the list. The page hasn’t rendered it yet. Sleep longer, or — better — wrap the lookup in withSnapshot so it polls.
  • It is in the list, but your predicate isn’t matching. Almost always: the label is in description or value instead of name, or there’s a stray icon-font glyph you didn’t strip. See chapter 7.

9.2 Common root causes

A handful of issues account for most “where did my element go” problems:
  • The label is in description, value, or helpText, not name. Use labelOf (chapter 7) instead of node.name directly.
  • The element has no refId. Structural wrappers (groups, panes) often share a name with their actionable children but don’t get a refId of their own. Always filter with n.refId != null in your predicate.
  • You’re looking inside the wrong subtree. Browser chrome (URL bar, bookmarks bar) lives at the top of the snapshot; the active page is inside the last AriaRole.Document. Use pageNodes (chapter 7) when you only want page content.
  • The page renders after your snapshot. Wrap the find in withSnapshot rather than bumping a brittle sleep.
  • The element appears, then disappears. SPAs sometimes re-mount nodes mid-update. withSnapshot retries on a fresh snapshot each tick; bare find doesn’t.
  • You’re bound to the wrong window. AccessibilityTree.fromForeground() binds to whichever app is frontmost at the moment of the call. If a notification stole focus during your sleep, you’ll get its tree instead. fromPid(instance.pid) is the safer choice when you have a pid.

9.3 Iterate in the REPL

When you’re stuck on a predicate, simulang run -i is much faster than editing-and-rerunning the whole script. Bring the target app to the front first, then:
$ simulang run -i
> const tree = AccessibilityTree.fromForeground()
> const root = tree.snapshot(true)
> flattenDFS(root).filter((n) => n.role === AriaRole.Button).map((n) => n.name)
[ 'Sign in', 'Cancel', '' ]
fromForeground() binds to whatever’s active at the moment of the call — that’s why the app needs to be focused before you run it.

10. A complete example

Let’s tie everything together. The script below opens automationexercise.com (a public test site), confirms we’re logged out, signs in with credentials from environment variables, and signs out again. We verify each state transition through the accessibility tree.
// login-and-out.ts
import {
  AccessibilityTree,
  App,
  AriaRole,
  FocusPolicy,
  Visibility,
} from '@simular-ai/simulib-js'

const email = process.env.SITE_EMAIL
const password = process.env.SITE_PASSWORD
if (!email || !password) {
  console.error('Set SITE_EMAIL and SITE_PASSWORD before running.')
  process.exit(1)
}

const sleep = (ms) => new Promise((r) => setTimeout(r, ms))
const stripPUA = (s) => s.replace(/[-]/g, '').replace(/\s+/g, ' ').trim()
const labelOf = (n) => stripPUA([n.name, n.value, n.description, n.helpText].filter(Boolean).join(' '))

function flattenDFS(node, out = []) {
  out.push(node); for (const c of node.children) flattenDFS(c, out); return out
}

async function withSnapshot(tree, predicate, opts = {}) {
  const { timeoutMs = 8000, intervalMs = 250, label = 'node' } = opts
  const deadline = Date.now() + timeoutMs
  while (Date.now() < deadline) {
    const root = tree.snapshot(true)
    const hit = predicate(flattenDFS(root))
    if (hit) return hit
    await sleep(intervalMs)
  }
  throw new Error(`Timed out waiting for ${label}`)
}

const isLink = (n, re) => n.role === AriaRole.Link && re.test(labelOf(n)) && n.refId != null
const findLink = (nodes, re) => nodes.find((n) => isLink(n, re))
const SIGNUP_LOGIN = /^signup \/ login$/i
const LOGOUT = /^logout$/i

try {
  // 1. Open the site.
  const instance = App.defaultBrowser().open(
    'https://automationexercise.com', FocusPolicy.Steal, Visibility.Show, true,
  )
  await sleep(2500)
  if (!instance.isAccessible()) instance.enableAccessibility()
  const tree = AccessibilityTree.fromPid(instance.pid)

  // 2. Make sure we're logged out (self-heal from stale sessions).
  const navState = await withSnapshot(
    tree,
    (nodes) => {
      const logout = findLink(nodes, LOGOUT)
      if (logout) return { kind: 'logged-in', logout }
      if (findLink(nodes, SIGNUP_LOGIN)) return { kind: 'logged-out' }
      return null
    },
    { label: 'nav with Signup/Login or Logout' },
  )
  if (navState.kind === 'logged-in') {
    tree.activate(navState.logout.refId)
    await withSnapshot(
      tree,
      (nodes) => findLink(nodes, SIGNUP_LOGIN) && !nodes.some((n) => isLink(n, LOGOUT)),
      { label: 'logged-out state' },
    )
  }

  // 3. Click "Signup / Login".
  const loginLink = await withSnapshot(tree, (nodes) => findLink(nodes, SIGNUP_LOGIN), {
    label: '"Signup / Login" link',
  })
  tree.activate(loginLink.refId)

  // 4. Find the login form positionally and fill it.
  const { emailField, passField, loginBtn } = await withSnapshot(
    tree,
    (nodes) => {
      const hIdx = nodes.findIndex((n) => /^login to your account$/i.test(labelOf(n)))
      if (hIdx === -1) return null
      const btnIdx = nodes.findIndex(
        (n, i) => i > hIdx && n.role === AriaRole.Button && /^login$/i.test(labelOf(n)),
      )
      if (btnIdx === -1) return null
      const inputs = nodes.slice(hIdx, btnIdx).filter(
        (n) => n.role === AriaRole.Textbox && n.refId != null,
      )
      if (inputs.length < 2) return null
      return { emailField: inputs[0], passField: inputs[1], loginBtn: nodes[btnIdx] }
    },
    { label: 'login form' },
  )
  tree.setValue(emailField.refId, email)
  tree.setValue(passField.refId, password)
  tree.activate(loginBtn.refId)

  // 5. Confirm login by waiting for the Logout link, then click it.
  const logoutLink = await withSnapshot(tree, (nodes) => findLink(nodes, LOGOUT), {
    label: 'Logout link', timeoutMs: 10000,
  })
  tree.activate(logoutLink.refId)

  // 6. Confirm logout.
  await withSnapshot(
    tree,
    (nodes) => findLink(nodes, SIGNUP_LOGIN) && !nodes.some((n) => isLink(n, LOGOUT)),
    { label: 'logged-out confirmation' },
  )
  console.log('done.')
} catch (error) {
  console.error('Failed:', error instanceof Error ? error.message : error)
  process.exit(1)
}
Most real scripts are some variant of this shape:
  • Imports at the top, secrets from process.env.
  • A handful of small helpers (sleep, labelOf, flattenDFS, withSnapshot) — the chapter-7 note about script-local helpers applies here.
  • One big try/catch with process.exit(1) on failure.
  • Numbered phases, each one a withSnapshot that names what it’s waiting for. The names double as breadcrumbs in the logs.

11. When accessibility isn’t enough

The accessibility tree is the right tool nine times out of ten, but not always. Two escape hatches:

11.1 Hardware mouse and keyboard

When a control has no refId — a canvas, an embedded video player, an OS-level dialog the app doesn’t expose — drop to real coordinates and key events:
import {
  MouseController, KeyboardController,
  Button, Direction, Coordinate, Key,
} from '@simular-ai/simulib-js'

const mouse = new MouseController()
mouse.moveMouse(640, 360, Coordinate.Abs)     // absolute screen pixels
mouse.button(Button.Left, Direction.Click)    // press + release

const kb = new KeyboardController()
kb.text('hello world')                        // types via the OS input layer
kb.key(Key.Enter)                             // a single named key
Coordinate.Abs is absolute pixels; Coordinate.Rel is movement relative to the current cursor position. For drag-and-drop, send Direction.Press, move, then Direction.Release.

11.2 Vision grounding

When you can see the element but neither the OS nor your regex can find it, take a screenshot and ask a vision model where it is:
import { Screen, screenshotFull, GroundingModel } from '@simular-ai/simulib-js'

const shot = screenshotFull(true, Screen.mainScreen())   // hide the cursor
const [x, y] = shot.ground(GroundingModel.default(), 'the blue Continue button')

mouse.moveMouse(x, y, Coordinate.Abs)
mouse.button(Button.Left, Direction.Click)
shot.ground(model, query) returns absolute screen coordinates for whatever the model thinks best matches your natural-language query. A few practical notes:
  • Pass true to screenshotFull to hide the cursor — otherwise the model may describe the cursor as part of the UI.
  • Grounding is network-bound: it needs OPENROUTER_API_KEY set.
  • Vision grounding is your last resort, not your first move. It’s slower, costs money, and is less reliable than the accessibility tree. Use it for the 1% of cases where nothing else works.

12. Files, env vars, and shelling out

There are no Simulang-specific I/O wrappers. Use Node’s standard library directly:
import { readdirSync, readFileSync, writeFileSync, renameSync, statSync, existsSync } from 'node:fs'
import { spawn, spawnSync } from 'node:child_process'
import { join, dirname } from 'node:path'
import { fileURLToPath } from 'node:url'

// Resolve "the directory of this script."
const HERE = dirname(fileURLToPath(import.meta.url))

// Plain I/O.
writeFileSync(join(HERE, 'out.txt'), 'hello\n')
const text = readFileSync(join(HERE, 'out.txt'), 'utf8')

// Shell out — captured output.
const result = spawnSync('python3', ['transform.py', 'in.csv'], { encoding: 'utf8' })
if (result.status !== 0) throw new Error(result.stderr)

// Fire-and-forget — open a file with the OS default app.
spawn('open', [join(HERE, 'out.txt')], { detached: true, stdio: 'ignore' }).unref()
import.meta.url + fileURLToPath is the ES-module equivalent of the CommonJS __dirname. Use it any time you need to resolve a path relative to the script itself. The @simular-ai/simulib-js package also ships convenience classes — File, Directory, Clipboard, System — when you want cross- platform helpers (an OS clipboard read, a temp directory with an explicit lifecycle). They’re worth using for clipboard and audio work; for plain file reads and writes, staying in node:fs keeps the concept count low.

13. Composing scripts

As workflows grow, you’ll want to split them up. There are two natural ways.

13.1 Import helpers from sibling modules

The simplest: extract reusable functions into their own files and import them.
// flow/login.mjs
export async function login(tree, { email, password }) {
  // … positional find + setValue + activate, as in chapter 10 …
}
// flow/checkout.mjs
export async function placeOrder(tree, item) {
  // …
}
// run.mjs
import { App, FocusPolicy, Visibility, AccessibilityTree } from '@simular-ai/simulib-js'
import { login } from './flow/login.mjs'
import { placeOrder } from './flow/checkout.mjs'

App.defaultBrowser().open('https://example.com', FocusPolicy.Steal, Visibility.Show, true)
const tree = AccessibilityTree.fromForeground()

await login(tree, { email: process.env.SITE_EMAIL, password: process.env.SITE_PASSWORD })
await placeOrder(tree, 'blue-shirt')
This is just ES modules — there’s nothing Simulang-specific to learn. Pass shared resources (the tree, the instance) as arguments rather than reaching for module-level singletons.

13.2 Run other scripts as subprocesses

When you want full isolation between phases — fresh process, fresh state, no shared imports — orchestrate from a parent script:
// orchestrator.mjs
import { spawnSync } from 'node:child_process'
import { dirname, join } from 'node:path'
import { fileURLToPath } from 'node:url'

const HERE = dirname(fileURLToPath(import.meta.url))

function runStep(label, script) {
  console.log(`\n===== ${label} =====`)
  const result = spawnSync('simulang', ['run', join(HERE, script)], {
    stdio: 'inherit',   // stream the child's logs live
    env: process.env,   // pass env vars through (SITE_EMAIL, OPENROUTER_API_KEY, …)
  })
  if (result.status !== 0) {
    console.error(`"${label}" failed with exit ${result.status}.`)
    process.exit(result.status ?? 1)
  }
}

runStep('Sign in', 'sign-in.ts')
runStep('Place order', 'place-order.ts')
runStep('Download receipt', 'download-receipt.ts')
The big win: each step is independently runnable. If place-order.ts fails, you can iterate on it directly without re-running sign-in every time.

14. Common pitfalls

A short list of things that bite first-time users (and second- and third-time users, honestly).
// ① refIds are tied to one snapshot. Don't cache them.
const btn = flattenDFS(tree.snapshot(true)).find(...)
await sleep(2000)         // ← the page may have re-rendered
tree.activate(btn.refId)  // ⚠ stale refId

// ✓ Re-snapshot inside withSnapshot and act on a fresh ref.
// ② AriaRole is a numeric enum. AriaRole[role] does NOT give a string.
console.log(AriaRole[node.role])              // ⚠ undefined
console.log(ariaRoleToString(node.role))      // ✓ "button"
// ③ FocusPolicy and Visibility are advisory.
App.exactName('Google Chrome').open(null, FocusPolicy.DoNotSteal, Visibility.Hidden, false)
// ⚠ Chromium / Electron apps and macOS Notes ignore these flags. The
//    app pops to the foreground and stays visible regardless.

// ✓ If you need certainty the app is focused, check and force it:
if (!instance.isFocused()) instance.focus()

// ✓ If you need it hidden, call .hide() after the launch settles:
await sleep(500)
instance.hide()
// ④ The accessible label can live in any of four fields.
node.name === 'Sign in'           // ⚠ might be in description on web AX
labelOf(node) === 'Sign in'       // ✓ uses name|value|description|helpText
// ⑤ Browser chrome contaminates snapshots.
nodes.find((n) => /save/i.test(labelOf(n)))   // ⚠ might match the URL bar
pageNodes(root).find(/* … */)                  // ✓ scope to the active tab
// ⑥ There is no built-in sleep / wait. Roll your own.
const sleep = (ms) => new Promise((r) => setTimeout(r, ms))
// ⑦ Top-level errors print ugly stack traces. Wrap and exit.
try {
  // … your script …
} catch (e) {
  console.error('Failed:', e instanceof Error ? e.message : e)
  process.exit(1)
}
One more worth mentioning that doesn’t fit a one-liner:
  • Temp directories don’t auto-clean. If you use Directory.temp(), wrap the work in try { … } finally { dir.remove() }.

15. Where to go next

You now know enough to write real Simulang scripts. From here:
  • Read index.d.ts in the resolved @simular-ai/simulib-js install. It’s the typed surface of the library — roughly 1500 lines, organised by class. Source of truth for signatures and enums, and the entry point to everything we didn’t cover here (audio capture, speech-to-text, screen recording).
  • Read the library’s CLAUDE.md next to index.d.ts. It covers idioms, lifecycle preconditions, and platform quirks — things types can’t express.
  • Browse the example scripts in examples/ inside the install. They’re the best place to see larger patterns: a full purchase flow, file downloads with rename, screenshot-and-ground, module composition.
  • Build your reflexes at the REPL. When you’re not sure how an API behaves, simulang run -i is faster than scripting it. Snapshots are cheap; experiment.
Welcome to Simulang. Have fun automating things that used to need a human in the chair.

16. Simulang in Claude Code

If you use Claude Code in the editor, you can drive the same desktop APIs without hand-writing every script: install the skill and use /simulang so Claude generates and runs automation for you. See Simulang with Claude Code for setup (simulang init-claude), example prompts, permissions, and tips.