If you have spent any serious time automating desktop workflows in Python, you know the usual suspects. Selenium and Playwright dominate browser automation. PyAutoGUI takes screenshots and clicks on pixel coordinates. Then there are the older tools — AutoIt on Windows, AppleScript on macOS — that work but feel like they belong in another era.
The problem with all of these approaches is the same: they either lock you into one platform, one app type, or one brittle strategy. Pixel-based tools break when resolution changes. Browser tools do not help when you need to automate a desktop spreadsheet application. Platform-specific scripts multiply your maintenance burden.
A newer library called xa11y (pronounced “ex-ally,” the same shorthand as “a11y” for accessibility) offers a different path. Instead of guessing where buttons are on a screen or sending raw key events, it talks to the operating system’s accessibility API directly. Every major desktop OS already maintains a tree of UI elements for screen readers and assistive technology. xa11y simply lets your Python code read that tree and interact with it.
What Is the Accessibility Tree?
When you open any desktop application — a file manager, a text editor, a settings panel — the OS builds a structured representation of every visible control. Windows calls it UI Automation. macOS calls it the Accessibility API (AXUIElement). Linux exposes it through AT-SPI2. These are the same APIs that screen readers like NVDA, VoiceOver, and Orca use to help visually impaired users navigate the desktop.
The tree looks something like this:
Window: "My App"
├── MenuBar
│ ├── MenuItem: "File"
│ └── MenuItem: "Edit"
├── Toolbar
│ ├── Button: "Save"
│ └── Button: "Undo"
├── TextField: "Search..."
└── ListView
└── ListItem: "document.pdf"
Every element has a role (button, text field, list item), a name, and a set of available actions. The important thing is that this tree exists for every desktop application, regardless of what framework it was built with — Electron, Qt, GTK, WPF, SwiftUI. If the OS can display it, the accessibility tree knows about it.
Installing xa11y
The Python package is on PyPI:
pip install xa11y
There are also Rust and Node bindings if you prefer those, but for this walkthrough we will stick with Python. The library is MIT-licensed, so you can use it in both personal and commercial projects without licensing headaches.
Reading the Accessibility Tree
The first thing you will probably want to do is see what the tree actually looks like for a running application. Here is a simple example:
from xa11y import Desktop
desktop = Desktop()
# Find the target window by title
window = desktop.query('window[name="Finder"]')
# Print the full tree
print(window.tree())
The query method accepts CSS-like selectors. You can match by role, by name, by partial name, or by hierarchical relationship:
# Find a button by exact name
save_btn = desktop.query('button[name="Save"]')
# Find any text field whose name starts with "Search"
search_box = desktop.query('textfield[name^="Search"]')
# Find a button inside a specific group
submit = desktop.query('group > button[name="Submit"]')
This selector syntax is one of xa11y’s strongest features. If you have ever used Playwright or Puppeteer for web automation, the pattern will feel immediately familiar.
Interacting with Elements
Reading the tree is only half the story. The real power comes from performing actions on elements you find:
from xa11y import Desktop
desktop = Desktop()
# Click a button
desktop.query('button[name="New Folder"]').click()
# Type into a text field
field = desktop.query('textfield[name="Folder name"]')
field.type_text("My Project")
# Press Enter
field.press_key("Return")
# Take a screenshot of a specific element (useful for debugging)
element = desktop.query('window[name="Terminal"]')
element.screenshot("terminal_state.png")
The type_text method sends characters at the OS level, so it works in any application — even ones that do not have traditional text input fields. The press_key method handles special keys like Return, Escape, Tab, and function keys.
A Practical Example: Automated File Organization
Let us put this together into something you could actually use. Here is a script that opens Finder (on macOS), navigates to the Downloads folder, and moves files into subdirectories based on their extension:
from xa11y import Desktop
import time
import shutil
from pathlib import Path
desktop = Desktop()
# Define extension-to-folder mapping
EXT_FOLDERS = {
".pdf": "Documents",
".jpg": "Photos",
".png": "Photos",
".mp4": "Videos",
".zip": "Archives",
".dmg": "Installers",
}
downloads = Path.home() / "Downloads"
for file_path in downloads.iterdir():
if file_path.is_dir():
continue
ext = file_path.suffix.lower()
target_folder = EXT_FOLDERS.get(ext)
if not target_folder:
continue
# Open Finder
desktop.query('application[name="Finder"]').click_menu_item("File", "New Finder Window")
time.sleep(0.5)
# Navigate to Downloads
sidebar = desktop.query('outline[name="Sidebar"]')
sidebar.query('static_text[name="Downloads"]').click()
time.sleep(0.5)
# Find and right-click the file
file_item = desktop.query(f'outline[name="Downloads"] > row[description="{file_path.name}"]')
file_item.right_click()
# Select "Move to" from the context menu
desktop.query('menu[name="Context Menu"] > menuitem[name="Move To"]').click()
time.sleep(0.3)
# Select the target folder
folder_item = desktop.query(f'menuitem[name="{target_folder}"]')
if folder_item:
folder_item.click()
else:
# Fallback: create folder and move
target_path = downloads / target_folder
target_path.mkdir(exist_ok=True)
shutil.move(str(file_path), str(target_path / file_path.name))
print(f"Moved {file_path.name} → {target_folder}/")
This example mixes xa11y’s UI automation with standard Python file operations. In practice, you might want to add error handling and retry logic — accessibility tree queries can fail if the UI has not finished loading.
Combining xa11y with AI Agents
Where this gets really interesting is when you combine xa11y with an AI agent. Instead of hardcoding every selector and action, you can let an LLM reason about the accessibility tree and decide what to do next.
Here is the basic pattern:
- Use xa11y to dump the current accessibility tree
- Send the tree to an LLM along with a goal (“click the Settings button”)
- Parse the LLM’s response into an action (selector + operation)
- Execute the action via xa11y
- Repeat until the goal is achieved
Several computer-use projects are already using this approach. The accessibility tree gives the LLM structured, semantic information about the UI — far more useful than a raw screenshot with bounding boxes.
from xa11y import Desktop
import json
desktop = Desktop()
def get_tree_summary():
"""Get a condensed version of the accessibility tree for an LLM."""
tree = desktop.query('window').tree()
# Trim to roles, names, and hierarchy — strip verbose attributes
lines = []
for line in tree.split("\n"):
if any(kw in line for kw in ["role:", "name:", "actions:"]):
lines.append(line.strip())
return "\n".join(lines[:50]) # Limit to first 50 lines
tree_summary = get_tree_summary()
print(f"Current UI state:\n{tree_summary}")
# Send to LLM, get action, execute...
This pattern is what powers the “computer use” capabilities you see in Claude and other AI assistants. The accessibility tree is the bridge between natural language commands and concrete UI interactions.
Platform Differences You Should Know
While xa11y presents a unified API, the underlying OS implementations do have quirks:
macOS (AXUIElement)
- Generally the most reliable. Apple has invested heavily in accessibility.
- Most native Cocoa apps expose rich accessibility information.
- Some Electron apps require enabling accessibility features explicitly.
Windows (UI Automation)
- Modern UWP and WinUI apps have excellent accessibility support.
- Older Win32 apps can be sparse — some custom-drawn controls expose almost nothing.
- Running as administrator may be necessary for some applications.
Linux (AT-SPI2)
- GTK apps work well. Qt apps vary by version.
- Wayland introduces additional complexity — some compositors restrict accessibility API access.
- You may need to set
GTK_ACCESSIBILITY=1in your environment.
When xa11y Is Not the Right Tool
No tool solves every problem. Here are situations where you should look elsewhere:
- High-speed automation: Reading the accessibility tree has overhead. If you need to click thousands of times per second, look at lower-level input simulation.
- Games: Most games do not populate the accessibility tree. They render directly to the GPU.
- Remote desktop: The accessibility tree is local to the machine. It does not cross RDP or VNC sessions.
- Web apps inside browsers: Playwright or Selenium will give you better control over DOM elements than going through the browser’s accessibility tree.
Security Considerations
Since xa11y can read every UI element on your screen and synthesize input events, it effectively has the same power as a keylogger or screen recorder. On macOS, you must grant accessibility permissions in System Settings. On Windows, some operations require elevated privileges.
If you are building a tool that distributes xa11y scripts to other users, be transparent about what the script can see and do. Users should understand that a script querying desktop.query('window') can potentially read data from any open application.
Looking Ahead
The xa11y project is still in active development. Version 0.9.0 added Python bindings that were not present in earlier releases. The roadmap includes better support for custom UI frameworks and improved error messages when accessibility information is unavailable.
As AI agent frameworks like LangGraph, Claude Agent SDK, and OpenAI Agents SDK mature, the combination of structured UI access (via accessibility trees) and LLM reasoning is becoming one of the most promising paths toward general-purpose desktop automation.
The accessibility tree was built to help people. It turns out it can help your scripts too.
References
- xa11y official site — Documentation and API reference
- Microsoft UI Automation — Windows accessibility API
- Apple Accessibility API — macOS accessibility programming
- AT-SPI2 — Linux accessibility infrastructure
Discussion
Leave a comment
No comments yet
Be the first to start the conversation.