With the proven success of AI-driven automation from our API Agent, we started with an ambitious goal: to automate complex user journeys with minimal manual intervention while generating code that seamlessly integrates into our existing framework.                     
           
The Browser Automation Agent (browser-agent) emerged from this initiative.

UI testing challenges

Modern software development demands rigorous UI testing at multiple levels. End-to-end testing validates that user workflows function correctly across the entire application stack, ensuring seamless interactions between frontend, backend, and external services. Regression testing verifies that new code changes haven’t broken existing user experiences, a critical concern as applications grow in complexity.

Traditional approaches to UI testing face significant challenges:

  • Manual test creation is time-consuming and requires deep technical knowledge of test frameworks, page object models, element selectors, and async handling.
  • Test maintenance becomes increasingly difficult as UIs evolve, requiring constant updates to selectors, wait conditions, and test logic.
  • Flaky tests plague automation efforts, with timing issues, dynamic content, and environmental inconsistencies that cause false failures.
  • Coverage gaps emerge as teams struggle to automate every user journey, edge case, and browser interaction.

Our browser automation strategy needed to address these challenges while scaling with our growing platform. We needed an approach that could generate tests rapidly, adapt to UI changes, and provide comprehensive coverage across user journeys and regression scenarios.

Introducing Browser Automation Agent

The Browser Automation Agent is an AI-driven workflow that automates the entire lifecycle of UI test creation and maintenance. Rather than requiring engineers to manually write test code or use brittle recording tools, the agent takes natural language test specifications from TestRail or JIRA and autonomously generates executable test code that follows our established testing frameworks and coding conventions.

What sets Browser Automation Agent apart from traditional code generation and recording tools is its end-to-end automation and intelligent code generation. The agent doesn’t just record clicks and generate raw scripts, it:

  • Accepts test cases written in natural language with detailed steps from TestRail
  • Intelligently executes the test flow using Playwright MCP to interact with browsers
  • Records the entire session, capturing video evidence and interaction patterns
  • Generates executable JavaScript test scripts from the recorded browser session
  • Transforms raw scripts into production-ready code using a conversion sub-agent
  • Creates supporting artifacts, including page objects, locators, and reusable components
  • Follows framework conventions, ensuring generated code matches your project’s patterns
  • Provides observability through detailed logs, videos, and execution reports

The agent processes test specifications that include test steps, expected results, and test data. It intelligently determines what code needs to be created or modified, inspects live page structures to identify robust selectors, and generates production-ready test code that integrates seamlessly into our existing test suites.

A high-level flow diagram:

Three modes
of operation

Standalone test execution

The Browser Automation Agent can operate as a standalone testing tool that executes test cases and provides immediate pass/fail results. This mode is ideal for:

  • Quick validation of test cases without generating code
  • Smoke testing after deployments to verify critical workflows
  • Test case validation before committing to automation
  • Debugging production issues by replicating user journeys

In standalone mode, the agent:

  • Launches a browser using Playwright MCP
  • Executes test steps based on natural language instructions
  • Captures video evidence of the test execution
  • Validates expected outcomes and assertions
  • Reports clear pass/fail status with detailed logs
  • Saves all artifacts (videos, screenshots, console logs) for debugging

This mode provides immediate feedback without requiring code generation or integration with your test framework.

Code generation with code conversion sub-agent

The most powerful capability of the Browser Automation Agent is its intelligent code generation through a two-stage process:

Stage 1: Browser recording

  • Executes the test using Playwright MCP
  • Captures all browser interactions and generates raw JavaScript test code
  • Records element selectors, actions, and timing

Stage 2: Sub-agent conversion

  • A specialized conversion sub-agent analyzes the raw JavaScript
  • Understands your project’s automation framework structure
  • Transforms the script into production-ready code that matches your patterns
  • Generates proper page object models with reusable methods
  • Creates robust, maintainable locators using best practices
  • Extracts test data and configurations
  • Structures test flow code following your framework conventions

The generated code includes:

  • Page objects: Reusable classes representing application pages
  • Locators: Robust element selectors using best practices (data-testid, semantic selectors)
  • Test flow code: Clean, maintainable test methods that follow your framework
  • Supporting utilities: Helper methods for common operations
MCP integration for IDE adoption

The Browser Automation Agent exposes its capabilities through the Model Context Protocol (MCP), enabling seamless integration with modern AI-powered IDEs like Cursor, Claude Code, and other MCP-compatible tools.

MCP integration benefits:

  • Zero setup friction: Add browser automation capabilities to your IDE instantly
  • Natural language commands: Describe what you want to test, and the agent handles execution
  • Live feedback: See browser automation happen in real-time from your IDE
  • Code generation in context: Generated code appears directly in your project with full context
  • Framework awareness: The agent understands your project structure and conventions
  • Iterative refinement: Modify tests conversationally and regenerate code instantly

How it works:

  1. Configure the MCP server in your IDE (Cursor, Claude Code, etc.)
  2. Describe your test case in natural language
  3. The agent executes the test using Playwright
  4. Watch browser automation happen live
  5. Review generated code and video artifacts
  6. Refine and iterate conversationally
  7. Generated code is ready to commit to your repository

This integration transforms your IDE into a powerful test automation workbench, combining AI-assisted development with production-grade browser automation.
Below is the MCP flow diagram: 

The Browser Automation Agent is currently in active adoption across our QA teams, with early results demonstrating significant improvements in test automation efficiency
and coverage. 

Initial estimated metrics from our adoption phase:

  • Test creation velocity: 60% reduction in time to create new automated tests compared to manual coding                                               
  • Code quality: Generated tests follow framework conventions with 40% fewer review comments than manually-written tests