Browser Agent Integration Documentation

Overview

The Browser Agent connects to an external API hosted by Prosus that provides web browsing and automation capabilities. This agent can navigate websites, interact with web pages, extract information, and perform automated browser tasks. It's designed for long-running tasks that require web automation.

Unlike the Native Agent which runs client-side, the Browser Agent communicates with a remote REST API service that handles browser automation using headless browsers or browser automation frameworks.

Architecture

High-Level Architecture

Key Characteristics

External API: Connects to Prosus-hosted API endpoint
Browser Automation: Uses headless browsers or automation frameworks
Polling-Based: Uses polling mode for long-running tasks
Task Management: Implements task initialization and status polling
Timeout Handling: Maximum 10-minute timeout for task completion
Streaming Support: Also supports streaming mode for quick responses

API Endpoints

The Browser Agent uses predefined endpoints hosted by Prosus:

Base URLs

Non-Streaming: https://mmwnmruxd7.eu-west-1.awsapprunner.com/chat
Streaming: https://mmwnmruxd7.eu-west-1.awsapprunner.com/chat/stream
Polling: https://mmwnmruxd7.eu-west-1.awsapprunner.com/ (base URL)

Note: These URLs are hardcoded and cannot be modified by users. They are defined in lib/database/repositories/agent-config.repository.ts.

Communication Modes

The Browser Agent supports three communication modes, with polling being the primary mode for long-running tasks:

1. Polling Mode (Primary)

Endpoint: POST {pollingUrl}chat → GET {pollingUrl}{statusUrl}

Behavior:

Initiates a task and returns a task ID and status URL
Polls the status endpoint every 30 seconds
Continues until task completes, fails, or times out (10 minutes max)
Best for long-running browser automation tasks

Implementation: hooks/use-chat-polling-agent.ts - handlePollingAgentResponse()

2. Streaming Mode

Endpoint: /chat/stream

Behavior:

Returns Server-Sent Events (SSE) format
Content is streamed incrementally
Used for quick browser tasks that complete immediately

Implementation: hooks/use-chat-custom-agent.ts - handleCustomAgentStreamingResponse()

3. Non-Streaming Mode

Endpoint: /chat

Behavior:

Returns complete response in single JSON object
Used for quick browser tasks

Implementation: hooks/use-chat-custom-agent.ts - handleCustomAgentNonStreamingResponse()

Polling Mode API

Initialize Task

Endpoint: POST {pollingUrl}chat

Headers:

Header	Description	Required
`Content-Type`	`application/json`	Yes
`x-api-key`	Internal API key for authentication	Yes
`x-prosusai-user-email`	User's email address	Optional

Request Body:

typescript

{
  message: string;           // User's message/task description
  history: HistoryMessage[]; // Chat history
}

Response:

typescript

{
  taskId: string;           // Unique task identifier
  status: "pending" | "running";
  message?: string;          // Optional status message
  statusUrl: string;        // URL to poll for task status
}

Note: If statusUrl is not present in the response, the task completed immediately and the response contains the final result.

Poll Task Status

Endpoint: GET {pollingUrl}{statusUrl}

Headers:

Header	Description	Required
`x-api-key`	Internal API key for authentication	Yes

Response:

typescript

{
  taskId: string;
  status: "pending" | "running" | "complete" | "failed";
  createdAt?: string;       // ISO timestamp
  content?: string;         // Final response content (when complete)
  products?: Product[];     // Optional product items
}

Polling Configuration

Polling Interval: 30 seconds
Maximum Attempts: 20 attempts
Total Timeout: 10 minutes (20 × 30s)

Request Format

Polling Mode Request

bash

curl -X POST https://mmwnmruxd7.eu-west-1.awsapprunner.com/chat \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "x-prosusai-user-email: user@example.com" \
  -d '{
    "message": "Navigate to example.com and extract the main heading",
    "history": [
      {
        "role": "user",
        "content": "Can you help me browse the web?"
      },
      {
        "role": "assistant",
        "content": "Sure! What would you like me to do?"
      }
    ]
  }'

Response:

json

{
  "taskId": "task-abc123",
  "status": "pending",
  "statusUrl": "task/status/task-abc123"
}

Polling Status Request

bash

curl -X GET https://mmwnmruxd7.eu-west-1.awsapprunner.com/task/status/task-abc123 \
  -H "x-api-key: YOUR_API_KEY"

Response (Pending):

json

{
  "taskId": "task-abc123",
  "status": "running",
  "createdAt": "2024-01-15T10:30:00Z"
}

Response (Complete):

json

{
  "taskId": "task-abc123",
  "status": "complete",
  "content": "The main heading on example.com is: 'Welcome to Example'",
  "products": []
}

Streaming Mode Request

Same format as Multi-Agent streaming requests. See Multi-Agent for details.

Response Format

Polling Mode Response

Task Initialization Response:

typescript

{
  taskId: string;
  status: "pending" | "running";
  message?: string;
  statusUrl: string;
}

Task Status Response:

typescript

{
  taskId: string;
  status: "pending" | "running" | "complete" | "failed";
  createdAt?: string;
  content?: string;      // Available when status is "complete"
  products?: Product[];  // Optional product items
}

Streaming Mode Response

Same SSE format as Multi-Agent. See Multi-Agent for details.

Integration Flow

Polling Mode Flow

Code Examples

Example 1: Polling Mode Implementation

typescript

// Inside handlePollingAgentResponse
export async function handlePollingAgentResponse(
  content: string,
  pollingUrl: string,
  signal: AbortSignal,
  assistantMessageId: string,
  convId: string,
  updateMessage: UpdateMessageFn,
  handleStreamComplete: HandleStreamCompleteFn,
  messages: Message[]
) {
  try {
    // Show tool call indicator
    updateMessage(assistantMessageId, {
      toolCall: {
        toolName: "browser_polling",
        query: content,
      },
    }, true, convId);

    // Initialize the task
    const initResponse = await initializePollingTask(
      content,
      pollingUrl,
      messages,
      signal
    );

    const { statusUrl } = initResponse;
    
    // Check if response is already complete
    if (!statusUrl) {
      const finalContent = (initResponse as any).content || "Task completed.";
      const products = (initResponse as any).products || [];

      updateMessage(assistantMessageId, {
        content: finalContent,
        toolCall: undefined,
      }, false);

      await handleStreamComplete(assistantMessageId, convId, {
        content: finalContent,
        products,
      });

      return { content: finalContent, products };
    }

    // Poll for completion
    const POLLING_INTERVAL = 30000; // 30 seconds
    const MAX_ATTEMPTS = 20; // 10 minutes total
    let attempts = 0;

    while (attempts < MAX_ATTEMPTS) {
      if (signal.aborted) {
        throw new Error("Request aborted");
      }

      await sleep(POLLING_INTERVAL);
      attempts++;

      const statusResponse = await pollTaskStatus(
        statusUrl,
        pollingUrl,
        signal
      );

      if (statusResponse.status === "complete") {
        const finalContent = statusResponse.content || "Task completed.";
        const products = statusResponse.products || [];

        updateMessage(assistantMessageId, {
          content: finalContent,
          toolCall: undefined,
        }, false);

        await handleStreamComplete(assistantMessageId, convId, {
          content: finalContent,
          products,
        });

        return { content: finalContent, products };
      } else if (statusResponse.status === "failed") {
        throw new Error("Task failed on server");
      }

      // Status is still "pending" or "running", continue polling
    }

    // Timeout reached
    throw new Error("Polling timeout: Task did not complete within 10 minutes");
  } catch (error) {
    updateMessage(assistantMessageId, {
      toolCall: undefined,
    }, false);
    throw error;
  }
}

Example 2: Task Initialization

typescript

async function initializePollingTask(
  content: string,
  pollingUrl: string,
  messages: Message[],
  signal: AbortSignal
): Promise<InitializeTaskResponse> {
  const history = messagesToHistory(messages);
  const userEmail = getUserEmail();

  const response = await fetch(`${pollingUrl}chat`, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "x-api-key": process.env.EXPO_PUBLIC_INTERNAL_API_KEY || "",
      ...(userEmail && { "x-prosusai-user-email": userEmail }),
    },
    body: JSON.stringify({
      message: content,
      history,
    }),
    signal,
  });

  if (!response.ok) {
    throw new Error(`Failed to initialize task: ${response.statusText}`);
  }

  return await response.json();
}

Example 3: Status Polling

typescript

async function pollTaskStatus(
  statusUrl: string,
  pollingUrl: string,
  signal: AbortSignal
): Promise<TaskStatusResponse> {
  const response = await fetch(
    `${pollingUrl}${statusUrl}`.replaceAll('//', '/'),
    {
      method: "GET",
      headers: {
        "x-api-key": process.env.EXPO_PUBLIC_INTERNAL_API_KEY || "",
      },
      signal,
    }
  );

  if (!response.ok) {
    throw new Error(`Failed to poll status: ${response.statusText}`);
  }

  return await response.json();
}

Configuration

Agent Type Selection

The Browser Agent is selected in the Agent Settings screen:

File: app/agent-settings.tsx in the main application

Description: "Web browsing agent that can navigate and interact with websites. Capable of performing automated browser tasks."

Communication Mode

The Browser Agent only supports polling mode by default, but also supports streaming and non-streaming for quick tasks:

File: lib/database/repositories/agent-config.repository.ts

typescript

export function getAvailableCommunicationModes(agentType: AgentType): CommunicationMode[] {
  if (agentType === 'browser-agent') {
    return ['polling'];  // Primary mode
  }
  // ...
}

However, the implementation also supports streaming and non-streaming modes when those endpoints are available.

Environment Variables

Variable	Description	Source
`EXPO_PUBLIC_INTERNAL_API_KEY`	Internal API key for Prosus-hosted endpoints	From .env file

Important: Never commit API keys to version control. All keys should be stored in .env.local (which is git-ignored).

URL Configuration

The Browser Agent URLs are hardcoded and cannot be modified by users:

File: lib/database/repositories/agent-config.repository.ts

typescript

export const BROWSER_AGENT_API_URL = 'https://mmwnmruxd7.eu-west-1.awsapprunner.com/chat';
export const BROWSER_AGENT_API_URL_STREAM = 'https://mmwnmruxd7.eu-west-1.awsapprunner.com/chat/stream';
export const BROWSER_AGENT_POLLING_URL = 'https://mmwnmruxd7.eu-west-1.awsapprunner.com/';

Error Handling

Task Initialization Errors

4xx Errors: Bad request, invalid parameters
5xx Errors: Server errors during task initialization

Polling Errors

Task Failed: Server reports task status as "failed"
Polling Timeout: Task does not complete within 10 minutes
Network Errors: Connection failures during polling
Abort Signal: User cancels the request

Implementation

typescript

try {
  // Initialize task
  const initResponse = await initializePollingTask(...);
  
  // Poll for status
  while (attempts < MAX_ATTEMPTS) {
    if (signal.aborted) {
      throw new Error("Request aborted");
    }
    
    const statusResponse = await pollTaskStatus(...);
    
    if (statusResponse.status === "failed") {
      throw new Error("Task failed on server");
    }
    
    if (statusResponse.status === "complete") {
      // Success
      return { content: statusResponse.content, products: statusResponse.products };
    }
    
    // Continue polling
  }
  
  throw new Error("Polling timeout: Task did not complete within 10 minutes");
} catch (error) {
  // Clear tool call indicator
  updateMessage(assistantMessageId, { toolCall: undefined }, false);
  throw error;
}

Task Status Indicators

UI Feedback

During polling, the UI shows:

Tool Call Indicator: "browser_polling" with the user's query
Status Updates: Polling continues in background
Final Response: Tool indicator is cleared and content is displayed

Status States

Pending: Task is queued, waiting to start
Running: Task is executing browser automation
Complete: Task finished successfully, content available
Failed: Task encountered an error

Use Cases

Typical Browser Agent Tasks

Web Scraping: Extract information from websites
Form Filling: Automate form submissions
Navigation: Navigate through multi-page workflows
Data Extraction: Collect data from dynamic web pages
Screenshot Capture: Take screenshots of web pages
Content Analysis: Analyze web page content

Example Queries

"Navigate to example.com and tell me what's on the homepage"
"Search for 'React Native' on Google and summarize the first 3 results"
"Fill out the contact form on example.com with my information"
"Extract all product prices from this e-commerce page"

Limitations

External Dependency: Requires Prosus-hosted API to be available
Long-Running Tasks: Maximum 10-minute timeout
Polling Overhead: 30-second polling interval adds latency
Network Required: Requires internet connection for all requests
Hardcoded URLs: Endpoints cannot be customized by users
Resource Intensive: Browser automation is resource-intensive on the server side

Performance Considerations

Polling Interval

30 seconds: Balance between responsiveness and server load
20 attempts: Maximum 10 minutes total wait time
Early Completion: Task may complete before timeout

Optimization Tips

Use streaming mode for quick tasks when available
Set appropriate timeouts based on expected task duration
Handle abort signals to allow user cancellation
Cache task results if applicable

Future Enhancements

Potential improvements:

WebSocket support for real-time status updates
Configurable polling intervals
Task progress indicators (percentage complete)
Support for task cancellation
Enhanced error recovery and retry logic
Support for multiple concurrent tasks

Native Agent - Native Agent documentation
Multi-Agent - Multi-Agent documentation
Custom Agent - Custom Agent documentation
Agent Settings UI: app/agent-settings.tsx in the main application
Polling Agent Handler: hooks/use-chat-polling-agent.ts in the main application
Custom Agent Handler: hooks/use-chat-custom-agent.ts in the main application

Browser Agent Integration Documentation ​

Overview ​

Architecture ​

High-Level Architecture ​

Key Characteristics ​

API Endpoints ​

Base URLs ​

Communication Modes ​

1. Polling Mode (Primary) ​

2. Streaming Mode ​

3. Non-Streaming Mode ​

Polling Mode API ​

Initialize Task ​

Poll Task Status ​

Polling Configuration ​

Request Format ​

Polling Mode Request ​

Polling Status Request ​

Streaming Mode Request ​

Response Format ​

Polling Mode Response ​

Streaming Mode Response ​

Integration Flow ​

Polling Mode Flow ​

Code Examples ​

Example 1: Polling Mode Implementation ​

Example 2: Task Initialization ​

Example 3: Status Polling ​

Configuration ​

Agent Type Selection ​

Communication Mode ​

Environment Variables ​

URL Configuration ​

Error Handling ​

Task Initialization Errors ​

Polling Errors ​

Implementation ​

Task Status Indicators ​

UI Feedback ​

Status States ​

Use Cases ​

Typical Browser Agent Tasks ​

Example Queries ​

Limitations ​

Performance Considerations ​

Polling Interval ​

Optimization Tips ​

Future Enhancements ​

Related Documentation ​

Browser Agent Integration Documentation

Overview

Architecture

High-Level Architecture

Key Characteristics

API Endpoints

Base URLs

Communication Modes

1. Polling Mode (Primary)

2. Streaming Mode

3. Non-Streaming Mode

Polling Mode API

Initialize Task

Poll Task Status

Polling Configuration

Request Format

Polling Mode Request

Polling Status Request

Streaming Mode Request

Response Format

Polling Mode Response

Streaming Mode Response

Integration Flow

Polling Mode Flow

Code Examples

Example 1: Polling Mode Implementation

Example 2: Task Initialization

Example 3: Status Polling

Configuration

Agent Type Selection

Communication Mode

Environment Variables

URL Configuration

Error Handling

Task Initialization Errors

Polling Errors

Implementation

Task Status Indicators

UI Feedback

Status States

Use Cases

Typical Browser Agent Tasks

Example Queries

Limitations

Performance Considerations

Polling Interval

Optimization Tips

Future Enhancements

Related Documentation