Skip to content

Browser Agent Integration Documentation

Overview

The Browser Agent connects to an external API hosted by Prosus that provides web browsing and automation capabilities. This agent can navigate websites, interact with web pages, extract information, and perform automated browser tasks. It's designed for long-running tasks that require web automation.

Unlike the Native Agent which runs client-side, the Browser Agent communicates with a remote REST API service that handles browser automation using headless browsers or browser automation frameworks.

Architecture

High-Level Architecture

Key Characteristics

  • External API: Connects to Prosus-hosted API endpoint
  • Browser Automation: Uses headless browsers or automation frameworks
  • Polling-Based: Uses polling mode for long-running tasks
  • Task Management: Implements task initialization and status polling
  • Timeout Handling: Maximum 10-minute timeout for task completion
  • Streaming Support: Also supports streaming mode for quick responses

API Endpoints

The Browser Agent uses predefined endpoints hosted by Prosus:

Base URLs

  • Non-Streaming: https://mmwnmruxd7.eu-west-1.awsapprunner.com/chat
  • Streaming: https://mmwnmruxd7.eu-west-1.awsapprunner.com/chat/stream
  • Polling: https://mmwnmruxd7.eu-west-1.awsapprunner.com/ (base URL)

Note: These URLs are hardcoded and cannot be modified by users. They are defined in lib/database/repositories/agent-config.repository.ts.

Communication Modes

The Browser Agent supports three communication modes, with polling being the primary mode for long-running tasks:

1. Polling Mode (Primary)

Endpoint: POST {pollingUrl}chatGET {pollingUrl}{statusUrl}

Behavior:

  • Initiates a task and returns a task ID and status URL
  • Polls the status endpoint every 30 seconds
  • Continues until task completes, fails, or times out (10 minutes max)
  • Best for long-running browser automation tasks

Implementation: hooks/use-chat-polling-agent.ts - handlePollingAgentResponse()

2. Streaming Mode

Endpoint: /chat/stream

Behavior:

  • Returns Server-Sent Events (SSE) format
  • Content is streamed incrementally
  • Used for quick browser tasks that complete immediately

Implementation: hooks/use-chat-custom-agent.ts - handleCustomAgentStreamingResponse()

3. Non-Streaming Mode

Endpoint: /chat

Behavior:

  • Returns complete response in single JSON object
  • Used for quick browser tasks

Implementation: hooks/use-chat-custom-agent.ts - handleCustomAgentNonStreamingResponse()

Polling Mode API

Initialize Task

Endpoint: POST {pollingUrl}chat

Headers:

HeaderDescriptionRequired
Content-Typeapplication/jsonYes
x-api-keyInternal API key for authenticationYes
x-prosusai-user-emailUser's email addressOptional

Request Body:

typescript
{
  message: string;           // User's message/task description
  history: HistoryMessage[]; // Chat history
}

Response:

typescript
{
  taskId: string;           // Unique task identifier
  status: "pending" | "running";
  message?: string;          // Optional status message
  statusUrl: string;        // URL to poll for task status
}

Note: If statusUrl is not present in the response, the task completed immediately and the response contains the final result.

Poll Task Status

Endpoint: GET {pollingUrl}{statusUrl}

Headers:

HeaderDescriptionRequired
x-api-keyInternal API key for authenticationYes

Response:

typescript
{
  taskId: string;
  status: "pending" | "running" | "complete" | "failed";
  createdAt?: string;       // ISO timestamp
  content?: string;         // Final response content (when complete)
  products?: Product[];     // Optional product items
}

Polling Configuration

  • Polling Interval: 30 seconds
  • Maximum Attempts: 20 attempts
  • Total Timeout: 10 minutes (20 × 30s)

Request Format

Polling Mode Request

bash
curl -X POST https://mmwnmruxd7.eu-west-1.awsapprunner.com/chat \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -H "x-prosusai-user-email: user@example.com" \
  -d '{
    "message": "Navigate to example.com and extract the main heading",
    "history": [
      {
        "role": "user",
        "content": "Can you help me browse the web?"
      },
      {
        "role": "assistant",
        "content": "Sure! What would you like me to do?"
      }
    ]
  }'

Response:

json
{
  "taskId": "task-abc123",
  "status": "pending",
  "statusUrl": "task/status/task-abc123"
}

Polling Status Request

bash
curl -X GET https://mmwnmruxd7.eu-west-1.awsapprunner.com/task/status/task-abc123 \
  -H "x-api-key: YOUR_API_KEY"

Response (Pending):

json
{
  "taskId": "task-abc123",
  "status": "running",
  "createdAt": "2024-01-15T10:30:00Z"
}

Response (Complete):

json
{
  "taskId": "task-abc123",
  "status": "complete",
  "content": "The main heading on example.com is: 'Welcome to Example'",
  "products": []
}

Streaming Mode Request

Same format as Multi-Agent streaming requests. See Multi-Agent for details.

Response Format

Polling Mode Response

Task Initialization Response:

typescript
{
  taskId: string;
  status: "pending" | "running";
  message?: string;
  statusUrl: string;
}

Task Status Response:

typescript
{
  taskId: string;
  status: "pending" | "running" | "complete" | "failed";
  createdAt?: string;
  content?: string;      // Available when status is "complete"
  products?: Product[];  // Optional product items
}

Streaming Mode Response

Same SSE format as Multi-Agent. See Multi-Agent for details.

Integration Flow

Polling Mode Flow

Code Examples

Example 1: Polling Mode Implementation

typescript
// Inside handlePollingAgentResponse
export async function handlePollingAgentResponse(
  content: string,
  pollingUrl: string,
  signal: AbortSignal,
  assistantMessageId: string,
  convId: string,
  updateMessage: UpdateMessageFn,
  handleStreamComplete: HandleStreamCompleteFn,
  messages: Message[]
) {
  try {
    // Show tool call indicator
    updateMessage(assistantMessageId, {
      toolCall: {
        toolName: "browser_polling",
        query: content,
      },
    }, true, convId);

    // Initialize the task
    const initResponse = await initializePollingTask(
      content,
      pollingUrl,
      messages,
      signal
    );

    const { statusUrl } = initResponse;
    
    // Check if response is already complete
    if (!statusUrl) {
      const finalContent = (initResponse as any).content || "Task completed.";
      const products = (initResponse as any).products || [];

      updateMessage(assistantMessageId, {
        content: finalContent,
        toolCall: undefined,
      }, false);

      await handleStreamComplete(assistantMessageId, convId, {
        content: finalContent,
        products,
      });

      return { content: finalContent, products };
    }

    // Poll for completion
    const POLLING_INTERVAL = 30000; // 30 seconds
    const MAX_ATTEMPTS = 20; // 10 minutes total
    let attempts = 0;

    while (attempts < MAX_ATTEMPTS) {
      if (signal.aborted) {
        throw new Error("Request aborted");
      }

      await sleep(POLLING_INTERVAL);
      attempts++;

      const statusResponse = await pollTaskStatus(
        statusUrl,
        pollingUrl,
        signal
      );

      if (statusResponse.status === "complete") {
        const finalContent = statusResponse.content || "Task completed.";
        const products = statusResponse.products || [];

        updateMessage(assistantMessageId, {
          content: finalContent,
          toolCall: undefined,
        }, false);

        await handleStreamComplete(assistantMessageId, convId, {
          content: finalContent,
          products,
        });

        return { content: finalContent, products };
      } else if (statusResponse.status === "failed") {
        throw new Error("Task failed on server");
      }

      // Status is still "pending" or "running", continue polling
    }

    // Timeout reached
    throw new Error("Polling timeout: Task did not complete within 10 minutes");
  } catch (error) {
    updateMessage(assistantMessageId, {
      toolCall: undefined,
    }, false);
    throw error;
  }
}

Example 2: Task Initialization

typescript
async function initializePollingTask(
  content: string,
  pollingUrl: string,
  messages: Message[],
  signal: AbortSignal
): Promise<InitializeTaskResponse> {
  const history = messagesToHistory(messages);
  const userEmail = getUserEmail();

  const response = await fetch(`${pollingUrl}chat`, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "x-api-key": process.env.EXPO_PUBLIC_INTERNAL_API_KEY || "",
      ...(userEmail && { "x-prosusai-user-email": userEmail }),
    },
    body: JSON.stringify({
      message: content,
      history,
    }),
    signal,
  });

  if (!response.ok) {
    throw new Error(`Failed to initialize task: ${response.statusText}`);
  }

  return await response.json();
}

Example 3: Status Polling

typescript
async function pollTaskStatus(
  statusUrl: string,
  pollingUrl: string,
  signal: AbortSignal
): Promise<TaskStatusResponse> {
  const response = await fetch(
    `${pollingUrl}${statusUrl}`.replaceAll('//', '/'),
    {
      method: "GET",
      headers: {
        "x-api-key": process.env.EXPO_PUBLIC_INTERNAL_API_KEY || "",
      },
      signal,
    }
  );

  if (!response.ok) {
    throw new Error(`Failed to poll status: ${response.statusText}`);
  }

  return await response.json();
}

Configuration

Agent Type Selection

The Browser Agent is selected in the Agent Settings screen:

File: app/agent-settings.tsx in the main application

Description: "Web browsing agent that can navigate and interact with websites. Capable of performing automated browser tasks."

Communication Mode

The Browser Agent only supports polling mode by default, but also supports streaming and non-streaming for quick tasks:

File: lib/database/repositories/agent-config.repository.ts

typescript
export function getAvailableCommunicationModes(agentType: AgentType): CommunicationMode[] {
  if (agentType === 'browser-agent') {
    return ['polling'];  // Primary mode
  }
  // ...
}

However, the implementation also supports streaming and non-streaming modes when those endpoints are available.

Environment Variables

VariableDescriptionSource
EXPO_PUBLIC_INTERNAL_API_KEYInternal API key for Prosus-hosted endpointsFrom .env file

Important: Never commit API keys to version control. All keys should be stored in .env.local (which is git-ignored).

URL Configuration

The Browser Agent URLs are hardcoded and cannot be modified by users:

File: lib/database/repositories/agent-config.repository.ts

typescript
export const BROWSER_AGENT_API_URL = 'https://mmwnmruxd7.eu-west-1.awsapprunner.com/chat';
export const BROWSER_AGENT_API_URL_STREAM = 'https://mmwnmruxd7.eu-west-1.awsapprunner.com/chat/stream';
export const BROWSER_AGENT_POLLING_URL = 'https://mmwnmruxd7.eu-west-1.awsapprunner.com/';

Error Handling

Task Initialization Errors

  • 4xx Errors: Bad request, invalid parameters
  • 5xx Errors: Server errors during task initialization

Polling Errors

  • Task Failed: Server reports task status as "failed"
  • Polling Timeout: Task does not complete within 10 minutes
  • Network Errors: Connection failures during polling
  • Abort Signal: User cancels the request

Implementation

typescript
try {
  // Initialize task
  const initResponse = await initializePollingTask(...);
  
  // Poll for status
  while (attempts < MAX_ATTEMPTS) {
    if (signal.aborted) {
      throw new Error("Request aborted");
    }
    
    const statusResponse = await pollTaskStatus(...);
    
    if (statusResponse.status === "failed") {
      throw new Error("Task failed on server");
    }
    
    if (statusResponse.status === "complete") {
      // Success
      return { content: statusResponse.content, products: statusResponse.products };
    }
    
    // Continue polling
  }
  
  throw new Error("Polling timeout: Task did not complete within 10 minutes");
} catch (error) {
  // Clear tool call indicator
  updateMessage(assistantMessageId, { toolCall: undefined }, false);
  throw error;
}

Task Status Indicators

UI Feedback

During polling, the UI shows:

  • Tool Call Indicator: "browser_polling" with the user's query
  • Status Updates: Polling continues in background
  • Final Response: Tool indicator is cleared and content is displayed

Status States

  1. Pending: Task is queued, waiting to start
  2. Running: Task is executing browser automation
  3. Complete: Task finished successfully, content available
  4. Failed: Task encountered an error

Use Cases

Typical Browser Agent Tasks

  1. Web Scraping: Extract information from websites
  2. Form Filling: Automate form submissions
  3. Navigation: Navigate through multi-page workflows
  4. Data Extraction: Collect data from dynamic web pages
  5. Screenshot Capture: Take screenshots of web pages
  6. Content Analysis: Analyze web page content

Example Queries

  • "Navigate to example.com and tell me what's on the homepage"
  • "Search for 'React Native' on Google and summarize the first 3 results"
  • "Fill out the contact form on example.com with my information"
  • "Extract all product prices from this e-commerce page"

Limitations

  1. External Dependency: Requires Prosus-hosted API to be available
  2. Long-Running Tasks: Maximum 10-minute timeout
  3. Polling Overhead: 30-second polling interval adds latency
  4. Network Required: Requires internet connection for all requests
  5. Hardcoded URLs: Endpoints cannot be customized by users
  6. Resource Intensive: Browser automation is resource-intensive on the server side

Performance Considerations

Polling Interval

  • 30 seconds: Balance between responsiveness and server load
  • 20 attempts: Maximum 10 minutes total wait time
  • Early Completion: Task may complete before timeout

Optimization Tips

  1. Use streaming mode for quick tasks when available
  2. Set appropriate timeouts based on expected task duration
  3. Handle abort signals to allow user cancellation
  4. Cache task results if applicable

Future Enhancements

Potential improvements:

  • WebSocket support for real-time status updates
  • Configurable polling intervals
  • Task progress indicators (percentage complete)
  • Support for task cancellation
  • Enhanced error recovery and retry logic
  • Support for multiple concurrent tasks
  • Native Agent - Native Agent documentation
  • Multi-Agent - Multi-Agent documentation
  • Custom Agent - Custom Agent documentation
  • Agent Settings UI: app/agent-settings.tsx in the main application
  • Polling Agent Handler: hooks/use-chat-polling-agent.ts in the main application
  • Custom Agent Handler: hooks/use-chat-custom-agent.ts in the main application

Prosus AI App Documentation