agentlab.agents.visualwebarena.agent

Functions

`image_data_to_uri`(image_data[, output_format])
`parser`(response)

Classes

`VisualWebArenaAgent`(temperature, chat_model, ...)
`VisualWebArenaAgentArgs`([agent_name, ...])

class agentlab.agents.visualwebarena.agent.VisualWebArenaAgent(temperature: float, chat_model: AbstractChatModel, action_set: HighLevelActionSet, observation_type: Literal['axtree', 'axtree_som', 'axtree_screenshot'], with_few_shot_examples: bool)

Bases: Agent

get_action(obs)

Updates the agent with the current observation, and returns its next action (plus an info dict, optional).

Parameters:

obs:: The current observation of the environment, after it has been processed by obs_preprocessor(). By default, a BrowserGym observation is a dict with the following entries: - “chat_messages”: list[str], messages between the agent and the user. - “goal”: str, the current goal. - “open_pages_urls”: list[str], open pages. - “active_page_index”: int, the index of the active page. - “url”: str, the current URL. - “screenshot”: 3D np.array, the current screenshot. - “dom_object”: dict, the current DOM object. See DOMSnapshot from chrome devtools. - “axtree_object”: dict, the current AXTREE object. See Accessibility Tree from chrome devtools. - “extra_element_properties”: dict[bid, dict[name, value]] extra properties of elements in the DOM. - “focused_element_bid”: str, the bid of the focused element. - “last_action”: str, the last action executed. - “last_action_error”: str, the error of the last action. - “elapsed_time”: float, the time elapsed since the start of the episode.

Returns:

action: str

The action to be processed by action_mapping() (if any), and executed in the environment.

info: AgentInfo

Additional information about the action. with the following entries being handled by BrowserGym:

“think”: optional chain of thought

“messages”: list of messages with the LLM

“stats”: dict of extra statistics that will be saved and aggregated.

“markdown_page”: str, string that will be displayed by agentlab’s xray tool.

“extra_info”: dict, additional information that will be saved and aggregated.

class agentlab.agents.visualwebarena.agent.VisualWebArenaAgentArgs(agent_name: str = 'VisualWebArenaAgent', temperature: float = 0.1, chat_model_args: agentlab.llm.base_api.BaseModelArgs = None, action_set_args: browsergym.experiments.benchmark.base.HighLevelActionSetArgs = None, observation_type: Literal['axtree', 'axtree_som', 'axtree_screenshot'] = 'axtree_som', with_few_shot_examples: bool = True)

Bases: AgentArgs

action_set_args: HighLevelActionSetArgs = None

agent_name: str = 'VisualWebArenaAgent'

chat_model_args: BaseModelArgs = None

close(): Close the agent’s LLM models after running the experiment.

make_agent() → Agent: Comply the experiments.loop API for instantiating the agent.

observation_type: Literal['axtree', 'axtree_som', 'axtree_screenshot'] = 'axtree_som'

prepare(): Prepare the agent’s LLM models before running the experiment.

set_benchmark(benchmark: Benchmark, demo_mode: bool)

Optional method to set benchmark specific flags.

This allows the agent to have minor adjustments based on the benchmark. E.g. using a benchmark specific action space. Or letting the agent see HTML on MiniWoB since AXTree is not enough. Users should avoid making extensive benchmark specific prompt engineering.

Parameters:

benchmark – str Name of the benchmark.
demo_mode – bool If True, the agent should adapt to demo mode. E.g. it can set the demo_mode flag in the browsergym action space.

set_reproducibility_mode()

Optional method to set the agent in a reproducibility mode.

This should adjust the agent configuration to make it as deterministic as possible e.g. setting the temperature of the model to 0.

This is only called when reproducibility is requested.

Raises:: NotImplementedError – If the agent does not support reproducibility.

temperature: float = 0.1

with_few_shot_examples: bool = True

agentlab.agents.visualwebarena.agent.image_data_to_uri(image_data: bytes | ndarray, output_format: Literal['png', 'jpeg'] = 'png') → str

agentlab.agents.visualwebarena.agent.parser(response: str) → dict