agentlab.agents.generic_agent.generic_agent

GenericAgent implementation for AgentLab

This module defines a GenericAgent class and its associated arguments for use in the AgentLab framework. The GenericAgent class is designed to interact with a chat-based model to determine actions based on observations. It includes methods for preprocessing observations, generating actions, and managing internal state such as plans, memories, and thoughts. The GenericAgentArgs class provides configuration options for the agent, including model arguments and flags for various behaviors.

Functions

get_action_post_hoc(agent, obs, ans_dict)

Get the action post-hoc for the agent.

Classes

GenericAgent(chat_model_args, flags[, max_retry])

GenericAgentArgs([agent_name, ...])

class agentlab.agents.generic_agent.generic_agent.GenericAgent(chat_model_args: BaseModelArgs, flags: GenericPromptFlags, max_retry: int = 4)

Bases: Agent

get_action(obs)

Updates the agent with the current observation, and returns its next action (plus an info dict, optional).

Parameters:

obs:

The current observation of the environment, after it has been processed by obs_preprocessor(). By default, a BrowserGym observation is a dict with the following entries: - “chat_messages”: list[str], messages between the agent and the user. - “goal”: str, the current goal. - “open_pages_urls”: list[str], open pages. - “active_page_index”: int, the index of the active page. - “url”: str, the current URL. - “screenshot”: 3D np.array, the current screenshot. - “dom_object”: dict, the current DOM object. See DOMSnapshot from chrome devtools. - “axtree_object”: dict, the current AXTREE object. See Accessibility Tree from chrome devtools. - “extra_element_properties”: dict[bid, dict[name, value]] extra properties of elements in the DOM. - “focused_element_bid”: str, the bid of the focused element. - “last_action”: str, the last action executed. - “last_action_error”: str, the error of the last action. - “elapsed_time”: float, the time elapsed since the start of the episode.

Returns:

action: str

The action to be processed by action_mapping() (if any), and executed in the environment.

info: AgentInfo

Additional information about the action. with the following entries being handled by BrowserGym:

  • “think”: optional chain of thought

  • “messages”: list of messages with the LLM

  • “stats”: dict of extra statistics that will be saved and aggregated.

  • “markdown_page”: str, string that will be displayed by agentlab’s xray tool.

  • “extra_info”: dict, additional information that will be saved and aggregated.

obs_preprocessor(obs: dict) dict

Function that pre-processes observations before feeding them to get_action(). This property is meant to be overloaded by your agent (optional). By default, the base observation is augmented with text versions of the DOM and AXTREE.

Why this mapping? This mapping will happen within the experiment loop, so that the resulting observation gets recorded in the execution traces, and statistics can be computed from it.

reset(seed=None)
class agentlab.agents.generic_agent.generic_agent.GenericAgentArgs(agent_name: str = None, chat_model_args: agentlab.llm.base_api.BaseModelArgs = None, flags: agentlab.agents.generic_agent.generic_agent_prompt.GenericPromptFlags = None, max_retry: int = 4)

Bases: AgentArgs

chat_model_args: BaseModelArgs = None
close()

Close the agent’s LLM models after running the experiment.

flags: GenericPromptFlags = None
make_agent()

Comply the experiments.loop API for instantiating the agent.

max_retry: int = 4
prepare()

Prepare the agent’s LLM models before running the experiment.

set_benchmark(benchmark: Benchmark, demo_mode)

Override Some flags based on the benchmark.

set_reproducibility_mode()

Optional method to set the agent in a reproducibility mode.

This should adjust the agent configuration to make it as deterministic as possible e.g. setting the temperature of the model to 0.

This is only called when reproducibility is requested.

Raises:

NotImplementedError – If the agent does not support reproducibility.

agentlab.agents.generic_agent.generic_agent.get_action_post_hoc(agent: GenericAgent, obs: dict, ans_dict: dict)

Get the action post-hoc for the agent.

This function is used to get the action after the agent has already been run. Its goal is to recreate the prompt and the output of the agent a posteriori. The purpose is to build datasets for training the agents.

Parameters:
  • agent (GenericAgent) – The agent for which the action is being determined.

  • obs (dict) – The observation dictionary to append to the agent’s history.

  • ans_dict (dict) – The answer dictionary containing the plan, step, memory, think, and action.

Returns:

The complete prompt used for the agent and the reconstructed output based on the answer dictionary.

Return type:

Tuple[str, str]