### Configure Service Environment Variables (.env) Source: https://github.com/modelscope/agentevolver/blob/main/docs/launcher.md This example shows the necessary environment variables to be defined in a `.env` file for a new service. `MYENV_PATH` specifies the working directory, and `MYENV_SCRIPT` defines the command to start the service. ```dotenv # MyEnv service MYENV_PATH=/abs/path/to/myenv MYENV_SCRIPT=python -m myenv.api --host 0.0.0.0 --port 9009 ``` -------------------------------- ### Launch full experiment Source: https://github.com/modelscope/agentevolver/blob/main/docs/launcher.md Execute the launcher script with flags to clean up existing processes, back up configurations, and start required services. ```bash python launcher.py --kill --conf examples/self-question-nav-attr.yaml --with-appworld --with-exp-maker ``` -------------------------------- ### Setup Appworld Environment Service (Bash) Source: https://github.com/modelscope/agentevolver/blob/main/docs/index.md Configures the environment service specifically for Appworld. This script should be run from the specified directory. ```bash cd env_service/environments/appworld && bash setup.sh ``` -------------------------------- ### Launch AgentEvolver Training Source: https://github.com/modelscope/agentevolver/blob/main/README.md Starts the training process using the AgentEvolver launcher. Includes options for minimal training or full training with ReMe integration. ```bash conda activate agentevolver # option 1: minimal example without ReMe (using built-in datasets within environments) python launcher.py --conf examples/basic.yaml --with-appworld # option 2: full example with ReMe (questioning + navigating + attributing) python launcher.py --conf examples/overall.yaml --with-appworld --with-reme ``` -------------------------------- ### Install Project Dependencies Source: https://github.com/modelscope/agentevolver/blob/main/games/README.md Installs all required packages for the AgentEvolver project, including game-specific dependencies. This is a prerequisite before training an LLM agent. ```bash bash install.sh pip install -r games/requirements_game.txt ``` -------------------------------- ### Environment Initialization Request and Response Source: https://github.com/modelscope/agentevolver/blob/main/docs/guidelines/env_service.md Example JSON structures for the /create endpoint. The input requires environment identification and task parameters, while the output provides the initial system state and instance metadata. ```json { "env": "appworld", "instance_id": "my-instance-id-001", "task_id": "task-001", "params": { "simple": false, "prompt": true } } ``` ```json { "state": [ { "role": "system", "content": "[Structured prompt with task description and tool info]" }, { "role": "user", "content": "[Task instruction from Appworld]" } ], "info": { "instance_id": "my-instance-id-001", "task_id": "task-001" } } ``` -------------------------------- ### Orchestrate Parallel Rollouts with ParallelEnvManager Source: https://context7.com/modelscope/agentevolver/llms.txt Demonstrates how to initialize the ParallelEnvManager for orchestrating parallel rollouts. It covers creating tasks, getting experience configurations, executing rollouts, and converting trajectories to a training data format. ```python from agentevolver.module.env_manager.env_manager import ParallelEnvManager from agentevolver.module.exp_manager.exp_manager import ExperienceManager, TaskExpConfig from agentevolver.schema.task import Task # Initialize parallel environment manager env_manager = ParallelEnvManager( config=config, async_rollout_manager=async_rollout_manager, max_parallel=64, # Max concurrent environment workers max_llm_retries=3 ) # Create tasks for rollout tasks = [ Task(task_id=f"task_{i}", query=f"Query {i}", env_type="appworld") for i in range(32) ] # Initialize experience manager for self-navigating exp_manager = ExperienceManager(config=config) # Get experience configurations for each task task_exp_configs = exp_manager.get_complete_exp_configs(tasks, mode="sample") # Execute parallel rollouts trajectories = env_manager.rollout( tasks=tasks, task_exp_configs=task_exp_configs, mode="sample", # or "validate" epoch="train.1.0" ) # Returns List[Trajectory] sorted by (data_id, rollout_id) # Convert trajectories to DataProto for training dataproto = env_manager.to_dataproto(trajectories) # dataproto.batch contains: prompts, responses, attention_mask, loss_mask, etc. # dataproto.non_tensor_batch contains: task_ids, reward_scores, steps, etc. ``` -------------------------------- ### Install ReMe Experience Management (Bash) Source: https://github.com/modelscope/agentevolver/blob/main/docs/index.md Installs the ReMe component for experience management. This is an optional step for enhanced functionality. Refer to the ReMe GitHub for more details. ```bash bash external/reme/install_reme.sh ``` -------------------------------- ### Launch Environment Service via CLI Source: https://github.com/modelscope/agentevolver/blob/main/docs/guidelines/env_service.md Commands to start the environment service manually. Supports configuration of environment type, network binding, and debug modes. ```bash python -m env_service.env_service --env appworld --portal 127.0.0.1 --port 8080 --debug True ``` -------------------------------- ### Install AgentEvolver Dependencies (Bash) Source: https://github.com/modelscope/agentevolver/blob/main/docs/index.md Installs the necessary dependencies for AgentEvolver, including setting up the training environment. Requires conda and cuda toolkit to be pre-installed. ```bash bash install.sh ``` -------------------------------- ### Install Game Dependencies Source: https://github.com/modelscope/agentevolver/blob/main/games/README.md Installs the necessary Python packages for non-training usage of the game features. It requires a requirements file specific to games. ```bash pip install -r games/requirements_game.txt ``` -------------------------------- ### Control Game UI Display Based on State (CSS) Source: https://github.com/modelscope/agentevolver/blob/main/games/web/static/avalon/participate.html These CSS rules dynamically control the display of game UI elements based on the presence of 'auto-start' or 'manual-start' classes on the HTML element. The 'auto-start' class hides the game setup and shows messages and input containers, suitable for games that start automatically. The 'manual-start' class does the opposite, showing the game setup and hiding messages and input containers, for games requiring manual initiation. ```css html.auto-start #game-setup { display: none !important; } html.auto-start #messages-container { display: flex !important; } html.auto-start .input-container { display: flex !important; } html.manual-start #game-setup { display: block !important; } html.manual-start #messages-container { display: none !important; } html.manual-start .input-container { display: none !important; } ``` -------------------------------- ### Launch Web Interface Server Source: https://github.com/modelscope/agentevolver/blob/main/games/README.md Starts the Python web server for the AgentEvolver project. Once running, the interface can be accessed via a web browser at http://localhost:8000. ```python python games/web/server.py ``` -------------------------------- ### Configure Task Manager Grader Source: https://github.com/modelscope/agentevolver/blob/main/docs/guidelines/task_manager.md Example configuration for the Task Manager grader, demonstrating how to specify the original environment grader and the synthetic LLM-based fallback grader. ```yaml task_manager: grader: original_grader: env synthetic_grader: llm ``` -------------------------------- ### Configure and Start ReMe-Service Source: https://github.com/modelscope/agentevolver/blob/main/docs/tutorial/quick_start.md Sets up the ReMe long-term memory service with specific LLM and embedding configurations. This service listens on a local port to provide memory and reflection capabilities to the agent. ```bash export FLOW_EMBEDDING_API_KEY="" export FLOW_EMBEDDING_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1 export FLOW_LLM_API_KEY="" export FLOW_LLM_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1 conda activate reme cd external/reme reme config=default backend=http thread_pool_max_workers=256 http.host="127.0.0.1" http.port=8001 http.limit_concurrency=256 llm.default.model_name=qwen-max-2025-01-25 embedding_model.default.model_name=text-embedding-v4 vector_store.default.backend=local op.rerank_memory_op.params.enable_llm_rerank=false ``` -------------------------------- ### Configure AgentEvolver Training Parameters Source: https://context7.com/modelscope/agentevolver/llms.txt Example YAML configuration for AgentEvolver training, covering environment service endpoints, trainer hyperparameters, data processing limits, and algorithm-specific settings. ```yaml env_service: env_type: "appworld" env_url: "http://127.0.0.1:8000" trainer: total_epochs: 10 save_freq: 100 data: train_batch_size: 32 actor_rollout_ref: rollout: n: 4 max_steps: 20 algorithm: adv_estimator: "grpo" gamma: 1.0 ``` -------------------------------- ### Configure Observer UI Visibility Source: https://github.com/modelscope/agentevolver/blob/main/games/web/static/avalon/observe.html CSS rules to manage the visibility of game setup and message containers based on the initialization state class applied to the HTML element. ```css html.auto-start #game-setup { display: none !important; } html.auto-start #messages-container { display: flex !important; } html.manual-start #game-setup { display: block !important; } html.manual-start #messages-container { display: none !important; } ``` -------------------------------- ### Start Task Manager in Standalone Mode (Bash) Source: https://github.com/modelscope/agentevolver/blob/main/docs/guidelines/task_manager.md This command initiates the Task Manager in standalone mode for simple task synthesis. It runs the module directly, displaying synthesis progress and printing the path to the generated tasks upon completion. ```bash $ python -m agentevolver.module.task_manager ``` -------------------------------- ### Implement Custom Diplomacy Workflow Class Source: https://github.com/modelscope/agentevolver/blob/main/examples/game/diplomacy/README.md Shows how to create a custom workflow class by inheriting from BaseAgentscopeWorkflow. It requires implementing the __init__ method for agent setup and the execute method for running the game logic and returning a trajectory. ```python from agentevolver.utils.agentscope_utils import BaseAgentscopeWorkflow from agentevolver.schema.trajectory import Trajectory class DiplomacyWorkflow(BaseAgentscopeWorkflow): def __init__(self, task, llm_chat_fn, model_name, **kwargs): super().__init__(task, llm_chat_fn, model_name, **kwargs) def execute(self) -> Trajectory: pass ``` -------------------------------- ### POST /create Source: https://github.com/modelscope/agentevolver/blob/main/docs/guidelines/env_service.md Initializes a new environment instance and prepares its initial state for a specific task. ```APIDOC ## POST /create ### Description Initializes a new environment instance and prepares its initial state. ### Method POST ### Endpoint /create ### Request Body - **env** (string) - Required - The environment to use (e.g., "appworld"). - **instance_id** (string) - Required - A user-defined identifier for tracking the environment instance. - **task_id** (string) - Required - The task assigned to this instance. - **params** (dict) - Optional - Additional configuration options (simple, prompt). ### Request Example { "env": "appworld", "instance_id": "my-instance-id-001", "task_id": "task-001", "params": { "simple": false, "prompt": true } } ### Response #### Success Response (200) - **state** (array) - The initial state containing system and user messages. - **info** (object) - Metadata including instance_id and task_id. #### Response Example { "state": [ {"role": "system", "content": "[Structured prompt]"}, {"role": "user", "content": "[Task instruction]"} ], "info": { "instance_id": "my-instance-id-001", "task_id": "task-001" } } ``` -------------------------------- ### Launch Experiment via CLI Source: https://github.com/modelscope/agentevolver/blob/main/docs/launcher.md Demonstrates how to execute the launcher script with a specific configuration file and optional environment services. The command includes the configuration path and flags to enable specific services and process cleanup. ```bash python launcher.py --conf examples/self-question-nav-attr.yaml --python-killer --with-appworld --with-exp-maker --with-logview ``` -------------------------------- ### Initialize and Load Tasks with TaskManager Source: https://context7.com/modelscope/agentevolver/llms.txt Shows how to initialize the TaskManager with configuration and load tasks from both an environment service and a JSONL dataset file. It also demonstrates accessing loaded tasks and creating a training dataset. ```python from agentevolver.module.task_manager.task_manager import TaskManager, FullDataset from agentevolver.client.env_client import EnvClient # Initialize task manager with configuration task_manager = TaskManager( config=task_config, mixture_strategy="unified", # unified, stratified, or custom reward_config=reward_config ) # Load tasks from environment service env_client = EnvClient("http://127.0.0.1:8000") task_manager.load_tasks_from_environment( env_client, env_type="appworld", split="train" ) # Load tasks from JSONL dataset file task_manager.load_tasks_from_dataset( dataset=rlhf_dataset, env_type="appworld" ) # Access loaded tasks seed_tasks = task_manager.seed_tasks print(f"Loaded {len(seed_tasks)} seed tasks") # Create training dataset with automatic task mixing train_dataset = FullDataset( task_manager=task_manager, mixture_strategy=task_manager._mixture_strategy, reward_config=task_manager._reward_config, cache_path="./cache/train_tasks.jsonl", tokenizer=tokenizer, config=data_config, processor=None ) # Update dataset for next epoch train_dataset.update() ``` -------------------------------- ### Manage Rollout Context with Experience - Python Source: https://github.com/modelscope/agentevolver/blob/main/docs/guidelines/exp_manager.md Injects relevant past experiences into prompts for context-aware rollouts. It retrieves top-K experiences using EMClient, formats them, and prepends them to the current rollout message, enhancing context without altering the core task. ```python history_experience = self.em_client.call_context_generator( trajectory=trajectory, retrieve_top_k=reme_config.retrieve_top_k, workspace_id=reme_config.workspace_id ) formatted_experience = self.experience_template.format(history_experience) new_content = formatted_experience + trajectory.steps[-1]["content"] trajectory.steps[-1]["content"] = new_content ``` -------------------------------- ### GET /healthz Source: https://github.com/modelscope/agentevolver/blob/main/docs/guidelines/env_service.md Performs a basic health check on the environment service. ```APIDOC ## GET /healthz ### Description Basic service status check. ### Method GET ### Endpoint /healthz ``` -------------------------------- ### Launch AgentEvolver with Configuration Source: https://github.com/modelscope/agentevolver/blob/main/docs/tutorial/configuration.md Commands to execute the AgentEvolver training process using either a custom YAML file or direct command-line overrides for specific parameters. ```bash python launcher.py --conf examples/my_config.yaml ``` ```bash python3 -m agentevolver.main_ppo \ --config-path="$CONFIG_PATH" \ --config-name='script_config' \ trainer.experiment_name=my_experiment \ data.train_batch_size=64 ``` -------------------------------- ### Launch AgentEvolver Training (Python) Source: https://github.com/modelscope/agentevolver/blob/main/docs/index.md Launches the AgentEvolver training process using a configuration file. Supports basic training and full self-evolving pipelines with options for Appworld integration and ReMe. ```python # minimal example without ReMe (using built-in datasets within environments). python launcher.py --conf examples/train-basic.yaml --with-appworld # full example with ReMe (questioning + navigating + attributing) python launcher.py --conf examples/self-question-nav-attr.yaml --with-appworld ``` -------------------------------- ### Initialize and Run Ray PPO Trainer Source: https://context7.com/modelscope/agentevolver/llms.txt Configures and executes the AgentEvolverRayPPOTrainer. It sets up resource pools, maps roles to workers, and initiates the distributed training loop. ```python from agentevolver.module.trainer.ae_ray_trainer import AgentEvolverRayPPOTrainer from verl.trainer.ppo.ray_trainer import ResourcePoolManager, Role resource_pool_manager = ResourcePoolManager(resource_pool_spec={"actor_rollout": {"gpu": 4, "cpu": 16}, "critic": {"gpu": 2, "cpu": 8}}) role_worker_mapping = {Role.ActorRollout: ActorRolloutWorker, Role.Critic: CriticWorker} trainer = AgentEvolverRayPPOTrainer( config=config, tokenizer=tokenizer, role_worker_mapping=role_worker_mapping, resource_pool_manager=resource_pool_manager, train_task_manager=train_task_manager, val_task_manager=val_task_manager, shuffle_trainset=True ) trainer.init_workers() trainer.fit() ``` -------------------------------- ### Launch Workflow via CLI Source: https://github.com/modelscope/agentevolver/blob/main/examples/game/avalon/README.md Executes the configured workflow using the launcher script. Requires specifying the directory path to the configuration files. ```bash python launcher.py --config-path examples/game/avalon --config-name config ``` -------------------------------- ### POST /step Source: https://github.com/modelscope/agentevolver/blob/main/docs/guidelines/env_service.md Sends an action to the environment instance and retrieves the resulting state, reward, and completion status. ```APIDOC ## POST /step ### Description Sends an action to the environment instance. ### Method POST ### Endpoint /step ### Request Body - **instance_id** (string) - Required - The ID of the active environment instance. - **action** (string/object) - Required - The action to perform in the environment. ### Response #### Success Response (200) - **state** (object) - The next state of the environment. - **reward** (float) - The reward received for the action. - **done** (bool) - Indicates if the task is complete. ``` -------------------------------- ### Configure Multi-Node Ray Cluster Source: https://github.com/modelscope/agentevolver/blob/main/docs/tutorial/quick_start.md Manual setup for a distributed Ray cluster across multiple nodes to enable multi-node training. Requires all nodes to share the same Conda environment. ```bash conda activate agentevolver ray start --head ray start --address='' ``` -------------------------------- ### Configure Game Scene Layout Source: https://github.com/modelscope/agentevolver/blob/main/games/web/static/loop/index.html Sets up the main game area container with a grid background overlay and responsive table positioning. ```css #scene { position: relative; height: 580px; overflow: hidden; border-radius: 12px; border: 2px solid var(--border); background: linear-gradient(180deg, #143458 0%, #0c192c 60%); } #scene::before { content: ""; position: absolute; inset: 0; background-image: linear-gradient(90deg, var(--grid) 1px, transparent 1px), linear-gradient(var(--grid) 1px, transparent 1px); background-size: 32px 32px; opacity: 0.35; } ``` -------------------------------- ### POST /env/create Source: https://context7.com/modelscope/agentevolver/llms.txt Creates a new environment instance for a specific task ID. ```APIDOC ## POST /env/create ### Description Initializes a new environment instance based on the provided task identifier. ### Method POST ### Endpoint /env/create ### Parameters #### Request Body - **task_id** (string) - Required - The unique identifier of the task. - **params** (object) - Optional - Configuration parameters for the instance. ### Request Example { "task_id": "task_1", "params": {} } ### Response #### Success Response (200) - **instance_id** (string) - The unique ID of the created environment instance. #### Response Example { "instance_id": "instance_task_1_0" } ``` -------------------------------- ### Initialize and Execute Agent Flow Source: https://context7.com/modelscope/agentevolver/llms.txt Shows the initialization of the AgentFlow class with necessary components like a reward calculator and LLM function, followed by executing the agent-environment interaction loop. ```python from agentevolver.module.agent_flow.agent_flow import AgentFlow from agentevolver.module.agent_flow.reward_calculator import RewardCalculator from agentevolver.module.context_manager.cmt_linear import Linear_CMT agent_flow = AgentFlow( reward_calculator=reward_calculator, llm_chat_fn=llm_chat_fn, tokenizer=tokenizer, config=config ) context_manager = Linear_CMT( tokenizer=tokenizer, max_context_length=config.data.max_seq_length, metadata={} ) init_messages = [{"role": "user", "content": "Your task is to..."}] context_manager = agent_flow.execute( context_manager=context_manager, init_messages=init_messages, env=env_client, instance_id="env_instance_001", tmux={'step': [0], 'token': [0]}, stop=[False], thread_index=0, task_id="task_001", traj_exp_config=traj_exp_config, data_id="0", rollout_id="0", query="Find user profile" ) reward = context_manager.reward print(f"Outcome: {reward.outcome}, Success: {reward.success_rate}") trajectory = context_manager.to_trajectory() ``` -------------------------------- ### Environment Profile JSON Structure Source: https://github.com/modelscope/agentevolver/blob/main/docs/guidelines/task_manager.md Defines the structure for an Environment Profile JSON file. This file specifies entities, their attributes, and operations within an environment, along with task preferences for guiding data collection. ```json { "name": "Alice", "background": "A general user working with a file system.", "entities": [ { "name": "file", "description": "A file in a file system.", "attrs": { "name": "The name of the file.", "size": "The size of the file in bytes.", "type": "The type of the file, e.g. text, image, video, etc.", "parent": "The parent directory of the file." }, "opts": [ { "name": "create", "description": "Create a new file." }, { "name": "delete", "description": "Delete a file." }, { "name": "read", "description": "Read a file." }, { "name": "write", "description": "Write to a file." } ] }, { "name": "directory", "description": "A directory in a file system.", "attrs": { "name": "The name of the directory.", "parent": "The parent directory of the directory." }, "opts": [ { "name": "create", "description": "Create a new directory." }, { "name": "delete", "description": "Delete a directory." }, { "name": "list", "description": "List the contents of a directory." } ] } ], "task_preference": { "num_entities": 2, "num_opts": 3, "relation_difficulty": 3 } } ``` -------------------------------- ### Launch Service with pty_launch (Python) Source: https://github.com/modelscope/agentevolver/blob/main/docs/launcher.md This code shows how to conditionally call the `pty_launch` helper function within the `main()` function of `launcher.py` when a specific CLI flag is provided. It includes an optional 'ready' string for monitoring service startup. ```python if args.with_myenv: pty_launch("myenv", success_std_string="Uvicorn running on") ``` -------------------------------- ### Python: Compute Suffix Sum for Step Advantages Source: https://github.com/modelscope/agentevolver/blob/main/docs/guidelines/adv_processor.md Calculates the advantage for each step by computing the cumulative sum of future rewards (suffix sum). This function helps determine the expected future reward from any given step, crucial for guiding policy learning. ```python def suffix_sum_on_steps(rewards: torch.Tensor) -> torch.Tensor: # Calculates the suffix sum of rewards for each step. # ... implementation details ... pass ``` -------------------------------- ### Initialize Game Configuration and State (JavaScript) Source: https://github.com/modelscope/agentevolver/blob/main/games/web/static/avalon/participate.html This JavaScript code initializes early game settings by checking session storage for game configuration, selected portraits, and game running status. It updates the global `__EARLY_INIT__` object and applies CSS classes to the document element to control the game's display mode (auto-start or manual-start). It handles potential errors during parsing by defaulting to manual-start. ```javascript window.__EARLY_INIT__ = { hasGameConfig: false, isGameRunning: false, config: null, portraits: null }; (function() { try { var gameConfigStr = sessionStorage.getItem('gameConfig'); var portraitsStr = sessionStorage.getItem('selectedPortraits'); var gameRunning = sessionStorage.getItem('gameRunning') === 'true'; if (gameConfigStr) { window.__EARLY_INIT__.hasGameConfig = true; window.__EARLY_INIT__.config = JSON.parse(gameConfigStr); document.documentElement.classList.add('auto-start'); } else if (gameRunning) { window.__EARLY_INIT__.isGameRunning = true; document.documentElement.classList.add('auto-start'); } else { document.documentElement.classList.add('manual-start'); } if (portraitsStr) { window.__EARLY_INIT__.portraits = JSON.parse(portraitsStr); } } catch (e) { document.documentElement.classList.add('manual-start'); } })(); ``` -------------------------------- ### JavaScript Game Redirection and Mode Setting Source: https://github.com/modelscope/agentevolver/blob/main/games/web/static/loop/index.html This code defines a function to compute the redirect URL for a given game and overrides the window.setMode function to enforce an 'observe' mode. It also includes event listeners to manage the game state, particularly for Avalon, preventing new games from starting if one is already running and redirecting users to the existing session. ```javascript window.computeRedirectUrl = function(game) { if (!game) return '/'; return `/static/loop/${game}/observe.html`; }; const originalSetMode = window.setMode; window.setMode = function() { if (originalSetMode) originalSetMode('observe'); }; document.addEventListener('DOMContentLoaded', function() { if (window.setMode) window.setMode('observe'); const modeToggle = document.querySelector('.mode-toggle'); if (modeToggle) modeToggle.style.display = 'none'; document.querySelectorAll('.avalon-participate-only, .diplomacy-participate-only').forEach((el) => el.style.display = 'none'); // Prevent starting a new Avalon game while one is running; redirect to existing session. const isGameRunning = sessionStorage.getItem('gameRunning') === 'true'; let runningGame = null; try { const cfgStr = sessionStorage.getItem('gameConfig'); if (cfgStr) { const cfg = JSON.parse(cfgStr); runningGame = cfg.game || null; } } catch (e) { runningGame = null; } const avalonCard = document.querySelector('.game-card[data-game="avalon"]'); const startBtn = document.getElementById('start-btn'); function redirectToAvalon() { window.location.href = '/static/loop/avalon/observe.html'; } if (isGameRunning && runningGame === 'avalon') { if (startBtn) { startBtn.disabled = true; startBtn.textContent = 'Avalon running'; startBtn.addEventListener('click', function(e) { e.preventDefault(); redirectToAvalon(); }); } if (avalonCard) { avalonCard.classList.add('active'); avalonCard.addEventListener('click', function(e) { e.preventDefault(); redirectToAvalon(); }); } } }); ``` -------------------------------- ### Launch Simulation Environment Service Source: https://github.com/modelscope/agentevolver/blob/main/docs/tutorial/quick_start.md Activates the AppWorld environment and launches the simulation service required for agent operation. This service must run in the background. ```bash conda activate appworld bash env_service/launch_script/appworld.sh ``` -------------------------------- ### POST /env/step Source: https://context7.com/modelscope/agentevolver/llms.txt Executes an action within a specific environment instance and returns the observation. ```APIDOC ## POST /env/step ### Description Performs an action in the environment and returns the resulting state, reward, and termination status. ### Method POST ### Endpoint /env/step ### Parameters #### Request Body - **instance_id** (string) - Required - The ID of the environment instance. - **action** (object) - Required - The action to be performed. ### Request Example { "instance_id": "instance_task_1_0", "action": {"command": "click_button"} } ### Response #### Success Response (200) - **state** (object) - The new state of the environment. - **reward** (float) - The reward received. - **is_terminated** (boolean) - Whether the task is finished. #### Response Example { "state": {"content": "Action executed", "role": "user"}, "reward": 0.5, "is_terminated": false } ``` -------------------------------- ### Manual Execution Scripts (Bash) Source: https://github.com/modelscope/agentevolver/blob/main/docs/index.md Provides standalone scripts for manual execution of AgentEvolver pipelines. Includes options for basic RL training and the complete self-evolving pipeline with customizable configurations. ```bash # Execute basic RL pipeline with GRPO using built-in datasets within environments. bash examples/run_basic.sh # Run the complete self-evolving AgentEvolver pipeline with fully customizable configurations. bash examples/run_overall.sh ``` -------------------------------- ### Execute Agent Training Source: https://github.com/modelscope/agentevolver/blob/main/docs/tutorial/quick_start.md Runs the training scripts for either basic GRPO or the full AgentEvolver pipeline. Requires the environment and memory services to be active. ```bash conda activate agentevolver bash examples/run_basic.sh bash examples/run_overall.sh ``` -------------------------------- ### Launch Diplomacy Training Source: https://github.com/modelscope/agentevolver/blob/main/examples/game/diplomacy/README.md Commands to execute the training process using the launcher script or provided shell scripts for production and debug environments. ```bash python launcher.py --config-path examples/game/diplomacy --config-name config ./examples/game/diplomacy/run_train.sh ./examples/game/diplomacy/run_train_debug.sh ``` -------------------------------- ### POST /step Source: https://github.com/modelscope/agentevolver/blob/main/docs/guidelines/env_service.md Advances the environment by one step using a user-generated action and returns the resulting state and reward. ```APIDOC ## POST /step ### Description Advances the environment one step using a user-generated action. This endpoint is used to execute agent commands and receive feedback. ### Method POST ### Endpoint /step ### Parameters #### Request Body - **action** (object) - Required - The action object containing the agent's intent. - **role** (str) - Required - Typically "assistant". - **content** (str) - Required - The agent's natural language or code action. - **tool_calls** (list) - Optional - List of structured tool calls. ### Request Example { "action": { "role": "assistant", "content": "```python\nopen_app(\"Calendar\")\n```", "tool_calls": [] } } ### Response #### Success Response (200) - **state** (list) - Environment’s response including tool output. - **reward** (float) - Task performance score (1.0 for success). - **is_terminated** (bool) - Indicates if the task is completed. - **info** (dict) - Additional metadata. #### Response Example { "state": [ { "role": "assistant", "content": "Output:\n```\nApp opened successfully\n```" } ], "reward": 1.0, "is_terminated": true, "info": {} } ``` -------------------------------- ### Trajectory Creation and Usage Source: https://context7.com/modelscope/agentevolver/llms.txt Demonstrates how to create a Trajectory object, access its properties, and convert it into a training sample format. ```APIDOC ## Trajectory Creation and Usage ### Description This section shows how to create a `Trajectory` object, which represents a sequence of steps in an agent's interaction with an environment. It also covers accessing properties of the trajectory and converting it into a format suitable for training. ### Method N/A (Code Example) ### Endpoint N/A ### Parameters N/A ### Request Example ```python from agentevolver.schema.trajectory import Trajectory from agentevolver.schema.reward import Reward trajectory = Trajectory( task_id="task_001", data_id="0", rollout_id="0", query="Find user profile information", steps=[ { "role": "assistant", "content": "I'll search for the user profile using the API.\nsearch_user(id='123')" }, { "role": "user", "content": "[OBSERVATION]\nUser found: John Doe, email: john@example.com" }, { "role": "assistant", "content": "I found the user profile. The user is John Doe with email john@example.com." } ], is_terminated=True, reward=Reward( outcome=1.0, success_rate=1.0, madness=0.0, description="Task completed successfully" ), metadata={ "task_train_exp_mode": "discard", "add_exp": True, "experience_list": [] } ) # Access trajectory properties print(f"Task: {trajectory.task_id}") print(f"Success: {trajectory.reward.success_rate}") print(f"Steps: {len(trajectory.steps)}") # Convert trajectory to training sample format samples = trajectory.group_tokenize() # Returns List[Sample] ``` ### Response N/A ### Response Example N/A ``` -------------------------------- ### Create and Access Trajectory Data Source: https://context7.com/modelscope/agentevolver/llms.txt Demonstrates how to create a Trajectory object, populate it with steps and rewards, and access its properties. It also shows how to convert the trajectory into a training sample format. ```python trajectory = Trajectory( task_id="task_001", data_id="0", rollout_id="0", query="Find user profile information", steps=[ { "role": "assistant", "content": "I'll search for the user profile using the API.\nsearch_user(id='123')" }, { "role": "user", "content": "[OBSERVATION]\nUser found: John Doe, email: john@example.com" }, { "role": "assistant", "content": "I found the user profile. The user is John Doe with email john@example.com." } ], is_terminated=True, reward=Reward( outcome=1.0, success_rate=1.0, madness=0.0, description="Task completed successfully" ), metadata={ "task_train_exp_mode": "discard", "add_exp": True, "experience_list": [] } ) # Access trajectory properties print(f"Task: {trajectory.task_id}") print(f"Success: {trajectory.reward.success_rate}") print(f"Steps: {len(trajectory.steps)}") # Convert trajectory to training sample format samples = trajectory.group_tokenize() # Returns List[Sample] ``` -------------------------------- ### Manage Experience with ExperienceManager Source: https://context7.com/modelscope/agentevolver/llms.txt Illustrates the use of the ExperienceManager for implementing the Self-Navigating mechanism. It shows how to allocate training modes for tasks and allocate settings for experience addition during rollouts. ```python from agentevolver.module.exp_manager.exp_manager import ( ExperienceManager, ExperienceWorker, TaskExpConfig, TrajExpConfig ) from agentevolver.schema.trajectory import Trajectory # Initialize experience manager exp_manager = ExperienceManager(config=config) # Allocate training modes for tasks (keep vs discard experience) task_exp_configs = exp_manager.allocate_train_mode(tasks) # Returns List[TaskExpConfig] with train_mode: "keep" or "discard" # Allocate experience addition settings task_exp_configs = exp_manager.allocate_add_exp( task_exp_configs, mode="sample" # or "validate" ) # Each config now has add_exp: List[bool] for each rollout ``` -------------------------------- ### Experience Worker for Rollout Context Management Source: https://context7.com/modelscope/agentevolver/llms.txt Illustrates the use of ExperienceWorker to manage rollout context by adding historical experience. It shows how to configure experience retrieval and augment messages with it. ```python exp_worker = ExperienceWorker(config=config) traj_exp_config = TrajExpConfig( add_exp=True, train_mode="discard", task_id="task_001", query="Find user data" ) augmented_messages, updated_config = exp_worker.manage_rollout_context( init_messages=[{"role": "user", "content": "Task query..."}], traj_exp_config=traj_exp_config ) ```