# poke-env `poke-env` (v0.15.0) is a Python library for building scripted bots, self-play experiments, and reinforcement-learning agents on [Pokémon Showdown](https://pokemonshowdown.com/). It provides an async interface for connecting to a Showdown server, receiving battle state, and issuing move or switch decisions, as well as a full PettingZoo/Gymnasium-compatible environment layer so agents can be trained with standard RL libraries such as Stable-Baselines3. The library targets Python 3.10+ and depends on `gymnasium`, `pettingzoo`, `numpy`, `websockets`, and `orjson`. The library is structured around three concerns: the **player layer** (`Player` and its subclasses), which handles the WebSocket connection, battle-message parsing, and the core `choose_move` / `teampreview` interface; the **battle-state layer** (`Battle`, `DoubleBattle`, `Pokemon`, `Move`), which mirrors every piece of information the server exposes; and the **environment layer** (`SinglesEnv`, `DoublesEnv`, `SingleAgentWrapper`), which wraps two-player self-play or agent-vs-bot scenarios into a standard Gym API suitable for training RL policies. --- ## Installation ```bash pip install poke-env ``` Start a local Pokémon Showdown server (recommended for training): ```bash git clone https://github.com/smogon/pokemon-showdown.git cd pokemon-showdown npm install cp config/config-example.js config/config.js node pokemon-showdown start --no-security ``` --- ## Player — subclassing `Player` to build a bot `Player` is the abstract base class for all bots. Subclass it and implement `choose_move`. The constructor accepts an optional `AccountConfiguration`, `ServerConfiguration`, `battle_format`, `team`, and many other parameters. Once instantiated, the player automatically connects to the Showdown server and listens for incoming battle messages. ```python import asyncio from poke_env.battle import AbstractBattle, Battle from poke_env.player import Player from poke_env.player.battle_order import BattleOrder class MaxPowerPlayer(Player): """Always uses the highest base-power available move; falls back to a random move.""" def choose_move(self, battle: AbstractBattle) -> BattleOrder: assert isinstance(battle, Battle) if battle.available_moves: best = max(battle.available_moves, key=lambda m: m.base_power) return self.create_order(best) # Force-switch or no moves available — pick randomly return self.choose_random_move(battle) async def main(): p1 = MaxPowerPlayer(battle_format="gen9randombattle", max_concurrent_battles=1) p2 = MaxPowerPlayer(battle_format="gen9randombattle", max_concurrent_battles=1) await p1.battle_against(p2, n_battles=5) print(f"p1 wins: {p1.n_won_battles}/{p1.n_finished_battles}") print(f"p2 wins: {p2.n_won_battles}/{p2.n_finished_battles}") print(f"p1 win rate: {p1.win_rate:.2%}") # Clean up connections await p1.ps_client.stop_listening() await p2.ps_client.stop_listening() asyncio.run(main()) # p1 wins: 3/5 # p2 wins: 2/5 # p1 win rate: 60.00% ``` --- ## `Player.battle_against` — run N battles between two players High-level helper that internally coordinates `send_challenges` and `accept_challenges`. Works for both singles and doubles when `format_is_doubles` is set correctly. ```python import asyncio from poke_env.player import RandomPlayer async def main(): attacker = RandomPlayer(battle_format="gen9randombattle", max_concurrent_battles=3) defender = RandomPlayer(battle_format="gen9randombattle", max_concurrent_battles=3) # Run 10 concurrent battles await attacker.battle_against(defender, n_battles=10) print(f"Finished: {attacker.n_finished_battles}") print(f"Won: {attacker.n_won_battles}") print(f"Lost: {attacker.n_lost_battles}") print(f"Tied: {attacker.n_tied_battles}") await attacker.ps_client.stop_listening() await defender.ps_client.stop_listening() asyncio.run(main()) ``` --- ## `Player.ladder` — play ranked ladder games Submits the player to the matchmaking queue and plays `n_games` ladder matches automatically. ```python import asyncio from poke_env.player import SimpleHeuristicsPlayer async def main(): player = SimpleHeuristicsPlayer( battle_format="gen9randombattle", max_concurrent_battles=1, ) await player.ladder(n_games=20) print(f"Ladder record: {player.n_won_battles}W / {player.n_lost_battles}L") await player.ps_client.stop_listening() asyncio.run(main()) ``` --- ## `Player.accept_challenges` — accept human challenges Waits for incoming challenges from a named opponent (or any opponent if `None`) and accepts them. ```python import asyncio from poke_env.player import SimpleHeuristicsPlayer async def main(): bot = SimpleHeuristicsPlayer( battle_format="gen9randombattle", max_concurrent_battles=1, ) # Wait for and accept 3 challenges from any player await bot.accept_challenges(opponent=None, n_challenges=3) print(f"Finished: {bot.n_finished_battles} battles") await bot.ps_client.stop_listening() asyncio.run(main()) ``` --- ## `Player.create_order` — build a move/switch order Static helper that wraps a `Move` or `Pokemon` into a `SingleBattleOrder`, optionally activating gimmicks (mega, z-move, dynamax, terastallize). ```python from poke_env.battle import Battle from poke_env.player import Player from poke_env.player.battle_order import BattleOrder class TeraPlayer(Player): def choose_move(self, battle: Battle) -> BattleOrder: if battle.available_moves: best = max(battle.available_moves, key=lambda m: m.base_power) # Terastallize on the first move of the battle if available if battle.can_tera and battle.turn == 1: return self.create_order(best, terastallize=True) return self.create_order(best) return self.choose_random_move(battle) ``` --- ## `Player.save_replay` — export a battle replay as HTML Writes a local HTML replay file for any battle the player has participated in. ```python import asyncio from pathlib import Path from poke_env.player import RandomPlayer async def main(): p1 = RandomPlayer(battle_format="gen9randombattle") p2 = RandomPlayer(battle_format="gen9randombattle") await p1.battle_against(p2, n_battles=1) battle_tag = list(p1.battles.keys())[0] out = p1.save_replay(battle_tag, Path("replays") / f"{battle_tag}.html") print(f"Replay saved to {out}") await p1.ps_client.stop_listening() await p2.ps_client.stop_listening() asyncio.run(main()) ``` --- ## `AccountConfiguration` — configure player credentials `NamedTuple` holding a Showdown username and optional password. `AccountConfiguration.generate` creates auto-numbered accounts for local/unauthenticated servers. ```python from poke_env import AccountConfiguration, ShowdownServerConfiguration from poke_env.player import RandomPlayer # Local server — no password local_cfg = AccountConfiguration("MyBot", None) local_player = RandomPlayer(account_configuration=local_cfg) # Official Showdown server — password required auth_cfg = AccountConfiguration("MyBot", "s3cr3t") online_player = RandomPlayer( account_configuration=auth_cfg, server_configuration=ShowdownServerConfiguration, ) # Auto-generated accounts (useful for self-play training) auto1 = AccountConfiguration.generate("TrainingBot") # "TrainingBot 1" auto2 = AccountConfiguration.generate("TrainingBot") # "TrainingBot 2" ``` --- ## `ServerConfiguration` — point a player at a custom server `NamedTuple` of `(websocket_url, authentication_url)`. Two pre-built constants cover the two most common cases. ```python from poke_env import ServerConfiguration from poke_env.player import RandomPlayer # Built-in: localhost from poke_env import AccountConfiguration from poke_env.ps_client.server_configuration import LocalhostServerConfiguration p_local = RandomPlayer(server_configuration=LocalhostServerConfiguration) # Built-in: official Showdown server from poke_env.ps_client.server_configuration import ShowdownServerConfiguration p_online = RandomPlayer( account_configuration=AccountConfiguration("user", "pass"), server_configuration=ShowdownServerConfiguration, ) # Custom server custom = ServerConfiguration( "ws://my.server:5432/showdown/websocket", "https://my.server/action.php?", ) p_custom = RandomPlayer(server_configuration=custom) ``` --- ## Built-in baseline players — `RandomPlayer`, `MaxBasePowerPlayer`, `SimpleHeuristicsPlayer` Three ready-to-use `Player` subclasses that serve as opponents, benchmarks, and baselines for RL training. ```python import asyncio from poke_env.player import RandomPlayer, MaxBasePowerPlayer, SimpleHeuristicsPlayer, cross_evaluate from tabulate import tabulate async def main(): players = [ RandomPlayer(battle_format="gen9randombattle", max_concurrent_battles=5), MaxBasePowerPlayer(battle_format="gen9randombattle", max_concurrent_battles=5), SimpleHeuristicsPlayer(battle_format="gen9randombattle", max_concurrent_battles=5), ] results = await cross_evaluate(players, n_challenges=20) names = [p.username for p in players] table = [[n] + [f"{results[n].get(m, '-'):.0%}" if results[n].get(m) is not None else "-" for m in names] for n in names] print(tabulate(table, headers=[""] + names)) for p in players: await p.ps_client.stop_listening() asyncio.run(main()) ``` --- ## `cross_evaluate` — pairwise win-rate matrix Runs round-robin battles between all provided players and returns a nested dict of win rates. ```python import asyncio from poke_env.player import RandomPlayer, MaxBasePowerPlayer, cross_evaluate async def main(): p1 = RandomPlayer(battle_format="gen9randombattle", max_concurrent_battles=5) p2 = MaxBasePowerPlayer(battle_format="gen9randombattle", max_concurrent_battles=5) results = await cross_evaluate([p1, p2], n_challenges=50) # results[p1.username][p2.username] -> p1's win rate against p2 # results[p2.username][p1.username] -> p2's win rate against p1 print(results) # {'RandomPlayer 1': {'MaxBasePowerPlayer 2': 0.28}, # 'MaxBasePowerPlayer 2': {'RandomPlayer 1': 0.72}} await p1.ps_client.stop_listening() await p2.ps_client.stop_listening() asyncio.run(main()) ``` --- ## `evaluate_player` — estimate bot strength as a scalar rating Plays placement battles against three calibrated baselines (`RandomPlayer`, `MaxBasePowerPlayer`, `SimpleHeuristicsPlayer`), picks the closest one, then runs the full evaluation. Returns a strength estimate and 95% confidence interval, calibrated so that `RandomPlayer = 1`. ```python import asyncio from poke_env.player import RandomPlayer from poke_env.player.utils import evaluate_player async def main(): my_bot = RandomPlayer( battle_format="gen8randombattle", # must be gen8randombattle max_concurrent_battles=10, ) strength, (lo, hi) = await evaluate_player(my_bot, n_battles=200) print(f"Estimated strength: {strength:.2f} (95% CI: [{lo:.2f}, {hi:.2f}])") # Estimated strength: 1.02 (95% CI: [0.78, 1.34]) await my_bot.ps_client.stop_listening() asyncio.run(main()) ``` --- ## `Battle` — single-battle state object Provides the full current state of a singles battle, including active Pokémon, available moves and switches, field conditions, and gimmick availability. Passed directly to `choose_move`. ```python from poke_env.battle import Battle, Weather, Field, SideCondition from poke_env.player import Player from poke_env.player.battle_order import BattleOrder class InfoPlayer(Player): def choose_move(self, battle: Battle) -> BattleOrder: active = battle.active_pokemon opp = battle.opponent_active_pokemon print(f"Turn {battle.turn}") print(f" My active: {active.species} {active.current_hp_fraction:.0%} HP") print(f" Opponent: {opp.species if opp else 'None'}") print(f" Weather: {battle.weather}") print(f" Fields: {list(battle.fields.keys())}") print(f" My hazards: {list(battle.side_conditions.keys())}") print(f" Can tera: {battle.can_tera}") print(f" Can dynamax: {battle.can_dynamax}") print(f" Moves: {[m.id for m in battle.available_moves]}") print(f" Switches: {[p.species for p in battle.available_switches]}") print(f" Valid orders: {len(battle.valid_orders)}") return self.choose_random_move(battle) ``` --- ## `DoubleBattle` — doubles battle state object Extends `AbstractBattle` for double battles: `active_pokemon`, `available_moves`, `available_switches`, `force_switch`, and `can_*` flags are all length-2 lists indexed by slot. ```python from poke_env.battle import DoubleBattle from poke_env.player import Player from poke_env.player.battle_order import BattleOrder, DoubleBattleOrder, SingleBattleOrder class SimpleDoublesPlayer(Player): def choose_move(self, battle: DoubleBattle) -> BattleOrder: orders = [] for mon, moves, switches in zip( battle.active_pokemon, battle.available_moves, battle.available_switches, ): if mon is None or mon.fainted: from poke_env.player.battle_order import PassBattleOrder orders.append(PassBattleOrder()) elif moves: best = max(moves, key=lambda m: m.base_power) targets = battle.get_possible_showdown_targets(best, mon) orders.append(SingleBattleOrder(best, move_target=targets[0])) elif switches: orders.append(SingleBattleOrder(switches[0])) else: from poke_env.player.battle_order import DefaultBattleOrder orders.append(DefaultBattleOrder()) joined = DoubleBattleOrder.join_orders([orders[0]], [orders[1]]) return joined[0] if joined else DoubleBattleOrder(orders[0], orders[1]) ``` --- ## `Pokemon` — Pokémon state within a battle Tracks species, types, ability, item, HP, status, stat boosts, active effects, moves, and more. ```python from poke_env.battle import Battle, Pokemon, Status, PokemonType from poke_env.player import Player from poke_env.player.battle_order import BattleOrder class TypeAwarePlayer(Player): def choose_move(self, battle: Battle) -> BattleOrder: opp = battle.opponent_active_pokemon if opp and battle.available_moves: # Pick move that deals the most type-effective damage from poke_env.data import GenData chart = GenData.from_gen(battle.gen).type_chart best = max( battle.available_moves, key=lambda m: opp.damage_multiplier(m) * m.base_power, ) print(f" {opp.species} HP:{opp.current_hp_fraction:.0%} " f"Status:{opp.status} Types:{opp.types}") print(f" boosts: {opp.boosts}") return self.create_order(best) return self.choose_random_move(battle) ``` --- ## `Move` — move metadata Exposes all move properties derived from Showdown's data: `base_power`, `accuracy`, `type`, `category`, `priority`, `expected_hits`, `boosts`, `target`, and many more. ```python from poke_env.battle import Battle, Move, MoveCategory from poke_env.player import Player from poke_env.player.battle_order import BattleOrder class CategoryAwarePlayer(Player): def choose_move(self, battle: Battle) -> BattleOrder: active = battle.active_pokemon opp = battle.opponent_active_pokemon if not active or not opp or not battle.available_moves: return self.choose_random_move(battle) atk = active.stats.get("atk", 0) or 0 spa = active.stats.get("spa", 0) or 0 opp_def = opp.base_stats.get("def", 1) opp_spd = opp.base_stats.get("spd", 1) def score(m: Move) -> float: ratio = atk / opp_def if m.category == MoveCategory.PHYSICAL else spa / opp_spd return m.base_power * m.accuracy * m.expected_hits * opp.damage_multiplier(m) * ratio best = max(battle.available_moves, key=score) return self.create_order(best) ``` --- ## `GenData` — generation-specific Pokédex, moves, and type chart Singleton-per-generation providing the Pokédex, move data, type chart, natures, and learnsets. Use `GenData.from_gen(gen)` or `GenData.from_format(format_str)`. ```python from poke_env.data import GenData # Load gen 9 data gen9 = GenData.from_gen(9) # Type chart fire_vs_grass = gen9.type_chart["fire"]["grass"] # 2.0 water_vs_fire = gen9.type_chart["water"]["fire"] # 2.0 # Pokédex entry charizard = gen9.pokedex["charizard"] print(charizard["baseStats"]) # {'hp': 78, 'atk': 84, 'def': 78, ...} print(charizard["types"]) # ['Fire', 'Flying'] # Move data flamethrower = gen9.moves["flamethrower"] print(flamethrower["basePower"]) # 90 print(flamethrower["accuracy"]) # 100 # Load from format string gen8_data = GenData.from_format("gen8ou") print(gen8_data.gen) # 8 ``` --- ## `compute_raw_stats` — calculate a Pokémon's actual battle stats Computes the six in-battle stats (`hp, atk, def, spa, spd, spe`) from base stats, EVs, IVs, level, and nature using the standard formula. ```python from poke_env.data import GenData from poke_env.stats import compute_raw_stats gen9 = GenData.from_gen(9) # Modest Dragapult, 252 SpA / 252 Spe / 4 HP, 31 IVs, level 50 evs = [4, 0, 0, 252, 0, 252] # hp atk def spa spd spe ivs = [31] * 6 stats = compute_raw_stats("dragapult", evs, ivs, level=50, nature="modest", data=gen9) # stats -> [hp, atk, def, spa, spd, spe] print(dict(zip(["hp", "atk", "def", "spa", "spd", "spe"], stats))) # {'hp': 155, 'atk': 110, 'def': 75, 'spa': 170, 'spd': 80, 'spe': 182} ``` --- ## `Teambuilder` — supply custom teams to a player Abstract base class for team providers. Implement `yield_team` to return a packed team string. `parse_showdown_team`, `parse_packed_team`, and `join_team` are static helpers for format conversion. ```python import asyncio import random from poke_env.player import RandomPlayer from poke_env.teambuilder import Teambuilder class RandomTeamFromPool(Teambuilder): """Pick a random pre-built team from a pool on every battle.""" def __init__(self, showdown_teams: list[str]): # Convert Showdown text format -> packed format once self.packed_teams = [ self.join_team(self.parse_showdown_team(t)) for t in showdown_teams ] def yield_team(self) -> str: return random.choice(self.packed_teams) TEAM = """ Garchomp @ Rocky Helmet Ability: Rough Skin EVs: 248 HP / 64 Def / 196 Spe Jolly Nature - Stealth Rock - Earthquake - Dragon Tail - Fire Fang """ builder = RandomTeamFromPool([TEAM]) async def main(): p1 = RandomPlayer(battle_format="gen9ou", team=builder, max_concurrent_battles=1) p2 = RandomPlayer(battle_format="gen9ou", team=builder, max_concurrent_battles=1) await p1.battle_against(p2, n_battles=1) await p1.ps_client.stop_listening() await p2.ps_client.stop_listening() asyncio.run(main()) ``` --- ## `SinglesEnv` — PettingZoo parallel env for RL training (singles) Self-play environment implementing the PettingZoo `ParallelEnv` API. Two internal `_EnvPlayer` instances battle each other; actions are integers from a discrete space whose size depends on the generation. Override `embed_battle` and `calc_reward` to define observations and rewards. ```python import asyncio import numpy as np from gymnasium.spaces import Box from stable_baselines3 import PPO from poke_env.battle import AbstractBattle from poke_env.environment import SingleAgentWrapper, SinglesEnv from poke_env.player import SimpleHeuristicsPlayer class MyEnv(SinglesEnv): N_OBS = 10 def __init__(self, **kwargs): super().__init__(**kwargs) self.observation_spaces = { agent: Box(-1.0, 1.0, shape=(self.N_OBS,), dtype=np.float32) for agent in self.possible_agents } def embed_battle(self, battle: AbstractBattle) -> np.ndarray: moves_bp = np.array( [m.base_power / 100 for m in battle.available_moves[:4]] + [0.0] * (4 - len(battle.available_moves)) ) fainted_ratio = np.array([ sum(m.fainted for m in battle.team.values()) / 6, sum(m.fainted for m in battle.opponent_team.values()) / 6, ]) hp = np.array([ battle.active_pokemon.current_hp_fraction if battle.active_pokemon else 0.0, battle.opponent_active_pokemon.current_hp_fraction if battle.opponent_active_pokemon else 0.0, float(battle.can_dynamax), float(battle.can_tera), ]) return np.concatenate([moves_bp, fainted_ratio, hp], dtype=np.float32) def calc_reward(self, battle: AbstractBattle) -> float: return self.reward_computing_helper( battle, fainted_value=2.0, hp_value=1.0, status_value=0.5, victory_value=30.0 ) env = MyEnv(battle_format="gen9randombattle", log_level=40, open_timeout=None) opponent = SimpleHeuristicsPlayer(start_listening=False) wrapped = SingleAgentWrapper(env, opponent) model = PPO("MlpPolicy", wrapped, verbose=0) model.learn(total_timesteps=10_000) wrapped.close() ``` --- ## `DoublesEnv` — PettingZoo parallel env for RL training (doubles) Mirrors `SinglesEnv` but for double battles. The action space is `MultiDiscrete([action_space_size, action_space_size])`, one action per active slot. ```python import numpy as np from gymnasium.spaces import Box from poke_env.battle import AbstractBattle from poke_env.environment import DoublesEnv class MyDoublesEnv(DoublesEnv): def __init__(self, **kwargs): super().__init__(**kwargs) self.observation_spaces = { agent: Box(-1.0, 1.0, shape=(8,), dtype=np.float32) for agent in self.possible_agents } def embed_battle(self, battle: AbstractBattle) -> np.ndarray: hp_slots = [] for mon in battle.active_pokemon: # type: ignore[attr-defined] hp_slots.append(mon.current_hp_fraction if mon else 0.0) for mon in battle.opponent_active_pokemon: # type: ignore[attr-defined] hp_slots.append(mon.current_hp_fraction if mon else 0.0) fainted = [ sum(m.fainted for m in battle.team.values()) / 6, sum(m.fainted for m in battle.opponent_team.values()) / 6, ] can_dmax = [ float(battle.can_dynamax[0]), # type: ignore[index] float(battle.can_dynamax[1]), # type: ignore[index] ] return np.array(hp_slots + fainted + can_dmax, dtype=np.float32) def calc_reward(self, battle: AbstractBattle) -> float: return self.reward_computing_helper( battle, fainted_value=2.0, hp_value=1.0, status_value=0.25, victory_value=30.0 ) env = MyDoublesEnv(battle_format="gen8randomdoublesbattle", log_level=40) obs, _ = env.reset() print(obs) # dict keyed by agent username, values are np.float32 arrays of shape (8,) env.close() ``` --- ## `SingleAgentWrapper` — convert a PettingZoo env to a standard Gymnasium env Wraps a `PokeEnv` (e.g. `SinglesEnv`) and a fixed `Player` opponent so the result is a single-agent `gymnasium.Env` compatible with Stable-Baselines3's `SubprocVecEnv` and `Monitor`. ```python import numpy as np from gymnasium.spaces import Box from stable_baselines3 import PPO from stable_baselines3.common.monitor import Monitor from stable_baselines3.common.vec_env import SubprocVecEnv from poke_env.battle import AbstractBattle from poke_env.environment import SingleAgentWrapper, SinglesEnv from poke_env.player import SimpleHeuristicsPlayer class CompactEnv(SinglesEnv): N = 4 def __init__(self, **kw): super().__init__(**kw) self.observation_spaces = { a: Box(-1.0, 4.0, shape=(self.N,), dtype=np.float32) for a in self.possible_agents } def embed_battle(self, battle: AbstractBattle) -> np.ndarray: bp = [m.base_power / 100 for m in battle.available_moves[:self.N]] bp += [0.0] * (self.N - len(bp)) return np.array(bp, dtype=np.float32) def calc_reward(self, battle: AbstractBattle) -> float: return self.reward_computing_helper(battle, victory_value=10.0, fainted_value=1.0) @classmethod def make(cls) -> Monitor: env = cls(battle_format="gen9randombattle", log_level=40, open_timeout=None) opp = SimpleHeuristicsPlayer(start_listening=False) return Monitor(SingleAgentWrapper(env, opp)) vec_env = SubprocVecEnv([CompactEnv.make, CompactEnv.make]) model = PPO("MlpPolicy", vec_env, n_steps=512, batch_size=64, verbose=0) model.learn(total_timesteps=5_000) vec_env.close() ``` --- ## `SinglesEnv.get_action_mask` — produce a legal-action boolean mask Returns a list of 0/1 integers indicating which actions in the discrete action space are currently legal, enabling action-masking in neural policies. ```python import numpy as np import torch from poke_env.battle import Battle from poke_env.environment import SinglesEnv from poke_env.player import Player from poke_env.player.battle_order import BattleOrder class MaskedPlayer(Player): """Example: apply action mask before sampling with a neural policy.""" def choose_move(self, battle: Battle) -> BattleOrder: mask = np.array(SinglesEnv.get_action_mask(battle), dtype=np.float32) # mask has shape (action_space_size,); 1 = legal, 0 = illegal logits = np.random.randn(len(mask)) logits[mask == 0] = -1e9 # mask out illegal actions action = np.argmax(logits) return SinglesEnv.action_to_order(np.int64(action), battle, strict=False) ``` --- ## `SinglesEnv.action_to_order` / `order_to_action` — convert between integer actions and `BattleOrder` Static methods that map between the integer action space and `BattleOrder` objects, with optional `fake` (skip legality) and `strict` (raise vs. default) modes. ```python import numpy as np from poke_env.battle import Battle from poke_env.environment import SinglesEnv from poke_env.player import Player from poke_env.player.battle_order import BattleOrder, SingleBattleOrder class RecordingPlayer(Player): def choose_move(self, battle: Battle) -> BattleOrder: order = self.choose_random_singles_move(battle) # Roundtrip: order -> action index -> order action = SinglesEnv.order_to_action(order, battle, fake=False, strict=True) recovered = SinglesEnv.action_to_order(action, battle, fake=False, strict=True) print(f"action={int(action)} order={order} recovered={recovered}") return order ``` --- ## `calculate_damage` (gen9) — compute damage rolls for Gen 9 Calculates `(min_damage, max_damage)` for a given attacker/defender/move triple, accounting for weather, terrain, stat boosts, STAB, and type effectiveness. ```python from poke_env.battle import Battle from poke_env.calc.damage_calc_gen9 import calculate_damage from poke_env.player import Player from poke_env.player.battle_order import BattleOrder class DamageCalcPlayer(Player): def choose_move(self, battle: Battle) -> BattleOrder: active = battle.active_pokemon opp = battle.opponent_active_pokemon if not active or not opp or not battle.available_moves: return self.choose_random_move(battle) best_move = None best_min_dmg = -1 attacker_id = f"{battle.player_role}: {active.species}" defender_id = f"{battle.opponent_role}: {opp.species}" for move in battle.available_moves: try: lo, hi = calculate_damage(attacker_id, defender_id, move, battle) if lo > best_min_dmg: best_min_dmg = lo best_move = move except Exception: pass # calc may fail for some edge-case moves if best_move: return self.create_order(best_move) return self.choose_random_move(battle) ``` --- ## Tracking battle observations with `_battle_finished_callback` Override `_battle_finished_callback` to execute custom logic when a battle ends. Combined with per-turn snapshots inside `choose_move`, this enables detailed logging and dataset collection. ```python import asyncio from dataclasses import dataclass, field from typing import Dict, List, Optional from poke_env.battle import AbstractBattle, Battle, Weather, Field, SideCondition from poke_env.player import Player, RandomPlayer from poke_env.player.battle_order import BattleOrder @dataclass class Snapshot: turn: int our_active: Optional[str] opp_active: Optional[str] our_hp: float opp_hp: float class LoggingPlayer(Player): def __init__(self, **kw): super().__init__(**kw) self._logs: Dict[str, List[Snapshot]] = {} def choose_move(self, battle: AbstractBattle) -> BattleOrder: assert isinstance(battle, Battle) snap = Snapshot( turn=battle.turn, our_active=battle.active_pokemon.species if battle.active_pokemon else None, opp_active=battle.opponent_active_pokemon.species if battle.opponent_active_pokemon else None, our_hp=battle.active_pokemon.current_hp_fraction if battle.active_pokemon else 0.0, opp_hp=battle.opponent_active_pokemon.current_hp_fraction if battle.opponent_active_pokemon else 0.0, ) self._logs.setdefault(battle.battle_tag, []).append(snap) return self.choose_random_move(battle) def _battle_finished_callback(self, battle: AbstractBattle): log = self._logs.pop(battle.battle_tag, []) result = "WON" if battle.won else "LOST" print(f"{battle.battle_tag}: {result} in {len(log)} turns") async def main(): player = LoggingPlayer(battle_format="gen9randombattle") opp = RandomPlayer(battle_format="gen9randombattle") await player.battle_against(opp, n_battles=3) await player.ps_client.stop_listening() await opp.ps_client.stop_listening() asyncio.run(main()) ``` --- ## Summary `poke-env` covers three primary use cases. The first is **scripted bot development**: subclass `Player`, implement `choose_move` using the rich `Battle`/`Pokemon`/`Move` state objects, and run battles against humans on the official Showdown server, a local server, or other bots via `battle_against` and `accept_challenges`. Built-in helpers like `create_order`, `choose_random_move`, `random_teampreview`, and the damage calculator in `poke_env.calc` accelerate heuristic and rule-based agents. The second is **self-play and benchmarking**: `cross_evaluate` and `evaluate_player` in `poke_env.player.utils` provide a standardised way to measure relative and absolute bot strength, while the three baseline players form a calibrated ladder (`RandomPlayer = 1`, `MaxBasePowerPlayer ≈ 7.7`, `SimpleHeuristicsPlayer ≈ 128.8`). The third and most involved use case is **reinforcement learning**. `SinglesEnv` and `DoublesEnv` expose a PettingZoo `ParallelEnv` (self-play, both agents learn simultaneously) while `SingleAgentWrapper` converts that into a standard `gymnasium.Env` (agent vs. fixed opponent). Both environments require users to implement `embed_battle` (observation builder returning a NumPy array) and `calc_reward` (float reward signal) while everything else — WebSocket I/O, battle parsing, action-to-order mapping, concurrency, and teampreview — is handled by the library. Action masking is first-class: `SinglesEnv.get_action_mask` returns a per-step legal-action vector that can be passed to masked policy classes in Stable-Baselines3 or any other framework, preventing the agent from ever selecting an illegal action. The environment integrates directly with SB3's `SubprocVecEnv`, `Monitor`, and the PettingZoo wrapper library `supersuit`.