# poke-env

`poke-env` (v0.15.0) is a Python library for building scripted bots, self-play experiments, and reinforcement-learning agents on [Pokémon Showdown](https://pokemonshowdown.com/). It provides an async interface for connecting to a Showdown server, receiving battle state, and issuing move or switch decisions, as well as a full PettingZoo/Gymnasium-compatible environment layer so agents can be trained with standard RL libraries such as Stable-Baselines3. The library targets Python 3.10+ and depends on `gymnasium`, `pettingzoo`, `numpy`, `websockets`, and `orjson`.

The library is structured around three concerns: the **player layer** (`Player` and its subclasses), which handles the WebSocket connection, battle-message parsing, and the core `choose_move` / `teampreview` interface; the **battle-state layer** (`Battle`, `DoubleBattle`, `Pokemon`, `Move`), which mirrors every piece of information the server exposes; and the **environment layer** (`SinglesEnv`, `DoublesEnv`, `SingleAgentWrapper`), which wraps two-player self-play or agent-vs-bot scenarios into a standard Gym API suitable for training RL policies.

---

## Installation

```bash
pip install poke-env
```

Start a local Pokémon Showdown server (recommended for training):

```bash
git clone https://github.com/smogon/pokemon-showdown.git
cd pokemon-showdown
npm install
cp config/config-example.js config/config.js
node pokemon-showdown start --no-security
```

---

## Player — subclassing `Player` to build a bot

`Player` is the abstract base class for all bots. Subclass it and implement `choose_move`. The constructor accepts an optional `AccountConfiguration`, `ServerConfiguration`, `battle_format`, `team`, and many other parameters. Once instantiated, the player automatically connects to the Showdown server and listens for incoming battle messages.

```python
import asyncio
from poke_env.battle import AbstractBattle, Battle
from poke_env.player import Player
from poke_env.player.battle_order import BattleOrder


class MaxPowerPlayer(Player):
    """Always uses the highest base-power available move; falls back to a random move."""

    def choose_move(self, battle: AbstractBattle) -> BattleOrder:
        assert isinstance(battle, Battle)
        if battle.available_moves:
            best = max(battle.available_moves, key=lambda m: m.base_power)
            return self.create_order(best)
        # Force-switch or no moves available — pick randomly
        return self.choose_random_move(battle)


async def main():
    p1 = MaxPowerPlayer(battle_format="gen9randombattle", max_concurrent_battles=1)
    p2 = MaxPowerPlayer(battle_format="gen9randombattle", max_concurrent_battles=1)

    await p1.battle_against(p2, n_battles=5)

    print(f"p1 wins: {p1.n_won_battles}/{p1.n_finished_battles}")
    print(f"p2 wins: {p2.n_won_battles}/{p2.n_finished_battles}")
    print(f"p1 win rate: {p1.win_rate:.2%}")

    # Clean up connections
    await p1.ps_client.stop_listening()
    await p2.ps_client.stop_listening()


asyncio.run(main())
# p1 wins: 3/5
# p2 wins: 2/5
# p1 win rate: 60.00%
```

---

## `Player.battle_against` — run N battles between two players

High-level helper that internally coordinates `send_challenges` and `accept_challenges`. Works for both singles and doubles when `format_is_doubles` is set correctly.

```python
import asyncio
from poke_env.player import RandomPlayer


async def main():
    attacker = RandomPlayer(battle_format="gen9randombattle", max_concurrent_battles=3)
    defender = RandomPlayer(battle_format="gen9randombattle", max_concurrent_battles=3)

    # Run 10 concurrent battles
    await attacker.battle_against(defender, n_battles=10)

    print(f"Finished: {attacker.n_finished_battles}")
    print(f"Won: {attacker.n_won_battles}")
    print(f"Lost: {attacker.n_lost_battles}")
    print(f"Tied: {attacker.n_tied_battles}")

    await attacker.ps_client.stop_listening()
    await defender.ps_client.stop_listening()


asyncio.run(main())
```

---

## `Player.ladder` — play ranked ladder games

Submits the player to the matchmaking queue and plays `n_games` ladder matches automatically.

```python
import asyncio
from poke_env.player import SimpleHeuristicsPlayer


async def main():
    player = SimpleHeuristicsPlayer(
        battle_format="gen9randombattle",
        max_concurrent_battles=1,
    )
    await player.ladder(n_games=20)
    print(f"Ladder record: {player.n_won_battles}W / {player.n_lost_battles}L")
    await player.ps_client.stop_listening()


asyncio.run(main())
```

---

## `Player.accept_challenges` — accept human challenges

Waits for incoming challenges from a named opponent (or any opponent if `None`) and accepts them.

```python
import asyncio
from poke_env.player import SimpleHeuristicsPlayer


async def main():
    bot = SimpleHeuristicsPlayer(
        battle_format="gen9randombattle",
        max_concurrent_battles=1,
    )
    # Wait for and accept 3 challenges from any player
    await bot.accept_challenges(opponent=None, n_challenges=3)
    print(f"Finished: {bot.n_finished_battles} battles")
    await bot.ps_client.stop_listening()


asyncio.run(main())
```

---

## `Player.create_order` — build a move/switch order

Static helper that wraps a `Move` or `Pokemon` into a `SingleBattleOrder`, optionally activating gimmicks (mega, z-move, dynamax, terastallize).

```python
from poke_env.battle import Battle
from poke_env.player import Player
from poke_env.player.battle_order import BattleOrder


class TeraPlayer(Player):
    def choose_move(self, battle: Battle) -> BattleOrder:
        if battle.available_moves:
            best = max(battle.available_moves, key=lambda m: m.base_power)
            # Terastallize on the first move of the battle if available
            if battle.can_tera and battle.turn == 1:
                return self.create_order(best, terastallize=True)
            return self.create_order(best)
        return self.choose_random_move(battle)
```

---

## `Player.save_replay` — export a battle replay as HTML

Writes a local HTML replay file for any battle the player has participated in.

```python
import asyncio
from pathlib import Path
from poke_env.player import RandomPlayer


async def main():
    p1 = RandomPlayer(battle_format="gen9randombattle")
    p2 = RandomPlayer(battle_format="gen9randombattle")
    await p1.battle_against(p2, n_battles=1)

    battle_tag = list(p1.battles.keys())[0]
    out = p1.save_replay(battle_tag, Path("replays") / f"{battle_tag}.html")
    print(f"Replay saved to {out}")

    await p1.ps_client.stop_listening()
    await p2.ps_client.stop_listening()


asyncio.run(main())
```

---

## `AccountConfiguration` — configure player credentials

`NamedTuple` holding a Showdown username and optional password. `AccountConfiguration.generate` creates auto-numbered accounts for local/unauthenticated servers.

```python
from poke_env import AccountConfiguration, ShowdownServerConfiguration
from poke_env.player import RandomPlayer

# Local server — no password
local_cfg = AccountConfiguration("MyBot", None)
local_player = RandomPlayer(account_configuration=local_cfg)

# Official Showdown server — password required
auth_cfg = AccountConfiguration("MyBot", "s3cr3t")
online_player = RandomPlayer(
    account_configuration=auth_cfg,
    server_configuration=ShowdownServerConfiguration,
)

# Auto-generated accounts (useful for self-play training)
auto1 = AccountConfiguration.generate("TrainingBot")   # "TrainingBot 1"
auto2 = AccountConfiguration.generate("TrainingBot")   # "TrainingBot 2"
```

---

## `ServerConfiguration` — point a player at a custom server

`NamedTuple` of `(websocket_url, authentication_url)`. Two pre-built constants cover the two most common cases.

```python
from poke_env import ServerConfiguration
from poke_env.player import RandomPlayer

# Built-in: localhost
from poke_env import AccountConfiguration
from poke_env.ps_client.server_configuration import LocalhostServerConfiguration
p_local = RandomPlayer(server_configuration=LocalhostServerConfiguration)

# Built-in: official Showdown server
from poke_env.ps_client.server_configuration import ShowdownServerConfiguration
p_online = RandomPlayer(
    account_configuration=AccountConfiguration("user", "pass"),
    server_configuration=ShowdownServerConfiguration,
)

# Custom server
custom = ServerConfiguration(
    "ws://my.server:5432/showdown/websocket",
    "https://my.server/action.php?",
)
p_custom = RandomPlayer(server_configuration=custom)
```

---

## Built-in baseline players — `RandomPlayer`, `MaxBasePowerPlayer`, `SimpleHeuristicsPlayer`

Three ready-to-use `Player` subclasses that serve as opponents, benchmarks, and baselines for RL training.

```python
import asyncio
from poke_env.player import RandomPlayer, MaxBasePowerPlayer, SimpleHeuristicsPlayer, cross_evaluate
from tabulate import tabulate


async def main():
    players = [
        RandomPlayer(battle_format="gen9randombattle", max_concurrent_battles=5),
        MaxBasePowerPlayer(battle_format="gen9randombattle", max_concurrent_battles=5),
        SimpleHeuristicsPlayer(battle_format="gen9randombattle", max_concurrent_battles=5),
    ]
    results = await cross_evaluate(players, n_challenges=20)

    names = [p.username for p in players]
    table = [[n] + [f"{results[n].get(m, '-'):.0%}" if results[n].get(m) is not None else "-"
                    for m in names] for n in names]
    print(tabulate(table, headers=[""] + names))

    for p in players:
        await p.ps_client.stop_listening()


asyncio.run(main())
```

---

## `cross_evaluate` — pairwise win-rate matrix

Runs round-robin battles between all provided players and returns a nested dict of win rates.

```python
import asyncio
from poke_env.player import RandomPlayer, MaxBasePowerPlayer, cross_evaluate


async def main():
    p1 = RandomPlayer(battle_format="gen9randombattle", max_concurrent_battles=5)
    p2 = MaxBasePowerPlayer(battle_format="gen9randombattle", max_concurrent_battles=5)

    results = await cross_evaluate([p1, p2], n_challenges=50)
    # results[p1.username][p2.username]  -> p1's win rate against p2
    # results[p2.username][p1.username]  -> p2's win rate against p1
    print(results)
    # {'RandomPlayer 1': {'MaxBasePowerPlayer 2': 0.28},
    #  'MaxBasePowerPlayer 2': {'RandomPlayer 1': 0.72}}

    await p1.ps_client.stop_listening()
    await p2.ps_client.stop_listening()


asyncio.run(main())
```

---

## `evaluate_player` — estimate bot strength as a scalar rating

Plays placement battles against three calibrated baselines (`RandomPlayer`, `MaxBasePowerPlayer`, `SimpleHeuristicsPlayer`), picks the closest one, then runs the full evaluation. Returns a strength estimate and 95% confidence interval, calibrated so that `RandomPlayer = 1`.

```python
import asyncio
from poke_env.player import RandomPlayer
from poke_env.player.utils import evaluate_player


async def main():
    my_bot = RandomPlayer(
        battle_format="gen8randombattle",   # must be gen8randombattle
        max_concurrent_battles=10,
    )
    strength, (lo, hi) = await evaluate_player(my_bot, n_battles=200)
    print(f"Estimated strength: {strength:.2f}  (95% CI: [{lo:.2f}, {hi:.2f}])")
    # Estimated strength: 1.02  (95% CI: [0.78, 1.34])
    await my_bot.ps_client.stop_listening()


asyncio.run(main())
```

---

## `Battle` — single-battle state object

Provides the full current state of a singles battle, including active Pokémon, available moves and switches, field conditions, and gimmick availability. Passed directly to `choose_move`.

```python
from poke_env.battle import Battle, Weather, Field, SideCondition
from poke_env.player import Player
from poke_env.player.battle_order import BattleOrder


class InfoPlayer(Player):
    def choose_move(self, battle: Battle) -> BattleOrder:
        active = battle.active_pokemon
        opp = battle.opponent_active_pokemon

        print(f"Turn {battle.turn}")
        print(f"  My active: {active.species} {active.current_hp_fraction:.0%} HP")
        print(f"  Opponent:  {opp.species if opp else 'None'}")
        print(f"  Weather:   {battle.weather}")
        print(f"  Fields:    {list(battle.fields.keys())}")
        print(f"  My hazards: {list(battle.side_conditions.keys())}")
        print(f"  Can tera:  {battle.can_tera}")
        print(f"  Can dynamax: {battle.can_dynamax}")
        print(f"  Moves:     {[m.id for m in battle.available_moves]}")
        print(f"  Switches:  {[p.species for p in battle.available_switches]}")
        print(f"  Valid orders: {len(battle.valid_orders)}")

        return self.choose_random_move(battle)
```

---

## `DoubleBattle` — doubles battle state object

Extends `AbstractBattle` for double battles: `active_pokemon`, `available_moves`, `available_switches`, `force_switch`, and `can_*` flags are all length-2 lists indexed by slot.

```python
from poke_env.battle import DoubleBattle
from poke_env.player import Player
from poke_env.player.battle_order import BattleOrder, DoubleBattleOrder, SingleBattleOrder


class SimpleDoublesPlayer(Player):
    def choose_move(self, battle: DoubleBattle) -> BattleOrder:
        orders = []
        for mon, moves, switches in zip(
            battle.active_pokemon,
            battle.available_moves,
            battle.available_switches,
        ):
            if mon is None or mon.fainted:
                from poke_env.player.battle_order import PassBattleOrder
                orders.append(PassBattleOrder())
            elif moves:
                best = max(moves, key=lambda m: m.base_power)
                targets = battle.get_possible_showdown_targets(best, mon)
                orders.append(SingleBattleOrder(best, move_target=targets[0]))
            elif switches:
                orders.append(SingleBattleOrder(switches[0]))
            else:
                from poke_env.player.battle_order import DefaultBattleOrder
                orders.append(DefaultBattleOrder())

        joined = DoubleBattleOrder.join_orders([orders[0]], [orders[1]])
        return joined[0] if joined else DoubleBattleOrder(orders[0], orders[1])
```

---

## `Pokemon` — Pokémon state within a battle

Tracks species, types, ability, item, HP, status, stat boosts, active effects, moves, and more.

```python
from poke_env.battle import Battle, Pokemon, Status, PokemonType
from poke_env.player import Player
from poke_env.player.battle_order import BattleOrder


class TypeAwarePlayer(Player):
    def choose_move(self, battle: Battle) -> BattleOrder:
        opp = battle.opponent_active_pokemon
        if opp and battle.available_moves:
            # Pick move that deals the most type-effective damage
            from poke_env.data import GenData
            chart = GenData.from_gen(battle.gen).type_chart
            best = max(
                battle.available_moves,
                key=lambda m: opp.damage_multiplier(m) * m.base_power,
            )
            print(f"  {opp.species}  HP:{opp.current_hp_fraction:.0%}  "
                  f"Status:{opp.status}  Types:{opp.types}")
            print(f"  boosts: {opp.boosts}")
            return self.create_order(best)
        return self.choose_random_move(battle)
```

---

## `Move` — move metadata

Exposes all move properties derived from Showdown's data: `base_power`, `accuracy`, `type`, `category`, `priority`, `expected_hits`, `boosts`, `target`, and many more.

```python
from poke_env.battle import Battle, Move, MoveCategory
from poke_env.player import Player
from poke_env.player.battle_order import BattleOrder


class CategoryAwarePlayer(Player):
    def choose_move(self, battle: Battle) -> BattleOrder:
        active = battle.active_pokemon
        opp = battle.opponent_active_pokemon
        if not active or not opp or not battle.available_moves:
            return self.choose_random_move(battle)

        atk = active.stats.get("atk", 0) or 0
        spa = active.stats.get("spa", 0) or 0
        opp_def = opp.base_stats.get("def", 1)
        opp_spd = opp.base_stats.get("spd", 1)

        def score(m: Move) -> float:
            ratio = atk / opp_def if m.category == MoveCategory.PHYSICAL else spa / opp_spd
            return m.base_power * m.accuracy * m.expected_hits * opp.damage_multiplier(m) * ratio

        best = max(battle.available_moves, key=score)
        return self.create_order(best)
```

---

## `GenData` — generation-specific Pokédex, moves, and type chart

Singleton-per-generation providing the Pokédex, move data, type chart, natures, and learnsets. Use `GenData.from_gen(gen)` or `GenData.from_format(format_str)`.

```python
from poke_env.data import GenData

# Load gen 9 data
gen9 = GenData.from_gen(9)

# Type chart
fire_vs_grass = gen9.type_chart["fire"]["grass"]   # 2.0
water_vs_fire = gen9.type_chart["water"]["fire"]   # 2.0

# Pokédex entry
charizard = gen9.pokedex["charizard"]
print(charizard["baseStats"])   # {'hp': 78, 'atk': 84, 'def': 78, ...}
print(charizard["types"])       # ['Fire', 'Flying']

# Move data
flamethrower = gen9.moves["flamethrower"]
print(flamethrower["basePower"])   # 90
print(flamethrower["accuracy"])    # 100

# Load from format string
gen8_data = GenData.from_format("gen8ou")
print(gen8_data.gen)  # 8
```

---

## `compute_raw_stats` — calculate a Pokémon's actual battle stats

Computes the six in-battle stats (`hp, atk, def, spa, spd, spe`) from base stats, EVs, IVs, level, and nature using the standard formula.

```python
from poke_env.data import GenData
from poke_env.stats import compute_raw_stats

gen9 = GenData.from_gen(9)

# Modest Dragapult, 252 SpA / 252 Spe / 4 HP, 31 IVs, level 50
evs = [4, 0, 0, 252, 0, 252]   # hp atk def spa spd spe
ivs = [31] * 6
stats = compute_raw_stats("dragapult", evs, ivs, level=50, nature="modest", data=gen9)
# stats -> [hp, atk, def, spa, spd, spe]
print(dict(zip(["hp", "atk", "def", "spa", "spd", "spe"], stats)))
# {'hp': 155, 'atk': 110, 'def': 75, 'spa': 170, 'spd': 80, 'spe': 182}
```

---

## `Teambuilder` — supply custom teams to a player

Abstract base class for team providers. Implement `yield_team` to return a packed team string. `parse_showdown_team`, `parse_packed_team`, and `join_team` are static helpers for format conversion.

```python
import asyncio
import random
from poke_env.player import RandomPlayer
from poke_env.teambuilder import Teambuilder


class RandomTeamFromPool(Teambuilder):
    """Pick a random pre-built team from a pool on every battle."""

    def __init__(self, showdown_teams: list[str]):
        # Convert Showdown text format -> packed format once
        self.packed_teams = [
            self.join_team(self.parse_showdown_team(t)) for t in showdown_teams
        ]

    def yield_team(self) -> str:
        return random.choice(self.packed_teams)


TEAM = """
Garchomp @ Rocky Helmet
Ability: Rough Skin
EVs: 248 HP / 64 Def / 196 Spe
Jolly Nature
- Stealth Rock
- Earthquake
- Dragon Tail
- Fire Fang
"""

builder = RandomTeamFromPool([TEAM])


async def main():
    p1 = RandomPlayer(battle_format="gen9ou", team=builder, max_concurrent_battles=1)
    p2 = RandomPlayer(battle_format="gen9ou", team=builder, max_concurrent_battles=1)
    await p1.battle_against(p2, n_battles=1)
    await p1.ps_client.stop_listening()
    await p2.ps_client.stop_listening()


asyncio.run(main())
```

---

## `SinglesEnv` — PettingZoo parallel env for RL training (singles)

Self-play environment implementing the PettingZoo `ParallelEnv` API. Two internal `_EnvPlayer` instances battle each other; actions are integers from a discrete space whose size depends on the generation. Override `embed_battle` and `calc_reward` to define observations and rewards.

```python
import asyncio
import numpy as np
from gymnasium.spaces import Box
from stable_baselines3 import PPO
from poke_env.battle import AbstractBattle
from poke_env.environment import SingleAgentWrapper, SinglesEnv
from poke_env.player import SimpleHeuristicsPlayer


class MyEnv(SinglesEnv):
    N_OBS = 10

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.observation_spaces = {
            agent: Box(-1.0, 1.0, shape=(self.N_OBS,), dtype=np.float32)
            for agent in self.possible_agents
        }

    def embed_battle(self, battle: AbstractBattle) -> np.ndarray:
        moves_bp = np.array(
            [m.base_power / 100 for m in battle.available_moves[:4]]
            + [0.0] * (4 - len(battle.available_moves))
        )
        fainted_ratio = np.array([
            sum(m.fainted for m in battle.team.values()) / 6,
            sum(m.fainted for m in battle.opponent_team.values()) / 6,
        ])
        hp = np.array([
            battle.active_pokemon.current_hp_fraction if battle.active_pokemon else 0.0,
            battle.opponent_active_pokemon.current_hp_fraction if battle.opponent_active_pokemon else 0.0,
            float(battle.can_dynamax),
            float(battle.can_tera),
        ])
        return np.concatenate([moves_bp, fainted_ratio, hp], dtype=np.float32)

    def calc_reward(self, battle: AbstractBattle) -> float:
        return self.reward_computing_helper(
            battle, fainted_value=2.0, hp_value=1.0, status_value=0.5, victory_value=30.0
        )


env = MyEnv(battle_format="gen9randombattle", log_level=40, open_timeout=None)
opponent = SimpleHeuristicsPlayer(start_listening=False)
wrapped = SingleAgentWrapper(env, opponent)

model = PPO("MlpPolicy", wrapped, verbose=0)
model.learn(total_timesteps=10_000)
wrapped.close()
```

---

## `DoublesEnv` — PettingZoo parallel env for RL training (doubles)

Mirrors `SinglesEnv` but for double battles. The action space is `MultiDiscrete([action_space_size, action_space_size])`, one action per active slot.

```python
import numpy as np
from gymnasium.spaces import Box
from poke_env.battle import AbstractBattle
from poke_env.environment import DoublesEnv


class MyDoublesEnv(DoublesEnv):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.observation_spaces = {
            agent: Box(-1.0, 1.0, shape=(8,), dtype=np.float32)
            for agent in self.possible_agents
        }

    def embed_battle(self, battle: AbstractBattle) -> np.ndarray:
        hp_slots = []
        for mon in battle.active_pokemon:       # type: ignore[attr-defined]
            hp_slots.append(mon.current_hp_fraction if mon else 0.0)
        for mon in battle.opponent_active_pokemon:  # type: ignore[attr-defined]
            hp_slots.append(mon.current_hp_fraction if mon else 0.0)
        fainted = [
            sum(m.fainted for m in battle.team.values()) / 6,
            sum(m.fainted for m in battle.opponent_team.values()) / 6,
        ]
        can_dmax = [
            float(battle.can_dynamax[0]),  # type: ignore[index]
            float(battle.can_dynamax[1]),  # type: ignore[index]
        ]
        return np.array(hp_slots + fainted + can_dmax, dtype=np.float32)

    def calc_reward(self, battle: AbstractBattle) -> float:
        return self.reward_computing_helper(
            battle, fainted_value=2.0, hp_value=1.0, status_value=0.25, victory_value=30.0
        )


env = MyDoublesEnv(battle_format="gen8randomdoublesbattle", log_level=40)
obs, _ = env.reset()
print(obs)   # dict keyed by agent username, values are np.float32 arrays of shape (8,)
env.close()
```

---

## `SingleAgentWrapper` — convert a PettingZoo env to a standard Gymnasium env

Wraps a `PokeEnv` (e.g. `SinglesEnv`) and a fixed `Player` opponent so the result is a single-agent `gymnasium.Env` compatible with Stable-Baselines3's `SubprocVecEnv` and `Monitor`.

```python
import numpy as np
from gymnasium.spaces import Box
from stable_baselines3 import PPO
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.vec_env import SubprocVecEnv
from poke_env.battle import AbstractBattle
from poke_env.environment import SingleAgentWrapper, SinglesEnv
from poke_env.player import SimpleHeuristicsPlayer


class CompactEnv(SinglesEnv):
    N = 4

    def __init__(self, **kw):
        super().__init__(**kw)
        self.observation_spaces = {
            a: Box(-1.0, 4.0, shape=(self.N,), dtype=np.float32)
            for a in self.possible_agents
        }

    def embed_battle(self, battle: AbstractBattle) -> np.ndarray:
        bp = [m.base_power / 100 for m in battle.available_moves[:self.N]]
        bp += [0.0] * (self.N - len(bp))
        return np.array(bp, dtype=np.float32)

    def calc_reward(self, battle: AbstractBattle) -> float:
        return self.reward_computing_helper(battle, victory_value=10.0, fainted_value=1.0)

    @classmethod
    def make(cls) -> Monitor:
        env = cls(battle_format="gen9randombattle", log_level=40, open_timeout=None)
        opp = SimpleHeuristicsPlayer(start_listening=False)
        return Monitor(SingleAgentWrapper(env, opp))


vec_env = SubprocVecEnv([CompactEnv.make, CompactEnv.make])
model = PPO("MlpPolicy", vec_env, n_steps=512, batch_size=64, verbose=0)
model.learn(total_timesteps=5_000)
vec_env.close()
```

---

## `SinglesEnv.get_action_mask` — produce a legal-action boolean mask

Returns a list of 0/1 integers indicating which actions in the discrete action space are currently legal, enabling action-masking in neural policies.

```python
import numpy as np
import torch
from poke_env.battle import Battle
from poke_env.environment import SinglesEnv
from poke_env.player import Player
from poke_env.player.battle_order import BattleOrder


class MaskedPlayer(Player):
    """Example: apply action mask before sampling with a neural policy."""

    def choose_move(self, battle: Battle) -> BattleOrder:
        mask = np.array(SinglesEnv.get_action_mask(battle), dtype=np.float32)
        # mask has shape (action_space_size,); 1 = legal, 0 = illegal
        logits = np.random.randn(len(mask))
        logits[mask == 0] = -1e9       # mask out illegal actions
        action = np.argmax(logits)
        return SinglesEnv.action_to_order(np.int64(action), battle, strict=False)
```

---

## `SinglesEnv.action_to_order` / `order_to_action` — convert between integer actions and `BattleOrder`

Static methods that map between the integer action space and `BattleOrder` objects, with optional `fake` (skip legality) and `strict` (raise vs. default) modes.

```python
import numpy as np
from poke_env.battle import Battle
from poke_env.environment import SinglesEnv
from poke_env.player import Player
from poke_env.player.battle_order import BattleOrder, SingleBattleOrder


class RecordingPlayer(Player):
    def choose_move(self, battle: Battle) -> BattleOrder:
        order = self.choose_random_singles_move(battle)
        # Roundtrip: order -> action index -> order
        action = SinglesEnv.order_to_action(order, battle, fake=False, strict=True)
        recovered = SinglesEnv.action_to_order(action, battle, fake=False, strict=True)
        print(f"action={int(action)}  order={order}  recovered={recovered}")
        return order
```

---

## `calculate_damage` (gen9) — compute damage rolls for Gen 9

Calculates `(min_damage, max_damage)` for a given attacker/defender/move triple, accounting for weather, terrain, stat boosts, STAB, and type effectiveness.

```python
from poke_env.battle import Battle
from poke_env.calc.damage_calc_gen9 import calculate_damage
from poke_env.player import Player
from poke_env.player.battle_order import BattleOrder


class DamageCalcPlayer(Player):
    def choose_move(self, battle: Battle) -> BattleOrder:
        active = battle.active_pokemon
        opp = battle.opponent_active_pokemon
        if not active or not opp or not battle.available_moves:
            return self.choose_random_move(battle)

        best_move = None
        best_min_dmg = -1

        attacker_id = f"{battle.player_role}: {active.species}"
        defender_id = f"{battle.opponent_role}: {opp.species}"

        for move in battle.available_moves:
            try:
                lo, hi = calculate_damage(attacker_id, defender_id, move, battle)
                if lo > best_min_dmg:
                    best_min_dmg = lo
                    best_move = move
            except Exception:
                pass   # calc may fail for some edge-case moves

        if best_move:
            return self.create_order(best_move)
        return self.choose_random_move(battle)
```

---

## Tracking battle observations with `_battle_finished_callback`

Override `_battle_finished_callback` to execute custom logic when a battle ends. Combined with per-turn snapshots inside `choose_move`, this enables detailed logging and dataset collection.

```python
import asyncio
from dataclasses import dataclass, field
from typing import Dict, List, Optional
from poke_env.battle import AbstractBattle, Battle, Weather, Field, SideCondition
from poke_env.player import Player, RandomPlayer
from poke_env.player.battle_order import BattleOrder


@dataclass
class Snapshot:
    turn: int
    our_active: Optional[str]
    opp_active: Optional[str]
    our_hp: float
    opp_hp: float


class LoggingPlayer(Player):
    def __init__(self, **kw):
        super().__init__(**kw)
        self._logs: Dict[str, List[Snapshot]] = {}

    def choose_move(self, battle: AbstractBattle) -> BattleOrder:
        assert isinstance(battle, Battle)
        snap = Snapshot(
            turn=battle.turn,
            our_active=battle.active_pokemon.species if battle.active_pokemon else None,
            opp_active=battle.opponent_active_pokemon.species if battle.opponent_active_pokemon else None,
            our_hp=battle.active_pokemon.current_hp_fraction if battle.active_pokemon else 0.0,
            opp_hp=battle.opponent_active_pokemon.current_hp_fraction if battle.opponent_active_pokemon else 0.0,
        )
        self._logs.setdefault(battle.battle_tag, []).append(snap)
        return self.choose_random_move(battle)

    def _battle_finished_callback(self, battle: AbstractBattle):
        log = self._logs.pop(battle.battle_tag, [])
        result = "WON" if battle.won else "LOST"
        print(f"{battle.battle_tag}: {result} in {len(log)} turns")


async def main():
    player = LoggingPlayer(battle_format="gen9randombattle")
    opp = RandomPlayer(battle_format="gen9randombattle")
    await player.battle_against(opp, n_battles=3)
    await player.ps_client.stop_listening()
    await opp.ps_client.stop_listening()


asyncio.run(main())
```

---

## Summary

`poke-env` covers three primary use cases. The first is **scripted bot development**: subclass `Player`, implement `choose_move` using the rich `Battle`/`Pokemon`/`Move` state objects, and run battles against humans on the official Showdown server, a local server, or other bots via `battle_against` and `accept_challenges`. Built-in helpers like `create_order`, `choose_random_move`, `random_teampreview`, and the damage calculator in `poke_env.calc` accelerate heuristic and rule-based agents. The second is **self-play and benchmarking**: `cross_evaluate` and `evaluate_player` in `poke_env.player.utils` provide a standardised way to measure relative and absolute bot strength, while the three baseline players form a calibrated ladder (`RandomPlayer = 1`, `MaxBasePowerPlayer ≈ 7.7`, `SimpleHeuristicsPlayer ≈ 128.8`).

The third and most involved use case is **reinforcement learning**. `SinglesEnv` and `DoublesEnv` expose a PettingZoo `ParallelEnv` (self-play, both agents learn simultaneously) while `SingleAgentWrapper` converts that into a standard `gymnasium.Env` (agent vs. fixed opponent). Both environments require users to implement `embed_battle` (observation builder returning a NumPy array) and `calc_reward` (float reward signal) while everything else — WebSocket I/O, battle parsing, action-to-order mapping, concurrency, and teampreview — is handled by the library. Action masking is first-class: `SinglesEnv.get_action_mask` returns a per-step legal-action vector that can be passed to masked policy classes in Stable-Baselines3 or any other framework, preventing the agent from ever selecting an illegal action. The environment integrates directly with SB3's `SubprocVecEnv`, `Monitor`, and the PettingZoo wrapper library `supersuit`.