Skip to content

Add optional per-env state export hook; implement for nmmo3#583

Open
BeeGass wants to merge 2 commits into
PufferAI:4.0from
BeeGass:feat/nmmo3-state-export
Open

Add optional per-env state export hook; implement for nmmo3#583
BeeGass wants to merge 2 commits into
PufferAI:4.0from
BeeGass:feat/nmmo3-state-export

Conversation

@BeeGass

@BeeGass BeeGass commented Jun 11, 2026

Copy link
Copy Markdown

Why

Downstream consumers (web renderers, telemetry, headless visualization) need
to read an env's world state - for nmmo3, the terrain grid and per-entity
positions - without forking the C env or casting raw pointers from outside
the binding. The per-agent observation buffers intentionally expose only
windowed views, so there is currently no supported way to get at the global
state from Python.

What

  • src/vecenv.h: a new optional my_state hook in the same pattern as
    my_get/my_put. Envs opt in with #define MY_STATE and fill up to
    max_fields StateField descriptors (name, data, numpy dtype string,
    ndim, dims, flags). Default implementation exports nothing.
  • src/bindings.cu + src/bindings_cpu.cpp (kept in parity):
    VecEnv.state(env_id=0) returning
    {name: {"data": bytes | memoryview, "dtype": str, "shape": tuple}}.
    Field buffers are copied into Python-owned bytes under the GIL by
    default (stable snapshots). Fields flagged PUFF_STATE_ZERO_COPY
    (buffer pointer stable for the env's lifetime) are returned as
    read-only memoryviews of the C buffer instead - zero bytes moved for
    large immutable data like map terrain; such views are invalidated by
    close(). Envs without the hook return {}.
  • ocean/nmmo3/binding.c: implements the hook - terrain as
    (height, width) int8 (zero-copy: allocated once in init, rewritten
    only by c_reset), positions as (num_agents + num_enemies, 10)
    int32 rows (kind, r, c, hp, hp_max, comb_lvl, prof_lvl, dir, anim, in_combat) with players first then enemies (copied), and tick.

Consumer side:

from pufferlib import _C
vec = _C.create_vec(args, gpu=0)
vec.reset()
st = vec.state(0)
terrain = np.frombuffer(st["terrain"]["data"], np.int8).reshape(st["terrain"]["shape"])

Testing

Built with ./build.sh nmmo3 --cpu (clang 21, Linux) and ran a two-env
vec through a 50-step roundtrip asserting, per env: shapes, dtypes, kind
partitioning, in-bounds coordinates, hp <= hp_max with positive hp_max,
tick advancement in lockstep across envs, position movement, and terrain
stability across steps. Also asserted: per-env indexing returns distinct
terrain for differently-seeded envs; state() is a pure read (two calls
without stepping are byte-identical); the zero-copy terrain view is
read-only; state(env_id) raises on out-of-range ids; and a second
reset() zeroes the tick and yields structurally valid state.

envs: 2 x 32 agents
terrain: (128, 128) int8 (zero-copy, distinct per env)
positions: (96, 10) int32
tick: 0 -> 50, reset -> 0
positions head:
[[  0  56 104  99  99   1   1   0   6   0]
 [  0  82  54  99  99   1   1   1   6   0]
 [  0  78  30  99  99   1   1   3   0   0]]
ALL OK

The same suite passes at the config/nmmo3.ini scale (2 envs x 1024
agents, 512x512 maps, positions (3072, 10)).

Also built the CUDA backend (./build.sh nmmo3, CUDA 13.3) and reran the
full suite against it in CPU mode (create_vec(args, gpu=0)) with
identical output, confirming the two binding backends stay in parity.

Rebuilt an env that does not implement the hook (./build.sh breakout --cpu) and verified it compiles against the no-op default and state()
returns {}.

Compat

Purely additive. No changes to env_init/step/reset or any
observation/action/reward buffer semantics. Nothing is added to the step
hot path; state is assembled only on demand inside state() calls.

BeeGass added 2 commits June 10, 2026 22:09
Add a StateField-based my_state hook to vecenv.h, following the existing
my_get/my_put pattern: envs opt in with #define MY_STATE and fill up to
max_fields typed buffer descriptors; the default implementation exports
nothing. Expose it in both binding backends as VecEnv.state(env_id),
returning {name: {data, dtype: str, shape: tuple}}. Field contents are
copied into Python-owned bytes under the GIL by default; fields flagged
PUFF_STATE_ZERO_COPY (pointer stable for the env lifetime) come back as
read-only memoryviews of the C buffer instead, invalidated by close().
Adds zero overhead to step(); state is assembled only when state() is
called.
Implement the my_state hook for nmmo3: terrain as (height, width) int8,
entities as (num_agents + num_enemies, 10) int32 rows of (kind, r, c, hp,
hp_max, comb_lvl, prof_lvl, dir, anim, in_combat) with players first, and
the tick counter. Enables rendering and telemetry consumers to read the
world state through the binding instead of poking at C memory.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant