Add optional per-env state export hook; implement for nmmo3 by BeeGass · Pull Request #583 · PufferAI/PufferLib

BeeGass · 2026-06-11T02:15:20Z

Why

Downstream consumers (web renderers, telemetry, headless visualization) need
to read an env's world state - for nmmo3, the terrain grid and per-entity
positions - without forking the C env or casting raw pointers from outside
the binding. The per-agent observation buffers intentionally expose only
windowed views, so there is currently no supported way to get at the global
state from Python.

What

src/vecenv.h: a new optional my_state hook in the same pattern as
my_get/my_put. Envs opt in with #define MY_STATE and fill up to
max_fields StateField descriptors (name, data, numpy dtype string,
ndim, dims, flags). Default implementation exports nothing.
src/bindings.cu + src/bindings_cpu.cpp (kept in parity):
VecEnv.state(env_id=0) returning
{name: {"data": bytes | memoryview, "dtype": str, "shape": tuple}}.
Field buffers are copied into Python-owned bytes under the GIL by
default (stable snapshots). Fields flagged PUFF_STATE_ZERO_COPY
(buffer pointer stable for the env's lifetime) are returned as
read-only memoryviews of the C buffer instead - zero bytes moved for
large immutable data like map terrain; such views are invalidated by
close(). Envs without the hook return {}.
ocean/nmmo3/binding.c: implements the hook - terrain as
(height, width) int8 (zero-copy: allocated once in init, rewritten
only by c_reset), positions as (num_agents + num_enemies, 10)
int32 rows (kind, r, c, hp, hp_max, comb_lvl, prof_lvl, dir, anim, in_combat) with players first then enemies (copied), and tick.

Consumer side:

from pufferlib import _C
vec = _C.create_vec(args, gpu=0)
vec.reset()
st = vec.state(0)
terrain = np.frombuffer(st["terrain"]["data"], np.int8).reshape(st["terrain"]["shape"])

Testing

Built with ./build.sh nmmo3 --cpu (clang 21, Linux) and ran a two-env
vec through a 50-step roundtrip asserting, per env: shapes, dtypes, kind
partitioning, in-bounds coordinates, hp <= hp_max with positive hp_max,
tick advancement in lockstep across envs, position movement, and terrain
stability across steps. Also asserted: per-env indexing returns distinct
terrain for differently-seeded envs; state() is a pure read (two calls
without stepping are byte-identical); the zero-copy terrain view is
read-only; state(env_id) raises on out-of-range ids; and a second
reset() zeroes the tick and yields structurally valid state.

envs: 2 x 32 agents
terrain: (128, 128) int8 (zero-copy, distinct per env)
positions: (96, 10) int32
tick: 0 -> 50, reset -> 0
positions head:
[[  0  56 104  99  99   1   1   0   6   0]
 [  0  82  54  99  99   1   1   1   6   0]
 [  0  78  30  99  99   1   1   3   0   0]]
ALL OK

The same suite passes at the config/nmmo3.ini scale (2 envs x 1024
agents, 512x512 maps, positions (3072, 10)).

Also built the CUDA backend (./build.sh nmmo3, CUDA 13.3) and reran the
full suite against it in CPU mode (create_vec(args, gpu=0)) with
identical output, confirming the two binding backends stay in parity.

Rebuilt an env that does not implement the hook (./build.sh breakout --cpu) and verified it compiles against the no-op default and state()
returns {}.

Compat

Purely additive. No changes to env_init/step/reset or any
observation/action/reward buffer semantics. Nothing is added to the step
hot path; state is assembled only on demand inside state() calls.

Add a StateField-based my_state hook to vecenv.h, following the existing my_get/my_put pattern: envs opt in with #define MY_STATE and fill up to max_fields typed buffer descriptors; the default implementation exports nothing. Expose it in both binding backends as VecEnv.state(env_id), returning {name: {data, dtype: str, shape: tuple}}. Field contents are copied into Python-owned bytes under the GIL by default; fields flagged PUFF_STATE_ZERO_COPY (pointer stable for the env lifetime) come back as read-only memoryviews of the C buffer instead, invalidated by close(). Adds zero overhead to step(); state is assembled only when state() is called.

Implement the my_state hook for nmmo3: terrain as (height, width) int8, entities as (num_agents + num_enemies, 10) int32 rows of (kind, r, c, hp, hp_max, comb_lvl, prof_lvl, dir, anim, in_combat) with players first, and the tick counter. Enables rendering and telemetry consumers to read the world state through the binding instead of poking at C memory.

BeeGass added 2 commits June 10, 2026 22:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optional per-env state export hook; implement for nmmo3#583

Add optional per-env state export hook; implement for nmmo3#583
BeeGass wants to merge 2 commits into
PufferAI:4.0from
BeeGass:feat/nmmo3-state-export

BeeGass commented Jun 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BeeGass commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What

Testing

Compat

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BeeGass commented Jun 11, 2026 •

edited

Loading