Add optional per-env state export hook; implement for nmmo3#583
Open
BeeGass wants to merge 2 commits into
Open
Add optional per-env state export hook; implement for nmmo3#583BeeGass wants to merge 2 commits into
BeeGass wants to merge 2 commits into
Conversation
Add a StateField-based my_state hook to vecenv.h, following the existing
my_get/my_put pattern: envs opt in with #define MY_STATE and fill up to
max_fields typed buffer descriptors; the default implementation exports
nothing. Expose it in both binding backends as VecEnv.state(env_id),
returning {name: {data, dtype: str, shape: tuple}}. Field contents are
copied into Python-owned bytes under the GIL by default; fields flagged
PUFF_STATE_ZERO_COPY (pointer stable for the env lifetime) come back as
read-only memoryviews of the C buffer instead, invalidated by close().
Adds zero overhead to step(); state is assembled only when state() is
called.
Implement the my_state hook for nmmo3: terrain as (height, width) int8, entities as (num_agents + num_enemies, 10) int32 rows of (kind, r, c, hp, hp_max, comb_lvl, prof_lvl, dir, anim, in_combat) with players first, and the tick counter. Enables rendering and telemetry consumers to read the world state through the binding instead of poking at C memory.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Downstream consumers (web renderers, telemetry, headless visualization) need
to read an env's world state - for nmmo3, the terrain grid and per-entity
positions - without forking the C env or casting raw pointers from outside
the binding. The per-agent observation buffers intentionally expose only
windowed views, so there is currently no supported way to get at the global
state from Python.
What
src/vecenv.h: a new optionalmy_statehook in the same pattern asmy_get/my_put. Envs opt in with#define MY_STATEand fill up tomax_fieldsStateFielddescriptors (name, data, numpy dtype string,ndim, dims, flags). Default implementation exports nothing.
src/bindings.cu+src/bindings_cpu.cpp(kept in parity):VecEnv.state(env_id=0)returning{name: {"data": bytes | memoryview, "dtype": str, "shape": tuple}}.Field buffers are copied into Python-owned bytes under the GIL by
default (stable snapshots). Fields flagged
PUFF_STATE_ZERO_COPY(buffer pointer stable for the env's lifetime) are returned as
read-only memoryviews of the C buffer instead - zero bytes moved for
large immutable data like map terrain; such views are invalidated by
close(). Envs without the hook return{}.ocean/nmmo3/binding.c: implements the hook -terrainas(height, width)int8 (zero-copy: allocated once ininit, rewrittenonly by
c_reset),positionsas(num_agents + num_enemies, 10)int32 rows
(kind, r, c, hp, hp_max, comb_lvl, prof_lvl, dir, anim, in_combat)with players first then enemies (copied), andtick.Consumer side:
Testing
Built with
./build.sh nmmo3 --cpu(clang 21, Linux) and ran a two-envvec through a 50-step roundtrip asserting, per env: shapes, dtypes, kind
partitioning, in-bounds coordinates, hp <= hp_max with positive hp_max,
tick advancement in lockstep across envs, position movement, and terrain
stability across steps. Also asserted: per-env indexing returns distinct
terrain for differently-seeded envs;
state()is a pure read (two callswithout stepping are byte-identical); the zero-copy terrain view is
read-only;
state(env_id)raises on out-of-range ids; and a secondreset()zeroes the tick and yields structurally valid state.The same suite passes at the config/nmmo3.ini scale (2 envs x 1024
agents, 512x512 maps, positions (3072, 10)).
Also built the CUDA backend (
./build.sh nmmo3, CUDA 13.3) and reran thefull suite against it in CPU mode (
create_vec(args, gpu=0)) withidentical output, confirming the two binding backends stay in parity.
Rebuilt an env that does not implement the hook (
./build.sh breakout --cpu) and verified it compiles against the no-op default andstate()returns
{}.Compat
Purely additive. No changes to
env_init/step/resetor anyobservation/action/reward buffer semantics. Nothing is added to the step
hot path; state is assembled only on demand inside
state()calls.