Skip to content

Make GSPO loss length-proportional#544

Open
jlamypoirier wants to merge 2 commits into
mainfrom
jlp_gspo-length-proportional
Open

Make GSPO loss length-proportional#544
jlamypoirier wants to merge 2 commits into
mainfrom
jlp_gspo-length-proportional

Conversation

@jlamypoirier

@jlamypoirier jlamypoirier commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Codex GPT-5 note:

Summary

  • make GSPO apply segment loss once per labeled token instead of once per document
  • keep per-document label counts for the geometric-mean ratio and advantage computation
  • update the GSPO reference test to match the length-proportional loss

Rationale

PipelineRL DeepSpeed GSPO uses length-proportional sequence weighting when group_normalization=false: each segment contributes in proportion to its labeled-token count. Fast-LLM was still using mask / num_labels_in_seq as both the geometric-mean normalizer and the loss weight, making each document contribute uniformly regardless of length. This keeps num_labels_in_seq for the ratio/advantage means but uses the label mask as the loss/gradient weight.

Test

  • FAST_LLM_TEST_RESULTS_PATH=/tmp/fast_llm_tests/gspo_length_proportional /Users/joel.lamy-poirier/Projects/Fast-LLM/venv/bin/python -m pytest -v -n 4 tests/layers/test_lm_losses.py -k gspo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant