Skip to content

Read unsigned bytecode operands in CodeHTML disassembler#511

Merged
garydgregory merged 1 commit into
apache:masterfrom
rootvector2:codehtml-unsigned-operands
Jul 2, 2026
Merged

Read unsigned bytecode operands in CodeHTML disassembler#511
garydgregory merged 1 commit into
apache:masterfrom
rootvector2:codehtml-unsigned-operands

Conversation

@rootvector2

Copy link
Copy Markdown
Contributor

CodeHTML.codeToHTML disassembles byte code independently of the class-file model and read wide local-variable indices, field/method/class/LDC_W constant-pool indices and the multianewarray dimension count with signed readShort()/readByte(), so a class whose operand is >= 0x8000 (or dimensions >= 128) sign-extends to a negative value and the generated _code.html shows a negative slot/dimension or does a wrong constant-pool lookup; the reference disassembler Utility.codeToString already reads all of these as unsigned, so this makes codeToHTML match it. Found by diffing the two disassemblers operand by operand.

  • Read the contribution guidelines for this project.
  • Read the ASF Generative Tooling Guidance if you use Artificial Intelligence (AI).
  • I used AI to create any part of, or all of, this pull request. Which AI tool was used to create this pull request, and to what extent did it contribute?
  • Run a successful build using the default Maven goal with mvn; that's mvn on the command line by itself.
  • Write unit tests that match behavioral changes, where the tests fail if the changes to the runtime are not applied. This may not always be possible, but it is a best practice.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Each commit in the pull request should have a meaningful subject line and body. Note that a maintainer may squash commits during the merge process.

@garydgregory garydgregory changed the title read unsigned bytecode operands in CodeHTML disassembler Read unsigned bytecode operands in CodeHTML disassembler Jul 2, 2026
@garydgregory garydgregory merged commit 296a4dd into apache:master Jul 2, 2026
34 of 35 checks passed
@garydgregory

Copy link
Copy Markdown
Member

@rootvector2

PR merged 🚀 Thank you! I wonder if we have other mis-reading of data using signed reads instead of unsigned. WDYT?

@rootvector2

Copy link
Copy Markdown
Contributor Author

good question, i went digging after this one. the class-file model path already reads everything unsigned: the generic/* instruction reads and the classfile/* attribute parsers pull cp indices, local-var indices and counts with readUnsignedShort, and Utility.codeToString reads its operands unsigned too, so CodeHTML was the outlier that had drifted from the reference. the signed reads left in both disassemblers are operands that are genuinely signed: branch offsets, IINC increment, SIPUSH/BIPUSH constants, and the NEWARRAY atype byte. the only borderline one is StackMapType reading its tag with readByte(), but checkType pins it to 0..8 so signed vs unsigned can't change the result there. didn't spot any other real misread.

@garydgregory

Copy link
Copy Markdown
Member

@rootvector2
OK great, thank you for digging in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants