Skip to content

Escape leading comment marker in printWithEscapes#614

Merged
garydgregory merged 1 commit into
apache:masterfrom
rootvector2:escape-comment-marker-first-char
Jun 19, 2026
Merged

Escape leading comment marker in printWithEscapes#614
garydgregory merged 1 commit into
apache:masterfrom
rootvector2:escape-comment-marker-first-char

Conversation

@rootvector2

Copy link
Copy Markdown
Contributor

With an escape character set and no quote character (or QuoteMode.NONE), CSVFormat.printWithEscapes escapes the delimiter, CR, LF, the escape character and (since #609) the quote character, but never a comment marker. A value whose first character is the configured comment marker is written verbatim, so CSVPrinter emits a record that its own CSVParser reads back as a comment and silently drops. CSVFormat.DEFAULT.builder().setQuote(null).setEscape('\').setCommentMarker(';').get() prints ;foo for the value ;foo, and re-parsing that output yields zero records. Found round-tripping printer output back through the parser.

The escape condition is the right place to fix it, next to where the delimiter, escape char and quote char are already handled. The change escapes the comment marker when it is the first character of the value, in both printWithEscapes overloads (CharSequence and Reader). This is the escape-mode counterpart to #610, which protected the comment marker only in the MINIMAL quoting path and left the escape paths out.

  • Read the contribution guidelines for this project.
  • Read the ASF Generative Tooling Guidance if you use Artificial Intelligence (AI).
  • I used AI to create any part of, or all of, this pull request. Which AI tool was used to create this pull request, and to what extent did it contribute?
  • Run a successful build using the default Maven goal with mvn; that's mvn on the command line by itself.
  • Write unit tests that match behavioral changes, where the tests fail if the changes to the runtime are not applied. This may not always be possible, but it is a best practice.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Each commit in the pull request should have a meaningful subject line and body. Note that a maintainer may squash commits during the merge process.

@garydgregory

Copy link
Copy Markdown
Member

@rootvector2

The build is broken.
Please run mvn with no arguments BEFORE you push.
This will avoid wasting time and CI resources.

@rootvector2

Copy link
Copy Markdown
Contributor Author

ran mvn here and the only failure is testGetBytePositionMultiCharacterDelimiterWithSupplementaryCharacter, which fails on master too, not from this PR. the Refactor delimiter in test commit (a1cf4f2) set the expected value to "a" + delimiter + "b\n".getBytes(UTF_8).length. .length binds before +, so that evaluates to the string "ax😀2" rather than the byte count 8; the prior "ax😀b\n".getBytes(UTF_8).length was correct. my change and its two tests are green once that line is restored. want me to drop the one-line fix into this PR, or will you patch master?

@garydgregory

garydgregory commented Jun 19, 2026

Copy link
Copy Markdown
Member

You're right @rootvector2 ! Please rebase on git master.

@garydgregory garydgregory changed the title escape leading comment marker in printWithEscapes Escape leading comment marker in printWithEscapes Jun 19, 2026
@rootvector2 rootvector2 force-pushed the escape-comment-marker-first-char branch from 9588eda to 61aa055 Compare June 19, 2026 19:14
@rootvector2

Copy link
Copy Markdown
Contributor Author

rebased onto master, so the revert of a1cf4f2 is in the branch now and the byte-position test is green again. ran mvn with no args locally, full build passes.

@garydgregory garydgregory merged commit f0a2acd into apache:master Jun 19, 2026
16 checks passed
@garydgregory

Copy link
Copy Markdown
Member

TY @rootvector2 , merged 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants