CLDSRV-908: CopyObject handle checksums#6176
Conversation
Hello leif-scality,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
Incorrect fix versionThe
Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:
Please check the |
|
LGTM — clean implementation of checksum propagation and recompute on CopyObject. Stream handling (jsutil.once guards, Azure per-part passthrough, error propagation) and the _shouldRecomputeChecksum decision logic are solid. One minor test style issue flagged inline. |
❌ 2 Tests Failed:
View the full list of 2 ❄️ flaky test(s)
To view more test analytics, go to the Test Analytics Dashboard |
5ca489a to
57d9894
Compare
|
LGTM |
57d9894 to
5bd2262
Compare
|
5bd2262 to
8b2b196
Compare
|
LGTM |
8b2b196 to
720f686
Compare
|
LGTM |
| if (err) { | ||
| done(err); |
There was a problem hiding this comment.
When this error happens, it could be any error and the perPart is still piped.
So it could theoretically receive data from azure and continue to stream during a short time window until the final callback calls passthrough.destroy and unpipes the source perPart.
Maybe the perPart should be destroyed here to stop immediately the streaming
There was a problem hiding this comment.
This would make every thing more complex for little gain, lets just let error propagate naturally
| // into the master passthrough and use its 'end' as the completion | ||
| // signal — same pattern arsenal's data.copyObject uses. | ||
| const perPart = new PassThrough(); | ||
| perPart.once('error', done); |
There was a problem hiding this comment.
Should we consider encapsulating the error to include some part details in the error for better troubleshooting if the error is ever logged somewhere else in the streaming path.
There was a problem hiding this comment.
added a commit improving theerrors
| // and masterKeyId stored properly in metadata | ||
| if (sourceIsDestination && storeMetadataParams.locationMatch | ||
| && !isVersionedObj && !needsEncryption) { | ||
| && !isVersionedObj && !needsEncryption && !shouldRecomputeChecksum) { |
There was a problem hiding this comment.
If you don't skip, you'll trigger a data GET + PUT + DELETE (for previous location).
This only for checksum recomputation. Should you rather define another path where if it's only recompute checksum, you allow only the GET to stream into the ChecksumTransform stream and then you discard the end of the data stream, and avoid having to do a data PUT + DELETE
There was a problem hiding this comment.
Fixed I now Get and recompute but dont PUT and DELETE. I also removed recompute for external backend cases, so if the data does not go through cloudserver we dont GET, and the dest object receives no checksum if a recompute was required
720f686 to
6385c3e
Compare
|
Well-structured PR with thorough test coverage across all checksum scenarios (propagation, recompute, 0-byte, COMPOSITE, multi-part, Azure, copy-to-self, external backends). |
6385c3e to
8ede498
Compare
|
|
LGTM — solid implementation of all 9 checksum cases from the PR matrix. The propagation/recompute/default logic is clean, the |
df2187c to
8da10f6
Compare
|
LGTM |
8da10f6 to
20b6ba3
Compare
|
LGTM |
20b6ba3 to
4c36bc2
Compare
|
LGTM |
4c36bc2 to
22f8acf
Compare
|
LGTM — well-structured implementation with thorough test coverage across all the checksum scenarios (propagation, recompute, 0-byte, multi-part, Azure, copy-to-self, external backends, orphan cleanup, legacy string locations). |
| it('should lowercase mixed-case input', () => { | ||
| const result = getCopyObjectChecksumAlgorithm({ | ||
| 'x-amz-checksum-algorithm': 'Sha256', | ||
| }); | ||
| assert.strictEqual(result.error, null); | ||
| assert.strictEqual(result.algorithm, 'sha256'); | ||
| }); | ||
|
|
||
| it('should accept algorithms already in lowercase', () => { | ||
| const result = getCopyObjectChecksumAlgorithm({ | ||
| 'x-amz-checksum-algorithm': 'crc32', | ||
| }); | ||
| assert.strictEqual(result.error, null); | ||
| assert.strictEqual(result.algorithm, 'crc32'); | ||
| }); |
There was a problem hiding this comment.
you could include these 2 inside the validAlgorithms array above to run in the loop, the test body is the same
| it('should produce an arsenal InvalidRequest error via the standard mapping', () => { | ||
| const result = getCopyObjectChecksumAlgorithm({ | ||
| 'x-amz-checksum-algorithm': 'GARBAGE', | ||
| }); | ||
| const err = arsenalErrorFromChecksumError(result.error); | ||
| assert.strictEqual(err.is.InvalidRequest, true); | ||
| assert(err.description.includes('Checksum algorithm provided is unsupported'), | ||
| 'expected AWS-shaped error description'); | ||
| }); |
There was a problem hiding this comment.
This doesn't seem to be related to getCopyObjectChecksumAlgorithm but rather arsenalErrorFromChecksumError maybe you should have another describe separate for that function
b866c91 to
ae43793
Compare
|
LGTM |
ae43793 to
154e192
Compare
|
LGTM |
The recompute path writes the destination through plain data.put, which (unlike data.copyObject, used by the propagate path) does not return a dataStoreETag, so the destination's location entry was stored without one. Nothing breaks visibly — partNumber readers fall back to treating the copy as a single part — but it left the recompute path as the only data-writing path producing legacy ETag-less locations and dropped the written-bytes MD5 that integrity/audit tooling relies on. Read the MD5 from data.put's hashedStream callback argument (completedHash, which the arsenal wrapper guarantees is set before the callback fires) and stamp the location as 1:<md5>, matching createAndStoreObject. Falls back to dataRetrievalInfo.dataStoreETag for external backends. Adds a unit test asserting the recomputed destination location carries the 1:-prefixed MD5.
154e192 to
e8311fb
Compare
|
LGTM |
GET+PUT+DELETEGET+PUT+DELETE(same)CopyObject/beginCopyFromURL/copyObject) — no GETGET+PUT+DELETE— streamed through CloudServerGET+PUT+DELETEGET+PUT+DELETE(same)GET+PUT+DELETEGET+PUT+DELETE(same)GETonly — reuse location (no PUT, no DELETE)PUT(data.put(null))PUT(data.put(null)) (same)