Why Checksums Matter: How syncopio Verifies Every File
Hash algorithms explained, why dual checksums beat single-pass verification, and how syncopio verifies during transfer with zero extra passes.
You’ve just finished a 48-hour migration of 200TB across 50 million files. Everything looks good — file counts match, no errors in the log. But how do you know every file is intact? Silent bit rot, network corruption, storage firmware bugs, and truncated writes can all produce files that exist but contain wrong data. Checksums are the only way to know for sure.
What Is a Checksum?
A checksum (or hash) is a fixed-size value computed from file contents using a mathematical function. If even one bit of the file changes, the checksum changes completely. By computing checksums on both source and destination and comparing them, you can detect any data corruption.
File: quarterly-report.xlsx (4.2 MB)
SHA-256: a8f5f167f44f4964e6c998dee827110c...
Change one byte →
SHA-256: e3b0c44298fc1c149afbf4c8996fb924... (completely different)
Hash Algorithms Compared
Not all hash algorithms are equal. The trade-off is between speed and collision resistance (how hard it is to find two files with the same hash).
| Algorithm | Speed (GB/s)* | Output Size | Collision Resistance | Use Case |
|---|---|---|---|---|
| MD5 | ~3.5 | 128 bit | Broken | Legacy systems only |
| SHA-256 | ~1.2 | 256 bit | Strong | Security, compliance |
| SHA-512 | ~1.5 | 512 bit | Strong | High-security environments |
| XXH3 | ~30+ | 64/128 bit | Non-cryptographic | Speed-first integrity |
| BLAKE3 | ~8+ | 256 bit | Strong | Modern replacement for SHA-256 |
*Approximate single-core throughput on modern x86 hardware. Actual speeds depend on CPU, memory, and implementation.
MD5: Legacy, Avoid for New Deployments
MD5 was the standard for file integrity for decades, and you’ll still find it in many tools (rsync, md5sum). It’s fast and produces compact 128-bit hashes. However, MD5 is cryptographically broken — it’s possible to create different files with the same MD5 hash. For migration integrity checking (not security), MD5 is still functional, but newer options are better in every way.
SHA-256: The Gold Standard for Compliance
SHA-256 is the default choice when compliance or security matters. It’s part of the SHA-2 family, has no known practical attacks, and is required by many regulatory frameworks. The trade-off is speed — SHA-256 is roughly 3x slower than MD5 on the same hardware.
XXH3: When Speed Is Everything
XXH3 is a non-cryptographic hash that runs at 30+ GB/s — limited by memory bandwidth, not CPU. For migration integrity checking where you need speed and aren’t concerned about adversarial collision attacks, XXH3 is excellent.
BLAKE3: The Modern Choice
BLAKE3 combines SHA-256-class security with near-XXH3 speed. It’s parallelizable, produces 256-bit hashes, and is quickly becoming the modern default for file integrity. If you’re choosing a single algorithm for integrity verification, BLAKE3 is hard to beat.
Why Traditional Tools Fall Short
rsync: Checksum as an Afterthought
rsync can verify with checksums, but it’s a separate pass:
# Transfer first
rsync -av /source/ /dest/
# Then verify (reads everything again)
rsync -avc /source/ /dest/
This means:
- Double the I/O — you read every file twice (once to copy, once to verify)
- Double the time — verification takes as long as the transfer
- Race condition — files can change between transfer and verification
For a 200TB migration, a separate verification pass adds another 48 hours. Most admins skip it.
Robocopy: No Checksums at All
Robocopy has no built-in checksum capability. You can compare file sizes and timestamps, but these don’t detect silent corruption. Adding verification means writing PowerShell scripts with Get-FileHash:
# Manual verification after Robocopy
Get-ChildItem -Recurse \\dest\share | ForEach-Object {
$hash = Get-FileHash $_.FullName -Algorithm SHA256
# Compare with source... somehow
}
This is slow, error-prone, and not scalable.
rclone: Optional but Extra Pass
rclone supports --checksum for comparison, but similar to rsync, it’s a separate operation that doubles I/O.
The Dual-Checksum Approach
The best approach for migration integrity uses two checksums:
- Fast hash (XXH3 or CRC32) — computed during transfer for real-time integrity
- Strong hash (SHA-256 or BLAKE3) — for compliance evidence and long-term verification
Why both?
- The fast hash catches transfer corruption immediately — if the hash doesn’t match after network transfer, the file is retried before moving on
- The strong hash provides cryptographic evidence that can’t be spoofed, meeting compliance requirements for audit trails
Think of it like shipping
The fast hash is like checking that a package arrived intact (quick visual inspection). The strong hash is like verifying the serial numbers inside match the shipping manifest (takes longer but is legally defensible).
How syncopio Verifies Every File
syncopio takes a fundamentally different approach from CLI tools: verification happens during transfer, not after.
During Transfer
- Source read — the worker reads the file from the source filesystem
- Hash computation — as bytes stream through memory, checksums are computed in parallel
- Destination write — bytes are written to the destination
- Immediate comparison — the computed hash is compared against the source hash
No second pass. No double I/O. No race conditions.
The Result
| Approach | I/O Passes | Total Time (200TB) | Integrity Confidence |
|---|---|---|---|
| rsync (no verify) | 1 | ~48 hours | None |
| rsync + checksum verify | 2 | ~96 hours | High |
| Robocopy (no verify) | 1 | ~48 hours | None |
| syncopio (verify during transfer) | 1 | ~48 hours | High |
syncopio gives you the same integrity guarantee as rsync’s two-pass approach in half the time.
Verification Reports
After transfer, syncopio generates verification reports suitable for compliance audits:
- Per-file checksums — source hash, destination hash, match status
- Transfer metadata — timestamps, byte counts, worker ID
- Summary statistics — total files, verified files, any mismatches
- Export formats — PDF for auditors, CSV for analysis, Excel for stakeholders
syncopio advantage
syncopio’s built-in verification eliminates the “did it actually copy correctly?” anxiety that plagues every migration. No second pass, no manual verification scripts, no hoping for the best. See all features or request a demo.
Best Practices for Migration Verification
Regardless of what tool you use:
- Always verify — “no errors in the log” isn’t the same as “data is intact”
- Use checksums, not just size+timestamp — matching sizes don’t catch corruption
- Verify on the destination — don’t trust the transfer tool’s self-report; read back from the destination
- Keep verification evidence — compliance auditors will ask for proof
- Plan for the verification time — if you’re doing a separate pass, budget for it in your migration timeline
Further Reading
- The Complete rsync Guide — includes rsync checksum commands
- Data Migration Compliance Checklist — regulatory requirements for verification
- Data Migration: The Complete Guide — end-to-end methodology
- syncopio Features — full feature list including verification