Hard Links After Migration: Why Your Disk Usage Just Doubled
Most migration tools copy hard-linked files as separate files. Your 500GB dataset becomes 900GB on the destination, and nobody tells you why.
Migration finished. Source: 500GB. Destination: 920GB.
Same number of files. Same file sizes. Every checksum passes. But the destination is using almost twice the disk space. You run du three times because you donât believe it.
This is what happens when your migration tool doesnât understand hard links.
What hard links actually are
A hard link is two filenames pointing to the same data on disk. Not a copy. Not a symlink. The same bytes, with two (or more) directory entries referencing one inode.
# Create a file and a hard link to it
echo "important data" > original.txt
ln original.txt linked.txt
# Same inode, same data, one copy on disk
ls -li original.txt linked.txt
1048577 -rw-r--r-- 2 root root 15 Feb 13 10:00 original.txt
1048577 -rw-r--r-- 2 root root 15 Feb 13 10:00 linked.txt
Notice the inode number (1048577) is identical. The link count is 2. Thereâs only one copy of âimportant dataâ on disk, but it has two names.
This isnât some obscure edge case. Hard links show up everywhere:
- Package managers (dpkg, rpm) use them to share identical files across versions
- Backup tools (rsnapshot, Time Machine, Borg) use them to deduplicate across snapshots
- Build systems that hardlink object files instead of copying them
- Media libraries where the same file appears in multiple organizational directories
Why migration breaks them
When a migration tool encounters original.txt and linked.txt, it sees two files. Both report a size of 15 bytes. Both have content to copy. So it copies the data twice.
Source (1 copy on disk):
original.txt â inode 1048577 â [data: 15 bytes]
linked.txt â inode 1048577 â
Destination (2 copies on disk):
original.txt â inode 2097153 â [data: 15 bytes]
linked.txt â inode 2097154 â [data: 15 bytes]
Every hard link becomes an independent file. The relationship between them is gone.
Disk usage multiplier
If a 1GB file has 4 hard links on the source, it uses 1GB of disk. After migration, it uses 4GB. The multiplier is the link count. Backup directories with heavy hard linking can easily 3x or 4x in size on the destination.
How much space are you losing?
It depends entirely on your link density. A vanilla file server with no hard links loses nothing. A server running rsnapshot with 30 daily snapshots and lots of unchanged files between them? Expect the destination to be 5x to 10x the actual disk usage of the source.
Quick way to estimate before you migrate:
# Count files with link count > 1 (hard-linked files)
find /source -type f -links +1 | wc -l
# Show the biggest hard-linked files (most space impact)
find /source -type f -links +1 -exec stat -c '%s %h %n' {} \; \
| sort -rn | head -20
The second column is the link count. A 500MB file with link count 12 means your tool will copy 6GB instead of 500MB.
How to check after migration
Compare actual disk usage between source and destination:
# Apparent size (sum of file sizes, counts linked files multiple times)
du -sh --apparent-size /source/path
du -sh --apparent-size /dest/path
# Actual disk usage (what's really on disk)
du -sh /source/path
du -sh /dest/path
On the source, if hard links are present, du -sh will be significantly smaller than du -sh --apparent-size. On the destination after a naive copy, theyâll be nearly identical, because every link became a real file.
Source:
Apparent size: 920GB
Actual usage: 500GB â hard links save 420GB
Destination (after migration):
Apparent size: 920GB
Actual usage: 920GB â hard links are gone, all data duplicated
Quick sanity check
If du -sh and du -sh --apparent-size return the same number on your source, you donât have significant hard linking. If thereâs a big gap, that gap is exactly how much extra space your destination will consume after a naive migration.
What rsync -H does
rsync has a flag for this: -H (or --hard-links). It tracks inodes during the transfer and recreates hard link relationships on the destination.
rsync -aH /source/ /dest/
It works. But thereâs a cost.
To detect that two files share an inode, rsync must remember every inode it has seen during the entire transfer. For a dataset with 10 million files, thatâs 10 million inode entries held in memory. On large datasets, this can consume several gigabytes of RAM and significantly slow down the initial file list building phase.
For small datasets, -H is a no-brainer. For datasets with tens of millions of files, you may need to plan around the memory overhead, or segment the transfer into smaller runs.
rsync -H memory usage
rsync stores inode-to-path mappings in memory for the entire run. At roughly 100 bytes per entry, 10 million files requires about 1GB of RAM just for hard link tracking. 100 million files pushes that to 10GB. If rsync gets killed by the OOM killer mid-transfer, this is often why.
Cross-protocol makes it worse
Hard link preservation is a filesystem operation. You need the destination filesystem to support creating links, and your tool needs a way to instruct it.
On a local or NFS transfer, this is straightforward. The link() syscall works. But when SMB is involved, things fall apart. SMB has no native remote hard link creation in common implementations. You canât tell a remote SMB server âmake this file a hard link to that other fileâ the way you can with a local filesystem call.
This means:
- NFS to NFS: hard link preservation is possible
- NFS to local: hard link preservation is possible
- Anything to SMB: hard link preservation requires workarounds or isnât supported
- SMB source: detecting hard links on the source may not be possible over the protocol
If your migration involves SMB on either end, assume hard links will be broken and plan for the additional disk usage.
The scan phase matters
The key insight is that hard link detection needs to happen during scanning, not during transfer. By the time youâre copying files, you need to already know which files share inodes so you can copy the data once and create links for the rest.
This means a two-phase approach:
- Scan: walk the source, record inode numbers, identify link groups
- Transfer: for each link group, copy the first file normally, then create hard links for the rest
Tools that stream files directly (walk and copy simultaneously) have a harder time with this, because they may encounter the second link before theyâve finished writing the first.
syncopio advantage
syncopio detects hard links during the scan phase by tracking inode numbers across the entire dataset. During transfer, the first file in each link group is copied normally. Every subsequent file sharing that inode becomes a hard link task instead of a copy task. No duplicate data written, no extra disk usage.
Before you migrate: the checklist
1. Check for hard links on the source
find /source -type f -links +1 | wc -l
If the count is zero, youâre fine. If itâs nonzero, keep going.
2. Measure the disk savings from hard links
echo "Apparent: $(du -sh --apparent-size /source | cut -f1)"
echo "Actual: $(du -sh /source | cut -f1)"
The gap between these numbers is what youâll lose if links arenât preserved.
3. Check your toolâs hard link support
| Tool | Hard link flag | Memory cost |
|---|---|---|
| rsync | -H | High (tracks all inodes in RAM) |
| cp | -a preserves on local only | Low |
| tar | Preserves by default | Medium |
| rclone | No support | N/A |
| Robocopy | No support | N/A |
4. Verify after migration
# Same check as before
du -sh /source
du -sh /dest
If the numbers match, links were preserved. If the destination is significantly larger, they werenât.
Further reading: