TL;DR

Most migration tools copy hard-linked files as separate files, doubling your disk usage. A 500GB dataset can become 900GB on the destination with no warning. syncopio detects hard links during discovery and preserves link structures during transfer, keeping disk usage identical to the source.

Migration finished. Source: 500GB. Destination: 920GB.

Same number of files. Same file sizes. Every checksum passes. But the destination is using almost twice the disk space. You run du three times because you don’t believe it.

This is what happens when your migration tool doesn’t understand hard links.

What hard links actually are

A hard link is two filenames pointing to the same data on disk. Not a copy. Not a symlink. The same bytes, with two (or more) directory entries referencing one inode.

# Create a file and a hard link to it
echo "important data" > original.txt
ln original.txt linked.txt

# Same inode, same data, one copy on disk
ls -li original.txt linked.txt

1048577 -rw-r--r-- 2 root root 15 Feb 13 10:00 original.txt
1048577 -rw-r--r-- 2 root root 15 Feb 13 10:00 linked.txt

Notice the inode number (1048577) is identical. The link count is 2. There’s only one copy of “important data” on disk, but it has two names.

This isn’t some obscure edge case. Hard links show up everywhere:

Package managers (dpkg, rpm) use them to share identical files across versions
Backup tools (rsnapshot, Time Machine, Borg) use them to deduplicate across snapshots
Build systems that hardlink object files instead of copying them
Media libraries where the same file appears in multiple organizational directories

Why migration breaks them

When a migration tool encounters original.txt and linked.txt, it sees two files. Both report a size of 15 bytes. Both have content to copy. So it copies the data twice.

Source (1 copy on disk):
  original.txt  →  inode 1048577  →  [data: 15 bytes]
  linked.txt    →  inode 1048577  ↗

Destination (2 copies on disk):
  original.txt  →  inode 2097153  →  [data: 15 bytes]
  linked.txt    →  inode 2097154  →  [data: 15 bytes]

Every hard link becomes an independent file. The relationship between them is gone.

Disk usage multiplier

If a 1GB file has 4 hard links on the source, it uses 1GB of disk. After migration, it uses 4GB. The multiplier is the link count. Backup directories with heavy hard linking can easily 3x or 4x in size on the destination.

How much space are you losing?

It depends entirely on your link density. A vanilla file server with no hard links loses nothing. A server running rsnapshot with 30 daily snapshots and lots of unchanged files between them? Expect the destination to be 5x to 10x the actual disk usage of the source.

Quick way to estimate before you migrate:

# Count files with link count > 1 (hard-linked files)
find /source -type f -links +1 | wc -l

# Show the biggest hard-linked files (most space impact)
find /source -type f -links +1 -exec stat -c '%s %h %n' {} \; \
  | sort -rn | head -20

The second column is the link count. A 500MB file with link count 12 means your tool will copy 6GB instead of 500MB.

How to check after migration

Compare actual disk usage between source and destination:

# Apparent size (sum of file sizes, counts linked files multiple times)
du -sh --apparent-size /source/path
du -sh --apparent-size /dest/path

# Actual disk usage (what's really on disk)
du -sh /source/path
du -sh /dest/path

On the source, if hard links are present, du -sh will be significantly smaller than du -sh --apparent-size. On the destination after a naive copy, they’ll be nearly identical, because every link became a real file.

Source:
  Apparent size:  920GB
  Actual usage:   500GB    ← hard links save 420GB

Destination (after migration):
  Apparent size:  920GB
  Actual usage:   920GB    ← hard links are gone, all data duplicated

Quick sanity check

If du -sh and du -sh --apparent-size return the same number on your source, you don’t have significant hard linking. If there’s a big gap, that gap is exactly how much extra space your destination will consume after a naive migration.

What rsync -H does

rsync has a flag for this: -H (or --hard-links). It tracks inodes during the transfer and recreates hard link relationships on the destination.

rsync -aH /source/ /dest/

It works. But there’s a cost.

To detect that two files share an inode, rsync must remember every inode it has seen during the entire transfer. For a dataset with 10 million files, that’s 10 million inode entries held in memory. On large datasets, this can consume several gigabytes of RAM and significantly slow down the initial file list building phase.

For small datasets, -H is a no-brainer. For datasets with tens of millions of files, you may need to plan around the memory overhead, or segment the transfer into smaller runs.

rsync -H memory usage

rsync stores inode-to-path mappings in memory for the entire run. At roughly 100 bytes per entry, 10 million files requires about 1GB of RAM just for hard link tracking. 100 million files pushes that to 10GB. If rsync gets killed by the OOM killer mid-transfer, this is often why.

Cross-protocol makes it worse

Hard link preservation is a filesystem operation. You need the destination filesystem to support creating links, and your tool needs a way to instruct it.

On a local or NFS transfer, this is straightforward. The link() syscall works. But when SMB is involved, things fall apart. SMB has no native remote hard link creation in common implementations. You can’t tell a remote SMB server “make this file a hard link to that other file” the way you can with a local filesystem call.

This means:

NFS to NFS: hard link preservation is possible
NFS to local: hard link preservation is possible
Anything to SMB: hard link preservation requires workarounds or isn’t supported
SMB source: detecting hard links on the source may not be possible over the protocol

If your migration involves SMB on either end, assume hard links will be broken and plan for the additional disk usage.

The scan phase matters

The key insight is that hard link detection needs to happen during scanning, not during transfer. By the time you’re copying files, you need to already know which files share inodes so you can copy the data once and create links for the rest.

This means a two-phase approach:

Scan: walk the source, record inode numbers, identify link groups
Transfer: for each link group, copy the first file normally, then create hard links for the rest

Tools that stream files directly (walk and copy simultaneously) have a harder time with this, because they may encounter the second link before they’ve finished writing the first.

syncopio advantage

syncopio detects hard links during the scan phase by tracking inode numbers across the entire dataset. During transfer, the first file in each link group is copied normally. Every subsequent file sharing that inode becomes a hard link task instead of a copy task. No duplicate data written, no extra disk usage.

Before you migrate: the checklist

1. Check for hard links on the source

find /source -type f -links +1 | wc -l

If the count is zero, you’re fine. If it’s nonzero, keep going.

2. Measure the disk savings from hard links

echo "Apparent: $(du -sh --apparent-size /source | cut -f1)"
echo "Actual:  $(du -sh /source | cut -f1)"

The gap between these numbers is what you’ll lose if links aren’t preserved.

3. Check your tool’s hard link support

Tool	Hard link flag	Memory cost
rsync	`-H`	High (tracks all inodes in RAM)
cp	`-a` preserves on local only	Low
tar	Preserves by default	Medium
rclone	No support	N/A
Robocopy	No support	N/A

4. Verify after migration

# Same check as before
du -sh /source
du -sh /dest

If the numbers match, links were preserved. If the destination is significantly larger, they weren’t.

Further reading:

Hard Links After Migration: Why Your Disk Usage Just Doubled