Concepts

Rsync stutters when scale hits hard.

Andrios Robert

16 Oct 2025 • 1 min read

It’s fast on small datasets, but when you move terabytes across complex network topologies, every pain point shows. Slow sync on millions of tiny files. CPU spikes from checksum calculations. I/O wait times that stack like bricks. Latency bottlenecks on high-latency links. Rsync’s delta-transfer algorithm helps, but not enough when metadata overhead dominates.

The most common rsync pain points?

Performance degradation with large file counts. Even if total size is modest, filesystem calls kill throughput.
Checksum computation overhead under heavy load. Rsync reads and hashes entire files, slowing sync cycles.
Out-of-order file updates when multiple processes write during sync, creating inconsistency.
Sparse error handling on flaky connections, requiring manual retries.
Limited parallelism by design—rsync runs single-threaded, leaving CPU cores idle.

The fixes are partial. Increasing --block-size can speed large-file transfers but hurts delta efficiency. --whole-file can skip checksums but risks overwriting changes. Wrapping rsync in GNU Parallel or custom scripts adds concurrency, but complexity rises fast.

At cloud scale, rsync’s architecture becomes the bottleneck. File-by-file stat calls, single-thread loops, and synchronous network I/O leave bandwidth unused. Even with --compress, the compression stage hits CPU before hitting wire speed. Modern workloads—distributed storage, object stores, ephemeral instances—demand sync tools built for concurrent, incremental, resilient operations.

If these rsync pain points stall deploys, backups, or CI/CD pipelines, it’s time to see how purpose-built sync systems handle scale. Check out hoop.dev and watch your first sync go live in minutes.