rsync vs cp vs sbt.IO

May 26, 2017 ยท View on GitHub

Local file synchronization performance test.

I want to do some one-way sync: cp file from source to destination. I met this problem, how to choose an at-hand algorithm to sync mapped directories?

There are at least several convenient ways:

  • sbt.IO
  • rsync -a
  • cp -au
  • common-io (No test, introduces extra dependency.)

From sbt.IO.copyDirectory's doc, we know it provides fairly smart way to sync. Modern version rsync seems to be an adept.

This test is actually more valuable in a more general context.

Test scheme:

Generate 3 file trees as source to simulate the folder user wants to sync.

NameTotal file amountMax file sizeMax dir depth
small10010kb3
medium1000100kb6
big10000100kb12
big210001000kb12

(Sample Tree layout)

Simulate updating/modification in source file tree. And do the synchronization. Measure how much time elapsed during synchronization.

Because in my scenario, sync happens often while source dir may not change frequently, before sync, only do some minor modification to source dir (add 5% extra files, and modify(append) 5% files).

Modification simulation is random, so every time the burden it inflicts to sync may be different. Test up to 10 rounds to mitigate this effect. A fresh source dir for each combination grows after every round.

Test with scala app. Shell commands are executed by sbt.Process. (Which may be not fair, but this is only what my scenario suits.)

Every run, only start one combination, e.g. sbt + ram or rsync + ssd.

Do it yourself (you need enough RAM): (Test source code)

Accoutrement:

  • my dev pc: xeon-1230v2 3.3-3.7G, 32GB-DDR3-1600, ubuntu 16.04

  • sbt version 0.13.15 / jdk 1.8 / scala 2.12

  • rsync version 3.1.1

  • cp version 8.25

Test result:

  1. Source in RAM, total time cost:
Tpesmallmediumbigbig2
sbt20ms273ms2906ms507ms
rsync -a454ms751ms3394ms2444ms
cp -au35ms209ms1712ms636ms
  1. Source in RAM, no modification on source dir, total time cost:
Tpesmallmediumbigbig2
sbt20ms225ms2552ms304ms
rsync -a453ms539ms1441ms625ms
cp -au18ms115ms1354ms133ms
  1. Source on SSD, total time cost:
Tpesmallmedium
sbt.IO23ms280ms
rsync -a455ms812ms
cp -au22ms185ms
  1. Source on SSD, no modification on source dir, total time cost:
Tpesmallmedium
sbt.IO21ms252ms
rsync -a453ms615ms
cp -au18ms106ms

I chose cp -au to do the sync.