Transferring large amount of data over the network: scp, tar | ssh, tar | nc at comparison
Scp is slow, that's a known fact. Known and annoying that someone tried to fix it producing the hpn-ssh patch:
SCP and the underlying SSH2 protocol implementation in OpenSSH is network performance limited by statically defined internal flow control buffers. These buffers often end up acting as a bottleneck for network throughput of SCP, especially on long and high bandwidth network links.
Nonetheless, especially for small transfers, scp is straightforward and so that's what I use. But transferring 100GB of data between 2 machines on the same lan proved to be such a pain that I decided to opt for one of the alternatives, the 2 most common being tar over ssh and tar over netcat. The whole thing got me curious so I decided to do some testing/bechmarking.
This is no scientific test. There was background noise, OSes of the box were
different, and more. But it's good enough as a real life test between two boxes
on the same lan
Test bed
Two boxes, referred to as hostA and hostB from now on, with the same specs:
vendor_id : AuthenticAMD
model name : AMD Sempron(tm) Processor 2800+
cpu MHz : 1600.010
MemTotal : 2009992 kB
SATA disks: : Timing cached reads: 1243.04 MB/sec
Timing buffered disk reads: 57.97 MB/sec
Network : VIA Technologies, Inc. VT6102 [Rhine-II] (rev 78)
Switch : Netgear 10/100 Mbs
Boxes were connected via a 10/100Mbs switch, living on the same LAN/subnet.
Given the above setup it's safe to assume that the
network is the bottleneck, with its theoretical 12MB/s peak transfer rate.
Test cases and data set
I've created 2 directories, one containing 2000 100KB files , and the other 200 10MB files. All files I've been created using dd if=/dev/urandom of=file.These are the commands I've compared:
hostA: scp -r dir user@hostB:/tmp/ hostA: tar cf - dir | ssh user@hostB tar xf - -C /tmp/ hostA: tar cf - dir | nc -w1 hostB 6969 \ on hostB: nc -l -p 6969 | tar xf - -C /tmp/I've also run a set of tests using ssh compression and tar gzip compression. To be noted that bzip2 compression is too CPU expensive to be generally worth it.
Results
| Command | Compression | Fileset | Time |
| scp | No | Small | 0:01:53 |
| scp | No | Large | 0:10:10 |
| scp | Yes | Small | 0:02:46 |
| scp | Yes | Large | 0:14:11 |
| tar | ssh | No | Small | 0:00:24 |
| tar | ssh | No | Large | 0:03:18 |
| tar | ssh | Yes ssh | Small | 0:01:09 |
| tar | ssh | Yes ssh | Large | 0:11:33 |
| tar | ssh | Yes tar gz | Small | 0:00:18 |
| tar | ssh | Yes tar gz | Large | 0:01:57 |
| tar | nc | No | Small | 0:00:21 |
| tar | nc | No | Large | 0:03:24 |
| tar | nc | Yes tar gz | Small | 0:00:20 |
| tar | nc | Yes tar gz | Large | 0:01:16 |
This is a summary with totals for the entire dataset transfer with times in seconds
| Command | Compression | Time |
| scp | No | 723 |
| scp | Yes ssh | 1017 |
| tar | ssh | No | 222 |
| tar | ssh | Yes ssh | 762 |
| tar | ssh | Yes tar gz | 135 |
| tar | nc | No | 225 |
| tar | nc | Yes tar gz | 96 |