September 15, 2008

Modular tar pipe tar in Perl

I was trying to write a nice tar pipe tar system in a perl script, and got to this, which I think can be useful:



#!/usr/bin/perl


use Archive::Tar;
use Net::SSH;


my $tar = Archive::Tar->new;

my $dir = "/path/to/dir";
# Copy files from dir without recursion
my @files = glob("$dir/*");
$tar->add_files(@files);


$user = "root";
$host = "myhost.example.org";
$cmd = "cd / && tar xf -";

Net::SSH::sshopen2("$user\@$host", *READER, *WRITER, "$cmd") || die "ssh: $!";

print WRITER $tar->write;
close(WRITER);
close(READER);





If you know of a nicer way to do it, I'm open to ideas :)

5 comments:

peck said...

With "$dir/*" you miss the dot files.

Anyway, isn't this calles the rsync command ?

Raphaël said...

You're entirely right about the dot files... This is why I eventually wrote it as


my $tar = Archive::Tar->new;
my @files = File::Find::Rule->maxdepth(1)
->in( $dir );
$tar->add_files(@files);
...


As for the rsync command... no, it's not the same, unless you can tell me how to tell rsync to create the parent directories on the remote side. I spent quite some time searching, and nobody could tell me. Rsync is able to recurse into children directories, but not to create parent ones, whereas tar does it perfectly.

peck said...

rsync "$dir" "$user@$host:`dirname $dir`"

But be careful of the trailing / $dir must not end with it.

Tobi said...

Isn't Perl a bit overkill for this? How about a Bash one-liner:

(cd /path/to/dir && tar -c *) | ssh myhost.example.org "cd / && tar -x"

Raphaël said...

Hi Tobi,

Yes, Perl is obviously overkill for this, and even in Perl, it would be faster to call this same one-liner with system.

Now the reason I came to this Perl code is because this is only a part of a bigger program (whose goal is to replicate CVS repositories synchronously), and the rest of the program justifies to write it in Perl (it uses a DB, SFTP to write on a distant file over SSH, and other stuff that would be tricky in bash).

Also, I am trying to handle errors in this code. It is not very easy because Net::SSH uses IPC:Open2 or IPC::Open3. IPC::Open2 works fine but doesn't let you handle errors, whereas IPC::Open3 let's you handle the stderr of the command, but tends to leak processes. If anyone is familiar with this issue, I'd be happy to know how to deal with this.


Peck > this rsync command won't create the parent directories on the remote side if they don't exist yet. e.g. 'rsync "/path/to/my/dir" "$user@$host:`dirname /path/to/my/dir`"' will fail if /path/to does not exist yet on the remove machine. It simply yields a mkdir "No such file or directory" error. tar|tar works like a charm in this case though. Another reason for not choosing rsync over tar|tar, is that I have to copy tons of little files and I really don't care to override them completely. It is faster for me to just tar all the files, transfer them and untar, than to calculate the md5 or size of every file and transfer only the diff. Overall, I get pretty nice perfs with tar|tar :)