Tool: pv / pipe viewer

How to use pv?

Ok, let’s start with some really easy examples and progress to more complicated ones.

Suppose that you had a file “access.log” that is a few gigabytes in size and contains web logs. You want to compress it into a smaller file, let’s say a gunzip archive (.gz). The obvious way would be to do:

$ gzip -c access.log > access.log.gz

As the file is so huge (several gigabytes), you have no idea how long to wait. Will it finish soon? Or will it take another 30 mins?

By using pv you can precisely time how long it will take. Take a look at doing the same through pv:

$ pv access.log | gzip > access.log.gz
611MB 0:00:11 [58.3MB/s] [=>      ] 15% ETA 0:00:59

Pipe viewer acts as “cat” here, except it also adds a progress bar. We can see that gzip processed 611MB of data in 11 seconds. It has processed 15% of all data and it will take 59 more seconds to finish.

You may stick several pv processes in between. For example, you can time how fast the data is being read from the disk and how much data is gzip outputting:

$ pv -cN source access.log | gzip | pv -cN gzip > access.log.gz
source:  760MB 0:00:15 [37.4MB/s] [=>     ] 19% ETA 0:01:02
  gzip: 34.5MB 0:00:15 [1.74MB/s] [  <=>  ]

Here we specified the “-N” parameter to pv to create a named stream. The “-c” parameter makes sure the output is not garbaged by one pv process writing over the other.

This example shows that “access.log” file is being read at a speed of 37.4MB/s but gzip is writing data at only 1.74MB/s. We can immediately calculate the compression rate. It’s 37.4/1.74 = 21x!

Notice how the gzip does not include how much data is left or how fast it will finish. It’s because the pv process after gzip has no idea how much data gzip will produce (it’s just outputting compressed data from input stream). The first pv process, however, knows how much data is left, because it’s reading it.

Another similar example would be to pack the whole directory of files into a compressed tarball:

$ tar -czf - . | pv > out.tgz
 117MB 0:00:55 [2.7MB/s] [>         ]

In this example pv shows just the output rate of “tar -czf” command. Not very interesting and it does not provide information about how much data is left. We need to provide the total size of data we are tarring to pv, it’s done this way:

$ tar -cf - . | pv -s $(du -sb . | awk '{print $1}') | gzip > out.tgz
 253MB 0:00:05 [46.7MB/s] [>     ]  1% ETA 0:04:49

What happens here is we tell tar to create “-c” an archive of all files in current dir “.” (recursively) and output the data to stdout “-f -”. Next we specify the size “-s” to pv of all files in current dir. The “du -sb . | awk ‘{print $1}’” returns number of bytes in current dir, and it gets fed as “-s” parameter to pv. Next we gzip the whole content and output the result to out.tgz file. This way “pv” knows how much data is still left to be processed and shows us that it will take yet another 4 mins 49 secs to finish.

Another fine example is copying large amounts of data over network by using help of “nc” utility that I will write about some other time.

Suppose you have two computers A and B. You want to transfer a directory from A to B very quickly. The fastest way is to use tar and nc, and time the operation with pv.

# on computer A, with IP address 192.168.1.100
$ tar -cf - /path/to/dir | pv | nc -l -p 6666 -q 5
# on computer B
$ nc 192.168.1.100 6666 | pv | tar -xf -

That’s it. All the files in /path/to/dir on computer A will get transferred to computer B, and you’ll be able to see how fast the operation is going.

If you want the progress bar, you have to do the “pv -s $(…)” trick from the previous example (only on computer A).

Another funny example is by my blog reader alexandru. He shows how to time how fast the computer reads from /dev/zero:

$ pv /dev/zero > /dev/null
 157GB 0:00:38 [4,17GB/s]

That’s about it. I hope you enjoyed my examples and learned something new. I love explaining things and teaching! :)

How to install pv?

If you’re on Debian or Debian based system such as Ubuntu do the following:

$ sudo aptitude install pv

If you’re on Fedora or Fedora based system such as CentOS do:

$ sudo yum install pv

If you’re on Slackware, go to [5] pv homepage, download the pv-version.tar.gz archive and do:

$ tar -zxf pv-version.tar.gz
$ cd pv-version
$ ./configure && sudo make install

If you’re a Mac user:

$ sudo port install pv

If you’re OpenSolaris user:

$ pfexec pkg install pv

If you’re a Windows user on Cygwin:

$ ./configure
$ export DESTDIR=/cygdrive/c/cygwin
$ make
$ make install

The manual of the utility can be found here [6] man pv.

Have fun measuring your pipes with pv, and until next time!

via: catonmat

You must be logged in to post a comment.