Using pv with md5sum

I used md5sum with pv to check 4 GiB of files that are in the same directory:

md5sum dir/* | pv -s 4g | sort

The command completes successfully in about 28 seconds, but pv's output is all wrong. This is the sort of output that is displayed throughout:

219 B 0:00:07 [ 125 B/s ] [>                                ]  0% ETA 1668:01:09:02

It's like this without the -s 4g and | sort aswell. I've also tried it with different files.

I've tried using pv with cat and the output was fine, so the problem seems to be caused by md5sum.

asked Jan 19 at 16:29

EmmaV

1,1581332

1

It's likely a buffering issue. That is, the output from md5sum is not line-buffered and won't arrive at pv until the process is done or has produced enough data to fill the output buffer. I can't see an option in the md5sum manual to make it line-buffered. Or, you are misunderstanding what is happening, which is that the data sent through pv is only the checksums (and filenames). Also pv does not know how much data to expect, so it can't say how much is left.

– Kusalananda
Jan 19 at 16:42

It seems like only the checksums and filenames are going thorough pv (but this doesn't seem to affect anyone else?). Is there are way to make all of the file data go through pv?

– EmmaV
Jan 19 at 16:49

The issue with that is that you would loose the filename. Think of pv as a "fancy cat". Using cat file | md5sum, you would get the MD5 hash for a single file, but md5sum has no way of tagging the result with a filename.

– Kusalananda
Jan 19 at 16:51

1

You are using pv to rate the output of md5sum (which is a few bytes) and not md5sum's own progress of reading the files themselves. Maybe this answer is related: unix.stackexchange.com/q/16826/30851 (on second thought, maybe not - it's about textfiles...)

– frostschutz
Jan 19 at 17:05

2

Since you are not feeding 4Gb of data down the pipe, but just the output of md5sum for a plurality of files, changing the -s 4g option such that it reflects an estimate of the size of md5sum's output, e.g. -s 512, should be a step in the right direction.

– ozzy
Jan 19 at 17:12

add a comment |

I used md5sum with pv to check 4 GiB of files that are in the same directory:

md5sum dir/* | pv -s 4g | sort

The command completes successfully in about 28 seconds, but pv's output is all wrong. This is the sort of output that is displayed throughout:

219 B 0:00:07 [ 125 B/s ] [>                                ]  0% ETA 1668:01:09:02

It's like this without the -s 4g and | sort aswell. I've also tried it with different files.

I've tried using pv with cat and the output was fine, so the problem seems to be caused by md5sum.

asked Jan 19 at 16:29

EmmaV

1,1581332

1

It's likely a buffering issue. That is, the output from md5sum is not line-buffered and won't arrive at pv until the process is done or has produced enough data to fill the output buffer. I can't see an option in the md5sum manual to make it line-buffered. Or, you are misunderstanding what is happening, which is that the data sent through pv is only the checksums (and filenames). Also pv does not know how much data to expect, so it can't say how much is left.

– Kusalananda
Jan 19 at 16:42

It seems like only the checksums and filenames are going thorough pv (but this doesn't seem to affect anyone else?). Is there are way to make all of the file data go through pv?

– EmmaV
Jan 19 at 16:49

The issue with that is that you would loose the filename. Think of pv as a "fancy cat". Using cat file | md5sum, you would get the MD5 hash for a single file, but md5sum has no way of tagging the result with a filename.

– Kusalananda
Jan 19 at 16:51

1

You are using pv to rate the output of md5sum (which is a few bytes) and not md5sum's own progress of reading the files themselves. Maybe this answer is related: unix.stackexchange.com/q/16826/30851 (on second thought, maybe not - it's about textfiles...)

– frostschutz
Jan 19 at 17:05

2

Since you are not feeding 4Gb of data down the pipe, but just the output of md5sum for a plurality of files, changing the -s 4g option such that it reflects an estimate of the size of md5sum's output, e.g. -s 512, should be a step in the right direction.

– ozzy
Jan 19 at 17:12

add a comment |

I used md5sum with pv to check 4 GiB of files that are in the same directory:

md5sum dir/* | pv -s 4g | sort

The command completes successfully in about 28 seconds, but pv's output is all wrong. This is the sort of output that is displayed throughout:

219 B 0:00:07 [ 125 B/s ] [>                                ]  0% ETA 1668:01:09:02

It's like this without the -s 4g and | sort aswell. I've also tried it with different files.

I've tried using pv with cat and the output was fine, so the problem seems to be caused by md5sum.

asked Jan 19 at 16:29

EmmaV

1,1581332

I used md5sum with pv to check 4 GiB of files that are in the same directory:

md5sum dir/* | pv -s 4g | sort

The command completes successfully in about 28 seconds, but pv's output is all wrong. This is the sort of output that is displayed throughout:

219 B 0:00:07 [ 125 B/s ] [>                                ]  0% ETA 1668:01:09:02

It's like this without the -s 4g and | sort aswell. I've also tried it with different files.

I've tried using pv with cat and the output was fine, so the problem seems to be caused by md5sum.

pipe hashsum pv

asked Jan 19 at 16:29

EmmaV

1,1581332

asked Jan 19 at 16:29

EmmaV

1,1581332

asked Jan 19 at 16:29

EmmaV

1,1581332

asked Jan 19 at 16:29

EmmaV

1,1581332

asked Jan 19 at 16:29

EmmaV

1,1581332

1

It's likely a buffering issue. That is, the output from md5sum is not line-buffered and won't arrive at pv until the process is done or has produced enough data to fill the output buffer. I can't see an option in the md5sum manual to make it line-buffered. Or, you are misunderstanding what is happening, which is that the data sent through pv is only the checksums (and filenames). Also pv does not know how much data to expect, so it can't say how much is left.

– Kusalananda
Jan 19 at 16:42

It seems like only the checksums and filenames are going thorough pv (but this doesn't seem to affect anyone else?). Is there are way to make all of the file data go through pv?

– EmmaV
Jan 19 at 16:49

The issue with that is that you would loose the filename. Think of pv as a "fancy cat". Using cat file | md5sum, you would get the MD5 hash for a single file, but md5sum has no way of tagging the result with a filename.

– Kusalananda
Jan 19 at 16:51

1

You are using pv to rate the output of md5sum (which is a few bytes) and not md5sum's own progress of reading the files themselves. Maybe this answer is related: unix.stackexchange.com/q/16826/30851 (on second thought, maybe not - it's about textfiles...)

– frostschutz
Jan 19 at 17:05

2

Since you are not feeding 4Gb of data down the pipe, but just the output of md5sum for a plurality of files, changing the -s 4g option such that it reflects an estimate of the size of md5sum's output, e.g. -s 512, should be a step in the right direction.

– ozzy
Jan 19 at 17:12

add a comment |

1

It's likely a buffering issue. That is, the output from md5sum is not line-buffered and won't arrive at pv until the process is done or has produced enough data to fill the output buffer. I can't see an option in the md5sum manual to make it line-buffered. Or, you are misunderstanding what is happening, which is that the data sent through pv is only the checksums (and filenames). Also pv does not know how much data to expect, so it can't say how much is left.

– Kusalananda
Jan 19 at 16:42

It seems like only the checksums and filenames are going thorough pv (but this doesn't seem to affect anyone else?). Is there are way to make all of the file data go through pv?

– EmmaV
Jan 19 at 16:49

The issue with that is that you would loose the filename. Think of pv as a "fancy cat". Using cat file | md5sum, you would get the MD5 hash for a single file, but md5sum has no way of tagging the result with a filename.

– Kusalananda
Jan 19 at 16:51

1

You are using pv to rate the output of md5sum (which is a few bytes) and not md5sum's own progress of reading the files themselves. Maybe this answer is related: unix.stackexchange.com/q/16826/30851 (on second thought, maybe not - it's about textfiles...)

– frostschutz
Jan 19 at 17:05

2

Since you are not feeding 4Gb of data down the pipe, but just the output of md5sum for a plurality of files, changing the -s 4g option such that it reflects an estimate of the size of md5sum's output, e.g. -s 512, should be a step in the right direction.

– ozzy
Jan 19 at 17:12

It's likely a buffering issue. That is, the output from md5sum is not line-buffered and won't arrive at pv until the process is done or has produced enough data to fill the output buffer. I can't see an option in the md5sum manual to make it line-buffered. Or, you are misunderstanding what is happening, which is that the data sent through pv is only the checksums (and filenames). Also pv does not know how much data to expect, so it can't say how much is left.

– Kusalananda
Jan 19 at 16:42

It seems like only the checksums and filenames are going thorough pv (but this doesn't seem to affect anyone else?). Is there are way to make all of the file data go through pv?

– EmmaV
Jan 19 at 16:49

The issue with that is that you would loose the filename. Think of pv as a "fancy cat". Using cat file | md5sum, you would get the MD5 hash for a single file, but md5sum has no way of tagging the result with a filename.

– Kusalananda
Jan 19 at 16:51

You are using pv to rate the output of md5sum (which is a few bytes) and not md5sum's own progress of reading the files themselves. Maybe this answer is related: unix.stackexchange.com/q/16826/30851 (on second thought, maybe not - it's about textfiles...)

– frostschutz
Jan 19 at 17:05

Since you are not feeding 4Gb of data down the pipe, but just the output of md5sum for a plurality of files, changing the -s 4g option such that it reflects an estimate of the size of md5sum's output, e.g. -s 512, should be a step in the right direction.

– ozzy
Jan 19 at 17:12

add a comment |

5 Answers
5

active

oldest

votes

pv is a "fancy cat", which is that you may use pv in most situations where you would use cat.

Using cat with md5sum, you can compute the MD5 checksum of a single file with

cat file | md5sum

or, with pv,

pv file | md5sum

Unfortunately though, this does not allow md5sum to insert the filename into its output properly.

Now, fortunately, pv is a really fancy cat, and on some systems (Linux), it's able to watch the data being passed through another process. This is done by using its -d option with the process ID of that other process.

This means that you can do things like

md5sum dir/* | sort >sums &

sleep 1

pv -d "$(pgrep -n md5sum)"

This would allow pv to watch the md5sum process. The sleep is there to allow md5sum, which is running in the background, to properly start. pgrep -n md5sum would return the PID of the most recently started md5sum process that you own. pv will exit as soon as the process that it is watching terminates.

I've tested this particular way of running pv a few times and it seems to generally work well, but sometimes it seems to stop outputting anything as md5sum switches to the next file. Sometimes, it seems to spawn spurious background tasks in the shell.

It would probably be safest to run it as

md5sum dir/* >sums &

sleep 1

pv -W -d "$!"

sort -o sums sums

The -W option will cause pv to wait until there's actual data being transferred, although this does also not always seem to work reliably.

edited Jan 19 at 17:32

answered Jan 19 at 17:14

Kusalananda

126k16239393

The need for sleep is somewhat surprising!

– Stephen Kitt
Jan 19 at 17:43

@StephenKitt That's the only way I could get it to behave somewhat predictably. I don't really know what it does, but whatever it is, it seems to be possible for it to do it "too early", and the progress meter won't show at all.

– Kusalananda
Jan 19 at 17:45

add a comment |

The data that you are feeding through the pipe is not the data of the files that md5sum is processing, but instead the md5sum output, which, for every file, consists of one line comprising: the MD5-hash, two spaces, and the file name. Since we know this in advance, can inform pv accordingly, so as to enable it to display an accurate progress indicator. There are two ways of doing so.

The first, preferred method (suggested by frostschutz) makes use of the fact that md5sum generates one line per processed file, and the fact that pv has a line mode that counts lines rather than bytes. In this mode pv will only move the progress bar when it encounters a newline in the throughput, i.e. per file finished by md5sum. In Bash, this first method can look like this:

set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort

The set builtin is used to set the positional parameters to the files to be processed (the *.iso shell pattern is expanded by the shell). md5sum is then told to process these files ($@ expands to the positional parameters), and pv in line mode will move the progress indicator each time a file has been processed / a line is output by md5sum. Notably, pv is informed of the total number of lines it can expect (-s $#), as the special shell parameter $# expands to the number of positional arguments.

The second method is not line-based but byte-based. With md5sum this unnecessarily complicated, but some other program may not produce lines but for instance continuous data, and then this approach may be more practical. I illustrate it with md5sum though. The idea is to calculate the amount of data that md5sum (or some other program) will produce, and use this to inform pv. In Bash, this could look as follows:

os=$(( $( ls -1 | wc -c ) + $( ls -1 | wc -l ) * 34 ))

md5sum * | pv -s $os | sort

The first line calculates the output size (os) estimate: the first term is the number of bytes necessary for encoding the filenames (incl. newline), the second term the number of bytes used for encoding the MD5-hashes (32 bytes each), plus 2 spaces. In the second line, we tell pv that the expected amount of data is os bytes, so that it can show an accurate progress indicator leading up to 100% (which indicator is updated per finished md5summed file).

Obviously, both methods are only practical in case multiple files are to be processed. Also, it should be noted that since the output of md5sum is not related to the amount of time the md5sum program has to spend crunching the underlying data, the progress indicator may be considered somewhat misleading. E.g., in the second method, the file with the shortest name will yield the lowest progress update, even though it may actually be the biggest in size. Then again, if all files have a similar sizes and names, this shouldn't matter much.

edited Jan 20 at 9:14

answered Jan 19 at 17:48

ozzy

6955

2

It's a very nice idea to calculate progress based on md5sum output (even though the progress would indicate how many files are still left, and not - how large or how long that would take). However it should not require parsing ls. pv supports --line-mode so set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort might be equivalent and still work if you replace md5sum with sha512sum or otherwise.

– frostschutz
Jan 19 at 19:39

@frostschutz You are undeniably right :-) This is a nicer, cleaner solution.

– ozzy
Jan 19 at 20:48

add a comment |

Here's a dirty hack to get progress per file:

for f in iso/*

do

    pv "$f" | (

        cat > /dev/null &

        md5sum "$f"

        wait

    )

done

What it looks like:

4.15GiB 0:00:32 [ 130MiB/s] [================================>] 100%            

0db0b36fc7bad7b50835f68c369e854c  iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso

 792MiB 0:00:06 [ 130MiB/s] [================================>] 100%            

97537db63e61d20a5cb71d29145b2937  iso/archlinux-2016.10.01-dual.iso

 843MiB 0:00:06 [ 129MiB/s] [================================>] 100%            

1b5dc31e038499b8409f7d4d720e3eba  iso/lubuntu-16.04-desktop-i386.iso

 259MiB 0:00:02 [ 130MiB/s] [=========>                        ] 30% ETA 0:00:04

...

Now, this makes several assumptions. Firstly, that reading data is slower than hashing it. Secondly, that OS will cache the I/O so data won't be (physically) read twice even though pv and md5sum are completely independent readers.

The nice thing about such a dirty, dirty hack is that you can easily adapt it to make a progress bar across all the data, not just one file. And still do weird stuff like sort the output afterwards.

pv iso/* | (

    cat > /dev/null &

    md5sum iso/* | sort

    wait

)

What it looks like (ongoing):

15.0GiB 0:01:47 [ 131MiB/s] [===========================>      ] 83% ETA 0:00:21

What it looks like (finished):

18.0GiB 0:02:11 [ 140MiB/s] [================================>] 100%            

0db0b36fc7bad7b50835f68c369e854c  iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso

155603390e65f2a8341328be3cb63875  iso/systemrescuecd-x86-4.2.0.iso

1b5dc31e038499b8409f7d4d720e3eba  iso/lubuntu-16.04-desktop-i386.iso

1b6ed6ff8d399f53adadfafb20fb0d71  iso/systemrescuecd-x86-4.4.1.iso

25715326d7096c50f7ea126ac20eabfd  iso/openSUSE-13.2-KDE-Live-i686.iso

...

Now, that's for the hacks. Check other answers for proper solutions. ;-)

answered Jan 19 at 18:47

frostschutz

26.5k15483

Reading the data would be required to hashing it, so hashing the data is almost certainly slower than just reading it. Also, you force reading the data twice for each file (once with pv and once with md5sum).

– Kusalananda
Jan 19 at 18:52

@Kusalananda ...and that's why I called it a hack! pv < /dev/zero | md5sum -> 637MiB/s on my machine. The ISO files I tested with were on a USB3 stick, 140MiB/s is about the max read speed. So md5sum can hash data faster than read it. Yes it reads data twice, but that's pedantic - thanks to OS caching, the USB stick still reads it only once; and running this hack is (in my case) not any slower than running md5sum by itself without progress bar.

– frostschutz
Jan 19 at 19:07

...and yes, it can go wrong. If you were to do this on a slow machine with a very fast storage medium, pv would run way ahead of md5sum and data would probably end up being read twice for real, or in any case, the progress bar would not be in sync with md5sum at all.

– frostschutz
Jan 19 at 19:12

add a comment |

As already pointed out in comments and other answers:

You are piping into pv only md5sum's output: checksums and file names; thus, pv's progress bar is not able to show how much data md5sum is reading.

A size of 4 GB will be of course too much for that. Also, providing pv with the size of the file(s) you are piping into it (manually, with -s) is inconvenient.

Piping the content of your files into pv and then into md5sum will give you a progress bar, but file names would be lost.

This code is a not so elegant way to have both—a meaningful progress bar and file names with checksums:

#!/bin/sh



for file in "$@"; do

    pv -- "$file" |

    md5sum |

    sed 's/-$//' |

    printf '%s%sn' "$(cat -)" "$file"

done

The script is meant to be invoked as:

./script dir/*

You can of course declare it as a function, to avoid having to type its path to call it (or adding it to your PATH):

function pvsum () {

    for file in "$@"; do

        pv -- "$file" | md5sum | sed 's/-$//' | printf '%s%sn' "$(cat -)" "$file"

    done

}

This way, the command pvsum dir/* | sort will be equivalent to your md5sum dir/* | pv -s <size> | sort.

Its output:

$ ./testscript testdir/*

4.00GiB 0:00:09 [ 446MiB/s] [==============================>] 100%            

9dab5f8add1f699bca108f99e5fa5342  testdir/file1

1.00GiB 0:00:02 [ 447MiB/s] [==============================>] 100%            

06a738a71e3fd3119922bdac259fe29a  testdir/file2

What it does:

It loops over the given files and, for each:
- Pipes the file from pv into md5sum, showing the default progress bar.
- sed is used to remove the - printed by md5sum (which is reading from standard input); this also attempts to make the output suitable for being consumed by md5sum -c (thanks to frostschutz for pointing out this out)¹.
- Prints the checksum followed by the file name on standard output.

About sort:

I'm not sure about your expected results, so I have just ignored it. Since pv writes its progress bar to standard error, piping everything into sort will detach pv's output from md5sum's output.

Anyway, you can just append | sort after done in the code above and check if the result is fine to you.

¹ Note that the output from the code shown above will not be suitable for md5sum -c if file names include newlines. Handling newlines is possible, but some versions of md5sum behave differently in this respect (see, for instance, answers to this question), making a general solution not easy (and out of the scope of this answer).

Assuming a recent version of md5sum, an attempt at solving this issue could be:

for file in "$@"; do

    pv -- "$file" |

    md5sum |

    sed 's/-$//' |

    printf '%s%sn' "$(cat -)" "$file" |

    sed '$! N; s/\/\\/g; s/n/\n/g; t x; q; :x s/^/\/;'

done

Where the only addition, the final sed, will:

Put the whole input (checksum and file name) in pattern space, since it may contain newlines: $! matches any line except for the last one; N; appends a newline and the next line to the pattern space.

Escape with a backslash () any backslash found.

Replace with n any newline found.

Only if at least a backslash or newline has been replaced (t x;: branch to label x), a backslash is added at the beginning of the checksum to signal md5sum -c that something has to be unescaped; otherwise just quit.

edited Jan 20 at 18:15

answered Jan 19 at 18:23

fra-san

1,3971215

add a comment |

I have also enjoyed taming the 'fancy cat', `pv`, for `md5sum` :-)

I think my shellscript is rather stable now

There is a usage output, if you do not enter the pattern correctly.

It works with wild cards, but does not recurse into subdirectories

You can enter more than one pattern, for example ".* *"

There is a verbosity switch that turns on checking the md5sums ... OK

You can redirect the relevant output into a file; the process view output of pv will stay on the {screen/terminal window}

There are two pv processes in a for loop, one global and one for each file, the global pv 'only counts the files', and the other one measures the speed and amount of data transferred

ANSI escape sequences are used to keep the process view in a stable position

I use the name md5summer, make the shellscript executable and put it in a directory in PATH (my ~/bin directory, you may prefer /usr/local/bin).

#!/bin/bash



# date      sign     comment

# 20190119  sudodus  created md5summer version 1.0



if [ "$1" == "-v" ]

then

 verbose=true

 shift

else

 verbose=false

fi

if [ $# -ne 1 ]

then

 echo "Usage:    $0  [-v]  <pattern>"

 echo "Example:  $0  '*.iso'      # notice the quotes"

 echo "          $0  -v  '*.iso'  # verbose"

 exit

fi

tmpstr=$(find $1 -maxdepth 0 -type f 2> /dev/null)

if [ "$tmpstr" == "" ]

then

 echo "No such file '$1'. Try another pattern!"

 exit

fi



tmpdir=$(mktemp -d)

tmpfil="$tmpdir/fil1"

tmpfi2="$tmpdir/fil2"

resetvid="033[0m"

prev2line="033[2F"

next2line="033[2E"



sln=1

cln=0

cnt=0

for i in $1

do

 if test -f "$i"

 then

  cln=$((cln+1))

  tmp=$(find -L "$i" -printf "%s")

  cnt=$((cnt+tmp))

 fi

done

echo "

                    number of files = $cln

                    total file size = $cnt B ~ $(($cnt/2**20)) MiB

"

for i in $1

do

 if test -f "$i"

 then

  tmpnam=$(echo -n "$i")

  tmpsum=$(< "$i" pv -ptrbs "$cnt" | md5sum)

  sleep 0.05

  echo "$sln" | pv -ls "$cln" > /dev/null

  sleep 0.05

  sln="$sln

$i"

  sleep 0.05

  printf "${tmpsum/-}${tmpnam}n" >> "$tmpfil"

  echo -ne "$prev2line" > /dev/stderr

 fi

done



sync

sleep 0.1

echo -ne "$next2line" > /dev/stderr



echo "-----"

if $verbose

then

 sort -k2 "$tmpfil" | tee "$tmpfi2" | md5sum -c

 echo "-----"

 cat "$tmpfi2"

else

 sort -k2 "$tmpfil"

fi

sleep 0.5

sync

rm -r "$tmpdir"

Demo example

Usage

$ md5summer 

Usage:    /home/sudodus/bin/md5summer  [-v]  <pattern>

Example:  /home/sudodus/bin/md5summer  '*.iso'      # notice the quotes

          /home/sudodus/bin/md5summer  -v  '*.iso'  # verbose

I tested in this directory

$ ls -1a

.

..

'filename with spaces'

md5summer

md5summer1

md5summer2

subdir

.ttt

zenity-info-message.png

Normal usage plus pattern to see hidden files

$ md5summer ".* *"



                    number of files = 6

                    total file size = 12649 B ~ 0 MiB



8,32KiB 0:00:00 [ 156MiB/s] [=============================>                ] 67%

6,00  0:00:00 [ 133k/s] [====================================>] 100%            

-----

184d0995cc8b6d8070f89f15caee35ce  filename with spaces

28227139997996c7838f07cd4c630ffc  md5summer

3383b86a0753e486215280f0baf94399  md5summer1

28227139997996c7838f07cd4c630ffc  md5summer2

31cd03f64a466e680e9c22fef4bcf14b  .ttt

670b8db45e57723b5f1b8a63399cdfa1  zenity-info-message.png

Verbose output plus pattern to see hidden files

$ md5summer -v ".* *"



                    number of files = 6

                    total file size = 12649 B ~ 0 MiB



8,32KiB 0:00:00 [ 184MiB/s] [=============================>                ] 67%

6,00  0:00:00 [ 133k/s] [====================================>] 100%            

-----

filename with spaces: OK

md5summer: OK

md5summer1: OK

md5summer2: OK

.ttt: OK

zenity-info-message.png: OK

-----

184d0995cc8b6d8070f89f15caee35ce  filename with spaces

28227139997996c7838f07cd4c630ffc  md5summer

3383b86a0753e486215280f0baf94399  md5summer1

28227139997996c7838f07cd4c630ffc  md5summer2

31cd03f64a466e680e9c22fef4bcf14b  .ttt

670b8db45e57723b5f1b8a63399cdfa1  zenity-info-message.png

Redirection to a file, first the screen output

$ md5summer ".* *" > subdir/save

8,32KiB 0:00:00 [ 180MiB/s] [=============================>                ] 67%

6,00  0:00:00 [ 162k/s] [====================================>] 100%

and then the saved output

$ cat subdir/save 



                    number of files = 6

                    total file size = 12649 B ~ 0 MiB



-----

184d0995cc8b6d8070f89f15caee35ce  filename with spaces

28227139997996c7838f07cd4c630ffc  md5summer

3383b86a0753e486215280f0baf94399  md5summer1

28227139997996c7838f07cd4c630ffc  md5summer2

31cd03f64a466e680e9c22fef4bcf14b  .ttt

670b8db45e57723b5f1b8a63399cdfa1  zenity-info-message.png

Checking iso files

$ md5summer "*.iso"



                    number of files = 10

                    total file size = 7112491008 B ~ 6783 MiB



28,0MiB 0:00:00 [ 160MiB/s] [>                                             ]  0%

10,0  0:00:00 [ 204k/s] [====================================>] 100%            

-----

7a27fdd46a63ba4375896891826c1c88  debian-live-8.6.0-amd64-lxde-desktop.iso

d70eec28cdbdee7f7aa95fb53b9bfdac  debian-live-8.7.1-amd64-standard.iso

382cfbe621ca446d12871b8945b50d20  debian-live-8.8.0-amd64-standard.iso

44473dfe2ee1aad0f71506f1d5862457  debian-live-8.8.0-i386-standard.iso

f396b3532fa84059e7738c3c1827bada  debian-live-9.3.0-amd64-cinnamon.iso

8f6def28ae7cbefa0a6e59407c884466  debian-live-9.6.0-amd64-cinnamon.iso

90b1815da0a5bf4ee4b00eec2b5d3587  debian-testing-amd64-netinst_2017-07-28.iso

8f75074ab98e166b7469299d3e459ac6  mini-amd64-2016-01-21-daily.iso

e580266fba58eb34b05bf6e13f51a047  mini-jessie-32.iso

646c109a9a16c0527ce1c7afa922e2ed  mini-jessie-64.iso

edited Jan 20 at 17:04

answered Jan 20 at 3:21

sudodus

1,32016

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f495477%2fusing-pv-with-md5sum%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

pv is a "fancy cat", which is that you may use pv in most situations where you would use cat.

Using cat with md5sum, you can compute the MD5 checksum of a single file with

cat file | md5sum

or, with pv,

pv file | md5sum

Unfortunately though, this does not allow md5sum to insert the filename into its output properly.

This means that you can do things like

md5sum dir/* | sort >sums &

sleep 1

pv -d "$(pgrep -n md5sum)"

It would probably be safest to run it as

md5sum dir/* >sums &

sleep 1

pv -W -d "$!"

sort -o sums sums

The -W option will cause pv to wait until there's actual data being transferred, although this does also not always seem to work reliably.

edited Jan 19 at 17:32

answered Jan 19 at 17:14

Kusalananda

126k16239393

The need for sleep is somewhat surprising!

– Stephen Kitt
Jan 19 at 17:43

@StephenKitt That's the only way I could get it to behave somewhat predictably. I don't really know what it does, but whatever it is, it seems to be possible for it to do it "too early", and the progress meter won't show at all.

– Kusalananda
Jan 19 at 17:45

add a comment |

pv is a "fancy cat", which is that you may use pv in most situations where you would use cat.

Using cat with md5sum, you can compute the MD5 checksum of a single file with

cat file | md5sum

or, with pv,

pv file | md5sum

Unfortunately though, this does not allow md5sum to insert the filename into its output properly.

This means that you can do things like

md5sum dir/* | sort >sums &

sleep 1

pv -d "$(pgrep -n md5sum)"

It would probably be safest to run it as

md5sum dir/* >sums &

sleep 1

pv -W -d "$!"

sort -o sums sums

The -W option will cause pv to wait until there's actual data being transferred, although this does also not always seem to work reliably.

edited Jan 19 at 17:32

answered Jan 19 at 17:14

Kusalananda

126k16239393

The need for sleep is somewhat surprising!

– Stephen Kitt
Jan 19 at 17:43

@StephenKitt That's the only way I could get it to behave somewhat predictably. I don't really know what it does, but whatever it is, it seems to be possible for it to do it "too early", and the progress meter won't show at all.

– Kusalananda
Jan 19 at 17:45

add a comment |

pv is a "fancy cat", which is that you may use pv in most situations where you would use cat.

Using cat with md5sum, you can compute the MD5 checksum of a single file with

cat file | md5sum

or, with pv,

pv file | md5sum

Unfortunately though, this does not allow md5sum to insert the filename into its output properly.

This means that you can do things like

md5sum dir/* | sort >sums &

sleep 1

pv -d "$(pgrep -n md5sum)"

It would probably be safest to run it as

md5sum dir/* >sums &

sleep 1

pv -W -d "$!"

sort -o sums sums

The -W option will cause pv to wait until there's actual data being transferred, although this does also not always seem to work reliably.

edited Jan 19 at 17:32

answered Jan 19 at 17:14

Kusalananda

126k16239393

pv is a "fancy cat", which is that you may use pv in most situations where you would use cat.

Using cat with md5sum, you can compute the MD5 checksum of a single file with

cat file | md5sum

or, with pv,

pv file | md5sum

Unfortunately though, this does not allow md5sum to insert the filename into its output properly.

This means that you can do things like

md5sum dir/* | sort >sums &

sleep 1

pv -d "$(pgrep -n md5sum)"

It would probably be safest to run it as

md5sum dir/* >sums &

sleep 1

pv -W -d "$!"

sort -o sums sums

The -W option will cause pv to wait until there's actual data being transferred, although this does also not always seem to work reliably.

edited Jan 19 at 17:32

answered Jan 19 at 17:14

Kusalananda

126k16239393

edited Jan 19 at 17:32

answered Jan 19 at 17:14

Kusalananda

126k16239393

answered Jan 19 at 17:14

Kusalananda

126k16239393

answered Jan 19 at 17:14

Kusalananda

126k16239393

The need for sleep is somewhat surprising!

– Stephen Kitt
Jan 19 at 17:43

@StephenKitt That's the only way I could get it to behave somewhat predictably. I don't really know what it does, but whatever it is, it seems to be possible for it to do it "too early", and the progress meter won't show at all.

– Kusalananda
Jan 19 at 17:45

add a comment |

The need for sleep is somewhat surprising!

– Stephen Kitt
Jan 19 at 17:43

@StephenKitt That's the only way I could get it to behave somewhat predictably. I don't really know what it does, but whatever it is, it seems to be possible for it to do it "too early", and the progress meter won't show at all.

– Kusalananda
Jan 19 at 17:45

The need for sleep is somewhat surprising!

– Stephen Kitt
Jan 19 at 17:43

@StephenKitt That's the only way I could get it to behave somewhat predictably. I don't really know what it does, but whatever it is, it seems to be possible for it to do it "too early", and the progress meter won't show at all.

– Kusalananda
Jan 19 at 17:45

add a comment |

set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort

os=$(( $( ls -1 | wc -c ) + $( ls -1 | wc -l ) * 34 ))

md5sum * | pv -s $os | sort

edited Jan 20 at 9:14

answered Jan 19 at 17:48

ozzy

6955

2

It's a very nice idea to calculate progress based on md5sum output (even though the progress would indicate how many files are still left, and not - how large or how long that would take). However it should not require parsing ls. pv supports --line-mode so set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort might be equivalent and still work if you replace md5sum with sha512sum or otherwise.

– frostschutz
Jan 19 at 19:39

@frostschutz You are undeniably right :-) This is a nicer, cleaner solution.

– ozzy
Jan 19 at 20:48

add a comment |

set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort

os=$(( $( ls -1 | wc -c ) + $( ls -1 | wc -l ) * 34 ))

md5sum * | pv -s $os | sort

edited Jan 20 at 9:14

answered Jan 19 at 17:48

ozzy

6955

2

It's a very nice idea to calculate progress based on md5sum output (even though the progress would indicate how many files are still left, and not - how large or how long that would take). However it should not require parsing ls. pv supports --line-mode so set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort might be equivalent and still work if you replace md5sum with sha512sum or otherwise.

– frostschutz
Jan 19 at 19:39

@frostschutz You are undeniably right :-) This is a nicer, cleaner solution.

– ozzy
Jan 19 at 20:48

add a comment |

set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort

os=$(( $( ls -1 | wc -c ) + $( ls -1 | wc -l ) * 34 ))

md5sum * | pv -s $os | sort

edited Jan 20 at 9:14

answered Jan 19 at 17:48

ozzy

6955

set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort

os=$(( $( ls -1 | wc -c ) + $( ls -1 | wc -l ) * 34 ))

md5sum * | pv -s $os | sort

edited Jan 20 at 9:14

answered Jan 19 at 17:48

ozzy

6955

edited Jan 20 at 9:14

answered Jan 19 at 17:48

ozzy

6955

answered Jan 19 at 17:48

ozzy

6955

answered Jan 19 at 17:48

ozzy

6955

2

It's a very nice idea to calculate progress based on md5sum output (even though the progress would indicate how many files are still left, and not - how large or how long that would take). However it should not require parsing ls. pv supports --line-mode so set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort might be equivalent and still work if you replace md5sum with sha512sum or otherwise.

– frostschutz
Jan 19 at 19:39

@frostschutz You are undeniably right :-) This is a nicer, cleaner solution.

– ozzy
Jan 19 at 20:48

add a comment |

2

It's a very nice idea to calculate progress based on md5sum output (even though the progress would indicate how many files are still left, and not - how large or how long that would take). However it should not require parsing ls. pv supports --line-mode so set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort might be equivalent and still work if you replace md5sum with sha512sum or otherwise.

– frostschutz
Jan 19 at 19:39

@frostschutz You are undeniably right :-) This is a nicer, cleaner solution.

– ozzy
Jan 19 at 20:48

It's a very nice idea to calculate progress based on md5sum output (even though the progress would indicate how many files are still left, and not - how large or how long that would take). However it should not require parsing ls. pv supports --line-mode so set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort might be equivalent and still work if you replace md5sum with sha512sum or otherwise.

– frostschutz
Jan 19 at 19:39

@frostschutz You are undeniably right :-) This is a nicer, cleaner solution.

– ozzy
Jan 19 at 20:48

add a comment |

Here's a dirty hack to get progress per file:

for f in iso/*

do

    pv "$f" | (

        cat > /dev/null &

        md5sum "$f"

        wait

    )

done

What it looks like:

4.15GiB 0:00:32 [ 130MiB/s] [================================>] 100%            

0db0b36fc7bad7b50835f68c369e854c  iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso

 792MiB 0:00:06 [ 130MiB/s] [================================>] 100%            

97537db63e61d20a5cb71d29145b2937  iso/archlinux-2016.10.01-dual.iso

 843MiB 0:00:06 [ 129MiB/s] [================================>] 100%            

1b5dc31e038499b8409f7d4d720e3eba  iso/lubuntu-16.04-desktop-i386.iso

 259MiB 0:00:02 [ 130MiB/s] [=========>                        ] 30% ETA 0:00:04

...

The nice thing about such a dirty, dirty hack is that you can easily adapt it to make a progress bar across all the data, not just one file. And still do weird stuff like sort the output afterwards.

pv iso/* | (

    cat > /dev/null &

    md5sum iso/* | sort

    wait

)

What it looks like (ongoing):

15.0GiB 0:01:47 [ 131MiB/s] [===========================>      ] 83% ETA 0:00:21

What it looks like (finished):

18.0GiB 0:02:11 [ 140MiB/s] [================================>] 100%            

0db0b36fc7bad7b50835f68c369e854c  iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso

155603390e65f2a8341328be3cb63875  iso/systemrescuecd-x86-4.2.0.iso

1b5dc31e038499b8409f7d4d720e3eba  iso/lubuntu-16.04-desktop-i386.iso

1b6ed6ff8d399f53adadfafb20fb0d71  iso/systemrescuecd-x86-4.4.1.iso

25715326d7096c50f7ea126ac20eabfd  iso/openSUSE-13.2-KDE-Live-i686.iso

...

Now, that's for the hacks. Check other answers for proper solutions. ;-)

answered Jan 19 at 18:47

frostschutz

26.5k15483

Reading the data would be required to hashing it, so hashing the data is almost certainly slower than just reading it. Also, you force reading the data twice for each file (once with pv and once with md5sum).

– Kusalananda
Jan 19 at 18:52

@Kusalananda ...and that's why I called it a hack! pv < /dev/zero | md5sum -> 637MiB/s on my machine. The ISO files I tested with were on a USB3 stick, 140MiB/s is about the max read speed. So md5sum can hash data faster than read it. Yes it reads data twice, but that's pedantic - thanks to OS caching, the USB stick still reads it only once; and running this hack is (in my case) not any slower than running md5sum by itself without progress bar.

– frostschutz
Jan 19 at 19:07

...and yes, it can go wrong. If you were to do this on a slow machine with a very fast storage medium, pv would run way ahead of md5sum and data would probably end up being read twice for real, or in any case, the progress bar would not be in sync with md5sum at all.

– frostschutz
Jan 19 at 19:12

add a comment |

Here's a dirty hack to get progress per file:

for f in iso/*

do

    pv "$f" | (

        cat > /dev/null &

        md5sum "$f"

        wait

    )

done

What it looks like:

4.15GiB 0:00:32 [ 130MiB/s] [================================>] 100%            

0db0b36fc7bad7b50835f68c369e854c  iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso

 792MiB 0:00:06 [ 130MiB/s] [================================>] 100%            

97537db63e61d20a5cb71d29145b2937  iso/archlinux-2016.10.01-dual.iso

 843MiB 0:00:06 [ 129MiB/s] [================================>] 100%            

1b5dc31e038499b8409f7d4d720e3eba  iso/lubuntu-16.04-desktop-i386.iso

 259MiB 0:00:02 [ 130MiB/s] [=========>                        ] 30% ETA 0:00:04

...

The nice thing about such a dirty, dirty hack is that you can easily adapt it to make a progress bar across all the data, not just one file. And still do weird stuff like sort the output afterwards.

pv iso/* | (

    cat > /dev/null &

    md5sum iso/* | sort

    wait

)

What it looks like (ongoing):

15.0GiB 0:01:47 [ 131MiB/s] [===========================>      ] 83% ETA 0:00:21

What it looks like (finished):

18.0GiB 0:02:11 [ 140MiB/s] [================================>] 100%            

0db0b36fc7bad7b50835f68c369e854c  iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso

155603390e65f2a8341328be3cb63875  iso/systemrescuecd-x86-4.2.0.iso

1b5dc31e038499b8409f7d4d720e3eba  iso/lubuntu-16.04-desktop-i386.iso

1b6ed6ff8d399f53adadfafb20fb0d71  iso/systemrescuecd-x86-4.4.1.iso

25715326d7096c50f7ea126ac20eabfd  iso/openSUSE-13.2-KDE-Live-i686.iso

...

Now, that's for the hacks. Check other answers for proper solutions. ;-)

answered Jan 19 at 18:47

frostschutz

26.5k15483

Reading the data would be required to hashing it, so hashing the data is almost certainly slower than just reading it. Also, you force reading the data twice for each file (once with pv and once with md5sum).

– Kusalananda
Jan 19 at 18:52

@Kusalananda ...and that's why I called it a hack! pv < /dev/zero | md5sum -> 637MiB/s on my machine. The ISO files I tested with were on a USB3 stick, 140MiB/s is about the max read speed. So md5sum can hash data faster than read it. Yes it reads data twice, but that's pedantic - thanks to OS caching, the USB stick still reads it only once; and running this hack is (in my case) not any slower than running md5sum by itself without progress bar.

– frostschutz
Jan 19 at 19:07

...and yes, it can go wrong. If you were to do this on a slow machine with a very fast storage medium, pv would run way ahead of md5sum and data would probably end up being read twice for real, or in any case, the progress bar would not be in sync with md5sum at all.

– frostschutz
Jan 19 at 19:12

add a comment |

Here's a dirty hack to get progress per file:

for f in iso/*

do

    pv "$f" | (

        cat > /dev/null &

        md5sum "$f"

        wait

    )

done

What it looks like:

4.15GiB 0:00:32 [ 130MiB/s] [================================>] 100%            

0db0b36fc7bad7b50835f68c369e854c  iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso

 792MiB 0:00:06 [ 130MiB/s] [================================>] 100%            

97537db63e61d20a5cb71d29145b2937  iso/archlinux-2016.10.01-dual.iso

 843MiB 0:00:06 [ 129MiB/s] [================================>] 100%            

1b5dc31e038499b8409f7d4d720e3eba  iso/lubuntu-16.04-desktop-i386.iso

 259MiB 0:00:02 [ 130MiB/s] [=========>                        ] 30% ETA 0:00:04

...

The nice thing about such a dirty, dirty hack is that you can easily adapt it to make a progress bar across all the data, not just one file. And still do weird stuff like sort the output afterwards.

pv iso/* | (

    cat > /dev/null &

    md5sum iso/* | sort

    wait

)

What it looks like (ongoing):

15.0GiB 0:01:47 [ 131MiB/s] [===========================>      ] 83% ETA 0:00:21

What it looks like (finished):

18.0GiB 0:02:11 [ 140MiB/s] [================================>] 100%            

0db0b36fc7bad7b50835f68c369e854c  iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso

155603390e65f2a8341328be3cb63875  iso/systemrescuecd-x86-4.2.0.iso

1b5dc31e038499b8409f7d4d720e3eba  iso/lubuntu-16.04-desktop-i386.iso

1b6ed6ff8d399f53adadfafb20fb0d71  iso/systemrescuecd-x86-4.4.1.iso

25715326d7096c50f7ea126ac20eabfd  iso/openSUSE-13.2-KDE-Live-i686.iso

...

Now, that's for the hacks. Check other answers for proper solutions. ;-)

answered Jan 19 at 18:47

frostschutz

26.5k15483

Here's a dirty hack to get progress per file:

for f in iso/*

do

    pv "$f" | (

        cat > /dev/null &

        md5sum "$f"

        wait

    )

done

What it looks like:

4.15GiB 0:00:32 [ 130MiB/s] [================================>] 100%            

0db0b36fc7bad7b50835f68c369e854c  iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso

 792MiB 0:00:06 [ 130MiB/s] [================================>] 100%            

97537db63e61d20a5cb71d29145b2937  iso/archlinux-2016.10.01-dual.iso

 843MiB 0:00:06 [ 129MiB/s] [================================>] 100%            

1b5dc31e038499b8409f7d4d720e3eba  iso/lubuntu-16.04-desktop-i386.iso

 259MiB 0:00:02 [ 130MiB/s] [=========>                        ] 30% ETA 0:00:04

...

The nice thing about such a dirty, dirty hack is that you can easily adapt it to make a progress bar across all the data, not just one file. And still do weird stuff like sort the output afterwards.

pv iso/* | (

    cat > /dev/null &

    md5sum iso/* | sort

    wait

)

What it looks like (ongoing):

15.0GiB 0:01:47 [ 131MiB/s] [===========================>      ] 83% ETA 0:00:21

What it looks like (finished):

18.0GiB 0:02:11 [ 140MiB/s] [================================>] 100%            

0db0b36fc7bad7b50835f68c369e854c  iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso

155603390e65f2a8341328be3cb63875  iso/systemrescuecd-x86-4.2.0.iso

1b5dc31e038499b8409f7d4d720e3eba  iso/lubuntu-16.04-desktop-i386.iso

1b6ed6ff8d399f53adadfafb20fb0d71  iso/systemrescuecd-x86-4.4.1.iso

25715326d7096c50f7ea126ac20eabfd  iso/openSUSE-13.2-KDE-Live-i686.iso

...

Now, that's for the hacks. Check other answers for proper solutions. ;-)

answered Jan 19 at 18:47

frostschutz

26.5k15483

answered Jan 19 at 18:47

frostschutz

26.5k15483

answered Jan 19 at 18:47

frostschutz

26.5k15483

answered Jan 19 at 18:47

frostschutz

26.5k15483

Reading the data would be required to hashing it, so hashing the data is almost certainly slower than just reading it. Also, you force reading the data twice for each file (once with pv and once with md5sum).

– Kusalananda
Jan 19 at 18:52

@Kusalananda ...and that's why I called it a hack! pv < /dev/zero | md5sum -> 637MiB/s on my machine. The ISO files I tested with were on a USB3 stick, 140MiB/s is about the max read speed. So md5sum can hash data faster than read it. Yes it reads data twice, but that's pedantic - thanks to OS caching, the USB stick still reads it only once; and running this hack is (in my case) not any slower than running md5sum by itself without progress bar.

– frostschutz
Jan 19 at 19:07

...and yes, it can go wrong. If you were to do this on a slow machine with a very fast storage medium, pv would run way ahead of md5sum and data would probably end up being read twice for real, or in any case, the progress bar would not be in sync with md5sum at all.

– frostschutz
Jan 19 at 19:12

add a comment |

Reading the data would be required to hashing it, so hashing the data is almost certainly slower than just reading it. Also, you force reading the data twice for each file (once with pv and once with md5sum).

– Kusalananda
Jan 19 at 18:52

@Kusalananda ...and that's why I called it a hack! pv < /dev/zero | md5sum -> 637MiB/s on my machine. The ISO files I tested with were on a USB3 stick, 140MiB/s is about the max read speed. So md5sum can hash data faster than read it. Yes it reads data twice, but that's pedantic - thanks to OS caching, the USB stick still reads it only once; and running this hack is (in my case) not any slower than running md5sum by itself without progress bar.

– frostschutz
Jan 19 at 19:07

...and yes, it can go wrong. If you were to do this on a slow machine with a very fast storage medium, pv would run way ahead of md5sum and data would probably end up being read twice for real, or in any case, the progress bar would not be in sync with md5sum at all.

– frostschutz
Jan 19 at 19:12

Reading the data would be required to hashing it, so hashing the data is almost certainly slower than just reading it. Also, you force reading the data twice for each file (once with pv and once with md5sum).

– Kusalananda
Jan 19 at 18:52

@Kusalananda ...and that's why I called it a hack! pv < /dev/zero | md5sum -> 637MiB/s on my machine. The ISO files I tested with were on a USB3 stick, 140MiB/s is about the max read speed. So md5sum can hash data faster than read it. Yes it reads data twice, but that's pedantic - thanks to OS caching, the USB stick still reads it only once; and running this hack is (in my case) not any slower than running md5sum by itself without progress bar.

– frostschutz
Jan 19 at 19:07

...and yes, it can go wrong. If you were to do this on a slow machine with a very fast storage medium, pv would run way ahead of md5sum and data would probably end up being read twice for real, or in any case, the progress bar would not be in sync with md5sum at all.

– frostschutz
Jan 19 at 19:12

add a comment |

As already pointed out in comments and other answers:

You are piping into pv only md5sum's output: checksums and file names; thus, pv's progress bar is not able to show how much data md5sum is reading.

A size of 4 GB will be of course too much for that. Also, providing pv with the size of the file(s) you are piping into it (manually, with -s) is inconvenient.

Piping the content of your files into pv and then into md5sum will give you a progress bar, but file names would be lost.

This code is a not so elegant way to have both—a meaningful progress bar and file names with checksums:

#!/bin/sh



for file in "$@"; do

    pv -- "$file" |

    md5sum |

    sed 's/-$//' |

    printf '%s%sn' "$(cat -)" "$file"

done

The script is meant to be invoked as:

./script dir/*

You can of course declare it as a function, to avoid having to type its path to call it (or adding it to your PATH):

function pvsum () {

    for file in "$@"; do

        pv -- "$file" | md5sum | sed 's/-$//' | printf '%s%sn' "$(cat -)" "$file"

    done

}

This way, the command pvsum dir/* | sort will be equivalent to your md5sum dir/* | pv -s <size> | sort.

Its output:

$ ./testscript testdir/*

4.00GiB 0:00:09 [ 446MiB/s] [==============================>] 100%            

9dab5f8add1f699bca108f99e5fa5342  testdir/file1

1.00GiB 0:00:02 [ 447MiB/s] [==============================>] 100%            

06a738a71e3fd3119922bdac259fe29a  testdir/file2

What it does:

It loops over the given files and, for each:
- Pipes the file from pv into md5sum, showing the default progress bar.
- sed is used to remove the - printed by md5sum (which is reading from standard input); this also attempts to make the output suitable for being consumed by md5sum -c (thanks to frostschutz for pointing out this out)¹.
- Prints the checksum followed by the file name on standard output.

About sort:

Assuming a recent version of md5sum, an attempt at solving this issue could be:

for file in "$@"; do

    pv -- "$file" |

    md5sum |

    sed 's/-$//' |

    printf '%s%sn' "$(cat -)" "$file" |

    sed '$! N; s/\/\\/g; s/n/\n/g; t x; q; :x s/^/\/;'

done

Where the only addition, the final sed, will:

Put the whole input (checksum and file name) in pattern space, since it may contain newlines: $! matches any line except for the last one; N; appends a newline and the next line to the pattern space.

Escape with a backslash () any backslash found.

Replace with n any newline found.

Only if at least a backslash or newline has been replaced (t x;: branch to label x), a backslash is added at the beginning of the checksum to signal md5sum -c that something has to be unescaped; otherwise just quit.

edited Jan 20 at 18:15

answered Jan 19 at 18:23

fra-san

1,3971215

add a comment |

As already pointed out in comments and other answers:

You are piping into pv only md5sum's output: checksums and file names; thus, pv's progress bar is not able to show how much data md5sum is reading.

A size of 4 GB will be of course too much for that. Also, providing pv with the size of the file(s) you are piping into it (manually, with -s) is inconvenient.

Piping the content of your files into pv and then into md5sum will give you a progress bar, but file names would be lost.

This code is a not so elegant way to have both—a meaningful progress bar and file names with checksums:

#!/bin/sh



for file in "$@"; do

    pv -- "$file" |

    md5sum |

    sed 's/-$//' |

    printf '%s%sn' "$(cat -)" "$file"

done

The script is meant to be invoked as:

./script dir/*

You can of course declare it as a function, to avoid having to type its path to call it (or adding it to your PATH):

function pvsum () {

    for file in "$@"; do

        pv -- "$file" | md5sum | sed 's/-$//' | printf '%s%sn' "$(cat -)" "$file"

    done

}

This way, the command pvsum dir/* | sort will be equivalent to your md5sum dir/* | pv -s <size> | sort.

Its output:

$ ./testscript testdir/*

4.00GiB 0:00:09 [ 446MiB/s] [==============================>] 100%            

9dab5f8add1f699bca108f99e5fa5342  testdir/file1

1.00GiB 0:00:02 [ 447MiB/s] [==============================>] 100%            

06a738a71e3fd3119922bdac259fe29a  testdir/file2

What it does:

It loops over the given files and, for each:
- Pipes the file from pv into md5sum, showing the default progress bar.
- sed is used to remove the - printed by md5sum (which is reading from standard input); this also attempts to make the output suitable for being consumed by md5sum -c (thanks to frostschutz for pointing out this out)¹.
- Prints the checksum followed by the file name on standard output.

About sort:

Assuming a recent version of md5sum, an attempt at solving this issue could be:

for file in "$@"; do

    pv -- "$file" |

    md5sum |

    sed 's/-$//' |

    printf '%s%sn' "$(cat -)" "$file" |

    sed '$! N; s/\/\\/g; s/n/\n/g; t x; q; :x s/^/\/;'

done

Where the only addition, the final sed, will:

Put the whole input (checksum and file name) in pattern space, since it may contain newlines: $! matches any line except for the last one; N; appends a newline and the next line to the pattern space.

Escape with a backslash () any backslash found.

Replace with n any newline found.

Only if at least a backslash or newline has been replaced (t x;: branch to label x), a backslash is added at the beginning of the checksum to signal md5sum -c that something has to be unescaped; otherwise just quit.

edited Jan 20 at 18:15

answered Jan 19 at 18:23

fra-san

1,3971215

add a comment |

As already pointed out in comments and other answers:

You are piping into pv only md5sum's output: checksums and file names; thus, pv's progress bar is not able to show how much data md5sum is reading.

A size of 4 GB will be of course too much for that. Also, providing pv with the size of the file(s) you are piping into it (manually, with -s) is inconvenient.

Piping the content of your files into pv and then into md5sum will give you a progress bar, but file names would be lost.

This code is a not so elegant way to have both—a meaningful progress bar and file names with checksums:

#!/bin/sh



for file in "$@"; do

    pv -- "$file" |

    md5sum |

    sed 's/-$//' |

    printf '%s%sn' "$(cat -)" "$file"

done

The script is meant to be invoked as:

./script dir/*

You can of course declare it as a function, to avoid having to type its path to call it (or adding it to your PATH):

function pvsum () {

    for file in "$@"; do

        pv -- "$file" | md5sum | sed 's/-$//' | printf '%s%sn' "$(cat -)" "$file"

    done

}

This way, the command pvsum dir/* | sort will be equivalent to your md5sum dir/* | pv -s <size> | sort.

Its output:

$ ./testscript testdir/*

4.00GiB 0:00:09 [ 446MiB/s] [==============================>] 100%            

9dab5f8add1f699bca108f99e5fa5342  testdir/file1

1.00GiB 0:00:02 [ 447MiB/s] [==============================>] 100%            

06a738a71e3fd3119922bdac259fe29a  testdir/file2

What it does:

It loops over the given files and, for each:
- Pipes the file from pv into md5sum, showing the default progress bar.
- sed is used to remove the - printed by md5sum (which is reading from standard input); this also attempts to make the output suitable for being consumed by md5sum -c (thanks to frostschutz for pointing out this out)¹.
- Prints the checksum followed by the file name on standard output.

About sort:

Assuming a recent version of md5sum, an attempt at solving this issue could be:

for file in "$@"; do

    pv -- "$file" |

    md5sum |

    sed 's/-$//' |

    printf '%s%sn' "$(cat -)" "$file" |

    sed '$! N; s/\/\\/g; s/n/\n/g; t x; q; :x s/^/\/;'

done

Where the only addition, the final sed, will:

Put the whole input (checksum and file name) in pattern space, since it may contain newlines: $! matches any line except for the last one; N; appends a newline and the next line to the pattern space.

Escape with a backslash () any backslash found.

Replace with n any newline found.

Only if at least a backslash or newline has been replaced (t x;: branch to label x), a backslash is added at the beginning of the checksum to signal md5sum -c that something has to be unescaped; otherwise just quit.

edited Jan 20 at 18:15

answered Jan 19 at 18:23

fra-san

1,3971215

As already pointed out in comments and other answers:

You are piping into pv only md5sum's output: checksums and file names; thus, pv's progress bar is not able to show how much data md5sum is reading.

A size of 4 GB will be of course too much for that. Also, providing pv with the size of the file(s) you are piping into it (manually, with -s) is inconvenient.

Piping the content of your files into pv and then into md5sum will give you a progress bar, but file names would be lost.

This code is a not so elegant way to have both—a meaningful progress bar and file names with checksums:

#!/bin/sh



for file in "$@"; do

    pv -- "$file" |

    md5sum |

    sed 's/-$//' |

    printf '%s%sn' "$(cat -)" "$file"

done

The script is meant to be invoked as:

./script dir/*

You can of course declare it as a function, to avoid having to type its path to call it (or adding it to your PATH):

function pvsum () {

    for file in "$@"; do

        pv -- "$file" | md5sum | sed 's/-$//' | printf '%s%sn' "$(cat -)" "$file"

    done

}

This way, the command pvsum dir/* | sort will be equivalent to your md5sum dir/* | pv -s <size> | sort.

Its output:

$ ./testscript testdir/*

4.00GiB 0:00:09 [ 446MiB/s] [==============================>] 100%            

9dab5f8add1f699bca108f99e5fa5342  testdir/file1

1.00GiB 0:00:02 [ 447MiB/s] [==============================>] 100%            

06a738a71e3fd3119922bdac259fe29a  testdir/file2

What it does:

It loops over the given files and, for each:
- Pipes the file from pv into md5sum, showing the default progress bar.
- sed is used to remove the - printed by md5sum (which is reading from standard input); this also attempts to make the output suitable for being consumed by md5sum -c (thanks to frostschutz for pointing out this out)¹.
- Prints the checksum followed by the file name on standard output.

About sort:

Assuming a recent version of md5sum, an attempt at solving this issue could be:

for file in "$@"; do

    pv -- "$file" |

    md5sum |

    sed 's/-$//' |

    printf '%s%sn' "$(cat -)" "$file" |

    sed '$! N; s/\/\\/g; s/n/\n/g; t x; q; :x s/^/\/;'

done

Where the only addition, the final sed, will:

Put the whole input (checksum and file name) in pattern space, since it may contain newlines: $! matches any line except for the last one; N; appends a newline and the next line to the pattern space.

Escape with a backslash () any backslash found.

Replace with n any newline found.

Only if at least a backslash or newline has been replaced (t x;: branch to label x), a backslash is added at the beginning of the checksum to signal md5sum -c that something has to be unescaped; otherwise just quit.

edited Jan 20 at 18:15

answered Jan 19 at 18:23

fra-san

1,3971215

edited Jan 20 at 18:15

answered Jan 19 at 18:23

fra-san

1,3971215

answered Jan 19 at 18:23

fra-san

1,3971215

answered Jan 19 at 18:23

fra-san

1,3971215

add a comment |

I have also enjoyed taming the 'fancy cat', `pv`, for `md5sum` :-)

I think my shellscript is rather stable now

There is a usage output, if you do not enter the pattern correctly.

It works with wild cards, but does not recurse into subdirectories

You can enter more than one pattern, for example ".* *"

There is a verbosity switch that turns on checking the md5sums ... OK

You can redirect the relevant output into a file; the process view output of pv will stay on the {screen/terminal window}

There are two pv processes in a for loop, one global and one for each file, the global pv 'only counts the files', and the other one measures the speed and amount of data transferred

ANSI escape sequences are used to keep the process view in a stable position

I use the name md5summer, make the shellscript executable and put it in a directory in PATH (my ~/bin directory, you may prefer /usr/local/bin).

#!/bin/bash



# date      sign     comment

# 20190119  sudodus  created md5summer version 1.0



if [ "$1" == "-v" ]

then

 verbose=true

 shift

else

 verbose=false

fi

if [ $# -ne 1 ]

then

 echo "Usage:    $0  [-v]  <pattern>"

 echo "Example:  $0  '*.iso'      # notice the quotes"

 echo "          $0  -v  '*.iso'  # verbose"

 exit

fi

tmpstr=$(find $1 -maxdepth 0 -type f 2> /dev/null)

if [ "$tmpstr" == "" ]

then

 echo "No such file '$1'. Try another pattern!"

 exit

fi



tmpdir=$(mktemp -d)

tmpfil="$tmpdir/fil1"

tmpfi2="$tmpdir/fil2"

resetvid="033[0m"

prev2line="033[2F"

next2line="033[2E"



sln=1

cln=0

cnt=0

for i in $1

do

 if test -f "$i"

 then

  cln=$((cln+1))

  tmp=$(find -L "$i" -printf "%s")

  cnt=$((cnt+tmp))

 fi

done

echo "

                    number of files = $cln

                    total file size = $cnt B ~ $(($cnt/2**20)) MiB

"

for i in $1

do

 if test -f "$i"

 then

  tmpnam=$(echo -n "$i")

  tmpsum=$(< "$i" pv -ptrbs "$cnt" | md5sum)

  sleep 0.05

  echo "$sln" | pv -ls "$cln" > /dev/null

  sleep 0.05

  sln="$sln

$i"

  sleep 0.05

  printf "${tmpsum/-}${tmpnam}n" >> "$tmpfil"

  echo -ne "$prev2line" > /dev/stderr

 fi

done



sync

sleep 0.1

echo -ne "$next2line" > /dev/stderr



echo "-----"

if $verbose

then

 sort -k2 "$tmpfil" | tee "$tmpfi2" | md5sum -c

 echo "-----"

 cat "$tmpfi2"

else

 sort -k2 "$tmpfil"

fi

sleep 0.5

sync

rm -r "$tmpdir"

Demo example

Usage

$ md5summer 

Usage:    /home/sudodus/bin/md5summer  [-v]  <pattern>

Example:  /home/sudodus/bin/md5summer  '*.iso'      # notice the quotes

          /home/sudodus/bin/md5summer  -v  '*.iso'  # verbose

I tested in this directory

$ ls -1a

.

..

'filename with spaces'

md5summer

md5summer1

md5summer2

subdir

.ttt

zenity-info-message.png

Normal usage plus pattern to see hidden files

$ md5summer ".* *"



                    number of files = 6

                    total file size = 12649 B ~ 0 MiB



8,32KiB 0:00:00 [ 156MiB/s] [=============================>                ] 67%

6,00  0:00:00 [ 133k/s] [====================================>] 100%            

-----

184d0995cc8b6d8070f89f15caee35ce  filename with spaces

28227139997996c7838f07cd4c630ffc  md5summer

3383b86a0753e486215280f0baf94399  md5summer1

28227139997996c7838f07cd4c630ffc  md5summer2

31cd03f64a466e680e9c22fef4bcf14b  .ttt

670b8db45e57723b5f1b8a63399cdfa1  zenity-info-message.png

Verbose output plus pattern to see hidden files

$ md5summer -v ".* *"



                    number of files = 6

                    total file size = 12649 B ~ 0 MiB



8,32KiB 0:00:00 [ 184MiB/s] [=============================>                ] 67%

6,00  0:00:00 [ 133k/s] [====================================>] 100%            

-----

filename with spaces: OK

md5summer: OK

md5summer1: OK

md5summer2: OK

.ttt: OK

zenity-info-message.png: OK

-----

184d0995cc8b6d8070f89f15caee35ce  filename with spaces

28227139997996c7838f07cd4c630ffc  md5summer

3383b86a0753e486215280f0baf94399  md5summer1

28227139997996c7838f07cd4c630ffc  md5summer2

31cd03f64a466e680e9c22fef4bcf14b  .ttt

670b8db45e57723b5f1b8a63399cdfa1  zenity-info-message.png

Redirection to a file, first the screen output

$ md5summer ".* *" > subdir/save

8,32KiB 0:00:00 [ 180MiB/s] [=============================>                ] 67%

6,00  0:00:00 [ 162k/s] [====================================>] 100%

and then the saved output

$ cat subdir/save 



                    number of files = 6

                    total file size = 12649 B ~ 0 MiB



-----

184d0995cc8b6d8070f89f15caee35ce  filename with spaces

28227139997996c7838f07cd4c630ffc  md5summer

3383b86a0753e486215280f0baf94399  md5summer1

28227139997996c7838f07cd4c630ffc  md5summer2

31cd03f64a466e680e9c22fef4bcf14b  .ttt

670b8db45e57723b5f1b8a63399cdfa1  zenity-info-message.png

Checking iso files

$ md5summer "*.iso"



                    number of files = 10

                    total file size = 7112491008 B ~ 6783 MiB



28,0MiB 0:00:00 [ 160MiB/s] [>                                             ]  0%

10,0  0:00:00 [ 204k/s] [====================================>] 100%            

-----

7a27fdd46a63ba4375896891826c1c88  debian-live-8.6.0-amd64-lxde-desktop.iso

d70eec28cdbdee7f7aa95fb53b9bfdac  debian-live-8.7.1-amd64-standard.iso

382cfbe621ca446d12871b8945b50d20  debian-live-8.8.0-amd64-standard.iso

44473dfe2ee1aad0f71506f1d5862457  debian-live-8.8.0-i386-standard.iso

f396b3532fa84059e7738c3c1827bada  debian-live-9.3.0-amd64-cinnamon.iso

8f6def28ae7cbefa0a6e59407c884466  debian-live-9.6.0-amd64-cinnamon.iso

90b1815da0a5bf4ee4b00eec2b5d3587  debian-testing-amd64-netinst_2017-07-28.iso

8f75074ab98e166b7469299d3e459ac6  mini-amd64-2016-01-21-daily.iso

e580266fba58eb34b05bf6e13f51a047  mini-jessie-32.iso

646c109a9a16c0527ce1c7afa922e2ed  mini-jessie-64.iso

edited Jan 20 at 17:04

answered Jan 20 at 3:21

sudodus

1,32016

add a comment |

I have also enjoyed taming the 'fancy cat', `pv`, for `md5sum` :-)

I think my shellscript is rather stable now

There is a usage output, if you do not enter the pattern correctly.

It works with wild cards, but does not recurse into subdirectories

You can enter more than one pattern, for example ".* *"

There is a verbosity switch that turns on checking the md5sums ... OK

You can redirect the relevant output into a file; the process view output of pv will stay on the {screen/terminal window}

There are two pv processes in a for loop, one global and one for each file, the global pv 'only counts the files', and the other one measures the speed and amount of data transferred

ANSI escape sequences are used to keep the process view in a stable position

I use the name md5summer, make the shellscript executable and put it in a directory in PATH (my ~/bin directory, you may prefer /usr/local/bin).

#!/bin/bash



# date      sign     comment

# 20190119  sudodus  created md5summer version 1.0



if [ "$1" == "-v" ]

then

 verbose=true

 shift

else

 verbose=false

fi

if [ $# -ne 1 ]

then

 echo "Usage:    $0  [-v]  <pattern>"

 echo "Example:  $0  '*.iso'      # notice the quotes"

 echo "          $0  -v  '*.iso'  # verbose"

 exit

fi

tmpstr=$(find $1 -maxdepth 0 -type f 2> /dev/null)

if [ "$tmpstr" == "" ]

then

 echo "No such file '$1'. Try another pattern!"

 exit

fi



tmpdir=$(mktemp -d)

tmpfil="$tmpdir/fil1"

tmpfi2="$tmpdir/fil2"

resetvid="033[0m"

prev2line="033[2F"

next2line="033[2E"



sln=1

cln=0

cnt=0

for i in $1

do

 if test -f "$i"

 then

  cln=$((cln+1))

  tmp=$(find -L "$i" -printf "%s")

  cnt=$((cnt+tmp))

 fi

done

echo "

                    number of files = $cln

                    total file size = $cnt B ~ $(($cnt/2**20)) MiB

"

for i in $1

do

 if test -f "$i"

 then

  tmpnam=$(echo -n "$i")

  tmpsum=$(< "$i" pv -ptrbs "$cnt" | md5sum)

  sleep 0.05

  echo "$sln" | pv -ls "$cln" > /dev/null

  sleep 0.05

  sln="$sln

$i"

  sleep 0.05

  printf "${tmpsum/-}${tmpnam}n" >> "$tmpfil"

  echo -ne "$prev2line" > /dev/stderr

 fi

done



sync

sleep 0.1

echo -ne "$next2line" > /dev/stderr



echo "-----"

if $verbose

then

 sort -k2 "$tmpfil" | tee "$tmpfi2" | md5sum -c

 echo "-----"

 cat "$tmpfi2"

else

 sort -k2 "$tmpfil"

fi

sleep 0.5

sync

rm -r "$tmpdir"

Demo example

Usage

$ md5summer 

Usage:    /home/sudodus/bin/md5summer  [-v]  <pattern>

Example:  /home/sudodus/bin/md5summer  '*.iso'      # notice the quotes

          /home/sudodus/bin/md5summer  -v  '*.iso'  # verbose

I tested in this directory

$ ls -1a

.

..

'filename with spaces'

md5summer

md5summer1

md5summer2

subdir

.ttt

zenity-info-message.png

Normal usage plus pattern to see hidden files

$ md5summer ".* *"



                    number of files = 6

                    total file size = 12649 B ~ 0 MiB



8,32KiB 0:00:00 [ 156MiB/s] [=============================>                ] 67%

6,00  0:00:00 [ 133k/s] [====================================>] 100%            

-----

184d0995cc8b6d8070f89f15caee35ce  filename with spaces

28227139997996c7838f07cd4c630ffc  md5summer

3383b86a0753e486215280f0baf94399  md5summer1

28227139997996c7838f07cd4c630ffc  md5summer2

31cd03f64a466e680e9c22fef4bcf14b  .ttt

670b8db45e57723b5f1b8a63399cdfa1  zenity-info-message.png

Verbose output plus pattern to see hidden files

$ md5summer -v ".* *"



                    number of files = 6

                    total file size = 12649 B ~ 0 MiB



8,32KiB 0:00:00 [ 184MiB/s] [=============================>                ] 67%

6,00  0:00:00 [ 133k/s] [====================================>] 100%            

-----

filename with spaces: OK

md5summer: OK

md5summer1: OK

md5summer2: OK

.ttt: OK

zenity-info-message.png: OK

-----

184d0995cc8b6d8070f89f15caee35ce  filename with spaces

28227139997996c7838f07cd4c630ffc  md5summer

3383b86a0753e486215280f0baf94399  md5summer1

28227139997996c7838f07cd4c630ffc  md5summer2

31cd03f64a466e680e9c22fef4bcf14b  .ttt

670b8db45e57723b5f1b8a63399cdfa1  zenity-info-message.png

Redirection to a file, first the screen output

$ md5summer ".* *" > subdir/save

8,32KiB 0:00:00 [ 180MiB/s] [=============================>                ] 67%

6,00  0:00:00 [ 162k/s] [====================================>] 100%

and then the saved output

$ cat subdir/save 



                    number of files = 6

                    total file size = 12649 B ~ 0 MiB



-----

184d0995cc8b6d8070f89f15caee35ce  filename with spaces

28227139997996c7838f07cd4c630ffc  md5summer

3383b86a0753e486215280f0baf94399  md5summer1

28227139997996c7838f07cd4c630ffc  md5summer2

31cd03f64a466e680e9c22fef4bcf14b  .ttt

670b8db45e57723b5f1b8a63399cdfa1  zenity-info-message.png

Checking iso files

$ md5summer "*.iso"



                    number of files = 10

                    total file size = 7112491008 B ~ 6783 MiB



28,0MiB 0:00:00 [ 160MiB/s] [>                                             ]  0%

10,0  0:00:00 [ 204k/s] [====================================>] 100%            

-----

7a27fdd46a63ba4375896891826c1c88  debian-live-8.6.0-amd64-lxde-desktop.iso

d70eec28cdbdee7f7aa95fb53b9bfdac  debian-live-8.7.1-amd64-standard.iso

382cfbe621ca446d12871b8945b50d20  debian-live-8.8.0-amd64-standard.iso

44473dfe2ee1aad0f71506f1d5862457  debian-live-8.8.0-i386-standard.iso

f396b3532fa84059e7738c3c1827bada  debian-live-9.3.0-amd64-cinnamon.iso

8f6def28ae7cbefa0a6e59407c884466  debian-live-9.6.0-amd64-cinnamon.iso

90b1815da0a5bf4ee4b00eec2b5d3587  debian-testing-amd64-netinst_2017-07-28.iso

8f75074ab98e166b7469299d3e459ac6  mini-amd64-2016-01-21-daily.iso

e580266fba58eb34b05bf6e13f51a047  mini-jessie-32.iso

646c109a9a16c0527ce1c7afa922e2ed  mini-jessie-64.iso

edited Jan 20 at 17:04

answered Jan 20 at 3:21

sudodus

1,32016

add a comment |

I have also enjoyed taming the 'fancy cat', `pv`, for `md5sum` :-)

I think my shellscript is rather stable now

There is a usage output, if you do not enter the pattern correctly.

It works with wild cards, but does not recurse into subdirectories

You can enter more than one pattern, for example ".* *"

There is a verbosity switch that turns on checking the md5sums ... OK

You can redirect the relevant output into a file; the process view output of pv will stay on the {screen/terminal window}

There are two pv processes in a for loop, one global and one for each file, the global pv 'only counts the files', and the other one measures the speed and amount of data transferred

ANSI escape sequences are used to keep the process view in a stable position

I use the name md5summer, make the shellscript executable and put it in a directory in PATH (my ~/bin directory, you may prefer /usr/local/bin).

#!/bin/bash



# date      sign     comment

# 20190119  sudodus  created md5summer version 1.0



if [ "$1" == "-v" ]

then

 verbose=true

 shift

else

 verbose=false

fi

if [ $# -ne 1 ]

then

 echo "Usage:    $0  [-v]  <pattern>"

 echo "Example:  $0  '*.iso'      # notice the quotes"

 echo "          $0  -v  '*.iso'  # verbose"

 exit

fi

tmpstr=$(find $1 -maxdepth 0 -type f 2> /dev/null)

if [ "$tmpstr" == "" ]

then

 echo "No such file '$1'. Try another pattern!"

 exit

fi



tmpdir=$(mktemp -d)

tmpfil="$tmpdir/fil1"

tmpfi2="$tmpdir/fil2"

resetvid="033[0m"

prev2line="033[2F"

next2line="033[2E"



sln=1

cln=0

cnt=0

for i in $1

do

 if test -f "$i"

 then

  cln=$((cln+1))

  tmp=$(find -L "$i" -printf "%s")

  cnt=$((cnt+tmp))

 fi

done

echo "

                    number of files = $cln

                    total file size = $cnt B ~ $(($cnt/2**20)) MiB

"

for i in $1

do

 if test -f "$i"

 then

  tmpnam=$(echo -n "$i")

  tmpsum=$(< "$i" pv -ptrbs "$cnt" | md5sum)

  sleep 0.05

  echo "$sln" | pv -ls "$cln" > /dev/null

  sleep 0.05

  sln="$sln

$i"

  sleep 0.05

  printf "${tmpsum/-}${tmpnam}n" >> "$tmpfil"

  echo -ne "$prev2line" > /dev/stderr

 fi

done



sync

sleep 0.1

echo -ne "$next2line" > /dev/stderr



echo "-----"

if $verbose

then

 sort -k2 "$tmpfil" | tee "$tmpfi2" | md5sum -c

 echo "-----"

 cat "$tmpfi2"

else

 sort -k2 "$tmpfil"

fi

sleep 0.5

sync

rm -r "$tmpdir"

Demo example

Usage

$ md5summer 

Usage:    /home/sudodus/bin/md5summer  [-v]  <pattern>

Example:  /home/sudodus/bin/md5summer  '*.iso'      # notice the quotes

          /home/sudodus/bin/md5summer  -v  '*.iso'  # verbose

I tested in this directory

$ ls -1a

.

..

'filename with spaces'

md5summer

md5summer1

md5summer2

subdir

.ttt

zenity-info-message.png

Normal usage plus pattern to see hidden files

$ md5summer ".* *"



                    number of files = 6

                    total file size = 12649 B ~ 0 MiB



8,32KiB 0:00:00 [ 156MiB/s] [=============================>                ] 67%

6,00  0:00:00 [ 133k/s] [====================================>] 100%            

-----

184d0995cc8b6d8070f89f15caee35ce  filename with spaces

28227139997996c7838f07cd4c630ffc  md5summer

3383b86a0753e486215280f0baf94399  md5summer1

28227139997996c7838f07cd4c630ffc  md5summer2

31cd03f64a466e680e9c22fef4bcf14b  .ttt

670b8db45e57723b5f1b8a63399cdfa1  zenity-info-message.png

Verbose output plus pattern to see hidden files

$ md5summer -v ".* *"



                    number of files = 6

                    total file size = 12649 B ~ 0 MiB



8,32KiB 0:00:00 [ 184MiB/s] [=============================>                ] 67%

6,00  0:00:00 [ 133k/s] [====================================>] 100%            

-----

filename with spaces: OK

md5summer: OK

md5summer1: OK

md5summer2: OK

.ttt: OK

zenity-info-message.png: OK

-----

184d0995cc8b6d8070f89f15caee35ce  filename with spaces

28227139997996c7838f07cd4c630ffc  md5summer

3383b86a0753e486215280f0baf94399  md5summer1

28227139997996c7838f07cd4c630ffc  md5summer2

31cd03f64a466e680e9c22fef4bcf14b  .ttt

670b8db45e57723b5f1b8a63399cdfa1  zenity-info-message.png

Redirection to a file, first the screen output

$ md5summer ".* *" > subdir/save

8,32KiB 0:00:00 [ 180MiB/s] [=============================>                ] 67%

6,00  0:00:00 [ 162k/s] [====================================>] 100%

and then the saved output

$ cat subdir/save 



                    number of files = 6

                    total file size = 12649 B ~ 0 MiB



-----

184d0995cc8b6d8070f89f15caee35ce  filename with spaces

28227139997996c7838f07cd4c630ffc  md5summer

3383b86a0753e486215280f0baf94399  md5summer1

28227139997996c7838f07cd4c630ffc  md5summer2

31cd03f64a466e680e9c22fef4bcf14b  .ttt

670b8db45e57723b5f1b8a63399cdfa1  zenity-info-message.png

Checking iso files

$ md5summer "*.iso"



                    number of files = 10

                    total file size = 7112491008 B ~ 6783 MiB



28,0MiB 0:00:00 [ 160MiB/s] [>                                             ]  0%

10,0  0:00:00 [ 204k/s] [====================================>] 100%            

-----

7a27fdd46a63ba4375896891826c1c88  debian-live-8.6.0-amd64-lxde-desktop.iso

d70eec28cdbdee7f7aa95fb53b9bfdac  debian-live-8.7.1-amd64-standard.iso

382cfbe621ca446d12871b8945b50d20  debian-live-8.8.0-amd64-standard.iso

44473dfe2ee1aad0f71506f1d5862457  debian-live-8.8.0-i386-standard.iso

f396b3532fa84059e7738c3c1827bada  debian-live-9.3.0-amd64-cinnamon.iso

8f6def28ae7cbefa0a6e59407c884466  debian-live-9.6.0-amd64-cinnamon.iso

90b1815da0a5bf4ee4b00eec2b5d3587  debian-testing-amd64-netinst_2017-07-28.iso

8f75074ab98e166b7469299d3e459ac6  mini-amd64-2016-01-21-daily.iso

e580266fba58eb34b05bf6e13f51a047  mini-jessie-32.iso

646c109a9a16c0527ce1c7afa922e2ed  mini-jessie-64.iso

edited Jan 20 at 17:04

answered Jan 20 at 3:21

sudodus

1,32016

I have also enjoyed taming the 'fancy cat', `pv`, for `md5sum` :-)

I think my shellscript is rather stable now

There is a usage output, if you do not enter the pattern correctly.

It works with wild cards, but does not recurse into subdirectories

You can enter more than one pattern, for example ".* *"

There is a verbosity switch that turns on checking the md5sums ... OK

You can redirect the relevant output into a file; the process view output of pv will stay on the {screen/terminal window}

There are two pv processes in a for loop, one global and one for each file, the global pv 'only counts the files', and the other one measures the speed and amount of data transferred

ANSI escape sequences are used to keep the process view in a stable position

I use the name md5summer, make the shellscript executable and put it in a directory in PATH (my ~/bin directory, you may prefer /usr/local/bin).

#!/bin/bash



# date      sign     comment

# 20190119  sudodus  created md5summer version 1.0



if [ "$1" == "-v" ]

then

 verbose=true

 shift

else

 verbose=false

fi

if [ $# -ne 1 ]

then

 echo "Usage:    $0  [-v]  <pattern>"

 echo "Example:  $0  '*.iso'      # notice the quotes"

 echo "          $0  -v  '*.iso'  # verbose"

 exit

fi

tmpstr=$(find $1 -maxdepth 0 -type f 2> /dev/null)

if [ "$tmpstr" == "" ]

then

 echo "No such file '$1'. Try another pattern!"

 exit

fi



tmpdir=$(mktemp -d)

tmpfil="$tmpdir/fil1"

tmpfi2="$tmpdir/fil2"

resetvid="033[0m"

prev2line="033[2F"

next2line="033[2E"



sln=1

cln=0

cnt=0

for i in $1

do

 if test -f "$i"

 then

  cln=$((cln+1))

  tmp=$(find -L "$i" -printf "%s")

  cnt=$((cnt+tmp))

 fi

done

echo "

                    number of files = $cln

                    total file size = $cnt B ~ $(($cnt/2**20)) MiB

"

for i in $1

do

 if test -f "$i"

 then

  tmpnam=$(echo -n "$i")

  tmpsum=$(< "$i" pv -ptrbs "$cnt" | md5sum)

  sleep 0.05

  echo "$sln" | pv -ls "$cln" > /dev/null

  sleep 0.05

  sln="$sln

$i"

  sleep 0.05

  printf "${tmpsum/-}${tmpnam}n" >> "$tmpfil"

  echo -ne "$prev2line" > /dev/stderr

 fi

done



sync

sleep 0.1

echo -ne "$next2line" > /dev/stderr



echo "-----"

if $verbose

then

 sort -k2 "$tmpfil" | tee "$tmpfi2" | md5sum -c

 echo "-----"

 cat "$tmpfi2"

else

 sort -k2 "$tmpfil"

fi

sleep 0.5

sync

rm -r "$tmpdir"

Demo example

Usage

$ md5summer 

Usage:    /home/sudodus/bin/md5summer  [-v]  <pattern>

Example:  /home/sudodus/bin/md5summer  '*.iso'      # notice the quotes

          /home/sudodus/bin/md5summer  -v  '*.iso'  # verbose

I tested in this directory

$ ls -1a

.

..

'filename with spaces'

md5summer

md5summer1

md5summer2

subdir

.ttt

zenity-info-message.png

Normal usage plus pattern to see hidden files

$ md5summer ".* *"



                    number of files = 6

                    total file size = 12649 B ~ 0 MiB



8,32KiB 0:00:00 [ 156MiB/s] [=============================>                ] 67%

6,00  0:00:00 [ 133k/s] [====================================>] 100%            

-----

184d0995cc8b6d8070f89f15caee35ce  filename with spaces

28227139997996c7838f07cd4c630ffc  md5summer

3383b86a0753e486215280f0baf94399  md5summer1

28227139997996c7838f07cd4c630ffc  md5summer2

31cd03f64a466e680e9c22fef4bcf14b  .ttt

670b8db45e57723b5f1b8a63399cdfa1  zenity-info-message.png

Verbose output plus pattern to see hidden files

$ md5summer -v ".* *"



                    number of files = 6

                    total file size = 12649 B ~ 0 MiB



8,32KiB 0:00:00 [ 184MiB/s] [=============================>                ] 67%

6,00  0:00:00 [ 133k/s] [====================================>] 100%            

-----

filename with spaces: OK

md5summer: OK

md5summer1: OK

md5summer2: OK

.ttt: OK

zenity-info-message.png: OK

-----

184d0995cc8b6d8070f89f15caee35ce  filename with spaces

28227139997996c7838f07cd4c630ffc  md5summer

3383b86a0753e486215280f0baf94399  md5summer1

28227139997996c7838f07cd4c630ffc  md5summer2

31cd03f64a466e680e9c22fef4bcf14b  .ttt

670b8db45e57723b5f1b8a63399cdfa1  zenity-info-message.png

Redirection to a file, first the screen output

$ md5summer ".* *" > subdir/save

8,32KiB 0:00:00 [ 180MiB/s] [=============================>                ] 67%

6,00  0:00:00 [ 162k/s] [====================================>] 100%

and then the saved output

$ cat subdir/save 



                    number of files = 6

                    total file size = 12649 B ~ 0 MiB



-----

184d0995cc8b6d8070f89f15caee35ce  filename with spaces

28227139997996c7838f07cd4c630ffc  md5summer

3383b86a0753e486215280f0baf94399  md5summer1

28227139997996c7838f07cd4c630ffc  md5summer2

31cd03f64a466e680e9c22fef4bcf14b  .ttt

670b8db45e57723b5f1b8a63399cdfa1  zenity-info-message.png

Checking iso files

$ md5summer "*.iso"



                    number of files = 10

                    total file size = 7112491008 B ~ 6783 MiB



28,0MiB 0:00:00 [ 160MiB/s] [>                                             ]  0%

10,0  0:00:00 [ 204k/s] [====================================>] 100%            

-----

7a27fdd46a63ba4375896891826c1c88  debian-live-8.6.0-amd64-lxde-desktop.iso

d70eec28cdbdee7f7aa95fb53b9bfdac  debian-live-8.7.1-amd64-standard.iso

382cfbe621ca446d12871b8945b50d20  debian-live-8.8.0-amd64-standard.iso

44473dfe2ee1aad0f71506f1d5862457  debian-live-8.8.0-i386-standard.iso

f396b3532fa84059e7738c3c1827bada  debian-live-9.3.0-amd64-cinnamon.iso

8f6def28ae7cbefa0a6e59407c884466  debian-live-9.6.0-amd64-cinnamon.iso

90b1815da0a5bf4ee4b00eec2b5d3587  debian-testing-amd64-netinst_2017-07-28.iso

8f75074ab98e166b7469299d3e459ac6  mini-amd64-2016-01-21-daily.iso

e580266fba58eb34b05bf6e13f51a047  mini-jessie-32.iso

646c109a9a16c0527ce1c7afa922e2ed  mini-jessie-64.iso

edited Jan 20 at 17:04

answered Jan 20 at 3:21

sudodus

1,32016

edited Jan 20 at 17:04

answered Jan 20 at 3:21

sudodus

1,32016

answered Jan 20 at 3:21

sudodus

1,32016

answered Jan 20 at 3:21

sudodus

1,32016

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Mfrhtyj