Using pv with md5sum












7















I used md5sum with pv to check 4 GiB of files that are in the same directory:



md5sum dir/* | pv -s 4g | sort


The command completes successfully in about 28 seconds, but pv's output is all wrong. This is the sort of output that is displayed throughout:



219 B 0:00:07 [ 125 B/s ] [>                                ]  0% ETA 1668:01:09:02


It's like this without the -s 4g and | sort aswell. I've also tried it with different files.



I've tried using pv with cat and the output was fine, so the problem seems to be caused by md5sum.










share|improve this question


















  • 1





    It's likely a buffering issue. That is, the output from md5sum is not line-buffered and won't arrive at pv until the process is done or has produced enough data to fill the output buffer. I can't see an option in the md5sum manual to make it line-buffered. Or, you are misunderstanding what is happening, which is that the data sent through pv is only the checksums (and filenames). Also pv does not know how much data to expect, so it can't say how much is left.

    – Kusalananda
    Jan 19 at 16:42













  • It seems like only the checksums and filenames are going thorough pv (but this doesn't seem to affect anyone else?). Is there are way to make all of the file data go through pv?

    – EmmaV
    Jan 19 at 16:49











  • The issue with that is that you would loose the filename. Think of pv as a "fancy cat". Using cat file | md5sum, you would get the MD5 hash for a single file, but md5sum has no way of tagging the result with a filename.

    – Kusalananda
    Jan 19 at 16:51






  • 1





    You are using pv to rate the output of md5sum (which is a few bytes) and not md5sum's own progress of reading the files themselves. Maybe this answer is related: unix.stackexchange.com/q/16826/30851 (on second thought, maybe not - it's about textfiles...)

    – frostschutz
    Jan 19 at 17:05








  • 2





    Since you are not feeding 4Gb of data down the pipe, but just the output of md5sum for a plurality of files, changing the -s 4g option such that it reflects an estimate of the size of md5sum's output, e.g. -s 512, should be a step in the right direction.

    – ozzy
    Jan 19 at 17:12


















7















I used md5sum with pv to check 4 GiB of files that are in the same directory:



md5sum dir/* | pv -s 4g | sort


The command completes successfully in about 28 seconds, but pv's output is all wrong. This is the sort of output that is displayed throughout:



219 B 0:00:07 [ 125 B/s ] [>                                ]  0% ETA 1668:01:09:02


It's like this without the -s 4g and | sort aswell. I've also tried it with different files.



I've tried using pv with cat and the output was fine, so the problem seems to be caused by md5sum.










share|improve this question


















  • 1





    It's likely a buffering issue. That is, the output from md5sum is not line-buffered and won't arrive at pv until the process is done or has produced enough data to fill the output buffer. I can't see an option in the md5sum manual to make it line-buffered. Or, you are misunderstanding what is happening, which is that the data sent through pv is only the checksums (and filenames). Also pv does not know how much data to expect, so it can't say how much is left.

    – Kusalananda
    Jan 19 at 16:42













  • It seems like only the checksums and filenames are going thorough pv (but this doesn't seem to affect anyone else?). Is there are way to make all of the file data go through pv?

    – EmmaV
    Jan 19 at 16:49











  • The issue with that is that you would loose the filename. Think of pv as a "fancy cat". Using cat file | md5sum, you would get the MD5 hash for a single file, but md5sum has no way of tagging the result with a filename.

    – Kusalananda
    Jan 19 at 16:51






  • 1





    You are using pv to rate the output of md5sum (which is a few bytes) and not md5sum's own progress of reading the files themselves. Maybe this answer is related: unix.stackexchange.com/q/16826/30851 (on second thought, maybe not - it's about textfiles...)

    – frostschutz
    Jan 19 at 17:05








  • 2





    Since you are not feeding 4Gb of data down the pipe, but just the output of md5sum for a plurality of files, changing the -s 4g option such that it reflects an estimate of the size of md5sum's output, e.g. -s 512, should be a step in the right direction.

    – ozzy
    Jan 19 at 17:12
















7












7








7


1






I used md5sum with pv to check 4 GiB of files that are in the same directory:



md5sum dir/* | pv -s 4g | sort


The command completes successfully in about 28 seconds, but pv's output is all wrong. This is the sort of output that is displayed throughout:



219 B 0:00:07 [ 125 B/s ] [>                                ]  0% ETA 1668:01:09:02


It's like this without the -s 4g and | sort aswell. I've also tried it with different files.



I've tried using pv with cat and the output was fine, so the problem seems to be caused by md5sum.










share|improve this question














I used md5sum with pv to check 4 GiB of files that are in the same directory:



md5sum dir/* | pv -s 4g | sort


The command completes successfully in about 28 seconds, but pv's output is all wrong. This is the sort of output that is displayed throughout:



219 B 0:00:07 [ 125 B/s ] [>                                ]  0% ETA 1668:01:09:02


It's like this without the -s 4g and | sort aswell. I've also tried it with different files.



I've tried using pv with cat and the output was fine, so the problem seems to be caused by md5sum.







pipe hashsum pv






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 19 at 16:29









EmmaVEmmaV

1,1581332




1,1581332








  • 1





    It's likely a buffering issue. That is, the output from md5sum is not line-buffered and won't arrive at pv until the process is done or has produced enough data to fill the output buffer. I can't see an option in the md5sum manual to make it line-buffered. Or, you are misunderstanding what is happening, which is that the data sent through pv is only the checksums (and filenames). Also pv does not know how much data to expect, so it can't say how much is left.

    – Kusalananda
    Jan 19 at 16:42













  • It seems like only the checksums and filenames are going thorough pv (but this doesn't seem to affect anyone else?). Is there are way to make all of the file data go through pv?

    – EmmaV
    Jan 19 at 16:49











  • The issue with that is that you would loose the filename. Think of pv as a "fancy cat". Using cat file | md5sum, you would get the MD5 hash for a single file, but md5sum has no way of tagging the result with a filename.

    – Kusalananda
    Jan 19 at 16:51






  • 1





    You are using pv to rate the output of md5sum (which is a few bytes) and not md5sum's own progress of reading the files themselves. Maybe this answer is related: unix.stackexchange.com/q/16826/30851 (on second thought, maybe not - it's about textfiles...)

    – frostschutz
    Jan 19 at 17:05








  • 2





    Since you are not feeding 4Gb of data down the pipe, but just the output of md5sum for a plurality of files, changing the -s 4g option such that it reflects an estimate of the size of md5sum's output, e.g. -s 512, should be a step in the right direction.

    – ozzy
    Jan 19 at 17:12
















  • 1





    It's likely a buffering issue. That is, the output from md5sum is not line-buffered and won't arrive at pv until the process is done or has produced enough data to fill the output buffer. I can't see an option in the md5sum manual to make it line-buffered. Or, you are misunderstanding what is happening, which is that the data sent through pv is only the checksums (and filenames). Also pv does not know how much data to expect, so it can't say how much is left.

    – Kusalananda
    Jan 19 at 16:42













  • It seems like only the checksums and filenames are going thorough pv (but this doesn't seem to affect anyone else?). Is there are way to make all of the file data go through pv?

    – EmmaV
    Jan 19 at 16:49











  • The issue with that is that you would loose the filename. Think of pv as a "fancy cat". Using cat file | md5sum, you would get the MD5 hash for a single file, but md5sum has no way of tagging the result with a filename.

    – Kusalananda
    Jan 19 at 16:51






  • 1





    You are using pv to rate the output of md5sum (which is a few bytes) and not md5sum's own progress of reading the files themselves. Maybe this answer is related: unix.stackexchange.com/q/16826/30851 (on second thought, maybe not - it's about textfiles...)

    – frostschutz
    Jan 19 at 17:05








  • 2





    Since you are not feeding 4Gb of data down the pipe, but just the output of md5sum for a plurality of files, changing the -s 4g option such that it reflects an estimate of the size of md5sum's output, e.g. -s 512, should be a step in the right direction.

    – ozzy
    Jan 19 at 17:12










1




1





It's likely a buffering issue. That is, the output from md5sum is not line-buffered and won't arrive at pv until the process is done or has produced enough data to fill the output buffer. I can't see an option in the md5sum manual to make it line-buffered. Or, you are misunderstanding what is happening, which is that the data sent through pv is only the checksums (and filenames). Also pv does not know how much data to expect, so it can't say how much is left.

– Kusalananda
Jan 19 at 16:42







It's likely a buffering issue. That is, the output from md5sum is not line-buffered and won't arrive at pv until the process is done or has produced enough data to fill the output buffer. I can't see an option in the md5sum manual to make it line-buffered. Or, you are misunderstanding what is happening, which is that the data sent through pv is only the checksums (and filenames). Also pv does not know how much data to expect, so it can't say how much is left.

– Kusalananda
Jan 19 at 16:42















It seems like only the checksums and filenames are going thorough pv (but this doesn't seem to affect anyone else?). Is there are way to make all of the file data go through pv?

– EmmaV
Jan 19 at 16:49





It seems like only the checksums and filenames are going thorough pv (but this doesn't seem to affect anyone else?). Is there are way to make all of the file data go through pv?

– EmmaV
Jan 19 at 16:49













The issue with that is that you would loose the filename. Think of pv as a "fancy cat". Using cat file | md5sum, you would get the MD5 hash for a single file, but md5sum has no way of tagging the result with a filename.

– Kusalananda
Jan 19 at 16:51





The issue with that is that you would loose the filename. Think of pv as a "fancy cat". Using cat file | md5sum, you would get the MD5 hash for a single file, but md5sum has no way of tagging the result with a filename.

– Kusalananda
Jan 19 at 16:51




1




1





You are using pv to rate the output of md5sum (which is a few bytes) and not md5sum's own progress of reading the files themselves. Maybe this answer is related: unix.stackexchange.com/q/16826/30851 (on second thought, maybe not - it's about textfiles...)

– frostschutz
Jan 19 at 17:05







You are using pv to rate the output of md5sum (which is a few bytes) and not md5sum's own progress of reading the files themselves. Maybe this answer is related: unix.stackexchange.com/q/16826/30851 (on second thought, maybe not - it's about textfiles...)

– frostschutz
Jan 19 at 17:05






2




2





Since you are not feeding 4Gb of data down the pipe, but just the output of md5sum for a plurality of files, changing the -s 4g option such that it reflects an estimate of the size of md5sum's output, e.g. -s 512, should be a step in the right direction.

– ozzy
Jan 19 at 17:12







Since you are not feeding 4Gb of data down the pipe, but just the output of md5sum for a plurality of files, changing the -s 4g option such that it reflects an estimate of the size of md5sum's output, e.g. -s 512, should be a step in the right direction.

– ozzy
Jan 19 at 17:12












5 Answers
5






active

oldest

votes


















9














pv is a "fancy cat", which is that you may use pv in most situations where you would use cat.



Using cat with md5sum, you can compute the MD5 checksum of a single file with



cat file | md5sum


or, with pv,



pv file | md5sum


Unfortunately though, this does not allow md5sum to insert the filename into its output properly.



Now, fortunately, pv is a really fancy cat, and on some systems (Linux), it's able to watch the data being passed through another process. This is done by using its -d option with the process ID of that other process.



This means that you can do things like



md5sum dir/* | sort >sums &
sleep 1
pv -d "$(pgrep -n md5sum)"


This would allow pv to watch the md5sum process. The sleep is there to allow md5sum, which is running in the background, to properly start. pgrep -n md5sum would return the PID of the most recently started md5sum process that you own. pv will exit as soon as the process that it is watching terminates.



I've tested this particular way of running pv a few times and it seems to generally work well, but sometimes it seems to stop outputting anything as md5sum switches to the next file. Sometimes, it seems to spawn spurious background tasks in the shell.



It would probably be safest to run it as



md5sum dir/* >sums &
sleep 1
pv -W -d "$!"
sort -o sums sums


The -W option will cause pv to wait until there's actual data being transferred, although this does also not always seem to work reliably.






share|improve this answer


























  • The need for sleep is somewhat surprising!

    – Stephen Kitt
    Jan 19 at 17:43











  • @StephenKitt That's the only way I could get it to behave somewhat predictably. I don't really know what it does, but whatever it is, it seems to be possible for it to do it "too early", and the progress meter won't show at all.

    – Kusalananda
    Jan 19 at 17:45



















5














The data that you are feeding through the pipe is not the data of the files that md5sum is processing, but instead the md5sum output, which, for every file, consists of one line comprising: the MD5-hash, two spaces, and the file name. Since we know this in advance, can inform pv accordingly, so as to enable it to display an accurate progress indicator. There are two ways of doing so.



The first, preferred method (suggested by frostschutz) makes use of the fact that md5sum generates one line per processed file, and the fact that pv has a line mode that counts lines rather than bytes. In this mode pv will only move the progress bar when it encounters a newline in the throughput, i.e. per file finished by md5sum. In Bash, this first method can look like this:



set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort


The set builtin is used to set the positional parameters to the files to be processed (the *.iso shell pattern is expanded by the shell). md5sum is then told to process these files ($@ expands to the positional parameters), and pv in line mode will move the progress indicator each time a file has been processed / a line is output by md5sum. Notably, pv is informed of the total number of lines it can expect (-s $#), as the special shell parameter $# expands to the number of positional arguments.



The second method is not line-based but byte-based. With md5sum this unnecessarily complicated, but some other program may not produce lines but for instance continuous data, and then this approach may be more practical. I illustrate it with md5sum though. The idea is to calculate the amount of data that md5sum (or some other program) will produce, and use this to inform pv. In Bash, this could look as follows:



os=$(( $( ls -1 | wc -c ) + $( ls -1 | wc -l ) * 34 ))
md5sum * | pv -s $os | sort


The first line calculates the output size (os) estimate: the first term is the number of bytes necessary for encoding the filenames (incl. newline), the second term the number of bytes used for encoding the MD5-hashes (32 bytes each), plus 2 spaces. In the second line, we tell pv that the expected amount of data is os bytes, so that it can show an accurate progress indicator leading up to 100% (which indicator is updated per finished md5summed file).



Obviously, both methods are only practical in case multiple files are to be processed. Also, it should be noted that since the output of md5sum is not related to the amount of time the md5sum program has to spend crunching the underlying data, the progress indicator may be considered somewhat misleading. E.g., in the second method, the file with the shortest name will yield the lowest progress update, even though it may actually be the biggest in size. Then again, if all files have a similar sizes and names, this shouldn't matter much.






share|improve this answer





















  • 2





    It's a very nice idea to calculate progress based on md5sum output (even though the progress would indicate how many files are still left, and not - how large or how long that would take). However it should not require parsing ls. pv supports --line-mode so set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort might be equivalent and still work if you replace md5sum with sha512sum or otherwise.

    – frostschutz
    Jan 19 at 19:39











  • @frostschutz You are undeniably right :-) This is a nicer, cleaner solution.

    – ozzy
    Jan 19 at 20:48



















3














Here's a dirty hack to get progress per file:



for f in iso/*
do
pv "$f" | (
cat > /dev/null &
md5sum "$f"
wait
)
done


What it looks like:



4.15GiB 0:00:32 [ 130MiB/s] [================================>] 100%            
0db0b36fc7bad7b50835f68c369e854c iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso
792MiB 0:00:06 [ 130MiB/s] [================================>] 100%
97537db63e61d20a5cb71d29145b2937 iso/archlinux-2016.10.01-dual.iso
843MiB 0:00:06 [ 129MiB/s] [================================>] 100%
1b5dc31e038499b8409f7d4d720e3eba iso/lubuntu-16.04-desktop-i386.iso
259MiB 0:00:02 [ 130MiB/s] [=========> ] 30% ETA 0:00:04
...


Now, this makes several assumptions. Firstly, that reading data is slower than hashing it. Secondly, that OS will cache the I/O so data won't be (physically) read twice even though pv and md5sum are completely independent readers.



The nice thing about such a dirty, dirty hack is that you can easily adapt it to make a progress bar across all the data, not just one file. And still do weird stuff like sort the output afterwards.



pv iso/* | (
cat > /dev/null &
md5sum iso/* | sort
wait
)


What it looks like (ongoing):



15.0GiB 0:01:47 [ 131MiB/s] [===========================>      ] 83% ETA 0:00:21


What it looks like (finished):



18.0GiB 0:02:11 [ 140MiB/s] [================================>] 100%            
0db0b36fc7bad7b50835f68c369e854c iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso
155603390e65f2a8341328be3cb63875 iso/systemrescuecd-x86-4.2.0.iso
1b5dc31e038499b8409f7d4d720e3eba iso/lubuntu-16.04-desktop-i386.iso
1b6ed6ff8d399f53adadfafb20fb0d71 iso/systemrescuecd-x86-4.4.1.iso
25715326d7096c50f7ea126ac20eabfd iso/openSUSE-13.2-KDE-Live-i686.iso
...


Now, that's for the hacks. Check other answers for proper solutions. ;-)






share|improve this answer
























  • Reading the data would be required to hashing it, so hashing the data is almost certainly slower than just reading it. Also, you force reading the data twice for each file (once with pv and once with md5sum).

    – Kusalananda
    Jan 19 at 18:52











  • @Kusalananda ...and that's why I called it a hack! pv < /dev/zero | md5sum -> 637MiB/s on my machine. The ISO files I tested with were on a USB3 stick, 140MiB/s is about the max read speed. So md5sum can hash data faster than read it. Yes it reads data twice, but that's pedantic - thanks to OS caching, the USB stick still reads it only once; and running this hack is (in my case) not any slower than running md5sum by itself without progress bar.

    – frostschutz
    Jan 19 at 19:07











  • ...and yes, it can go wrong. If you were to do this on a slow machine with a very fast storage medium, pv would run way ahead of md5sum and data would probably end up being read twice for real, or in any case, the progress bar would not be in sync with md5sum at all.

    – frostschutz
    Jan 19 at 19:12





















2
















As already pointed out in comments and other answers:




  1. You are piping into pv only md5sum's output: checksums and file names; thus, pv's progress bar is not able to show how much data md5sum is reading.

  2. A size of 4 GB will be of course too much for that. Also, providing pv with the size of the file(s) you are piping into it (manually, with -s) is inconvenient.


Piping the content of your files into pv and then into md5sum will give you a progress bar, but file names would be lost.



This code is a not so elegant way to have both—a meaningful progress bar and file names with checksums:



#!/bin/sh

for file in "$@"; do
pv -- "$file" |
md5sum |
sed 's/-$//' |
printf '%s%sn' "$(cat -)" "$file"
done


The script is meant to be invoked as:



./script dir/*


You can of course declare it as a function, to avoid having to type its path to call it (or adding it to your PATH):



function pvsum () {
for file in "$@"; do
pv -- "$file" | md5sum | sed 's/-$//' | printf '%s%sn' "$(cat -)" "$file"
done
}


This way, the command pvsum dir/* | sort will be equivalent to your md5sum dir/* | pv -s <size> | sort.



Its output:



$ ./testscript testdir/*
4.00GiB 0:00:09 [ 446MiB/s] [==============================>] 100%
9dab5f8add1f699bca108f99e5fa5342 testdir/file1
1.00GiB 0:00:02 [ 447MiB/s] [==============================>] 100%
06a738a71e3fd3119922bdac259fe29a testdir/file2


What it does:




  • It loops over the given files and, for each:


    • Pipes the file from pv into md5sum, showing the default progress bar.


    • sed is used to remove the - printed by md5sum (which is reading from standard input); this also attempts to make the output suitable for being consumed by md5sum -c (thanks to frostschutz for pointing out this out)1.

    • Prints the checksum followed by the file name on standard output.




About sort:



I'm not sure about your expected results, so I have just ignored it. Since pv writes its progress bar to standard error, piping everything into sort will detach pv's output from md5sum's output.

Anyway, you can just append | sort after done in the code above and check if the result is fine to you.





1 Note that the output from the code shown above will not be suitable for md5sum -c if file names include newlines. Handling newlines is possible, but some versions of md5sum behave differently in this respect (see, for instance, answers to this question), making a general solution not easy (and out of the scope of this answer).



Assuming a recent version of md5sum, an attempt at solving this issue could be:



for file in "$@"; do
pv -- "$file" |
md5sum |
sed 's/-$//' |
printf '%s%sn' "$(cat -)" "$file" |
sed '$! N; s/\/\\/g; s/n/\n/g; t x; q; :x s/^/\/;'
done


Where the only addition, the final sed, will:




  • Put the whole input (checksum and file name) in pattern space, since it may contain newlines: $! matches any line except for the last one; N; appends a newline and the next line to the pattern space.

  • Escape with a backslash () any backslash found.

  • Replace with n any newline found.

  • Only if at least a backslash or newline has been replaced (t x;: branch to label x), a backslash is added at the beginning of the checksum to signal md5sum -c that something has to be unescaped; otherwise just quit.






share|improve this answer

































    0














    I have also enjoyed taming the 'fancy cat', pv, for md5sum :-)




    • I think my shellscript is rather stable now

    • There is a usage output, if you do not enter the pattern correctly.

    • It works with wild cards, but does not recurse into subdirectories

    • You can enter more than one pattern, for example ".* *"

    • There is a verbosity switch that turns on checking the md5sums ... OK

    • You can redirect the relevant output into a file; the process view output of pv will stay on the {screen/terminal window}

    • There are two pv processes in a for loop, one global and one for each file, the global pv 'only counts the files', and the other one measures the speed and amount of data transferred

    • ANSI escape sequences are used to keep the process view in a stable position


    I use the name md5summer, make the shellscript executable and put it in a directory in PATH (my ~/bin directory, you may prefer /usr/local/bin).



    #!/bin/bash

    # date sign comment
    # 20190119 sudodus created md5summer version 1.0

    if [ "$1" == "-v" ]
    then
    verbose=true
    shift
    else
    verbose=false
    fi
    if [ $# -ne 1 ]
    then
    echo "Usage: $0 [-v] <pattern>"
    echo "Example: $0 '*.iso' # notice the quotes"
    echo " $0 -v '*.iso' # verbose"
    exit
    fi
    tmpstr=$(find $1 -maxdepth 0 -type f 2> /dev/null)
    if [ "$tmpstr" == "" ]
    then
    echo "No such file '$1'. Try another pattern!"
    exit
    fi

    tmpdir=$(mktemp -d)
    tmpfil="$tmpdir/fil1"
    tmpfi2="$tmpdir/fil2"
    resetvid="033[0m"
    prev2line="033[2F"
    next2line="033[2E"

    sln=1
    cln=0
    cnt=0
    for i in $1
    do
    if test -f "$i"
    then
    cln=$((cln+1))
    tmp=$(find -L "$i" -printf "%s")
    cnt=$((cnt+tmp))
    fi
    done
    echo "
    number of files = $cln
    total file size = $cnt B ~ $(($cnt/2**20)) MiB
    "
    for i in $1
    do
    if test -f "$i"
    then
    tmpnam=$(echo -n "$i")
    tmpsum=$(< "$i" pv -ptrbs "$cnt" | md5sum)
    sleep 0.05
    echo "$sln" | pv -ls "$cln" > /dev/null
    sleep 0.05
    sln="$sln
    $i"
    sleep 0.05
    printf "${tmpsum/-}${tmpnam}n" >> "$tmpfil"
    echo -ne "$prev2line" > /dev/stderr
    fi
    done

    sync
    sleep 0.1
    echo -ne "$next2line" > /dev/stderr

    echo "-----"
    if $verbose
    then
    sort -k2 "$tmpfil" | tee "$tmpfi2" | md5sum -c
    echo "-----"
    cat "$tmpfi2"
    else
    sort -k2 "$tmpfil"
    fi
    sleep 0.5
    sync
    rm -r "$tmpdir"


    Demo example



    Usage



    $ md5summer 
    Usage: /home/sudodus/bin/md5summer [-v] <pattern>
    Example: /home/sudodus/bin/md5summer '*.iso' # notice the quotes
    /home/sudodus/bin/md5summer -v '*.iso' # verbose


    I tested in this directory



    $ ls -1a
    .
    ..
    'filename with spaces'
    md5summer
    md5summer1
    md5summer2
    subdir
    .ttt
    zenity-info-message.png


    Normal usage plus pattern to see hidden files



    $ md5summer ".* *"

    number of files = 6
    total file size = 12649 B ~ 0 MiB

    8,32KiB 0:00:00 [ 156MiB/s] [=============================> ] 67%
    6,00 0:00:00 [ 133k/s] [====================================>] 100%
    -----
    184d0995cc8b6d8070f89f15caee35ce filename with spaces
    28227139997996c7838f07cd4c630ffc md5summer
    3383b86a0753e486215280f0baf94399 md5summer1
    28227139997996c7838f07cd4c630ffc md5summer2
    31cd03f64a466e680e9c22fef4bcf14b .ttt
    670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png


    Verbose output plus pattern to see hidden files



    $ md5summer -v ".* *"

    number of files = 6
    total file size = 12649 B ~ 0 MiB

    8,32KiB 0:00:00 [ 184MiB/s] [=============================> ] 67%
    6,00 0:00:00 [ 133k/s] [====================================>] 100%
    -----
    filename with spaces: OK
    md5summer: OK
    md5summer1: OK
    md5summer2: OK
    .ttt: OK
    zenity-info-message.png: OK
    -----
    184d0995cc8b6d8070f89f15caee35ce filename with spaces
    28227139997996c7838f07cd4c630ffc md5summer
    3383b86a0753e486215280f0baf94399 md5summer1
    28227139997996c7838f07cd4c630ffc md5summer2
    31cd03f64a466e680e9c22fef4bcf14b .ttt
    670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png


    Redirection to a file, first the screen output



    $ md5summer ".* *" > subdir/save
    8,32KiB 0:00:00 [ 180MiB/s] [=============================> ] 67%
    6,00 0:00:00 [ 162k/s] [====================================>] 100%


    and then the saved output



    $ cat subdir/save 

    number of files = 6
    total file size = 12649 B ~ 0 MiB

    -----
    184d0995cc8b6d8070f89f15caee35ce filename with spaces
    28227139997996c7838f07cd4c630ffc md5summer
    3383b86a0753e486215280f0baf94399 md5summer1
    28227139997996c7838f07cd4c630ffc md5summer2
    31cd03f64a466e680e9c22fef4bcf14b .ttt
    670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png


    Checking iso files



    $ md5summer "*.iso"

    number of files = 10
    total file size = 7112491008 B ~ 6783 MiB

    28,0MiB 0:00:00 [ 160MiB/s] [> ] 0%
    10,0 0:00:00 [ 204k/s] [====================================>] 100%
    -----
    7a27fdd46a63ba4375896891826c1c88 debian-live-8.6.0-amd64-lxde-desktop.iso
    d70eec28cdbdee7f7aa95fb53b9bfdac debian-live-8.7.1-amd64-standard.iso
    382cfbe621ca446d12871b8945b50d20 debian-live-8.8.0-amd64-standard.iso
    44473dfe2ee1aad0f71506f1d5862457 debian-live-8.8.0-i386-standard.iso
    f396b3532fa84059e7738c3c1827bada debian-live-9.3.0-amd64-cinnamon.iso
    8f6def28ae7cbefa0a6e59407c884466 debian-live-9.6.0-amd64-cinnamon.iso
    90b1815da0a5bf4ee4b00eec2b5d3587 debian-testing-amd64-netinst_2017-07-28.iso
    8f75074ab98e166b7469299d3e459ac6 mini-amd64-2016-01-21-daily.iso
    e580266fba58eb34b05bf6e13f51a047 mini-jessie-32.iso
    646c109a9a16c0527ce1c7afa922e2ed mini-jessie-64.iso





    share|improve this answer

























      Your Answer








      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "106"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f495477%2fusing-pv-with-md5sum%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      5 Answers
      5






      active

      oldest

      votes








      5 Answers
      5






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      9














      pv is a "fancy cat", which is that you may use pv in most situations where you would use cat.



      Using cat with md5sum, you can compute the MD5 checksum of a single file with



      cat file | md5sum


      or, with pv,



      pv file | md5sum


      Unfortunately though, this does not allow md5sum to insert the filename into its output properly.



      Now, fortunately, pv is a really fancy cat, and on some systems (Linux), it's able to watch the data being passed through another process. This is done by using its -d option with the process ID of that other process.



      This means that you can do things like



      md5sum dir/* | sort >sums &
      sleep 1
      pv -d "$(pgrep -n md5sum)"


      This would allow pv to watch the md5sum process. The sleep is there to allow md5sum, which is running in the background, to properly start. pgrep -n md5sum would return the PID of the most recently started md5sum process that you own. pv will exit as soon as the process that it is watching terminates.



      I've tested this particular way of running pv a few times and it seems to generally work well, but sometimes it seems to stop outputting anything as md5sum switches to the next file. Sometimes, it seems to spawn spurious background tasks in the shell.



      It would probably be safest to run it as



      md5sum dir/* >sums &
      sleep 1
      pv -W -d "$!"
      sort -o sums sums


      The -W option will cause pv to wait until there's actual data being transferred, although this does also not always seem to work reliably.






      share|improve this answer


























      • The need for sleep is somewhat surprising!

        – Stephen Kitt
        Jan 19 at 17:43











      • @StephenKitt That's the only way I could get it to behave somewhat predictably. I don't really know what it does, but whatever it is, it seems to be possible for it to do it "too early", and the progress meter won't show at all.

        – Kusalananda
        Jan 19 at 17:45
















      9














      pv is a "fancy cat", which is that you may use pv in most situations where you would use cat.



      Using cat with md5sum, you can compute the MD5 checksum of a single file with



      cat file | md5sum


      or, with pv,



      pv file | md5sum


      Unfortunately though, this does not allow md5sum to insert the filename into its output properly.



      Now, fortunately, pv is a really fancy cat, and on some systems (Linux), it's able to watch the data being passed through another process. This is done by using its -d option with the process ID of that other process.



      This means that you can do things like



      md5sum dir/* | sort >sums &
      sleep 1
      pv -d "$(pgrep -n md5sum)"


      This would allow pv to watch the md5sum process. The sleep is there to allow md5sum, which is running in the background, to properly start. pgrep -n md5sum would return the PID of the most recently started md5sum process that you own. pv will exit as soon as the process that it is watching terminates.



      I've tested this particular way of running pv a few times and it seems to generally work well, but sometimes it seems to stop outputting anything as md5sum switches to the next file. Sometimes, it seems to spawn spurious background tasks in the shell.



      It would probably be safest to run it as



      md5sum dir/* >sums &
      sleep 1
      pv -W -d "$!"
      sort -o sums sums


      The -W option will cause pv to wait until there's actual data being transferred, although this does also not always seem to work reliably.






      share|improve this answer


























      • The need for sleep is somewhat surprising!

        – Stephen Kitt
        Jan 19 at 17:43











      • @StephenKitt That's the only way I could get it to behave somewhat predictably. I don't really know what it does, but whatever it is, it seems to be possible for it to do it "too early", and the progress meter won't show at all.

        – Kusalananda
        Jan 19 at 17:45














      9












      9








      9







      pv is a "fancy cat", which is that you may use pv in most situations where you would use cat.



      Using cat with md5sum, you can compute the MD5 checksum of a single file with



      cat file | md5sum


      or, with pv,



      pv file | md5sum


      Unfortunately though, this does not allow md5sum to insert the filename into its output properly.



      Now, fortunately, pv is a really fancy cat, and on some systems (Linux), it's able to watch the data being passed through another process. This is done by using its -d option with the process ID of that other process.



      This means that you can do things like



      md5sum dir/* | sort >sums &
      sleep 1
      pv -d "$(pgrep -n md5sum)"


      This would allow pv to watch the md5sum process. The sleep is there to allow md5sum, which is running in the background, to properly start. pgrep -n md5sum would return the PID of the most recently started md5sum process that you own. pv will exit as soon as the process that it is watching terminates.



      I've tested this particular way of running pv a few times and it seems to generally work well, but sometimes it seems to stop outputting anything as md5sum switches to the next file. Sometimes, it seems to spawn spurious background tasks in the shell.



      It would probably be safest to run it as



      md5sum dir/* >sums &
      sleep 1
      pv -W -d "$!"
      sort -o sums sums


      The -W option will cause pv to wait until there's actual data being transferred, although this does also not always seem to work reliably.






      share|improve this answer















      pv is a "fancy cat", which is that you may use pv in most situations where you would use cat.



      Using cat with md5sum, you can compute the MD5 checksum of a single file with



      cat file | md5sum


      or, with pv,



      pv file | md5sum


      Unfortunately though, this does not allow md5sum to insert the filename into its output properly.



      Now, fortunately, pv is a really fancy cat, and on some systems (Linux), it's able to watch the data being passed through another process. This is done by using its -d option with the process ID of that other process.



      This means that you can do things like



      md5sum dir/* | sort >sums &
      sleep 1
      pv -d "$(pgrep -n md5sum)"


      This would allow pv to watch the md5sum process. The sleep is there to allow md5sum, which is running in the background, to properly start. pgrep -n md5sum would return the PID of the most recently started md5sum process that you own. pv will exit as soon as the process that it is watching terminates.



      I've tested this particular way of running pv a few times and it seems to generally work well, but sometimes it seems to stop outputting anything as md5sum switches to the next file. Sometimes, it seems to spawn spurious background tasks in the shell.



      It would probably be safest to run it as



      md5sum dir/* >sums &
      sleep 1
      pv -W -d "$!"
      sort -o sums sums


      The -W option will cause pv to wait until there's actual data being transferred, although this does also not always seem to work reliably.







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Jan 19 at 17:32

























      answered Jan 19 at 17:14









      KusalanandaKusalananda

      126k16239393




      126k16239393













      • The need for sleep is somewhat surprising!

        – Stephen Kitt
        Jan 19 at 17:43











      • @StephenKitt That's the only way I could get it to behave somewhat predictably. I don't really know what it does, but whatever it is, it seems to be possible for it to do it "too early", and the progress meter won't show at all.

        – Kusalananda
        Jan 19 at 17:45



















      • The need for sleep is somewhat surprising!

        – Stephen Kitt
        Jan 19 at 17:43











      • @StephenKitt That's the only way I could get it to behave somewhat predictably. I don't really know what it does, but whatever it is, it seems to be possible for it to do it "too early", and the progress meter won't show at all.

        – Kusalananda
        Jan 19 at 17:45

















      The need for sleep is somewhat surprising!

      – Stephen Kitt
      Jan 19 at 17:43





      The need for sleep is somewhat surprising!

      – Stephen Kitt
      Jan 19 at 17:43













      @StephenKitt That's the only way I could get it to behave somewhat predictably. I don't really know what it does, but whatever it is, it seems to be possible for it to do it "too early", and the progress meter won't show at all.

      – Kusalananda
      Jan 19 at 17:45





      @StephenKitt That's the only way I could get it to behave somewhat predictably. I don't really know what it does, but whatever it is, it seems to be possible for it to do it "too early", and the progress meter won't show at all.

      – Kusalananda
      Jan 19 at 17:45













      5














      The data that you are feeding through the pipe is not the data of the files that md5sum is processing, but instead the md5sum output, which, for every file, consists of one line comprising: the MD5-hash, two spaces, and the file name. Since we know this in advance, can inform pv accordingly, so as to enable it to display an accurate progress indicator. There are two ways of doing so.



      The first, preferred method (suggested by frostschutz) makes use of the fact that md5sum generates one line per processed file, and the fact that pv has a line mode that counts lines rather than bytes. In this mode pv will only move the progress bar when it encounters a newline in the throughput, i.e. per file finished by md5sum. In Bash, this first method can look like this:



      set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort


      The set builtin is used to set the positional parameters to the files to be processed (the *.iso shell pattern is expanded by the shell). md5sum is then told to process these files ($@ expands to the positional parameters), and pv in line mode will move the progress indicator each time a file has been processed / a line is output by md5sum. Notably, pv is informed of the total number of lines it can expect (-s $#), as the special shell parameter $# expands to the number of positional arguments.



      The second method is not line-based but byte-based. With md5sum this unnecessarily complicated, but some other program may not produce lines but for instance continuous data, and then this approach may be more practical. I illustrate it with md5sum though. The idea is to calculate the amount of data that md5sum (or some other program) will produce, and use this to inform pv. In Bash, this could look as follows:



      os=$(( $( ls -1 | wc -c ) + $( ls -1 | wc -l ) * 34 ))
      md5sum * | pv -s $os | sort


      The first line calculates the output size (os) estimate: the first term is the number of bytes necessary for encoding the filenames (incl. newline), the second term the number of bytes used for encoding the MD5-hashes (32 bytes each), plus 2 spaces. In the second line, we tell pv that the expected amount of data is os bytes, so that it can show an accurate progress indicator leading up to 100% (which indicator is updated per finished md5summed file).



      Obviously, both methods are only practical in case multiple files are to be processed. Also, it should be noted that since the output of md5sum is not related to the amount of time the md5sum program has to spend crunching the underlying data, the progress indicator may be considered somewhat misleading. E.g., in the second method, the file with the shortest name will yield the lowest progress update, even though it may actually be the biggest in size. Then again, if all files have a similar sizes and names, this shouldn't matter much.






      share|improve this answer





















      • 2





        It's a very nice idea to calculate progress based on md5sum output (even though the progress would indicate how many files are still left, and not - how large or how long that would take). However it should not require parsing ls. pv supports --line-mode so set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort might be equivalent and still work if you replace md5sum with sha512sum or otherwise.

        – frostschutz
        Jan 19 at 19:39











      • @frostschutz You are undeniably right :-) This is a nicer, cleaner solution.

        – ozzy
        Jan 19 at 20:48
















      5














      The data that you are feeding through the pipe is not the data of the files that md5sum is processing, but instead the md5sum output, which, for every file, consists of one line comprising: the MD5-hash, two spaces, and the file name. Since we know this in advance, can inform pv accordingly, so as to enable it to display an accurate progress indicator. There are two ways of doing so.



      The first, preferred method (suggested by frostschutz) makes use of the fact that md5sum generates one line per processed file, and the fact that pv has a line mode that counts lines rather than bytes. In this mode pv will only move the progress bar when it encounters a newline in the throughput, i.e. per file finished by md5sum. In Bash, this first method can look like this:



      set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort


      The set builtin is used to set the positional parameters to the files to be processed (the *.iso shell pattern is expanded by the shell). md5sum is then told to process these files ($@ expands to the positional parameters), and pv in line mode will move the progress indicator each time a file has been processed / a line is output by md5sum. Notably, pv is informed of the total number of lines it can expect (-s $#), as the special shell parameter $# expands to the number of positional arguments.



      The second method is not line-based but byte-based. With md5sum this unnecessarily complicated, but some other program may not produce lines but for instance continuous data, and then this approach may be more practical. I illustrate it with md5sum though. The idea is to calculate the amount of data that md5sum (or some other program) will produce, and use this to inform pv. In Bash, this could look as follows:



      os=$(( $( ls -1 | wc -c ) + $( ls -1 | wc -l ) * 34 ))
      md5sum * | pv -s $os | sort


      The first line calculates the output size (os) estimate: the first term is the number of bytes necessary for encoding the filenames (incl. newline), the second term the number of bytes used for encoding the MD5-hashes (32 bytes each), plus 2 spaces. In the second line, we tell pv that the expected amount of data is os bytes, so that it can show an accurate progress indicator leading up to 100% (which indicator is updated per finished md5summed file).



      Obviously, both methods are only practical in case multiple files are to be processed. Also, it should be noted that since the output of md5sum is not related to the amount of time the md5sum program has to spend crunching the underlying data, the progress indicator may be considered somewhat misleading. E.g., in the second method, the file with the shortest name will yield the lowest progress update, even though it may actually be the biggest in size. Then again, if all files have a similar sizes and names, this shouldn't matter much.






      share|improve this answer





















      • 2





        It's a very nice idea to calculate progress based on md5sum output (even though the progress would indicate how many files are still left, and not - how large or how long that would take). However it should not require parsing ls. pv supports --line-mode so set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort might be equivalent and still work if you replace md5sum with sha512sum or otherwise.

        – frostschutz
        Jan 19 at 19:39











      • @frostschutz You are undeniably right :-) This is a nicer, cleaner solution.

        – ozzy
        Jan 19 at 20:48














      5












      5








      5







      The data that you are feeding through the pipe is not the data of the files that md5sum is processing, but instead the md5sum output, which, for every file, consists of one line comprising: the MD5-hash, two spaces, and the file name. Since we know this in advance, can inform pv accordingly, so as to enable it to display an accurate progress indicator. There are two ways of doing so.



      The first, preferred method (suggested by frostschutz) makes use of the fact that md5sum generates one line per processed file, and the fact that pv has a line mode that counts lines rather than bytes. In this mode pv will only move the progress bar when it encounters a newline in the throughput, i.e. per file finished by md5sum. In Bash, this first method can look like this:



      set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort


      The set builtin is used to set the positional parameters to the files to be processed (the *.iso shell pattern is expanded by the shell). md5sum is then told to process these files ($@ expands to the positional parameters), and pv in line mode will move the progress indicator each time a file has been processed / a line is output by md5sum. Notably, pv is informed of the total number of lines it can expect (-s $#), as the special shell parameter $# expands to the number of positional arguments.



      The second method is not line-based but byte-based. With md5sum this unnecessarily complicated, but some other program may not produce lines but for instance continuous data, and then this approach may be more practical. I illustrate it with md5sum though. The idea is to calculate the amount of data that md5sum (or some other program) will produce, and use this to inform pv. In Bash, this could look as follows:



      os=$(( $( ls -1 | wc -c ) + $( ls -1 | wc -l ) * 34 ))
      md5sum * | pv -s $os | sort


      The first line calculates the output size (os) estimate: the first term is the number of bytes necessary for encoding the filenames (incl. newline), the second term the number of bytes used for encoding the MD5-hashes (32 bytes each), plus 2 spaces. In the second line, we tell pv that the expected amount of data is os bytes, so that it can show an accurate progress indicator leading up to 100% (which indicator is updated per finished md5summed file).



      Obviously, both methods are only practical in case multiple files are to be processed. Also, it should be noted that since the output of md5sum is not related to the amount of time the md5sum program has to spend crunching the underlying data, the progress indicator may be considered somewhat misleading. E.g., in the second method, the file with the shortest name will yield the lowest progress update, even though it may actually be the biggest in size. Then again, if all files have a similar sizes and names, this shouldn't matter much.






      share|improve this answer















      The data that you are feeding through the pipe is not the data of the files that md5sum is processing, but instead the md5sum output, which, for every file, consists of one line comprising: the MD5-hash, two spaces, and the file name. Since we know this in advance, can inform pv accordingly, so as to enable it to display an accurate progress indicator. There are two ways of doing so.



      The first, preferred method (suggested by frostschutz) makes use of the fact that md5sum generates one line per processed file, and the fact that pv has a line mode that counts lines rather than bytes. In this mode pv will only move the progress bar when it encounters a newline in the throughput, i.e. per file finished by md5sum. In Bash, this first method can look like this:



      set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort


      The set builtin is used to set the positional parameters to the files to be processed (the *.iso shell pattern is expanded by the shell). md5sum is then told to process these files ($@ expands to the positional parameters), and pv in line mode will move the progress indicator each time a file has been processed / a line is output by md5sum. Notably, pv is informed of the total number of lines it can expect (-s $#), as the special shell parameter $# expands to the number of positional arguments.



      The second method is not line-based but byte-based. With md5sum this unnecessarily complicated, but some other program may not produce lines but for instance continuous data, and then this approach may be more practical. I illustrate it with md5sum though. The idea is to calculate the amount of data that md5sum (or some other program) will produce, and use this to inform pv. In Bash, this could look as follows:



      os=$(( $( ls -1 | wc -c ) + $( ls -1 | wc -l ) * 34 ))
      md5sum * | pv -s $os | sort


      The first line calculates the output size (os) estimate: the first term is the number of bytes necessary for encoding the filenames (incl. newline), the second term the number of bytes used for encoding the MD5-hashes (32 bytes each), plus 2 spaces. In the second line, we tell pv that the expected amount of data is os bytes, so that it can show an accurate progress indicator leading up to 100% (which indicator is updated per finished md5summed file).



      Obviously, both methods are only practical in case multiple files are to be processed. Also, it should be noted that since the output of md5sum is not related to the amount of time the md5sum program has to spend crunching the underlying data, the progress indicator may be considered somewhat misleading. E.g., in the second method, the file with the shortest name will yield the lowest progress update, even though it may actually be the biggest in size. Then again, if all files have a similar sizes and names, this shouldn't matter much.







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Jan 20 at 9:14

























      answered Jan 19 at 17:48









      ozzyozzy

      6955




      6955








      • 2





        It's a very nice idea to calculate progress based on md5sum output (even though the progress would indicate how many files are still left, and not - how large or how long that would take). However it should not require parsing ls. pv supports --line-mode so set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort might be equivalent and still work if you replace md5sum with sha512sum or otherwise.

        – frostschutz
        Jan 19 at 19:39











      • @frostschutz You are undeniably right :-) This is a nicer, cleaner solution.

        – ozzy
        Jan 19 at 20:48














      • 2





        It's a very nice idea to calculate progress based on md5sum output (even though the progress would indicate how many files are still left, and not - how large or how long that would take). However it should not require parsing ls. pv supports --line-mode so set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort might be equivalent and still work if you replace md5sum with sha512sum or otherwise.

        – frostschutz
        Jan 19 at 19:39











      • @frostschutz You are undeniably right :-) This is a nicer, cleaner solution.

        – ozzy
        Jan 19 at 20:48








      2




      2





      It's a very nice idea to calculate progress based on md5sum output (even though the progress would indicate how many files are still left, and not - how large or how long that would take). However it should not require parsing ls. pv supports --line-mode so set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort might be equivalent and still work if you replace md5sum with sha512sum or otherwise.

      – frostschutz
      Jan 19 at 19:39





      It's a very nice idea to calculate progress based on md5sum output (even though the progress would indicate how many files are still left, and not - how large or how long that would take). However it should not require parsing ls. pv supports --line-mode so set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort might be equivalent and still work if you replace md5sum with sha512sum or otherwise.

      – frostschutz
      Jan 19 at 19:39













      @frostschutz You are undeniably right :-) This is a nicer, cleaner solution.

      – ozzy
      Jan 19 at 20:48





      @frostschutz You are undeniably right :-) This is a nicer, cleaner solution.

      – ozzy
      Jan 19 at 20:48











      3














      Here's a dirty hack to get progress per file:



      for f in iso/*
      do
      pv "$f" | (
      cat > /dev/null &
      md5sum "$f"
      wait
      )
      done


      What it looks like:



      4.15GiB 0:00:32 [ 130MiB/s] [================================>] 100%            
      0db0b36fc7bad7b50835f68c369e854c iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso
      792MiB 0:00:06 [ 130MiB/s] [================================>] 100%
      97537db63e61d20a5cb71d29145b2937 iso/archlinux-2016.10.01-dual.iso
      843MiB 0:00:06 [ 129MiB/s] [================================>] 100%
      1b5dc31e038499b8409f7d4d720e3eba iso/lubuntu-16.04-desktop-i386.iso
      259MiB 0:00:02 [ 130MiB/s] [=========> ] 30% ETA 0:00:04
      ...


      Now, this makes several assumptions. Firstly, that reading data is slower than hashing it. Secondly, that OS will cache the I/O so data won't be (physically) read twice even though pv and md5sum are completely independent readers.



      The nice thing about such a dirty, dirty hack is that you can easily adapt it to make a progress bar across all the data, not just one file. And still do weird stuff like sort the output afterwards.



      pv iso/* | (
      cat > /dev/null &
      md5sum iso/* | sort
      wait
      )


      What it looks like (ongoing):



      15.0GiB 0:01:47 [ 131MiB/s] [===========================>      ] 83% ETA 0:00:21


      What it looks like (finished):



      18.0GiB 0:02:11 [ 140MiB/s] [================================>] 100%            
      0db0b36fc7bad7b50835f68c369e854c iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso
      155603390e65f2a8341328be3cb63875 iso/systemrescuecd-x86-4.2.0.iso
      1b5dc31e038499b8409f7d4d720e3eba iso/lubuntu-16.04-desktop-i386.iso
      1b6ed6ff8d399f53adadfafb20fb0d71 iso/systemrescuecd-x86-4.4.1.iso
      25715326d7096c50f7ea126ac20eabfd iso/openSUSE-13.2-KDE-Live-i686.iso
      ...


      Now, that's for the hacks. Check other answers for proper solutions. ;-)






      share|improve this answer
























      • Reading the data would be required to hashing it, so hashing the data is almost certainly slower than just reading it. Also, you force reading the data twice for each file (once with pv and once with md5sum).

        – Kusalananda
        Jan 19 at 18:52











      • @Kusalananda ...and that's why I called it a hack! pv < /dev/zero | md5sum -> 637MiB/s on my machine. The ISO files I tested with were on a USB3 stick, 140MiB/s is about the max read speed. So md5sum can hash data faster than read it. Yes it reads data twice, but that's pedantic - thanks to OS caching, the USB stick still reads it only once; and running this hack is (in my case) not any slower than running md5sum by itself without progress bar.

        – frostschutz
        Jan 19 at 19:07











      • ...and yes, it can go wrong. If you were to do this on a slow machine with a very fast storage medium, pv would run way ahead of md5sum and data would probably end up being read twice for real, or in any case, the progress bar would not be in sync with md5sum at all.

        – frostschutz
        Jan 19 at 19:12


















      3














      Here's a dirty hack to get progress per file:



      for f in iso/*
      do
      pv "$f" | (
      cat > /dev/null &
      md5sum "$f"
      wait
      )
      done


      What it looks like:



      4.15GiB 0:00:32 [ 130MiB/s] [================================>] 100%            
      0db0b36fc7bad7b50835f68c369e854c iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso
      792MiB 0:00:06 [ 130MiB/s] [================================>] 100%
      97537db63e61d20a5cb71d29145b2937 iso/archlinux-2016.10.01-dual.iso
      843MiB 0:00:06 [ 129MiB/s] [================================>] 100%
      1b5dc31e038499b8409f7d4d720e3eba iso/lubuntu-16.04-desktop-i386.iso
      259MiB 0:00:02 [ 130MiB/s] [=========> ] 30% ETA 0:00:04
      ...


      Now, this makes several assumptions. Firstly, that reading data is slower than hashing it. Secondly, that OS will cache the I/O so data won't be (physically) read twice even though pv and md5sum are completely independent readers.



      The nice thing about such a dirty, dirty hack is that you can easily adapt it to make a progress bar across all the data, not just one file. And still do weird stuff like sort the output afterwards.



      pv iso/* | (
      cat > /dev/null &
      md5sum iso/* | sort
      wait
      )


      What it looks like (ongoing):



      15.0GiB 0:01:47 [ 131MiB/s] [===========================>      ] 83% ETA 0:00:21


      What it looks like (finished):



      18.0GiB 0:02:11 [ 140MiB/s] [================================>] 100%            
      0db0b36fc7bad7b50835f68c369e854c iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso
      155603390e65f2a8341328be3cb63875 iso/systemrescuecd-x86-4.2.0.iso
      1b5dc31e038499b8409f7d4d720e3eba iso/lubuntu-16.04-desktop-i386.iso
      1b6ed6ff8d399f53adadfafb20fb0d71 iso/systemrescuecd-x86-4.4.1.iso
      25715326d7096c50f7ea126ac20eabfd iso/openSUSE-13.2-KDE-Live-i686.iso
      ...


      Now, that's for the hacks. Check other answers for proper solutions. ;-)






      share|improve this answer
























      • Reading the data would be required to hashing it, so hashing the data is almost certainly slower than just reading it. Also, you force reading the data twice for each file (once with pv and once with md5sum).

        – Kusalananda
        Jan 19 at 18:52











      • @Kusalananda ...and that's why I called it a hack! pv < /dev/zero | md5sum -> 637MiB/s on my machine. The ISO files I tested with were on a USB3 stick, 140MiB/s is about the max read speed. So md5sum can hash data faster than read it. Yes it reads data twice, but that's pedantic - thanks to OS caching, the USB stick still reads it only once; and running this hack is (in my case) not any slower than running md5sum by itself without progress bar.

        – frostschutz
        Jan 19 at 19:07











      • ...and yes, it can go wrong. If you were to do this on a slow machine with a very fast storage medium, pv would run way ahead of md5sum and data would probably end up being read twice for real, or in any case, the progress bar would not be in sync with md5sum at all.

        – frostschutz
        Jan 19 at 19:12
















      3












      3








      3







      Here's a dirty hack to get progress per file:



      for f in iso/*
      do
      pv "$f" | (
      cat > /dev/null &
      md5sum "$f"
      wait
      )
      done


      What it looks like:



      4.15GiB 0:00:32 [ 130MiB/s] [================================>] 100%            
      0db0b36fc7bad7b50835f68c369e854c iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso
      792MiB 0:00:06 [ 130MiB/s] [================================>] 100%
      97537db63e61d20a5cb71d29145b2937 iso/archlinux-2016.10.01-dual.iso
      843MiB 0:00:06 [ 129MiB/s] [================================>] 100%
      1b5dc31e038499b8409f7d4d720e3eba iso/lubuntu-16.04-desktop-i386.iso
      259MiB 0:00:02 [ 130MiB/s] [=========> ] 30% ETA 0:00:04
      ...


      Now, this makes several assumptions. Firstly, that reading data is slower than hashing it. Secondly, that OS will cache the I/O so data won't be (physically) read twice even though pv and md5sum are completely independent readers.



      The nice thing about such a dirty, dirty hack is that you can easily adapt it to make a progress bar across all the data, not just one file. And still do weird stuff like sort the output afterwards.



      pv iso/* | (
      cat > /dev/null &
      md5sum iso/* | sort
      wait
      )


      What it looks like (ongoing):



      15.0GiB 0:01:47 [ 131MiB/s] [===========================>      ] 83% ETA 0:00:21


      What it looks like (finished):



      18.0GiB 0:02:11 [ 140MiB/s] [================================>] 100%            
      0db0b36fc7bad7b50835f68c369e854c iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso
      155603390e65f2a8341328be3cb63875 iso/systemrescuecd-x86-4.2.0.iso
      1b5dc31e038499b8409f7d4d720e3eba iso/lubuntu-16.04-desktop-i386.iso
      1b6ed6ff8d399f53adadfafb20fb0d71 iso/systemrescuecd-x86-4.4.1.iso
      25715326d7096c50f7ea126ac20eabfd iso/openSUSE-13.2-KDE-Live-i686.iso
      ...


      Now, that's for the hacks. Check other answers for proper solutions. ;-)






      share|improve this answer













      Here's a dirty hack to get progress per file:



      for f in iso/*
      do
      pv "$f" | (
      cat > /dev/null &
      md5sum "$f"
      wait
      )
      done


      What it looks like:



      4.15GiB 0:00:32 [ 130MiB/s] [================================>] 100%            
      0db0b36fc7bad7b50835f68c369e854c iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso
      792MiB 0:00:06 [ 130MiB/s] [================================>] 100%
      97537db63e61d20a5cb71d29145b2937 iso/archlinux-2016.10.01-dual.iso
      843MiB 0:00:06 [ 129MiB/s] [================================>] 100%
      1b5dc31e038499b8409f7d4d720e3eba iso/lubuntu-16.04-desktop-i386.iso
      259MiB 0:00:02 [ 130MiB/s] [=========> ] 30% ETA 0:00:04
      ...


      Now, this makes several assumptions. Firstly, that reading data is slower than hashing it. Secondly, that OS will cache the I/O so data won't be (physically) read twice even though pv and md5sum are completely independent readers.



      The nice thing about such a dirty, dirty hack is that you can easily adapt it to make a progress bar across all the data, not just one file. And still do weird stuff like sort the output afterwards.



      pv iso/* | (
      cat > /dev/null &
      md5sum iso/* | sort
      wait
      )


      What it looks like (ongoing):



      15.0GiB 0:01:47 [ 131MiB/s] [===========================>      ] 83% ETA 0:00:21


      What it looks like (finished):



      18.0GiB 0:02:11 [ 140MiB/s] [================================>] 100%            
      0db0b36fc7bad7b50835f68c369e854c iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso
      155603390e65f2a8341328be3cb63875 iso/systemrescuecd-x86-4.2.0.iso
      1b5dc31e038499b8409f7d4d720e3eba iso/lubuntu-16.04-desktop-i386.iso
      1b6ed6ff8d399f53adadfafb20fb0d71 iso/systemrescuecd-x86-4.4.1.iso
      25715326d7096c50f7ea126ac20eabfd iso/openSUSE-13.2-KDE-Live-i686.iso
      ...


      Now, that's for the hacks. Check other answers for proper solutions. ;-)







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered Jan 19 at 18:47









      frostschutzfrostschutz

      26.5k15483




      26.5k15483













      • Reading the data would be required to hashing it, so hashing the data is almost certainly slower than just reading it. Also, you force reading the data twice for each file (once with pv and once with md5sum).

        – Kusalananda
        Jan 19 at 18:52











      • @Kusalananda ...and that's why I called it a hack! pv < /dev/zero | md5sum -> 637MiB/s on my machine. The ISO files I tested with were on a USB3 stick, 140MiB/s is about the max read speed. So md5sum can hash data faster than read it. Yes it reads data twice, but that's pedantic - thanks to OS caching, the USB stick still reads it only once; and running this hack is (in my case) not any slower than running md5sum by itself without progress bar.

        – frostschutz
        Jan 19 at 19:07











      • ...and yes, it can go wrong. If you were to do this on a slow machine with a very fast storage medium, pv would run way ahead of md5sum and data would probably end up being read twice for real, or in any case, the progress bar would not be in sync with md5sum at all.

        – frostschutz
        Jan 19 at 19:12





















      • Reading the data would be required to hashing it, so hashing the data is almost certainly slower than just reading it. Also, you force reading the data twice for each file (once with pv and once with md5sum).

        – Kusalananda
        Jan 19 at 18:52











      • @Kusalananda ...and that's why I called it a hack! pv < /dev/zero | md5sum -> 637MiB/s on my machine. The ISO files I tested with were on a USB3 stick, 140MiB/s is about the max read speed. So md5sum can hash data faster than read it. Yes it reads data twice, but that's pedantic - thanks to OS caching, the USB stick still reads it only once; and running this hack is (in my case) not any slower than running md5sum by itself without progress bar.

        – frostschutz
        Jan 19 at 19:07











      • ...and yes, it can go wrong. If you were to do this on a slow machine with a very fast storage medium, pv would run way ahead of md5sum and data would probably end up being read twice for real, or in any case, the progress bar would not be in sync with md5sum at all.

        – frostschutz
        Jan 19 at 19:12



















      Reading the data would be required to hashing it, so hashing the data is almost certainly slower than just reading it. Also, you force reading the data twice for each file (once with pv and once with md5sum).

      – Kusalananda
      Jan 19 at 18:52





      Reading the data would be required to hashing it, so hashing the data is almost certainly slower than just reading it. Also, you force reading the data twice for each file (once with pv and once with md5sum).

      – Kusalananda
      Jan 19 at 18:52













      @Kusalananda ...and that's why I called it a hack! pv < /dev/zero | md5sum -> 637MiB/s on my machine. The ISO files I tested with were on a USB3 stick, 140MiB/s is about the max read speed. So md5sum can hash data faster than read it. Yes it reads data twice, but that's pedantic - thanks to OS caching, the USB stick still reads it only once; and running this hack is (in my case) not any slower than running md5sum by itself without progress bar.

      – frostschutz
      Jan 19 at 19:07





      @Kusalananda ...and that's why I called it a hack! pv < /dev/zero | md5sum -> 637MiB/s on my machine. The ISO files I tested with were on a USB3 stick, 140MiB/s is about the max read speed. So md5sum can hash data faster than read it. Yes it reads data twice, but that's pedantic - thanks to OS caching, the USB stick still reads it only once; and running this hack is (in my case) not any slower than running md5sum by itself without progress bar.

      – frostschutz
      Jan 19 at 19:07













      ...and yes, it can go wrong. If you were to do this on a slow machine with a very fast storage medium, pv would run way ahead of md5sum and data would probably end up being read twice for real, or in any case, the progress bar would not be in sync with md5sum at all.

      – frostschutz
      Jan 19 at 19:12







      ...and yes, it can go wrong. If you were to do this on a slow machine with a very fast storage medium, pv would run way ahead of md5sum and data would probably end up being read twice for real, or in any case, the progress bar would not be in sync with md5sum at all.

      – frostschutz
      Jan 19 at 19:12













      2
















      As already pointed out in comments and other answers:




      1. You are piping into pv only md5sum's output: checksums and file names; thus, pv's progress bar is not able to show how much data md5sum is reading.

      2. A size of 4 GB will be of course too much for that. Also, providing pv with the size of the file(s) you are piping into it (manually, with -s) is inconvenient.


      Piping the content of your files into pv and then into md5sum will give you a progress bar, but file names would be lost.



      This code is a not so elegant way to have both—a meaningful progress bar and file names with checksums:



      #!/bin/sh

      for file in "$@"; do
      pv -- "$file" |
      md5sum |
      sed 's/-$//' |
      printf '%s%sn' "$(cat -)" "$file"
      done


      The script is meant to be invoked as:



      ./script dir/*


      You can of course declare it as a function, to avoid having to type its path to call it (or adding it to your PATH):



      function pvsum () {
      for file in "$@"; do
      pv -- "$file" | md5sum | sed 's/-$//' | printf '%s%sn' "$(cat -)" "$file"
      done
      }


      This way, the command pvsum dir/* | sort will be equivalent to your md5sum dir/* | pv -s <size> | sort.



      Its output:



      $ ./testscript testdir/*
      4.00GiB 0:00:09 [ 446MiB/s] [==============================>] 100%
      9dab5f8add1f699bca108f99e5fa5342 testdir/file1
      1.00GiB 0:00:02 [ 447MiB/s] [==============================>] 100%
      06a738a71e3fd3119922bdac259fe29a testdir/file2


      What it does:




      • It loops over the given files and, for each:


        • Pipes the file from pv into md5sum, showing the default progress bar.


        • sed is used to remove the - printed by md5sum (which is reading from standard input); this also attempts to make the output suitable for being consumed by md5sum -c (thanks to frostschutz for pointing out this out)1.

        • Prints the checksum followed by the file name on standard output.




      About sort:



      I'm not sure about your expected results, so I have just ignored it. Since pv writes its progress bar to standard error, piping everything into sort will detach pv's output from md5sum's output.

      Anyway, you can just append | sort after done in the code above and check if the result is fine to you.





      1 Note that the output from the code shown above will not be suitable for md5sum -c if file names include newlines. Handling newlines is possible, but some versions of md5sum behave differently in this respect (see, for instance, answers to this question), making a general solution not easy (and out of the scope of this answer).



      Assuming a recent version of md5sum, an attempt at solving this issue could be:



      for file in "$@"; do
      pv -- "$file" |
      md5sum |
      sed 's/-$//' |
      printf '%s%sn' "$(cat -)" "$file" |
      sed '$! N; s/\/\\/g; s/n/\n/g; t x; q; :x s/^/\/;'
      done


      Where the only addition, the final sed, will:




      • Put the whole input (checksum and file name) in pattern space, since it may contain newlines: $! matches any line except for the last one; N; appends a newline and the next line to the pattern space.

      • Escape with a backslash () any backslash found.

      • Replace with n any newline found.

      • Only if at least a backslash or newline has been replaced (t x;: branch to label x), a backslash is added at the beginning of the checksum to signal md5sum -c that something has to be unescaped; otherwise just quit.






      share|improve this answer






























        2
















        As already pointed out in comments and other answers:




        1. You are piping into pv only md5sum's output: checksums and file names; thus, pv's progress bar is not able to show how much data md5sum is reading.

        2. A size of 4 GB will be of course too much for that. Also, providing pv with the size of the file(s) you are piping into it (manually, with -s) is inconvenient.


        Piping the content of your files into pv and then into md5sum will give you a progress bar, but file names would be lost.



        This code is a not so elegant way to have both—a meaningful progress bar and file names with checksums:



        #!/bin/sh

        for file in "$@"; do
        pv -- "$file" |
        md5sum |
        sed 's/-$//' |
        printf '%s%sn' "$(cat -)" "$file"
        done


        The script is meant to be invoked as:



        ./script dir/*


        You can of course declare it as a function, to avoid having to type its path to call it (or adding it to your PATH):



        function pvsum () {
        for file in "$@"; do
        pv -- "$file" | md5sum | sed 's/-$//' | printf '%s%sn' "$(cat -)" "$file"
        done
        }


        This way, the command pvsum dir/* | sort will be equivalent to your md5sum dir/* | pv -s <size> | sort.



        Its output:



        $ ./testscript testdir/*
        4.00GiB 0:00:09 [ 446MiB/s] [==============================>] 100%
        9dab5f8add1f699bca108f99e5fa5342 testdir/file1
        1.00GiB 0:00:02 [ 447MiB/s] [==============================>] 100%
        06a738a71e3fd3119922bdac259fe29a testdir/file2


        What it does:




        • It loops over the given files and, for each:


          • Pipes the file from pv into md5sum, showing the default progress bar.


          • sed is used to remove the - printed by md5sum (which is reading from standard input); this also attempts to make the output suitable for being consumed by md5sum -c (thanks to frostschutz for pointing out this out)1.

          • Prints the checksum followed by the file name on standard output.




        About sort:



        I'm not sure about your expected results, so I have just ignored it. Since pv writes its progress bar to standard error, piping everything into sort will detach pv's output from md5sum's output.

        Anyway, you can just append | sort after done in the code above and check if the result is fine to you.





        1 Note that the output from the code shown above will not be suitable for md5sum -c if file names include newlines. Handling newlines is possible, but some versions of md5sum behave differently in this respect (see, for instance, answers to this question), making a general solution not easy (and out of the scope of this answer).



        Assuming a recent version of md5sum, an attempt at solving this issue could be:



        for file in "$@"; do
        pv -- "$file" |
        md5sum |
        sed 's/-$//' |
        printf '%s%sn' "$(cat -)" "$file" |
        sed '$! N; s/\/\\/g; s/n/\n/g; t x; q; :x s/^/\/;'
        done


        Where the only addition, the final sed, will:




        • Put the whole input (checksum and file name) in pattern space, since it may contain newlines: $! matches any line except for the last one; N; appends a newline and the next line to the pattern space.

        • Escape with a backslash () any backslash found.

        • Replace with n any newline found.

        • Only if at least a backslash or newline has been replaced (t x;: branch to label x), a backslash is added at the beginning of the checksum to signal md5sum -c that something has to be unescaped; otherwise just quit.






        share|improve this answer




























          2












          2








          2









          As already pointed out in comments and other answers:




          1. You are piping into pv only md5sum's output: checksums and file names; thus, pv's progress bar is not able to show how much data md5sum is reading.

          2. A size of 4 GB will be of course too much for that. Also, providing pv with the size of the file(s) you are piping into it (manually, with -s) is inconvenient.


          Piping the content of your files into pv and then into md5sum will give you a progress bar, but file names would be lost.



          This code is a not so elegant way to have both—a meaningful progress bar and file names with checksums:



          #!/bin/sh

          for file in "$@"; do
          pv -- "$file" |
          md5sum |
          sed 's/-$//' |
          printf '%s%sn' "$(cat -)" "$file"
          done


          The script is meant to be invoked as:



          ./script dir/*


          You can of course declare it as a function, to avoid having to type its path to call it (or adding it to your PATH):



          function pvsum () {
          for file in "$@"; do
          pv -- "$file" | md5sum | sed 's/-$//' | printf '%s%sn' "$(cat -)" "$file"
          done
          }


          This way, the command pvsum dir/* | sort will be equivalent to your md5sum dir/* | pv -s <size> | sort.



          Its output:



          $ ./testscript testdir/*
          4.00GiB 0:00:09 [ 446MiB/s] [==============================>] 100%
          9dab5f8add1f699bca108f99e5fa5342 testdir/file1
          1.00GiB 0:00:02 [ 447MiB/s] [==============================>] 100%
          06a738a71e3fd3119922bdac259fe29a testdir/file2


          What it does:




          • It loops over the given files and, for each:


            • Pipes the file from pv into md5sum, showing the default progress bar.


            • sed is used to remove the - printed by md5sum (which is reading from standard input); this also attempts to make the output suitable for being consumed by md5sum -c (thanks to frostschutz for pointing out this out)1.

            • Prints the checksum followed by the file name on standard output.




          About sort:



          I'm not sure about your expected results, so I have just ignored it. Since pv writes its progress bar to standard error, piping everything into sort will detach pv's output from md5sum's output.

          Anyway, you can just append | sort after done in the code above and check if the result is fine to you.





          1 Note that the output from the code shown above will not be suitable for md5sum -c if file names include newlines. Handling newlines is possible, but some versions of md5sum behave differently in this respect (see, for instance, answers to this question), making a general solution not easy (and out of the scope of this answer).



          Assuming a recent version of md5sum, an attempt at solving this issue could be:



          for file in "$@"; do
          pv -- "$file" |
          md5sum |
          sed 's/-$//' |
          printf '%s%sn' "$(cat -)" "$file" |
          sed '$! N; s/\/\\/g; s/n/\n/g; t x; q; :x s/^/\/;'
          done


          Where the only addition, the final sed, will:




          • Put the whole input (checksum and file name) in pattern space, since it may contain newlines: $! matches any line except for the last one; N; appends a newline and the next line to the pattern space.

          • Escape with a backslash () any backslash found.

          • Replace with n any newline found.

          • Only if at least a backslash or newline has been replaced (t x;: branch to label x), a backslash is added at the beginning of the checksum to signal md5sum -c that something has to be unescaped; otherwise just quit.






          share|improve this answer

















          As already pointed out in comments and other answers:




          1. You are piping into pv only md5sum's output: checksums and file names; thus, pv's progress bar is not able to show how much data md5sum is reading.

          2. A size of 4 GB will be of course too much for that. Also, providing pv with the size of the file(s) you are piping into it (manually, with -s) is inconvenient.


          Piping the content of your files into pv and then into md5sum will give you a progress bar, but file names would be lost.



          This code is a not so elegant way to have both—a meaningful progress bar and file names with checksums:



          #!/bin/sh

          for file in "$@"; do
          pv -- "$file" |
          md5sum |
          sed 's/-$//' |
          printf '%s%sn' "$(cat -)" "$file"
          done


          The script is meant to be invoked as:



          ./script dir/*


          You can of course declare it as a function, to avoid having to type its path to call it (or adding it to your PATH):



          function pvsum () {
          for file in "$@"; do
          pv -- "$file" | md5sum | sed 's/-$//' | printf '%s%sn' "$(cat -)" "$file"
          done
          }


          This way, the command pvsum dir/* | sort will be equivalent to your md5sum dir/* | pv -s <size> | sort.



          Its output:



          $ ./testscript testdir/*
          4.00GiB 0:00:09 [ 446MiB/s] [==============================>] 100%
          9dab5f8add1f699bca108f99e5fa5342 testdir/file1
          1.00GiB 0:00:02 [ 447MiB/s] [==============================>] 100%
          06a738a71e3fd3119922bdac259fe29a testdir/file2


          What it does:




          • It loops over the given files and, for each:


            • Pipes the file from pv into md5sum, showing the default progress bar.


            • sed is used to remove the - printed by md5sum (which is reading from standard input); this also attempts to make the output suitable for being consumed by md5sum -c (thanks to frostschutz for pointing out this out)1.

            • Prints the checksum followed by the file name on standard output.




          About sort:



          I'm not sure about your expected results, so I have just ignored it. Since pv writes its progress bar to standard error, piping everything into sort will detach pv's output from md5sum's output.

          Anyway, you can just append | sort after done in the code above and check if the result is fine to you.





          1 Note that the output from the code shown above will not be suitable for md5sum -c if file names include newlines. Handling newlines is possible, but some versions of md5sum behave differently in this respect (see, for instance, answers to this question), making a general solution not easy (and out of the scope of this answer).



          Assuming a recent version of md5sum, an attempt at solving this issue could be:



          for file in "$@"; do
          pv -- "$file" |
          md5sum |
          sed 's/-$//' |
          printf '%s%sn' "$(cat -)" "$file" |
          sed '$! N; s/\/\\/g; s/n/\n/g; t x; q; :x s/^/\/;'
          done


          Where the only addition, the final sed, will:




          • Put the whole input (checksum and file name) in pattern space, since it may contain newlines: $! matches any line except for the last one; N; appends a newline and the next line to the pattern space.

          • Escape with a backslash () any backslash found.

          • Replace with n any newline found.

          • Only if at least a backslash or newline has been replaced (t x;: branch to label x), a backslash is added at the beginning of the checksum to signal md5sum -c that something has to be unescaped; otherwise just quit.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jan 20 at 18:15

























          answered Jan 19 at 18:23









          fra-sanfra-san

          1,3971215




          1,3971215























              0














              I have also enjoyed taming the 'fancy cat', pv, for md5sum :-)




              • I think my shellscript is rather stable now

              • There is a usage output, if you do not enter the pattern correctly.

              • It works with wild cards, but does not recurse into subdirectories

              • You can enter more than one pattern, for example ".* *"

              • There is a verbosity switch that turns on checking the md5sums ... OK

              • You can redirect the relevant output into a file; the process view output of pv will stay on the {screen/terminal window}

              • There are two pv processes in a for loop, one global and one for each file, the global pv 'only counts the files', and the other one measures the speed and amount of data transferred

              • ANSI escape sequences are used to keep the process view in a stable position


              I use the name md5summer, make the shellscript executable and put it in a directory in PATH (my ~/bin directory, you may prefer /usr/local/bin).



              #!/bin/bash

              # date sign comment
              # 20190119 sudodus created md5summer version 1.0

              if [ "$1" == "-v" ]
              then
              verbose=true
              shift
              else
              verbose=false
              fi
              if [ $# -ne 1 ]
              then
              echo "Usage: $0 [-v] <pattern>"
              echo "Example: $0 '*.iso' # notice the quotes"
              echo " $0 -v '*.iso' # verbose"
              exit
              fi
              tmpstr=$(find $1 -maxdepth 0 -type f 2> /dev/null)
              if [ "$tmpstr" == "" ]
              then
              echo "No such file '$1'. Try another pattern!"
              exit
              fi

              tmpdir=$(mktemp -d)
              tmpfil="$tmpdir/fil1"
              tmpfi2="$tmpdir/fil2"
              resetvid="033[0m"
              prev2line="033[2F"
              next2line="033[2E"

              sln=1
              cln=0
              cnt=0
              for i in $1
              do
              if test -f "$i"
              then
              cln=$((cln+1))
              tmp=$(find -L "$i" -printf "%s")
              cnt=$((cnt+tmp))
              fi
              done
              echo "
              number of files = $cln
              total file size = $cnt B ~ $(($cnt/2**20)) MiB
              "
              for i in $1
              do
              if test -f "$i"
              then
              tmpnam=$(echo -n "$i")
              tmpsum=$(< "$i" pv -ptrbs "$cnt" | md5sum)
              sleep 0.05
              echo "$sln" | pv -ls "$cln" > /dev/null
              sleep 0.05
              sln="$sln
              $i"
              sleep 0.05
              printf "${tmpsum/-}${tmpnam}n" >> "$tmpfil"
              echo -ne "$prev2line" > /dev/stderr
              fi
              done

              sync
              sleep 0.1
              echo -ne "$next2line" > /dev/stderr

              echo "-----"
              if $verbose
              then
              sort -k2 "$tmpfil" | tee "$tmpfi2" | md5sum -c
              echo "-----"
              cat "$tmpfi2"
              else
              sort -k2 "$tmpfil"
              fi
              sleep 0.5
              sync
              rm -r "$tmpdir"


              Demo example



              Usage



              $ md5summer 
              Usage: /home/sudodus/bin/md5summer [-v] <pattern>
              Example: /home/sudodus/bin/md5summer '*.iso' # notice the quotes
              /home/sudodus/bin/md5summer -v '*.iso' # verbose


              I tested in this directory



              $ ls -1a
              .
              ..
              'filename with spaces'
              md5summer
              md5summer1
              md5summer2
              subdir
              .ttt
              zenity-info-message.png


              Normal usage plus pattern to see hidden files



              $ md5summer ".* *"

              number of files = 6
              total file size = 12649 B ~ 0 MiB

              8,32KiB 0:00:00 [ 156MiB/s] [=============================> ] 67%
              6,00 0:00:00 [ 133k/s] [====================================>] 100%
              -----
              184d0995cc8b6d8070f89f15caee35ce filename with spaces
              28227139997996c7838f07cd4c630ffc md5summer
              3383b86a0753e486215280f0baf94399 md5summer1
              28227139997996c7838f07cd4c630ffc md5summer2
              31cd03f64a466e680e9c22fef4bcf14b .ttt
              670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png


              Verbose output plus pattern to see hidden files



              $ md5summer -v ".* *"

              number of files = 6
              total file size = 12649 B ~ 0 MiB

              8,32KiB 0:00:00 [ 184MiB/s] [=============================> ] 67%
              6,00 0:00:00 [ 133k/s] [====================================>] 100%
              -----
              filename with spaces: OK
              md5summer: OK
              md5summer1: OK
              md5summer2: OK
              .ttt: OK
              zenity-info-message.png: OK
              -----
              184d0995cc8b6d8070f89f15caee35ce filename with spaces
              28227139997996c7838f07cd4c630ffc md5summer
              3383b86a0753e486215280f0baf94399 md5summer1
              28227139997996c7838f07cd4c630ffc md5summer2
              31cd03f64a466e680e9c22fef4bcf14b .ttt
              670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png


              Redirection to a file, first the screen output



              $ md5summer ".* *" > subdir/save
              8,32KiB 0:00:00 [ 180MiB/s] [=============================> ] 67%
              6,00 0:00:00 [ 162k/s] [====================================>] 100%


              and then the saved output



              $ cat subdir/save 

              number of files = 6
              total file size = 12649 B ~ 0 MiB

              -----
              184d0995cc8b6d8070f89f15caee35ce filename with spaces
              28227139997996c7838f07cd4c630ffc md5summer
              3383b86a0753e486215280f0baf94399 md5summer1
              28227139997996c7838f07cd4c630ffc md5summer2
              31cd03f64a466e680e9c22fef4bcf14b .ttt
              670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png


              Checking iso files



              $ md5summer "*.iso"

              number of files = 10
              total file size = 7112491008 B ~ 6783 MiB

              28,0MiB 0:00:00 [ 160MiB/s] [> ] 0%
              10,0 0:00:00 [ 204k/s] [====================================>] 100%
              -----
              7a27fdd46a63ba4375896891826c1c88 debian-live-8.6.0-amd64-lxde-desktop.iso
              d70eec28cdbdee7f7aa95fb53b9bfdac debian-live-8.7.1-amd64-standard.iso
              382cfbe621ca446d12871b8945b50d20 debian-live-8.8.0-amd64-standard.iso
              44473dfe2ee1aad0f71506f1d5862457 debian-live-8.8.0-i386-standard.iso
              f396b3532fa84059e7738c3c1827bada debian-live-9.3.0-amd64-cinnamon.iso
              8f6def28ae7cbefa0a6e59407c884466 debian-live-9.6.0-amd64-cinnamon.iso
              90b1815da0a5bf4ee4b00eec2b5d3587 debian-testing-amd64-netinst_2017-07-28.iso
              8f75074ab98e166b7469299d3e459ac6 mini-amd64-2016-01-21-daily.iso
              e580266fba58eb34b05bf6e13f51a047 mini-jessie-32.iso
              646c109a9a16c0527ce1c7afa922e2ed mini-jessie-64.iso





              share|improve this answer






























                0














                I have also enjoyed taming the 'fancy cat', pv, for md5sum :-)




                • I think my shellscript is rather stable now

                • There is a usage output, if you do not enter the pattern correctly.

                • It works with wild cards, but does not recurse into subdirectories

                • You can enter more than one pattern, for example ".* *"

                • There is a verbosity switch that turns on checking the md5sums ... OK

                • You can redirect the relevant output into a file; the process view output of pv will stay on the {screen/terminal window}

                • There are two pv processes in a for loop, one global and one for each file, the global pv 'only counts the files', and the other one measures the speed and amount of data transferred

                • ANSI escape sequences are used to keep the process view in a stable position


                I use the name md5summer, make the shellscript executable and put it in a directory in PATH (my ~/bin directory, you may prefer /usr/local/bin).



                #!/bin/bash

                # date sign comment
                # 20190119 sudodus created md5summer version 1.0

                if [ "$1" == "-v" ]
                then
                verbose=true
                shift
                else
                verbose=false
                fi
                if [ $# -ne 1 ]
                then
                echo "Usage: $0 [-v] <pattern>"
                echo "Example: $0 '*.iso' # notice the quotes"
                echo " $0 -v '*.iso' # verbose"
                exit
                fi
                tmpstr=$(find $1 -maxdepth 0 -type f 2> /dev/null)
                if [ "$tmpstr" == "" ]
                then
                echo "No such file '$1'. Try another pattern!"
                exit
                fi

                tmpdir=$(mktemp -d)
                tmpfil="$tmpdir/fil1"
                tmpfi2="$tmpdir/fil2"
                resetvid="033[0m"
                prev2line="033[2F"
                next2line="033[2E"

                sln=1
                cln=0
                cnt=0
                for i in $1
                do
                if test -f "$i"
                then
                cln=$((cln+1))
                tmp=$(find -L "$i" -printf "%s")
                cnt=$((cnt+tmp))
                fi
                done
                echo "
                number of files = $cln
                total file size = $cnt B ~ $(($cnt/2**20)) MiB
                "
                for i in $1
                do
                if test -f "$i"
                then
                tmpnam=$(echo -n "$i")
                tmpsum=$(< "$i" pv -ptrbs "$cnt" | md5sum)
                sleep 0.05
                echo "$sln" | pv -ls "$cln" > /dev/null
                sleep 0.05
                sln="$sln
                $i"
                sleep 0.05
                printf "${tmpsum/-}${tmpnam}n" >> "$tmpfil"
                echo -ne "$prev2line" > /dev/stderr
                fi
                done

                sync
                sleep 0.1
                echo -ne "$next2line" > /dev/stderr

                echo "-----"
                if $verbose
                then
                sort -k2 "$tmpfil" | tee "$tmpfi2" | md5sum -c
                echo "-----"
                cat "$tmpfi2"
                else
                sort -k2 "$tmpfil"
                fi
                sleep 0.5
                sync
                rm -r "$tmpdir"


                Demo example



                Usage



                $ md5summer 
                Usage: /home/sudodus/bin/md5summer [-v] <pattern>
                Example: /home/sudodus/bin/md5summer '*.iso' # notice the quotes
                /home/sudodus/bin/md5summer -v '*.iso' # verbose


                I tested in this directory



                $ ls -1a
                .
                ..
                'filename with spaces'
                md5summer
                md5summer1
                md5summer2
                subdir
                .ttt
                zenity-info-message.png


                Normal usage plus pattern to see hidden files



                $ md5summer ".* *"

                number of files = 6
                total file size = 12649 B ~ 0 MiB

                8,32KiB 0:00:00 [ 156MiB/s] [=============================> ] 67%
                6,00 0:00:00 [ 133k/s] [====================================>] 100%
                -----
                184d0995cc8b6d8070f89f15caee35ce filename with spaces
                28227139997996c7838f07cd4c630ffc md5summer
                3383b86a0753e486215280f0baf94399 md5summer1
                28227139997996c7838f07cd4c630ffc md5summer2
                31cd03f64a466e680e9c22fef4bcf14b .ttt
                670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png


                Verbose output plus pattern to see hidden files



                $ md5summer -v ".* *"

                number of files = 6
                total file size = 12649 B ~ 0 MiB

                8,32KiB 0:00:00 [ 184MiB/s] [=============================> ] 67%
                6,00 0:00:00 [ 133k/s] [====================================>] 100%
                -----
                filename with spaces: OK
                md5summer: OK
                md5summer1: OK
                md5summer2: OK
                .ttt: OK
                zenity-info-message.png: OK
                -----
                184d0995cc8b6d8070f89f15caee35ce filename with spaces
                28227139997996c7838f07cd4c630ffc md5summer
                3383b86a0753e486215280f0baf94399 md5summer1
                28227139997996c7838f07cd4c630ffc md5summer2
                31cd03f64a466e680e9c22fef4bcf14b .ttt
                670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png


                Redirection to a file, first the screen output



                $ md5summer ".* *" > subdir/save
                8,32KiB 0:00:00 [ 180MiB/s] [=============================> ] 67%
                6,00 0:00:00 [ 162k/s] [====================================>] 100%


                and then the saved output



                $ cat subdir/save 

                number of files = 6
                total file size = 12649 B ~ 0 MiB

                -----
                184d0995cc8b6d8070f89f15caee35ce filename with spaces
                28227139997996c7838f07cd4c630ffc md5summer
                3383b86a0753e486215280f0baf94399 md5summer1
                28227139997996c7838f07cd4c630ffc md5summer2
                31cd03f64a466e680e9c22fef4bcf14b .ttt
                670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png


                Checking iso files



                $ md5summer "*.iso"

                number of files = 10
                total file size = 7112491008 B ~ 6783 MiB

                28,0MiB 0:00:00 [ 160MiB/s] [> ] 0%
                10,0 0:00:00 [ 204k/s] [====================================>] 100%
                -----
                7a27fdd46a63ba4375896891826c1c88 debian-live-8.6.0-amd64-lxde-desktop.iso
                d70eec28cdbdee7f7aa95fb53b9bfdac debian-live-8.7.1-amd64-standard.iso
                382cfbe621ca446d12871b8945b50d20 debian-live-8.8.0-amd64-standard.iso
                44473dfe2ee1aad0f71506f1d5862457 debian-live-8.8.0-i386-standard.iso
                f396b3532fa84059e7738c3c1827bada debian-live-9.3.0-amd64-cinnamon.iso
                8f6def28ae7cbefa0a6e59407c884466 debian-live-9.6.0-amd64-cinnamon.iso
                90b1815da0a5bf4ee4b00eec2b5d3587 debian-testing-amd64-netinst_2017-07-28.iso
                8f75074ab98e166b7469299d3e459ac6 mini-amd64-2016-01-21-daily.iso
                e580266fba58eb34b05bf6e13f51a047 mini-jessie-32.iso
                646c109a9a16c0527ce1c7afa922e2ed mini-jessie-64.iso





                share|improve this answer




























                  0












                  0








                  0







                  I have also enjoyed taming the 'fancy cat', pv, for md5sum :-)




                  • I think my shellscript is rather stable now

                  • There is a usage output, if you do not enter the pattern correctly.

                  • It works with wild cards, but does not recurse into subdirectories

                  • You can enter more than one pattern, for example ".* *"

                  • There is a verbosity switch that turns on checking the md5sums ... OK

                  • You can redirect the relevant output into a file; the process view output of pv will stay on the {screen/terminal window}

                  • There are two pv processes in a for loop, one global and one for each file, the global pv 'only counts the files', and the other one measures the speed and amount of data transferred

                  • ANSI escape sequences are used to keep the process view in a stable position


                  I use the name md5summer, make the shellscript executable and put it in a directory in PATH (my ~/bin directory, you may prefer /usr/local/bin).



                  #!/bin/bash

                  # date sign comment
                  # 20190119 sudodus created md5summer version 1.0

                  if [ "$1" == "-v" ]
                  then
                  verbose=true
                  shift
                  else
                  verbose=false
                  fi
                  if [ $# -ne 1 ]
                  then
                  echo "Usage: $0 [-v] <pattern>"
                  echo "Example: $0 '*.iso' # notice the quotes"
                  echo " $0 -v '*.iso' # verbose"
                  exit
                  fi
                  tmpstr=$(find $1 -maxdepth 0 -type f 2> /dev/null)
                  if [ "$tmpstr" == "" ]
                  then
                  echo "No such file '$1'. Try another pattern!"
                  exit
                  fi

                  tmpdir=$(mktemp -d)
                  tmpfil="$tmpdir/fil1"
                  tmpfi2="$tmpdir/fil2"
                  resetvid="033[0m"
                  prev2line="033[2F"
                  next2line="033[2E"

                  sln=1
                  cln=0
                  cnt=0
                  for i in $1
                  do
                  if test -f "$i"
                  then
                  cln=$((cln+1))
                  tmp=$(find -L "$i" -printf "%s")
                  cnt=$((cnt+tmp))
                  fi
                  done
                  echo "
                  number of files = $cln
                  total file size = $cnt B ~ $(($cnt/2**20)) MiB
                  "
                  for i in $1
                  do
                  if test -f "$i"
                  then
                  tmpnam=$(echo -n "$i")
                  tmpsum=$(< "$i" pv -ptrbs "$cnt" | md5sum)
                  sleep 0.05
                  echo "$sln" | pv -ls "$cln" > /dev/null
                  sleep 0.05
                  sln="$sln
                  $i"
                  sleep 0.05
                  printf "${tmpsum/-}${tmpnam}n" >> "$tmpfil"
                  echo -ne "$prev2line" > /dev/stderr
                  fi
                  done

                  sync
                  sleep 0.1
                  echo -ne "$next2line" > /dev/stderr

                  echo "-----"
                  if $verbose
                  then
                  sort -k2 "$tmpfil" | tee "$tmpfi2" | md5sum -c
                  echo "-----"
                  cat "$tmpfi2"
                  else
                  sort -k2 "$tmpfil"
                  fi
                  sleep 0.5
                  sync
                  rm -r "$tmpdir"


                  Demo example



                  Usage



                  $ md5summer 
                  Usage: /home/sudodus/bin/md5summer [-v] <pattern>
                  Example: /home/sudodus/bin/md5summer '*.iso' # notice the quotes
                  /home/sudodus/bin/md5summer -v '*.iso' # verbose


                  I tested in this directory



                  $ ls -1a
                  .
                  ..
                  'filename with spaces'
                  md5summer
                  md5summer1
                  md5summer2
                  subdir
                  .ttt
                  zenity-info-message.png


                  Normal usage plus pattern to see hidden files



                  $ md5summer ".* *"

                  number of files = 6
                  total file size = 12649 B ~ 0 MiB

                  8,32KiB 0:00:00 [ 156MiB/s] [=============================> ] 67%
                  6,00 0:00:00 [ 133k/s] [====================================>] 100%
                  -----
                  184d0995cc8b6d8070f89f15caee35ce filename with spaces
                  28227139997996c7838f07cd4c630ffc md5summer
                  3383b86a0753e486215280f0baf94399 md5summer1
                  28227139997996c7838f07cd4c630ffc md5summer2
                  31cd03f64a466e680e9c22fef4bcf14b .ttt
                  670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png


                  Verbose output plus pattern to see hidden files



                  $ md5summer -v ".* *"

                  number of files = 6
                  total file size = 12649 B ~ 0 MiB

                  8,32KiB 0:00:00 [ 184MiB/s] [=============================> ] 67%
                  6,00 0:00:00 [ 133k/s] [====================================>] 100%
                  -----
                  filename with spaces: OK
                  md5summer: OK
                  md5summer1: OK
                  md5summer2: OK
                  .ttt: OK
                  zenity-info-message.png: OK
                  -----
                  184d0995cc8b6d8070f89f15caee35ce filename with spaces
                  28227139997996c7838f07cd4c630ffc md5summer
                  3383b86a0753e486215280f0baf94399 md5summer1
                  28227139997996c7838f07cd4c630ffc md5summer2
                  31cd03f64a466e680e9c22fef4bcf14b .ttt
                  670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png


                  Redirection to a file, first the screen output



                  $ md5summer ".* *" > subdir/save
                  8,32KiB 0:00:00 [ 180MiB/s] [=============================> ] 67%
                  6,00 0:00:00 [ 162k/s] [====================================>] 100%


                  and then the saved output



                  $ cat subdir/save 

                  number of files = 6
                  total file size = 12649 B ~ 0 MiB

                  -----
                  184d0995cc8b6d8070f89f15caee35ce filename with spaces
                  28227139997996c7838f07cd4c630ffc md5summer
                  3383b86a0753e486215280f0baf94399 md5summer1
                  28227139997996c7838f07cd4c630ffc md5summer2
                  31cd03f64a466e680e9c22fef4bcf14b .ttt
                  670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png


                  Checking iso files



                  $ md5summer "*.iso"

                  number of files = 10
                  total file size = 7112491008 B ~ 6783 MiB

                  28,0MiB 0:00:00 [ 160MiB/s] [> ] 0%
                  10,0 0:00:00 [ 204k/s] [====================================>] 100%
                  -----
                  7a27fdd46a63ba4375896891826c1c88 debian-live-8.6.0-amd64-lxde-desktop.iso
                  d70eec28cdbdee7f7aa95fb53b9bfdac debian-live-8.7.1-amd64-standard.iso
                  382cfbe621ca446d12871b8945b50d20 debian-live-8.8.0-amd64-standard.iso
                  44473dfe2ee1aad0f71506f1d5862457 debian-live-8.8.0-i386-standard.iso
                  f396b3532fa84059e7738c3c1827bada debian-live-9.3.0-amd64-cinnamon.iso
                  8f6def28ae7cbefa0a6e59407c884466 debian-live-9.6.0-amd64-cinnamon.iso
                  90b1815da0a5bf4ee4b00eec2b5d3587 debian-testing-amd64-netinst_2017-07-28.iso
                  8f75074ab98e166b7469299d3e459ac6 mini-amd64-2016-01-21-daily.iso
                  e580266fba58eb34b05bf6e13f51a047 mini-jessie-32.iso
                  646c109a9a16c0527ce1c7afa922e2ed mini-jessie-64.iso





                  share|improve this answer















                  I have also enjoyed taming the 'fancy cat', pv, for md5sum :-)




                  • I think my shellscript is rather stable now

                  • There is a usage output, if you do not enter the pattern correctly.

                  • It works with wild cards, but does not recurse into subdirectories

                  • You can enter more than one pattern, for example ".* *"

                  • There is a verbosity switch that turns on checking the md5sums ... OK

                  • You can redirect the relevant output into a file; the process view output of pv will stay on the {screen/terminal window}

                  • There are two pv processes in a for loop, one global and one for each file, the global pv 'only counts the files', and the other one measures the speed and amount of data transferred

                  • ANSI escape sequences are used to keep the process view in a stable position


                  I use the name md5summer, make the shellscript executable and put it in a directory in PATH (my ~/bin directory, you may prefer /usr/local/bin).



                  #!/bin/bash

                  # date sign comment
                  # 20190119 sudodus created md5summer version 1.0

                  if [ "$1" == "-v" ]
                  then
                  verbose=true
                  shift
                  else
                  verbose=false
                  fi
                  if [ $# -ne 1 ]
                  then
                  echo "Usage: $0 [-v] <pattern>"
                  echo "Example: $0 '*.iso' # notice the quotes"
                  echo " $0 -v '*.iso' # verbose"
                  exit
                  fi
                  tmpstr=$(find $1 -maxdepth 0 -type f 2> /dev/null)
                  if [ "$tmpstr" == "" ]
                  then
                  echo "No such file '$1'. Try another pattern!"
                  exit
                  fi

                  tmpdir=$(mktemp -d)
                  tmpfil="$tmpdir/fil1"
                  tmpfi2="$tmpdir/fil2"
                  resetvid="033[0m"
                  prev2line="033[2F"
                  next2line="033[2E"

                  sln=1
                  cln=0
                  cnt=0
                  for i in $1
                  do
                  if test -f "$i"
                  then
                  cln=$((cln+1))
                  tmp=$(find -L "$i" -printf "%s")
                  cnt=$((cnt+tmp))
                  fi
                  done
                  echo "
                  number of files = $cln
                  total file size = $cnt B ~ $(($cnt/2**20)) MiB
                  "
                  for i in $1
                  do
                  if test -f "$i"
                  then
                  tmpnam=$(echo -n "$i")
                  tmpsum=$(< "$i" pv -ptrbs "$cnt" | md5sum)
                  sleep 0.05
                  echo "$sln" | pv -ls "$cln" > /dev/null
                  sleep 0.05
                  sln="$sln
                  $i"
                  sleep 0.05
                  printf "${tmpsum/-}${tmpnam}n" >> "$tmpfil"
                  echo -ne "$prev2line" > /dev/stderr
                  fi
                  done

                  sync
                  sleep 0.1
                  echo -ne "$next2line" > /dev/stderr

                  echo "-----"
                  if $verbose
                  then
                  sort -k2 "$tmpfil" | tee "$tmpfi2" | md5sum -c
                  echo "-----"
                  cat "$tmpfi2"
                  else
                  sort -k2 "$tmpfil"
                  fi
                  sleep 0.5
                  sync
                  rm -r "$tmpdir"


                  Demo example



                  Usage



                  $ md5summer 
                  Usage: /home/sudodus/bin/md5summer [-v] <pattern>
                  Example: /home/sudodus/bin/md5summer '*.iso' # notice the quotes
                  /home/sudodus/bin/md5summer -v '*.iso' # verbose


                  I tested in this directory



                  $ ls -1a
                  .
                  ..
                  'filename with spaces'
                  md5summer
                  md5summer1
                  md5summer2
                  subdir
                  .ttt
                  zenity-info-message.png


                  Normal usage plus pattern to see hidden files



                  $ md5summer ".* *"

                  number of files = 6
                  total file size = 12649 B ~ 0 MiB

                  8,32KiB 0:00:00 [ 156MiB/s] [=============================> ] 67%
                  6,00 0:00:00 [ 133k/s] [====================================>] 100%
                  -----
                  184d0995cc8b6d8070f89f15caee35ce filename with spaces
                  28227139997996c7838f07cd4c630ffc md5summer
                  3383b86a0753e486215280f0baf94399 md5summer1
                  28227139997996c7838f07cd4c630ffc md5summer2
                  31cd03f64a466e680e9c22fef4bcf14b .ttt
                  670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png


                  Verbose output plus pattern to see hidden files



                  $ md5summer -v ".* *"

                  number of files = 6
                  total file size = 12649 B ~ 0 MiB

                  8,32KiB 0:00:00 [ 184MiB/s] [=============================> ] 67%
                  6,00 0:00:00 [ 133k/s] [====================================>] 100%
                  -----
                  filename with spaces: OK
                  md5summer: OK
                  md5summer1: OK
                  md5summer2: OK
                  .ttt: OK
                  zenity-info-message.png: OK
                  -----
                  184d0995cc8b6d8070f89f15caee35ce filename with spaces
                  28227139997996c7838f07cd4c630ffc md5summer
                  3383b86a0753e486215280f0baf94399 md5summer1
                  28227139997996c7838f07cd4c630ffc md5summer2
                  31cd03f64a466e680e9c22fef4bcf14b .ttt
                  670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png


                  Redirection to a file, first the screen output



                  $ md5summer ".* *" > subdir/save
                  8,32KiB 0:00:00 [ 180MiB/s] [=============================> ] 67%
                  6,00 0:00:00 [ 162k/s] [====================================>] 100%


                  and then the saved output



                  $ cat subdir/save 

                  number of files = 6
                  total file size = 12649 B ~ 0 MiB

                  -----
                  184d0995cc8b6d8070f89f15caee35ce filename with spaces
                  28227139997996c7838f07cd4c630ffc md5summer
                  3383b86a0753e486215280f0baf94399 md5summer1
                  28227139997996c7838f07cd4c630ffc md5summer2
                  31cd03f64a466e680e9c22fef4bcf14b .ttt
                  670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png


                  Checking iso files



                  $ md5summer "*.iso"

                  number of files = 10
                  total file size = 7112491008 B ~ 6783 MiB

                  28,0MiB 0:00:00 [ 160MiB/s] [> ] 0%
                  10,0 0:00:00 [ 204k/s] [====================================>] 100%
                  -----
                  7a27fdd46a63ba4375896891826c1c88 debian-live-8.6.0-amd64-lxde-desktop.iso
                  d70eec28cdbdee7f7aa95fb53b9bfdac debian-live-8.7.1-amd64-standard.iso
                  382cfbe621ca446d12871b8945b50d20 debian-live-8.8.0-amd64-standard.iso
                  44473dfe2ee1aad0f71506f1d5862457 debian-live-8.8.0-i386-standard.iso
                  f396b3532fa84059e7738c3c1827bada debian-live-9.3.0-amd64-cinnamon.iso
                  8f6def28ae7cbefa0a6e59407c884466 debian-live-9.6.0-amd64-cinnamon.iso
                  90b1815da0a5bf4ee4b00eec2b5d3587 debian-testing-amd64-netinst_2017-07-28.iso
                  8f75074ab98e166b7469299d3e459ac6 mini-amd64-2016-01-21-daily.iso
                  e580266fba58eb34b05bf6e13f51a047 mini-jessie-32.iso
                  646c109a9a16c0527ce1c7afa922e2ed mini-jessie-64.iso






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Jan 20 at 17:04

























                  answered Jan 20 at 3:21









                  sudodussudodus

                  1,32016




                  1,32016






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Unix & Linux Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f495477%2fusing-pv-with-md5sum%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      An IMO inspired problem

                      Management

                      Investment