Using pv with md5sum
I used md5sum
with pv
to check 4 GiB of files that are in the same directory:
md5sum dir/* | pv -s 4g | sort
The command completes successfully in about 28 seconds, but pv
's output is all wrong. This is the sort of output that is displayed throughout:
219 B 0:00:07 [ 125 B/s ] [> ] 0% ETA 1668:01:09:02
It's like this without the -s 4g
and | sort
aswell. I've also tried it with different files.
I've tried using pv
with cat
and the output was fine, so the problem seems to be caused by md5sum
.
pipe hashsum pv
add a comment |
I used md5sum
with pv
to check 4 GiB of files that are in the same directory:
md5sum dir/* | pv -s 4g | sort
The command completes successfully in about 28 seconds, but pv
's output is all wrong. This is the sort of output that is displayed throughout:
219 B 0:00:07 [ 125 B/s ] [> ] 0% ETA 1668:01:09:02
It's like this without the -s 4g
and | sort
aswell. I've also tried it with different files.
I've tried using pv
with cat
and the output was fine, so the problem seems to be caused by md5sum
.
pipe hashsum pv
1
It's likely a buffering issue. That is, the output frommd5sum
is not line-buffered and won't arrive atpv
until the process is done or has produced enough data to fill the output buffer. I can't see an option in themd5sum
manual to make it line-buffered. Or, you are misunderstanding what is happening, which is that the data sent throughpv
is only the checksums (and filenames). Alsopv
does not know how much data to expect, so it can't say how much is left.
– Kusalananda
Jan 19 at 16:42
It seems like only the checksums and filenames are going thoroughpv
(but this doesn't seem to affect anyone else?). Is there are way to make all of the file data go throughpv
?
– EmmaV
Jan 19 at 16:49
The issue with that is that you would loose the filename. Think ofpv
as a "fancycat
". Usingcat file | md5sum
, you would get the MD5 hash for a single file, butmd5sum
has no way of tagging the result with a filename.
– Kusalananda
Jan 19 at 16:51
1
You are usingpv
to rate the output of md5sum (which is a few bytes) and notmd5sum
's own progress of reading the files themselves. Maybe this answer is related: unix.stackexchange.com/q/16826/30851 (on second thought, maybe not - it's about textfiles...)
– frostschutz
Jan 19 at 17:05
2
Since you are not feeding 4Gb of data down the pipe, but just the output ofmd5sum
for a plurality of files, changing the-s 4g
option such that it reflects an estimate of the size ofmd5sum
's output, e.g.-s 512
, should be a step in the right direction.
– ozzy
Jan 19 at 17:12
add a comment |
I used md5sum
with pv
to check 4 GiB of files that are in the same directory:
md5sum dir/* | pv -s 4g | sort
The command completes successfully in about 28 seconds, but pv
's output is all wrong. This is the sort of output that is displayed throughout:
219 B 0:00:07 [ 125 B/s ] [> ] 0% ETA 1668:01:09:02
It's like this without the -s 4g
and | sort
aswell. I've also tried it with different files.
I've tried using pv
with cat
and the output was fine, so the problem seems to be caused by md5sum
.
pipe hashsum pv
I used md5sum
with pv
to check 4 GiB of files that are in the same directory:
md5sum dir/* | pv -s 4g | sort
The command completes successfully in about 28 seconds, but pv
's output is all wrong. This is the sort of output that is displayed throughout:
219 B 0:00:07 [ 125 B/s ] [> ] 0% ETA 1668:01:09:02
It's like this without the -s 4g
and | sort
aswell. I've also tried it with different files.
I've tried using pv
with cat
and the output was fine, so the problem seems to be caused by md5sum
.
pipe hashsum pv
pipe hashsum pv
asked Jan 19 at 16:29
EmmaVEmmaV
1,1581332
1,1581332
1
It's likely a buffering issue. That is, the output frommd5sum
is not line-buffered and won't arrive atpv
until the process is done or has produced enough data to fill the output buffer. I can't see an option in themd5sum
manual to make it line-buffered. Or, you are misunderstanding what is happening, which is that the data sent throughpv
is only the checksums (and filenames). Alsopv
does not know how much data to expect, so it can't say how much is left.
– Kusalananda
Jan 19 at 16:42
It seems like only the checksums and filenames are going thoroughpv
(but this doesn't seem to affect anyone else?). Is there are way to make all of the file data go throughpv
?
– EmmaV
Jan 19 at 16:49
The issue with that is that you would loose the filename. Think ofpv
as a "fancycat
". Usingcat file | md5sum
, you would get the MD5 hash for a single file, butmd5sum
has no way of tagging the result with a filename.
– Kusalananda
Jan 19 at 16:51
1
You are usingpv
to rate the output of md5sum (which is a few bytes) and notmd5sum
's own progress of reading the files themselves. Maybe this answer is related: unix.stackexchange.com/q/16826/30851 (on second thought, maybe not - it's about textfiles...)
– frostschutz
Jan 19 at 17:05
2
Since you are not feeding 4Gb of data down the pipe, but just the output ofmd5sum
for a plurality of files, changing the-s 4g
option such that it reflects an estimate of the size ofmd5sum
's output, e.g.-s 512
, should be a step in the right direction.
– ozzy
Jan 19 at 17:12
add a comment |
1
It's likely a buffering issue. That is, the output frommd5sum
is not line-buffered and won't arrive atpv
until the process is done or has produced enough data to fill the output buffer. I can't see an option in themd5sum
manual to make it line-buffered. Or, you are misunderstanding what is happening, which is that the data sent throughpv
is only the checksums (and filenames). Alsopv
does not know how much data to expect, so it can't say how much is left.
– Kusalananda
Jan 19 at 16:42
It seems like only the checksums and filenames are going thoroughpv
(but this doesn't seem to affect anyone else?). Is there are way to make all of the file data go throughpv
?
– EmmaV
Jan 19 at 16:49
The issue with that is that you would loose the filename. Think ofpv
as a "fancycat
". Usingcat file | md5sum
, you would get the MD5 hash for a single file, butmd5sum
has no way of tagging the result with a filename.
– Kusalananda
Jan 19 at 16:51
1
You are usingpv
to rate the output of md5sum (which is a few bytes) and notmd5sum
's own progress of reading the files themselves. Maybe this answer is related: unix.stackexchange.com/q/16826/30851 (on second thought, maybe not - it's about textfiles...)
– frostschutz
Jan 19 at 17:05
2
Since you are not feeding 4Gb of data down the pipe, but just the output ofmd5sum
for a plurality of files, changing the-s 4g
option such that it reflects an estimate of the size ofmd5sum
's output, e.g.-s 512
, should be a step in the right direction.
– ozzy
Jan 19 at 17:12
1
1
It's likely a buffering issue. That is, the output from
md5sum
is not line-buffered and won't arrive at pv
until the process is done or has produced enough data to fill the output buffer. I can't see an option in the md5sum
manual to make it line-buffered. Or, you are misunderstanding what is happening, which is that the data sent through pv
is only the checksums (and filenames). Also pv
does not know how much data to expect, so it can't say how much is left.– Kusalananda
Jan 19 at 16:42
It's likely a buffering issue. That is, the output from
md5sum
is not line-buffered and won't arrive at pv
until the process is done or has produced enough data to fill the output buffer. I can't see an option in the md5sum
manual to make it line-buffered. Or, you are misunderstanding what is happening, which is that the data sent through pv
is only the checksums (and filenames). Also pv
does not know how much data to expect, so it can't say how much is left.– Kusalananda
Jan 19 at 16:42
It seems like only the checksums and filenames are going thorough
pv
(but this doesn't seem to affect anyone else?). Is there are way to make all of the file data go through pv
?– EmmaV
Jan 19 at 16:49
It seems like only the checksums and filenames are going thorough
pv
(but this doesn't seem to affect anyone else?). Is there are way to make all of the file data go through pv
?– EmmaV
Jan 19 at 16:49
The issue with that is that you would loose the filename. Think of
pv
as a "fancy cat
". Using cat file | md5sum
, you would get the MD5 hash for a single file, but md5sum
has no way of tagging the result with a filename.– Kusalananda
Jan 19 at 16:51
The issue with that is that you would loose the filename. Think of
pv
as a "fancy cat
". Using cat file | md5sum
, you would get the MD5 hash for a single file, but md5sum
has no way of tagging the result with a filename.– Kusalananda
Jan 19 at 16:51
1
1
You are using
pv
to rate the output of md5sum (which is a few bytes) and not md5sum
's own progress of reading the files themselves. Maybe this answer is related: unix.stackexchange.com/q/16826/30851 (on second thought, maybe not - it's about textfiles...)– frostschutz
Jan 19 at 17:05
You are using
pv
to rate the output of md5sum (which is a few bytes) and not md5sum
's own progress of reading the files themselves. Maybe this answer is related: unix.stackexchange.com/q/16826/30851 (on second thought, maybe not - it's about textfiles...)– frostschutz
Jan 19 at 17:05
2
2
Since you are not feeding 4Gb of data down the pipe, but just the output of
md5sum
for a plurality of files, changing the -s 4g
option such that it reflects an estimate of the size of md5sum
's output, e.g. -s 512
, should be a step in the right direction.– ozzy
Jan 19 at 17:12
Since you are not feeding 4Gb of data down the pipe, but just the output of
md5sum
for a plurality of files, changing the -s 4g
option such that it reflects an estimate of the size of md5sum
's output, e.g. -s 512
, should be a step in the right direction.– ozzy
Jan 19 at 17:12
add a comment |
5 Answers
5
active
oldest
votes
pv
is a "fancy cat
", which is that you may use pv
in most situations where you would use cat
.
Using cat
with md5sum
, you can compute the MD5 checksum of a single file with
cat file | md5sum
or, with pv
,
pv file | md5sum
Unfortunately though, this does not allow md5sum
to insert the filename into its output properly.
Now, fortunately, pv
is a really fancy cat
, and on some systems (Linux), it's able to watch the data being passed through another process. This is done by using its -d
option with the process ID of that other process.
This means that you can do things like
md5sum dir/* | sort >sums &
sleep 1
pv -d "$(pgrep -n md5sum)"
This would allow pv
to watch the md5sum
process. The sleep
is there to allow md5sum
, which is running in the background, to properly start. pgrep -n md5sum
would return the PID of the most recently started md5sum
process that you own. pv
will exit as soon as the process that it is watching terminates.
I've tested this particular way of running pv
a few times and it seems to generally work well, but sometimes it seems to stop outputting anything as md5sum
switches to the next file. Sometimes, it seems to spawn spurious background tasks in the shell.
It would probably be safest to run it as
md5sum dir/* >sums &
sleep 1
pv -W -d "$!"
sort -o sums sums
The -W
option will cause pv
to wait until there's actual data being transferred, although this does also not always seem to work reliably.
The need forsleep
is somewhat surprising!
– Stephen Kitt
Jan 19 at 17:43
@StephenKitt That's the only way I could get it to behave somewhat predictably. I don't really know what it does, but whatever it is, it seems to be possible for it to do it "too early", and the progress meter won't show at all.
– Kusalananda
Jan 19 at 17:45
add a comment |
The data that you are feeding through the pipe is not the data of the files that md5sum
is processing, but instead the md5sum
output, which, for every file, consists of one line comprising: the MD5-hash, two spaces, and the file name. Since we know this in advance, can inform pv
accordingly, so as to enable it to display an accurate progress indicator. There are two ways of doing so.
The first, preferred method (suggested by frostschutz) makes use of the fact that md5sum
generates one line per processed file, and the fact that pv
has a line mode that counts lines rather than bytes. In this mode pv
will only move the progress bar when it encounters a newline in the throughput, i.e. per file finished by md5sum
. In Bash, this first method can look like this:
set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort
The set
builtin is used to set the positional parameters to the files to be processed (the *.iso
shell pattern is expanded by the shell). md5sum
is then told to process these files ($@
expands to the positional parameters), and pv
in line mode will move the progress indicator each time a file has been processed / a line is output by md5sum
. Notably, pv
is informed of the total number of lines it can expect (-s $#
), as the special shell parameter $#
expands to the number of positional arguments.
The second method is not line-based but byte-based. With md5sum
this unnecessarily complicated, but some other program may not produce lines but for instance continuous data, and then this approach may be more practical. I illustrate it with md5sum
though. The idea is to calculate the amount of data that md5sum
(or some other program) will produce, and use this to inform pv
. In Bash, this could look as follows:
os=$(( $( ls -1 | wc -c ) + $( ls -1 | wc -l ) * 34 ))
md5sum * | pv -s $os | sort
The first line calculates the output size (os
) estimate: the first term is the number of bytes necessary for encoding the filenames (incl. newline), the second term the number of bytes used for encoding the MD5-hashes (32 bytes each), plus 2 spaces. In the second line, we tell pv
that the expected amount of data is os
bytes, so that it can show an accurate progress indicator leading up to 100% (which indicator is updated per finished md5summed file).
Obviously, both methods are only practical in case multiple files are to be processed. Also, it should be noted that since the output of md5sum
is not related to the amount of time the md5sum
program has to spend crunching the underlying data, the progress indicator may be considered somewhat misleading. E.g., in the second method, the file with the shortest name will yield the lowest progress update, even though it may actually be the biggest in size. Then again, if all files have a similar sizes and names, this shouldn't matter much.
2
It's a very nice idea to calculate progress based onmd5sum
output (even though the progress would indicate how many files are still left, and not - how large or how long that would take). However it should not require parsingls
.pv
supports--line-mode
soset -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort
might be equivalent and still work if you replace md5sum with sha512sum or otherwise.
– frostschutz
Jan 19 at 19:39
@frostschutz You are undeniably right :-) This is a nicer, cleaner solution.
– ozzy
Jan 19 at 20:48
add a comment |
Here's a dirty hack to get progress per file:
for f in iso/*
do
pv "$f" | (
cat > /dev/null &
md5sum "$f"
wait
)
done
What it looks like:
4.15GiB 0:00:32 [ 130MiB/s] [================================>] 100%
0db0b36fc7bad7b50835f68c369e854c iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso
792MiB 0:00:06 [ 130MiB/s] [================================>] 100%
97537db63e61d20a5cb71d29145b2937 iso/archlinux-2016.10.01-dual.iso
843MiB 0:00:06 [ 129MiB/s] [================================>] 100%
1b5dc31e038499b8409f7d4d720e3eba iso/lubuntu-16.04-desktop-i386.iso
259MiB 0:00:02 [ 130MiB/s] [=========> ] 30% ETA 0:00:04
...
Now, this makes several assumptions. Firstly, that reading data is slower than hashing it. Secondly, that OS will cache the I/O so data won't be (physically) read twice even though pv
and md5sum
are completely independent readers.
The nice thing about such a dirty, dirty hack is that you can easily adapt it to make a progress bar across all the data, not just one file. And still do weird stuff like sort the output afterwards.
pv iso/* | (
cat > /dev/null &
md5sum iso/* | sort
wait
)
What it looks like (ongoing):
15.0GiB 0:01:47 [ 131MiB/s] [===========================> ] 83% ETA 0:00:21
What it looks like (finished):
18.0GiB 0:02:11 [ 140MiB/s] [================================>] 100%
0db0b36fc7bad7b50835f68c369e854c iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso
155603390e65f2a8341328be3cb63875 iso/systemrescuecd-x86-4.2.0.iso
1b5dc31e038499b8409f7d4d720e3eba iso/lubuntu-16.04-desktop-i386.iso
1b6ed6ff8d399f53adadfafb20fb0d71 iso/systemrescuecd-x86-4.4.1.iso
25715326d7096c50f7ea126ac20eabfd iso/openSUSE-13.2-KDE-Live-i686.iso
...
Now, that's for the hacks. Check other answers for proper solutions. ;-)
Reading the data would be required to hashing it, so hashing the data is almost certainly slower than just reading it. Also, you force reading the data twice for each file (once withpv
and once withmd5sum
).
– Kusalananda
Jan 19 at 18:52
@Kusalananda ...and that's why I called it a hack!pv < /dev/zero | md5sum
-> 637MiB/s on my machine. The ISO files I tested with were on a USB3 stick, 140MiB/s is about the max read speed. So md5sum can hash data faster than read it. Yes it reads data twice, but that's pedantic - thanks to OS caching, the USB stick still reads it only once; and running this hack is (in my case) not any slower than running md5sum by itself without progress bar.
– frostschutz
Jan 19 at 19:07
...and yes, it can go wrong. If you were to do this on a slow machine with a very fast storage medium,pv
would run way ahead ofmd5sum
and data would probably end up being read twice for real, or in any case, the progress bar would not be in sync with md5sum at all.
– frostschutz
Jan 19 at 19:12
add a comment |
As already pointed out in comments and other answers:
- You are piping into
pv
onlymd5sum
's output: checksums and file names; thus,pv
's progress bar is not able to show how much datamd5sum
is reading. - A size of 4 GB will be of course too much for that. Also, providing
pv
with the size of the file(s) you are piping into it (manually, with-s
) is inconvenient.
Piping the content of your files into pv
and then into md5sum
will give you a progress bar, but file names would be lost.
This code is a not so elegant way to have both—a meaningful progress bar and file names with checksums:
#!/bin/sh
for file in "$@"; do
pv -- "$file" |
md5sum |
sed 's/-$//' |
printf '%s%sn' "$(cat -)" "$file"
done
The script is meant to be invoked as:
./script dir/*
You can of course declare it as a function, to avoid having to type its path to call it (or adding it to your PATH
):
function pvsum () {
for file in "$@"; do
pv -- "$file" | md5sum | sed 's/-$//' | printf '%s%sn' "$(cat -)" "$file"
done
}
This way, the command pvsum dir/* | sort
will be equivalent to your md5sum dir/* | pv -s <size> | sort
.
Its output:
$ ./testscript testdir/*
4.00GiB 0:00:09 [ 446MiB/s] [==============================>] 100%
9dab5f8add1f699bca108f99e5fa5342 testdir/file1
1.00GiB 0:00:02 [ 447MiB/s] [==============================>] 100%
06a738a71e3fd3119922bdac259fe29a testdir/file2
What it does:
- It loops over the given files and, for each:
- Pipes the file from
pv
intomd5sum
, showing the default progress bar.
sed
is used to remove the-
printed bymd5sum
(which is reading from standard input); this also attempts to make the output suitable for being consumed bymd5sum -c
(thanks to frostschutz for pointing out this out)1.- Prints the checksum followed by the file name on standard output.
- Pipes the file from
About sort
:
I'm not sure about your expected results, so I have just ignored it. Since pv
writes its progress bar to standard error, piping everything into sort
will detach pv
's output from md5sum
's output.
Anyway, you can just append | sort
after done
in the code above and check if the result is fine to you.
1 Note that the output from the code shown above will not be suitable for md5sum -c
if file names include newlines. Handling newlines is possible, but some versions of md5sum
behave differently in this respect (see, for instance, answers to this question), making a general solution not easy (and out of the scope of this answer).
Assuming a recent version of md5sum
, an attempt at solving this issue could be:
for file in "$@"; do
pv -- "$file" |
md5sum |
sed 's/-$//' |
printf '%s%sn' "$(cat -)" "$file" |
sed '$! N; s/\/\\/g; s/n/\n/g; t x; q; :x s/^/\/;'
done
Where the only addition, the final sed
, will:
- Put the whole input (checksum and file name) in pattern space, since it may contain newlines:
$!
matches any line except for the last one;N;
appends a newline and the next line to the pattern space. - Escape with a backslash (
) any backslash found.
- Replace with
n
any newline found. - Only if at least a backslash or newline has been replaced (
t x;
: branch to labelx
), a backslash is added at the beginning of the checksum to signalmd5sum -c
that something has to be unescaped; otherwise just quit.
add a comment |
I have also enjoyed taming the 'fancy cat', pv
, for md5sum
:-)
- I think my shellscript is rather stable now
- There is a
usage
output, if you do not enter the pattern correctly. - It works with wild cards, but does not recurse into subdirectories
- You can enter more than one pattern, for example
".* *"
- There is a verbosity switch that turns on checking the md5sums
... OK
- You can redirect the relevant output into a file; the process view output of
pv
will stay on the {screen/terminal window} - There are two
pv
processes in a for loop, one global and one for each file, the globalpv
'only counts the files', and the other one measures the speed and amount of data transferred - ANSI escape sequences are used to keep the process view in a stable position
I use the name md5summer
, make the shellscript executable and put it in a directory in PATH (my ~/bin
directory, you may prefer /usr/local/bin
).
#!/bin/bash
# date sign comment
# 20190119 sudodus created md5summer version 1.0
if [ "$1" == "-v" ]
then
verbose=true
shift
else
verbose=false
fi
if [ $# -ne 1 ]
then
echo "Usage: $0 [-v] <pattern>"
echo "Example: $0 '*.iso' # notice the quotes"
echo " $0 -v '*.iso' # verbose"
exit
fi
tmpstr=$(find $1 -maxdepth 0 -type f 2> /dev/null)
if [ "$tmpstr" == "" ]
then
echo "No such file '$1'. Try another pattern!"
exit
fi
tmpdir=$(mktemp -d)
tmpfil="$tmpdir/fil1"
tmpfi2="$tmpdir/fil2"
resetvid="033[0m"
prev2line="033[2F"
next2line="033[2E"
sln=1
cln=0
cnt=0
for i in $1
do
if test -f "$i"
then
cln=$((cln+1))
tmp=$(find -L "$i" -printf "%s")
cnt=$((cnt+tmp))
fi
done
echo "
number of files = $cln
total file size = $cnt B ~ $(($cnt/2**20)) MiB
"
for i in $1
do
if test -f "$i"
then
tmpnam=$(echo -n "$i")
tmpsum=$(< "$i" pv -ptrbs "$cnt" | md5sum)
sleep 0.05
echo "$sln" | pv -ls "$cln" > /dev/null
sleep 0.05
sln="$sln
$i"
sleep 0.05
printf "${tmpsum/-}${tmpnam}n" >> "$tmpfil"
echo -ne "$prev2line" > /dev/stderr
fi
done
sync
sleep 0.1
echo -ne "$next2line" > /dev/stderr
echo "-----"
if $verbose
then
sort -k2 "$tmpfil" | tee "$tmpfi2" | md5sum -c
echo "-----"
cat "$tmpfi2"
else
sort -k2 "$tmpfil"
fi
sleep 0.5
sync
rm -r "$tmpdir"
Demo example
Usage
$ md5summer
Usage: /home/sudodus/bin/md5summer [-v] <pattern>
Example: /home/sudodus/bin/md5summer '*.iso' # notice the quotes
/home/sudodus/bin/md5summer -v '*.iso' # verbose
I tested in this directory
$ ls -1a
.
..
'filename with spaces'
md5summer
md5summer1
md5summer2
subdir
.ttt
zenity-info-message.png
Normal usage plus pattern to see hidden files
$ md5summer ".* *"
number of files = 6
total file size = 12649 B ~ 0 MiB
8,32KiB 0:00:00 [ 156MiB/s] [=============================> ] 67%
6,00 0:00:00 [ 133k/s] [====================================>] 100%
-----
184d0995cc8b6d8070f89f15caee35ce filename with spaces
28227139997996c7838f07cd4c630ffc md5summer
3383b86a0753e486215280f0baf94399 md5summer1
28227139997996c7838f07cd4c630ffc md5summer2
31cd03f64a466e680e9c22fef4bcf14b .ttt
670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png
Verbose output plus pattern to see hidden files
$ md5summer -v ".* *"
number of files = 6
total file size = 12649 B ~ 0 MiB
8,32KiB 0:00:00 [ 184MiB/s] [=============================> ] 67%
6,00 0:00:00 [ 133k/s] [====================================>] 100%
-----
filename with spaces: OK
md5summer: OK
md5summer1: OK
md5summer2: OK
.ttt: OK
zenity-info-message.png: OK
-----
184d0995cc8b6d8070f89f15caee35ce filename with spaces
28227139997996c7838f07cd4c630ffc md5summer
3383b86a0753e486215280f0baf94399 md5summer1
28227139997996c7838f07cd4c630ffc md5summer2
31cd03f64a466e680e9c22fef4bcf14b .ttt
670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png
Redirection to a file, first the screen output
$ md5summer ".* *" > subdir/save
8,32KiB 0:00:00 [ 180MiB/s] [=============================> ] 67%
6,00 0:00:00 [ 162k/s] [====================================>] 100%
and then the saved output
$ cat subdir/save
number of files = 6
total file size = 12649 B ~ 0 MiB
-----
184d0995cc8b6d8070f89f15caee35ce filename with spaces
28227139997996c7838f07cd4c630ffc md5summer
3383b86a0753e486215280f0baf94399 md5summer1
28227139997996c7838f07cd4c630ffc md5summer2
31cd03f64a466e680e9c22fef4bcf14b .ttt
670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png
Checking iso files
$ md5summer "*.iso"
number of files = 10
total file size = 7112491008 B ~ 6783 MiB
28,0MiB 0:00:00 [ 160MiB/s] [> ] 0%
10,0 0:00:00 [ 204k/s] [====================================>] 100%
-----
7a27fdd46a63ba4375896891826c1c88 debian-live-8.6.0-amd64-lxde-desktop.iso
d70eec28cdbdee7f7aa95fb53b9bfdac debian-live-8.7.1-amd64-standard.iso
382cfbe621ca446d12871b8945b50d20 debian-live-8.8.0-amd64-standard.iso
44473dfe2ee1aad0f71506f1d5862457 debian-live-8.8.0-i386-standard.iso
f396b3532fa84059e7738c3c1827bada debian-live-9.3.0-amd64-cinnamon.iso
8f6def28ae7cbefa0a6e59407c884466 debian-live-9.6.0-amd64-cinnamon.iso
90b1815da0a5bf4ee4b00eec2b5d3587 debian-testing-amd64-netinst_2017-07-28.iso
8f75074ab98e166b7469299d3e459ac6 mini-amd64-2016-01-21-daily.iso
e580266fba58eb34b05bf6e13f51a047 mini-jessie-32.iso
646c109a9a16c0527ce1c7afa922e2ed mini-jessie-64.iso
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f495477%2fusing-pv-with-md5sum%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
pv
is a "fancy cat
", which is that you may use pv
in most situations where you would use cat
.
Using cat
with md5sum
, you can compute the MD5 checksum of a single file with
cat file | md5sum
or, with pv
,
pv file | md5sum
Unfortunately though, this does not allow md5sum
to insert the filename into its output properly.
Now, fortunately, pv
is a really fancy cat
, and on some systems (Linux), it's able to watch the data being passed through another process. This is done by using its -d
option with the process ID of that other process.
This means that you can do things like
md5sum dir/* | sort >sums &
sleep 1
pv -d "$(pgrep -n md5sum)"
This would allow pv
to watch the md5sum
process. The sleep
is there to allow md5sum
, which is running in the background, to properly start. pgrep -n md5sum
would return the PID of the most recently started md5sum
process that you own. pv
will exit as soon as the process that it is watching terminates.
I've tested this particular way of running pv
a few times and it seems to generally work well, but sometimes it seems to stop outputting anything as md5sum
switches to the next file. Sometimes, it seems to spawn spurious background tasks in the shell.
It would probably be safest to run it as
md5sum dir/* >sums &
sleep 1
pv -W -d "$!"
sort -o sums sums
The -W
option will cause pv
to wait until there's actual data being transferred, although this does also not always seem to work reliably.
The need forsleep
is somewhat surprising!
– Stephen Kitt
Jan 19 at 17:43
@StephenKitt That's the only way I could get it to behave somewhat predictably. I don't really know what it does, but whatever it is, it seems to be possible for it to do it "too early", and the progress meter won't show at all.
– Kusalananda
Jan 19 at 17:45
add a comment |
pv
is a "fancy cat
", which is that you may use pv
in most situations where you would use cat
.
Using cat
with md5sum
, you can compute the MD5 checksum of a single file with
cat file | md5sum
or, with pv
,
pv file | md5sum
Unfortunately though, this does not allow md5sum
to insert the filename into its output properly.
Now, fortunately, pv
is a really fancy cat
, and on some systems (Linux), it's able to watch the data being passed through another process. This is done by using its -d
option with the process ID of that other process.
This means that you can do things like
md5sum dir/* | sort >sums &
sleep 1
pv -d "$(pgrep -n md5sum)"
This would allow pv
to watch the md5sum
process. The sleep
is there to allow md5sum
, which is running in the background, to properly start. pgrep -n md5sum
would return the PID of the most recently started md5sum
process that you own. pv
will exit as soon as the process that it is watching terminates.
I've tested this particular way of running pv
a few times and it seems to generally work well, but sometimes it seems to stop outputting anything as md5sum
switches to the next file. Sometimes, it seems to spawn spurious background tasks in the shell.
It would probably be safest to run it as
md5sum dir/* >sums &
sleep 1
pv -W -d "$!"
sort -o sums sums
The -W
option will cause pv
to wait until there's actual data being transferred, although this does also not always seem to work reliably.
The need forsleep
is somewhat surprising!
– Stephen Kitt
Jan 19 at 17:43
@StephenKitt That's the only way I could get it to behave somewhat predictably. I don't really know what it does, but whatever it is, it seems to be possible for it to do it "too early", and the progress meter won't show at all.
– Kusalananda
Jan 19 at 17:45
add a comment |
pv
is a "fancy cat
", which is that you may use pv
in most situations where you would use cat
.
Using cat
with md5sum
, you can compute the MD5 checksum of a single file with
cat file | md5sum
or, with pv
,
pv file | md5sum
Unfortunately though, this does not allow md5sum
to insert the filename into its output properly.
Now, fortunately, pv
is a really fancy cat
, and on some systems (Linux), it's able to watch the data being passed through another process. This is done by using its -d
option with the process ID of that other process.
This means that you can do things like
md5sum dir/* | sort >sums &
sleep 1
pv -d "$(pgrep -n md5sum)"
This would allow pv
to watch the md5sum
process. The sleep
is there to allow md5sum
, which is running in the background, to properly start. pgrep -n md5sum
would return the PID of the most recently started md5sum
process that you own. pv
will exit as soon as the process that it is watching terminates.
I've tested this particular way of running pv
a few times and it seems to generally work well, but sometimes it seems to stop outputting anything as md5sum
switches to the next file. Sometimes, it seems to spawn spurious background tasks in the shell.
It would probably be safest to run it as
md5sum dir/* >sums &
sleep 1
pv -W -d "$!"
sort -o sums sums
The -W
option will cause pv
to wait until there's actual data being transferred, although this does also not always seem to work reliably.
pv
is a "fancy cat
", which is that you may use pv
in most situations where you would use cat
.
Using cat
with md5sum
, you can compute the MD5 checksum of a single file with
cat file | md5sum
or, with pv
,
pv file | md5sum
Unfortunately though, this does not allow md5sum
to insert the filename into its output properly.
Now, fortunately, pv
is a really fancy cat
, and on some systems (Linux), it's able to watch the data being passed through another process. This is done by using its -d
option with the process ID of that other process.
This means that you can do things like
md5sum dir/* | sort >sums &
sleep 1
pv -d "$(pgrep -n md5sum)"
This would allow pv
to watch the md5sum
process. The sleep
is there to allow md5sum
, which is running in the background, to properly start. pgrep -n md5sum
would return the PID of the most recently started md5sum
process that you own. pv
will exit as soon as the process that it is watching terminates.
I've tested this particular way of running pv
a few times and it seems to generally work well, but sometimes it seems to stop outputting anything as md5sum
switches to the next file. Sometimes, it seems to spawn spurious background tasks in the shell.
It would probably be safest to run it as
md5sum dir/* >sums &
sleep 1
pv -W -d "$!"
sort -o sums sums
The -W
option will cause pv
to wait until there's actual data being transferred, although this does also not always seem to work reliably.
edited Jan 19 at 17:32
answered Jan 19 at 17:14
KusalanandaKusalananda
126k16239393
126k16239393
The need forsleep
is somewhat surprising!
– Stephen Kitt
Jan 19 at 17:43
@StephenKitt That's the only way I could get it to behave somewhat predictably. I don't really know what it does, but whatever it is, it seems to be possible for it to do it "too early", and the progress meter won't show at all.
– Kusalananda
Jan 19 at 17:45
add a comment |
The need forsleep
is somewhat surprising!
– Stephen Kitt
Jan 19 at 17:43
@StephenKitt That's the only way I could get it to behave somewhat predictably. I don't really know what it does, but whatever it is, it seems to be possible for it to do it "too early", and the progress meter won't show at all.
– Kusalananda
Jan 19 at 17:45
The need for
sleep
is somewhat surprising!– Stephen Kitt
Jan 19 at 17:43
The need for
sleep
is somewhat surprising!– Stephen Kitt
Jan 19 at 17:43
@StephenKitt That's the only way I could get it to behave somewhat predictably. I don't really know what it does, but whatever it is, it seems to be possible for it to do it "too early", and the progress meter won't show at all.
– Kusalananda
Jan 19 at 17:45
@StephenKitt That's the only way I could get it to behave somewhat predictably. I don't really know what it does, but whatever it is, it seems to be possible for it to do it "too early", and the progress meter won't show at all.
– Kusalananda
Jan 19 at 17:45
add a comment |
The data that you are feeding through the pipe is not the data of the files that md5sum
is processing, but instead the md5sum
output, which, for every file, consists of one line comprising: the MD5-hash, two spaces, and the file name. Since we know this in advance, can inform pv
accordingly, so as to enable it to display an accurate progress indicator. There are two ways of doing so.
The first, preferred method (suggested by frostschutz) makes use of the fact that md5sum
generates one line per processed file, and the fact that pv
has a line mode that counts lines rather than bytes. In this mode pv
will only move the progress bar when it encounters a newline in the throughput, i.e. per file finished by md5sum
. In Bash, this first method can look like this:
set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort
The set
builtin is used to set the positional parameters to the files to be processed (the *.iso
shell pattern is expanded by the shell). md5sum
is then told to process these files ($@
expands to the positional parameters), and pv
in line mode will move the progress indicator each time a file has been processed / a line is output by md5sum
. Notably, pv
is informed of the total number of lines it can expect (-s $#
), as the special shell parameter $#
expands to the number of positional arguments.
The second method is not line-based but byte-based. With md5sum
this unnecessarily complicated, but some other program may not produce lines but for instance continuous data, and then this approach may be more practical. I illustrate it with md5sum
though. The idea is to calculate the amount of data that md5sum
(or some other program) will produce, and use this to inform pv
. In Bash, this could look as follows:
os=$(( $( ls -1 | wc -c ) + $( ls -1 | wc -l ) * 34 ))
md5sum * | pv -s $os | sort
The first line calculates the output size (os
) estimate: the first term is the number of bytes necessary for encoding the filenames (incl. newline), the second term the number of bytes used for encoding the MD5-hashes (32 bytes each), plus 2 spaces. In the second line, we tell pv
that the expected amount of data is os
bytes, so that it can show an accurate progress indicator leading up to 100% (which indicator is updated per finished md5summed file).
Obviously, both methods are only practical in case multiple files are to be processed. Also, it should be noted that since the output of md5sum
is not related to the amount of time the md5sum
program has to spend crunching the underlying data, the progress indicator may be considered somewhat misleading. E.g., in the second method, the file with the shortest name will yield the lowest progress update, even though it may actually be the biggest in size. Then again, if all files have a similar sizes and names, this shouldn't matter much.
2
It's a very nice idea to calculate progress based onmd5sum
output (even though the progress would indicate how many files are still left, and not - how large or how long that would take). However it should not require parsingls
.pv
supports--line-mode
soset -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort
might be equivalent and still work if you replace md5sum with sha512sum or otherwise.
– frostschutz
Jan 19 at 19:39
@frostschutz You are undeniably right :-) This is a nicer, cleaner solution.
– ozzy
Jan 19 at 20:48
add a comment |
The data that you are feeding through the pipe is not the data of the files that md5sum
is processing, but instead the md5sum
output, which, for every file, consists of one line comprising: the MD5-hash, two spaces, and the file name. Since we know this in advance, can inform pv
accordingly, so as to enable it to display an accurate progress indicator. There are two ways of doing so.
The first, preferred method (suggested by frostschutz) makes use of the fact that md5sum
generates one line per processed file, and the fact that pv
has a line mode that counts lines rather than bytes. In this mode pv
will only move the progress bar when it encounters a newline in the throughput, i.e. per file finished by md5sum
. In Bash, this first method can look like this:
set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort
The set
builtin is used to set the positional parameters to the files to be processed (the *.iso
shell pattern is expanded by the shell). md5sum
is then told to process these files ($@
expands to the positional parameters), and pv
in line mode will move the progress indicator each time a file has been processed / a line is output by md5sum
. Notably, pv
is informed of the total number of lines it can expect (-s $#
), as the special shell parameter $#
expands to the number of positional arguments.
The second method is not line-based but byte-based. With md5sum
this unnecessarily complicated, but some other program may not produce lines but for instance continuous data, and then this approach may be more practical. I illustrate it with md5sum
though. The idea is to calculate the amount of data that md5sum
(or some other program) will produce, and use this to inform pv
. In Bash, this could look as follows:
os=$(( $( ls -1 | wc -c ) + $( ls -1 | wc -l ) * 34 ))
md5sum * | pv -s $os | sort
The first line calculates the output size (os
) estimate: the first term is the number of bytes necessary for encoding the filenames (incl. newline), the second term the number of bytes used for encoding the MD5-hashes (32 bytes each), plus 2 spaces. In the second line, we tell pv
that the expected amount of data is os
bytes, so that it can show an accurate progress indicator leading up to 100% (which indicator is updated per finished md5summed file).
Obviously, both methods are only practical in case multiple files are to be processed. Also, it should be noted that since the output of md5sum
is not related to the amount of time the md5sum
program has to spend crunching the underlying data, the progress indicator may be considered somewhat misleading. E.g., in the second method, the file with the shortest name will yield the lowest progress update, even though it may actually be the biggest in size. Then again, if all files have a similar sizes and names, this shouldn't matter much.
2
It's a very nice idea to calculate progress based onmd5sum
output (even though the progress would indicate how many files are still left, and not - how large or how long that would take). However it should not require parsingls
.pv
supports--line-mode
soset -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort
might be equivalent and still work if you replace md5sum with sha512sum or otherwise.
– frostschutz
Jan 19 at 19:39
@frostschutz You are undeniably right :-) This is a nicer, cleaner solution.
– ozzy
Jan 19 at 20:48
add a comment |
The data that you are feeding through the pipe is not the data of the files that md5sum
is processing, but instead the md5sum
output, which, for every file, consists of one line comprising: the MD5-hash, two spaces, and the file name. Since we know this in advance, can inform pv
accordingly, so as to enable it to display an accurate progress indicator. There are two ways of doing so.
The first, preferred method (suggested by frostschutz) makes use of the fact that md5sum
generates one line per processed file, and the fact that pv
has a line mode that counts lines rather than bytes. In this mode pv
will only move the progress bar when it encounters a newline in the throughput, i.e. per file finished by md5sum
. In Bash, this first method can look like this:
set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort
The set
builtin is used to set the positional parameters to the files to be processed (the *.iso
shell pattern is expanded by the shell). md5sum
is then told to process these files ($@
expands to the positional parameters), and pv
in line mode will move the progress indicator each time a file has been processed / a line is output by md5sum
. Notably, pv
is informed of the total number of lines it can expect (-s $#
), as the special shell parameter $#
expands to the number of positional arguments.
The second method is not line-based but byte-based. With md5sum
this unnecessarily complicated, but some other program may not produce lines but for instance continuous data, and then this approach may be more practical. I illustrate it with md5sum
though. The idea is to calculate the amount of data that md5sum
(or some other program) will produce, and use this to inform pv
. In Bash, this could look as follows:
os=$(( $( ls -1 | wc -c ) + $( ls -1 | wc -l ) * 34 ))
md5sum * | pv -s $os | sort
The first line calculates the output size (os
) estimate: the first term is the number of bytes necessary for encoding the filenames (incl. newline), the second term the number of bytes used for encoding the MD5-hashes (32 bytes each), plus 2 spaces. In the second line, we tell pv
that the expected amount of data is os
bytes, so that it can show an accurate progress indicator leading up to 100% (which indicator is updated per finished md5summed file).
Obviously, both methods are only practical in case multiple files are to be processed. Also, it should be noted that since the output of md5sum
is not related to the amount of time the md5sum
program has to spend crunching the underlying data, the progress indicator may be considered somewhat misleading. E.g., in the second method, the file with the shortest name will yield the lowest progress update, even though it may actually be the biggest in size. Then again, if all files have a similar sizes and names, this shouldn't matter much.
The data that you are feeding through the pipe is not the data of the files that md5sum
is processing, but instead the md5sum
output, which, for every file, consists of one line comprising: the MD5-hash, two spaces, and the file name. Since we know this in advance, can inform pv
accordingly, so as to enable it to display an accurate progress indicator. There are two ways of doing so.
The first, preferred method (suggested by frostschutz) makes use of the fact that md5sum
generates one line per processed file, and the fact that pv
has a line mode that counts lines rather than bytes. In this mode pv
will only move the progress bar when it encounters a newline in the throughput, i.e. per file finished by md5sum
. In Bash, this first method can look like this:
set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort
The set
builtin is used to set the positional parameters to the files to be processed (the *.iso
shell pattern is expanded by the shell). md5sum
is then told to process these files ($@
expands to the positional parameters), and pv
in line mode will move the progress indicator each time a file has been processed / a line is output by md5sum
. Notably, pv
is informed of the total number of lines it can expect (-s $#
), as the special shell parameter $#
expands to the number of positional arguments.
The second method is not line-based but byte-based. With md5sum
this unnecessarily complicated, but some other program may not produce lines but for instance continuous data, and then this approach may be more practical. I illustrate it with md5sum
though. The idea is to calculate the amount of data that md5sum
(or some other program) will produce, and use this to inform pv
. In Bash, this could look as follows:
os=$(( $( ls -1 | wc -c ) + $( ls -1 | wc -l ) * 34 ))
md5sum * | pv -s $os | sort
The first line calculates the output size (os
) estimate: the first term is the number of bytes necessary for encoding the filenames (incl. newline), the second term the number of bytes used for encoding the MD5-hashes (32 bytes each), plus 2 spaces. In the second line, we tell pv
that the expected amount of data is os
bytes, so that it can show an accurate progress indicator leading up to 100% (which indicator is updated per finished md5summed file).
Obviously, both methods are only practical in case multiple files are to be processed. Also, it should be noted that since the output of md5sum
is not related to the amount of time the md5sum
program has to spend crunching the underlying data, the progress indicator may be considered somewhat misleading. E.g., in the second method, the file with the shortest name will yield the lowest progress update, even though it may actually be the biggest in size. Then again, if all files have a similar sizes and names, this shouldn't matter much.
edited Jan 20 at 9:14
answered Jan 19 at 17:48
ozzyozzy
6955
6955
2
It's a very nice idea to calculate progress based onmd5sum
output (even though the progress would indicate how many files are still left, and not - how large or how long that would take). However it should not require parsingls
.pv
supports--line-mode
soset -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort
might be equivalent and still work if you replace md5sum with sha512sum or otherwise.
– frostschutz
Jan 19 at 19:39
@frostschutz You are undeniably right :-) This is a nicer, cleaner solution.
– ozzy
Jan 19 at 20:48
add a comment |
2
It's a very nice idea to calculate progress based onmd5sum
output (even though the progress would indicate how many files are still left, and not - how large or how long that would take). However it should not require parsingls
.pv
supports--line-mode
soset -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort
might be equivalent and still work if you replace md5sum with sha512sum or otherwise.
– frostschutz
Jan 19 at 19:39
@frostschutz You are undeniably right :-) This is a nicer, cleaner solution.
– ozzy
Jan 19 at 20:48
2
2
It's a very nice idea to calculate progress based on
md5sum
output (even though the progress would indicate how many files are still left, and not - how large or how long that would take). However it should not require parsing ls
. pv
supports --line-mode
so set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort
might be equivalent and still work if you replace md5sum with sha512sum or otherwise.– frostschutz
Jan 19 at 19:39
It's a very nice idea to calculate progress based on
md5sum
output (even though the progress would indicate how many files are still left, and not - how large or how long that would take). However it should not require parsing ls
. pv
supports --line-mode
so set -- *.iso; md5sum "$@" | pv --line-mode -s $# | sort
might be equivalent and still work if you replace md5sum with sha512sum or otherwise.– frostschutz
Jan 19 at 19:39
@frostschutz You are undeniably right :-) This is a nicer, cleaner solution.
– ozzy
Jan 19 at 20:48
@frostschutz You are undeniably right :-) This is a nicer, cleaner solution.
– ozzy
Jan 19 at 20:48
add a comment |
Here's a dirty hack to get progress per file:
for f in iso/*
do
pv "$f" | (
cat > /dev/null &
md5sum "$f"
wait
)
done
What it looks like:
4.15GiB 0:00:32 [ 130MiB/s] [================================>] 100%
0db0b36fc7bad7b50835f68c369e854c iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso
792MiB 0:00:06 [ 130MiB/s] [================================>] 100%
97537db63e61d20a5cb71d29145b2937 iso/archlinux-2016.10.01-dual.iso
843MiB 0:00:06 [ 129MiB/s] [================================>] 100%
1b5dc31e038499b8409f7d4d720e3eba iso/lubuntu-16.04-desktop-i386.iso
259MiB 0:00:02 [ 130MiB/s] [=========> ] 30% ETA 0:00:04
...
Now, this makes several assumptions. Firstly, that reading data is slower than hashing it. Secondly, that OS will cache the I/O so data won't be (physically) read twice even though pv
and md5sum
are completely independent readers.
The nice thing about such a dirty, dirty hack is that you can easily adapt it to make a progress bar across all the data, not just one file. And still do weird stuff like sort the output afterwards.
pv iso/* | (
cat > /dev/null &
md5sum iso/* | sort
wait
)
What it looks like (ongoing):
15.0GiB 0:01:47 [ 131MiB/s] [===========================> ] 83% ETA 0:00:21
What it looks like (finished):
18.0GiB 0:02:11 [ 140MiB/s] [================================>] 100%
0db0b36fc7bad7b50835f68c369e854c iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso
155603390e65f2a8341328be3cb63875 iso/systemrescuecd-x86-4.2.0.iso
1b5dc31e038499b8409f7d4d720e3eba iso/lubuntu-16.04-desktop-i386.iso
1b6ed6ff8d399f53adadfafb20fb0d71 iso/systemrescuecd-x86-4.4.1.iso
25715326d7096c50f7ea126ac20eabfd iso/openSUSE-13.2-KDE-Live-i686.iso
...
Now, that's for the hacks. Check other answers for proper solutions. ;-)
Reading the data would be required to hashing it, so hashing the data is almost certainly slower than just reading it. Also, you force reading the data twice for each file (once withpv
and once withmd5sum
).
– Kusalananda
Jan 19 at 18:52
@Kusalananda ...and that's why I called it a hack!pv < /dev/zero | md5sum
-> 637MiB/s on my machine. The ISO files I tested with were on a USB3 stick, 140MiB/s is about the max read speed. So md5sum can hash data faster than read it. Yes it reads data twice, but that's pedantic - thanks to OS caching, the USB stick still reads it only once; and running this hack is (in my case) not any slower than running md5sum by itself without progress bar.
– frostschutz
Jan 19 at 19:07
...and yes, it can go wrong. If you were to do this on a slow machine with a very fast storage medium,pv
would run way ahead ofmd5sum
and data would probably end up being read twice for real, or in any case, the progress bar would not be in sync with md5sum at all.
– frostschutz
Jan 19 at 19:12
add a comment |
Here's a dirty hack to get progress per file:
for f in iso/*
do
pv "$f" | (
cat > /dev/null &
md5sum "$f"
wait
)
done
What it looks like:
4.15GiB 0:00:32 [ 130MiB/s] [================================>] 100%
0db0b36fc7bad7b50835f68c369e854c iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso
792MiB 0:00:06 [ 130MiB/s] [================================>] 100%
97537db63e61d20a5cb71d29145b2937 iso/archlinux-2016.10.01-dual.iso
843MiB 0:00:06 [ 129MiB/s] [================================>] 100%
1b5dc31e038499b8409f7d4d720e3eba iso/lubuntu-16.04-desktop-i386.iso
259MiB 0:00:02 [ 130MiB/s] [=========> ] 30% ETA 0:00:04
...
Now, this makes several assumptions. Firstly, that reading data is slower than hashing it. Secondly, that OS will cache the I/O so data won't be (physically) read twice even though pv
and md5sum
are completely independent readers.
The nice thing about such a dirty, dirty hack is that you can easily adapt it to make a progress bar across all the data, not just one file. And still do weird stuff like sort the output afterwards.
pv iso/* | (
cat > /dev/null &
md5sum iso/* | sort
wait
)
What it looks like (ongoing):
15.0GiB 0:01:47 [ 131MiB/s] [===========================> ] 83% ETA 0:00:21
What it looks like (finished):
18.0GiB 0:02:11 [ 140MiB/s] [================================>] 100%
0db0b36fc7bad7b50835f68c369e854c iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso
155603390e65f2a8341328be3cb63875 iso/systemrescuecd-x86-4.2.0.iso
1b5dc31e038499b8409f7d4d720e3eba iso/lubuntu-16.04-desktop-i386.iso
1b6ed6ff8d399f53adadfafb20fb0d71 iso/systemrescuecd-x86-4.4.1.iso
25715326d7096c50f7ea126ac20eabfd iso/openSUSE-13.2-KDE-Live-i686.iso
...
Now, that's for the hacks. Check other answers for proper solutions. ;-)
Reading the data would be required to hashing it, so hashing the data is almost certainly slower than just reading it. Also, you force reading the data twice for each file (once withpv
and once withmd5sum
).
– Kusalananda
Jan 19 at 18:52
@Kusalananda ...and that's why I called it a hack!pv < /dev/zero | md5sum
-> 637MiB/s on my machine. The ISO files I tested with were on a USB3 stick, 140MiB/s is about the max read speed. So md5sum can hash data faster than read it. Yes it reads data twice, but that's pedantic - thanks to OS caching, the USB stick still reads it only once; and running this hack is (in my case) not any slower than running md5sum by itself without progress bar.
– frostschutz
Jan 19 at 19:07
...and yes, it can go wrong. If you were to do this on a slow machine with a very fast storage medium,pv
would run way ahead ofmd5sum
and data would probably end up being read twice for real, or in any case, the progress bar would not be in sync with md5sum at all.
– frostschutz
Jan 19 at 19:12
add a comment |
Here's a dirty hack to get progress per file:
for f in iso/*
do
pv "$f" | (
cat > /dev/null &
md5sum "$f"
wait
)
done
What it looks like:
4.15GiB 0:00:32 [ 130MiB/s] [================================>] 100%
0db0b36fc7bad7b50835f68c369e854c iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso
792MiB 0:00:06 [ 130MiB/s] [================================>] 100%
97537db63e61d20a5cb71d29145b2937 iso/archlinux-2016.10.01-dual.iso
843MiB 0:00:06 [ 129MiB/s] [================================>] 100%
1b5dc31e038499b8409f7d4d720e3eba iso/lubuntu-16.04-desktop-i386.iso
259MiB 0:00:02 [ 130MiB/s] [=========> ] 30% ETA 0:00:04
...
Now, this makes several assumptions. Firstly, that reading data is slower than hashing it. Secondly, that OS will cache the I/O so data won't be (physically) read twice even though pv
and md5sum
are completely independent readers.
The nice thing about such a dirty, dirty hack is that you can easily adapt it to make a progress bar across all the data, not just one file. And still do weird stuff like sort the output afterwards.
pv iso/* | (
cat > /dev/null &
md5sum iso/* | sort
wait
)
What it looks like (ongoing):
15.0GiB 0:01:47 [ 131MiB/s] [===========================> ] 83% ETA 0:00:21
What it looks like (finished):
18.0GiB 0:02:11 [ 140MiB/s] [================================>] 100%
0db0b36fc7bad7b50835f68c369e854c iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso
155603390e65f2a8341328be3cb63875 iso/systemrescuecd-x86-4.2.0.iso
1b5dc31e038499b8409f7d4d720e3eba iso/lubuntu-16.04-desktop-i386.iso
1b6ed6ff8d399f53adadfafb20fb0d71 iso/systemrescuecd-x86-4.4.1.iso
25715326d7096c50f7ea126ac20eabfd iso/openSUSE-13.2-KDE-Live-i686.iso
...
Now, that's for the hacks. Check other answers for proper solutions. ;-)
Here's a dirty hack to get progress per file:
for f in iso/*
do
pv "$f" | (
cat > /dev/null &
md5sum "$f"
wait
)
done
What it looks like:
4.15GiB 0:00:32 [ 130MiB/s] [================================>] 100%
0db0b36fc7bad7b50835f68c369e854c iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso
792MiB 0:00:06 [ 130MiB/s] [================================>] 100%
97537db63e61d20a5cb71d29145b2937 iso/archlinux-2016.10.01-dual.iso
843MiB 0:00:06 [ 129MiB/s] [================================>] 100%
1b5dc31e038499b8409f7d4d720e3eba iso/lubuntu-16.04-desktop-i386.iso
259MiB 0:00:02 [ 130MiB/s] [=========> ] 30% ETA 0:00:04
...
Now, this makes several assumptions. Firstly, that reading data is slower than hashing it. Secondly, that OS will cache the I/O so data won't be (physically) read twice even though pv
and md5sum
are completely independent readers.
The nice thing about such a dirty, dirty hack is that you can easily adapt it to make a progress bar across all the data, not just one file. And still do weird stuff like sort the output afterwards.
pv iso/* | (
cat > /dev/null &
md5sum iso/* | sort
wait
)
What it looks like (ongoing):
15.0GiB 0:01:47 [ 131MiB/s] [===========================> ] 83% ETA 0:00:21
What it looks like (finished):
18.0GiB 0:02:11 [ 140MiB/s] [================================>] 100%
0db0b36fc7bad7b50835f68c369e854c iso/KNOPPIX_V7.6.1DVD-2016-01-16-EN.iso
155603390e65f2a8341328be3cb63875 iso/systemrescuecd-x86-4.2.0.iso
1b5dc31e038499b8409f7d4d720e3eba iso/lubuntu-16.04-desktop-i386.iso
1b6ed6ff8d399f53adadfafb20fb0d71 iso/systemrescuecd-x86-4.4.1.iso
25715326d7096c50f7ea126ac20eabfd iso/openSUSE-13.2-KDE-Live-i686.iso
...
Now, that's for the hacks. Check other answers for proper solutions. ;-)
answered Jan 19 at 18:47
frostschutzfrostschutz
26.5k15483
26.5k15483
Reading the data would be required to hashing it, so hashing the data is almost certainly slower than just reading it. Also, you force reading the data twice for each file (once withpv
and once withmd5sum
).
– Kusalananda
Jan 19 at 18:52
@Kusalananda ...and that's why I called it a hack!pv < /dev/zero | md5sum
-> 637MiB/s on my machine. The ISO files I tested with were on a USB3 stick, 140MiB/s is about the max read speed. So md5sum can hash data faster than read it. Yes it reads data twice, but that's pedantic - thanks to OS caching, the USB stick still reads it only once; and running this hack is (in my case) not any slower than running md5sum by itself without progress bar.
– frostschutz
Jan 19 at 19:07
...and yes, it can go wrong. If you were to do this on a slow machine with a very fast storage medium,pv
would run way ahead ofmd5sum
and data would probably end up being read twice for real, or in any case, the progress bar would not be in sync with md5sum at all.
– frostschutz
Jan 19 at 19:12
add a comment |
Reading the data would be required to hashing it, so hashing the data is almost certainly slower than just reading it. Also, you force reading the data twice for each file (once withpv
and once withmd5sum
).
– Kusalananda
Jan 19 at 18:52
@Kusalananda ...and that's why I called it a hack!pv < /dev/zero | md5sum
-> 637MiB/s on my machine. The ISO files I tested with were on a USB3 stick, 140MiB/s is about the max read speed. So md5sum can hash data faster than read it. Yes it reads data twice, but that's pedantic - thanks to OS caching, the USB stick still reads it only once; and running this hack is (in my case) not any slower than running md5sum by itself without progress bar.
– frostschutz
Jan 19 at 19:07
...and yes, it can go wrong. If you were to do this on a slow machine with a very fast storage medium,pv
would run way ahead ofmd5sum
and data would probably end up being read twice for real, or in any case, the progress bar would not be in sync with md5sum at all.
– frostschutz
Jan 19 at 19:12
Reading the data would be required to hashing it, so hashing the data is almost certainly slower than just reading it. Also, you force reading the data twice for each file (once with
pv
and once with md5sum
).– Kusalananda
Jan 19 at 18:52
Reading the data would be required to hashing it, so hashing the data is almost certainly slower than just reading it. Also, you force reading the data twice for each file (once with
pv
and once with md5sum
).– Kusalananda
Jan 19 at 18:52
@Kusalananda ...and that's why I called it a hack!
pv < /dev/zero | md5sum
-> 637MiB/s on my machine. The ISO files I tested with were on a USB3 stick, 140MiB/s is about the max read speed. So md5sum can hash data faster than read it. Yes it reads data twice, but that's pedantic - thanks to OS caching, the USB stick still reads it only once; and running this hack is (in my case) not any slower than running md5sum by itself without progress bar.– frostschutz
Jan 19 at 19:07
@Kusalananda ...and that's why I called it a hack!
pv < /dev/zero | md5sum
-> 637MiB/s on my machine. The ISO files I tested with were on a USB3 stick, 140MiB/s is about the max read speed. So md5sum can hash data faster than read it. Yes it reads data twice, but that's pedantic - thanks to OS caching, the USB stick still reads it only once; and running this hack is (in my case) not any slower than running md5sum by itself without progress bar.– frostschutz
Jan 19 at 19:07
...and yes, it can go wrong. If you were to do this on a slow machine with a very fast storage medium,
pv
would run way ahead of md5sum
and data would probably end up being read twice for real, or in any case, the progress bar would not be in sync with md5sum at all.– frostschutz
Jan 19 at 19:12
...and yes, it can go wrong. If you were to do this on a slow machine with a very fast storage medium,
pv
would run way ahead of md5sum
and data would probably end up being read twice for real, or in any case, the progress bar would not be in sync with md5sum at all.– frostschutz
Jan 19 at 19:12
add a comment |
As already pointed out in comments and other answers:
- You are piping into
pv
onlymd5sum
's output: checksums and file names; thus,pv
's progress bar is not able to show how much datamd5sum
is reading. - A size of 4 GB will be of course too much for that. Also, providing
pv
with the size of the file(s) you are piping into it (manually, with-s
) is inconvenient.
Piping the content of your files into pv
and then into md5sum
will give you a progress bar, but file names would be lost.
This code is a not so elegant way to have both—a meaningful progress bar and file names with checksums:
#!/bin/sh
for file in "$@"; do
pv -- "$file" |
md5sum |
sed 's/-$//' |
printf '%s%sn' "$(cat -)" "$file"
done
The script is meant to be invoked as:
./script dir/*
You can of course declare it as a function, to avoid having to type its path to call it (or adding it to your PATH
):
function pvsum () {
for file in "$@"; do
pv -- "$file" | md5sum | sed 's/-$//' | printf '%s%sn' "$(cat -)" "$file"
done
}
This way, the command pvsum dir/* | sort
will be equivalent to your md5sum dir/* | pv -s <size> | sort
.
Its output:
$ ./testscript testdir/*
4.00GiB 0:00:09 [ 446MiB/s] [==============================>] 100%
9dab5f8add1f699bca108f99e5fa5342 testdir/file1
1.00GiB 0:00:02 [ 447MiB/s] [==============================>] 100%
06a738a71e3fd3119922bdac259fe29a testdir/file2
What it does:
- It loops over the given files and, for each:
- Pipes the file from
pv
intomd5sum
, showing the default progress bar.
sed
is used to remove the-
printed bymd5sum
(which is reading from standard input); this also attempts to make the output suitable for being consumed bymd5sum -c
(thanks to frostschutz for pointing out this out)1.- Prints the checksum followed by the file name on standard output.
- Pipes the file from
About sort
:
I'm not sure about your expected results, so I have just ignored it. Since pv
writes its progress bar to standard error, piping everything into sort
will detach pv
's output from md5sum
's output.
Anyway, you can just append | sort
after done
in the code above and check if the result is fine to you.
1 Note that the output from the code shown above will not be suitable for md5sum -c
if file names include newlines. Handling newlines is possible, but some versions of md5sum
behave differently in this respect (see, for instance, answers to this question), making a general solution not easy (and out of the scope of this answer).
Assuming a recent version of md5sum
, an attempt at solving this issue could be:
for file in "$@"; do
pv -- "$file" |
md5sum |
sed 's/-$//' |
printf '%s%sn' "$(cat -)" "$file" |
sed '$! N; s/\/\\/g; s/n/\n/g; t x; q; :x s/^/\/;'
done
Where the only addition, the final sed
, will:
- Put the whole input (checksum and file name) in pattern space, since it may contain newlines:
$!
matches any line except for the last one;N;
appends a newline and the next line to the pattern space. - Escape with a backslash (
) any backslash found.
- Replace with
n
any newline found. - Only if at least a backslash or newline has been replaced (
t x;
: branch to labelx
), a backslash is added at the beginning of the checksum to signalmd5sum -c
that something has to be unescaped; otherwise just quit.
add a comment |
As already pointed out in comments and other answers:
- You are piping into
pv
onlymd5sum
's output: checksums and file names; thus,pv
's progress bar is not able to show how much datamd5sum
is reading. - A size of 4 GB will be of course too much for that. Also, providing
pv
with the size of the file(s) you are piping into it (manually, with-s
) is inconvenient.
Piping the content of your files into pv
and then into md5sum
will give you a progress bar, but file names would be lost.
This code is a not so elegant way to have both—a meaningful progress bar and file names with checksums:
#!/bin/sh
for file in "$@"; do
pv -- "$file" |
md5sum |
sed 's/-$//' |
printf '%s%sn' "$(cat -)" "$file"
done
The script is meant to be invoked as:
./script dir/*
You can of course declare it as a function, to avoid having to type its path to call it (or adding it to your PATH
):
function pvsum () {
for file in "$@"; do
pv -- "$file" | md5sum | sed 's/-$//' | printf '%s%sn' "$(cat -)" "$file"
done
}
This way, the command pvsum dir/* | sort
will be equivalent to your md5sum dir/* | pv -s <size> | sort
.
Its output:
$ ./testscript testdir/*
4.00GiB 0:00:09 [ 446MiB/s] [==============================>] 100%
9dab5f8add1f699bca108f99e5fa5342 testdir/file1
1.00GiB 0:00:02 [ 447MiB/s] [==============================>] 100%
06a738a71e3fd3119922bdac259fe29a testdir/file2
What it does:
- It loops over the given files and, for each:
- Pipes the file from
pv
intomd5sum
, showing the default progress bar.
sed
is used to remove the-
printed bymd5sum
(which is reading from standard input); this also attempts to make the output suitable for being consumed bymd5sum -c
(thanks to frostschutz for pointing out this out)1.- Prints the checksum followed by the file name on standard output.
- Pipes the file from
About sort
:
I'm not sure about your expected results, so I have just ignored it. Since pv
writes its progress bar to standard error, piping everything into sort
will detach pv
's output from md5sum
's output.
Anyway, you can just append | sort
after done
in the code above and check if the result is fine to you.
1 Note that the output from the code shown above will not be suitable for md5sum -c
if file names include newlines. Handling newlines is possible, but some versions of md5sum
behave differently in this respect (see, for instance, answers to this question), making a general solution not easy (and out of the scope of this answer).
Assuming a recent version of md5sum
, an attempt at solving this issue could be:
for file in "$@"; do
pv -- "$file" |
md5sum |
sed 's/-$//' |
printf '%s%sn' "$(cat -)" "$file" |
sed '$! N; s/\/\\/g; s/n/\n/g; t x; q; :x s/^/\/;'
done
Where the only addition, the final sed
, will:
- Put the whole input (checksum and file name) in pattern space, since it may contain newlines:
$!
matches any line except for the last one;N;
appends a newline and the next line to the pattern space. - Escape with a backslash (
) any backslash found.
- Replace with
n
any newline found. - Only if at least a backslash or newline has been replaced (
t x;
: branch to labelx
), a backslash is added at the beginning of the checksum to signalmd5sum -c
that something has to be unescaped; otherwise just quit.
add a comment |
As already pointed out in comments and other answers:
- You are piping into
pv
onlymd5sum
's output: checksums and file names; thus,pv
's progress bar is not able to show how much datamd5sum
is reading. - A size of 4 GB will be of course too much for that. Also, providing
pv
with the size of the file(s) you are piping into it (manually, with-s
) is inconvenient.
Piping the content of your files into pv
and then into md5sum
will give you a progress bar, but file names would be lost.
This code is a not so elegant way to have both—a meaningful progress bar and file names with checksums:
#!/bin/sh
for file in "$@"; do
pv -- "$file" |
md5sum |
sed 's/-$//' |
printf '%s%sn' "$(cat -)" "$file"
done
The script is meant to be invoked as:
./script dir/*
You can of course declare it as a function, to avoid having to type its path to call it (or adding it to your PATH
):
function pvsum () {
for file in "$@"; do
pv -- "$file" | md5sum | sed 's/-$//' | printf '%s%sn' "$(cat -)" "$file"
done
}
This way, the command pvsum dir/* | sort
will be equivalent to your md5sum dir/* | pv -s <size> | sort
.
Its output:
$ ./testscript testdir/*
4.00GiB 0:00:09 [ 446MiB/s] [==============================>] 100%
9dab5f8add1f699bca108f99e5fa5342 testdir/file1
1.00GiB 0:00:02 [ 447MiB/s] [==============================>] 100%
06a738a71e3fd3119922bdac259fe29a testdir/file2
What it does:
- It loops over the given files and, for each:
- Pipes the file from
pv
intomd5sum
, showing the default progress bar.
sed
is used to remove the-
printed bymd5sum
(which is reading from standard input); this also attempts to make the output suitable for being consumed bymd5sum -c
(thanks to frostschutz for pointing out this out)1.- Prints the checksum followed by the file name on standard output.
- Pipes the file from
About sort
:
I'm not sure about your expected results, so I have just ignored it. Since pv
writes its progress bar to standard error, piping everything into sort
will detach pv
's output from md5sum
's output.
Anyway, you can just append | sort
after done
in the code above and check if the result is fine to you.
1 Note that the output from the code shown above will not be suitable for md5sum -c
if file names include newlines. Handling newlines is possible, but some versions of md5sum
behave differently in this respect (see, for instance, answers to this question), making a general solution not easy (and out of the scope of this answer).
Assuming a recent version of md5sum
, an attempt at solving this issue could be:
for file in "$@"; do
pv -- "$file" |
md5sum |
sed 's/-$//' |
printf '%s%sn' "$(cat -)" "$file" |
sed '$! N; s/\/\\/g; s/n/\n/g; t x; q; :x s/^/\/;'
done
Where the only addition, the final sed
, will:
- Put the whole input (checksum and file name) in pattern space, since it may contain newlines:
$!
matches any line except for the last one;N;
appends a newline and the next line to the pattern space. - Escape with a backslash (
) any backslash found.
- Replace with
n
any newline found. - Only if at least a backslash or newline has been replaced (
t x;
: branch to labelx
), a backslash is added at the beginning of the checksum to signalmd5sum -c
that something has to be unescaped; otherwise just quit.
As already pointed out in comments and other answers:
- You are piping into
pv
onlymd5sum
's output: checksums and file names; thus,pv
's progress bar is not able to show how much datamd5sum
is reading. - A size of 4 GB will be of course too much for that. Also, providing
pv
with the size of the file(s) you are piping into it (manually, with-s
) is inconvenient.
Piping the content of your files into pv
and then into md5sum
will give you a progress bar, but file names would be lost.
This code is a not so elegant way to have both—a meaningful progress bar and file names with checksums:
#!/bin/sh
for file in "$@"; do
pv -- "$file" |
md5sum |
sed 's/-$//' |
printf '%s%sn' "$(cat -)" "$file"
done
The script is meant to be invoked as:
./script dir/*
You can of course declare it as a function, to avoid having to type its path to call it (or adding it to your PATH
):
function pvsum () {
for file in "$@"; do
pv -- "$file" | md5sum | sed 's/-$//' | printf '%s%sn' "$(cat -)" "$file"
done
}
This way, the command pvsum dir/* | sort
will be equivalent to your md5sum dir/* | pv -s <size> | sort
.
Its output:
$ ./testscript testdir/*
4.00GiB 0:00:09 [ 446MiB/s] [==============================>] 100%
9dab5f8add1f699bca108f99e5fa5342 testdir/file1
1.00GiB 0:00:02 [ 447MiB/s] [==============================>] 100%
06a738a71e3fd3119922bdac259fe29a testdir/file2
What it does:
- It loops over the given files and, for each:
- Pipes the file from
pv
intomd5sum
, showing the default progress bar.
sed
is used to remove the-
printed bymd5sum
(which is reading from standard input); this also attempts to make the output suitable for being consumed bymd5sum -c
(thanks to frostschutz for pointing out this out)1.- Prints the checksum followed by the file name on standard output.
- Pipes the file from
About sort
:
I'm not sure about your expected results, so I have just ignored it. Since pv
writes its progress bar to standard error, piping everything into sort
will detach pv
's output from md5sum
's output.
Anyway, you can just append | sort
after done
in the code above and check if the result is fine to you.
1 Note that the output from the code shown above will not be suitable for md5sum -c
if file names include newlines. Handling newlines is possible, but some versions of md5sum
behave differently in this respect (see, for instance, answers to this question), making a general solution not easy (and out of the scope of this answer).
Assuming a recent version of md5sum
, an attempt at solving this issue could be:
for file in "$@"; do
pv -- "$file" |
md5sum |
sed 's/-$//' |
printf '%s%sn' "$(cat -)" "$file" |
sed '$! N; s/\/\\/g; s/n/\n/g; t x; q; :x s/^/\/;'
done
Where the only addition, the final sed
, will:
- Put the whole input (checksum and file name) in pattern space, since it may contain newlines:
$!
matches any line except for the last one;N;
appends a newline and the next line to the pattern space. - Escape with a backslash (
) any backslash found.
- Replace with
n
any newline found. - Only if at least a backslash or newline has been replaced (
t x;
: branch to labelx
), a backslash is added at the beginning of the checksum to signalmd5sum -c
that something has to be unescaped; otherwise just quit.
edited Jan 20 at 18:15
answered Jan 19 at 18:23
fra-sanfra-san
1,3971215
1,3971215
add a comment |
add a comment |
I have also enjoyed taming the 'fancy cat', pv
, for md5sum
:-)
- I think my shellscript is rather stable now
- There is a
usage
output, if you do not enter the pattern correctly. - It works with wild cards, but does not recurse into subdirectories
- You can enter more than one pattern, for example
".* *"
- There is a verbosity switch that turns on checking the md5sums
... OK
- You can redirect the relevant output into a file; the process view output of
pv
will stay on the {screen/terminal window} - There are two
pv
processes in a for loop, one global and one for each file, the globalpv
'only counts the files', and the other one measures the speed and amount of data transferred - ANSI escape sequences are used to keep the process view in a stable position
I use the name md5summer
, make the shellscript executable and put it in a directory in PATH (my ~/bin
directory, you may prefer /usr/local/bin
).
#!/bin/bash
# date sign comment
# 20190119 sudodus created md5summer version 1.0
if [ "$1" == "-v" ]
then
verbose=true
shift
else
verbose=false
fi
if [ $# -ne 1 ]
then
echo "Usage: $0 [-v] <pattern>"
echo "Example: $0 '*.iso' # notice the quotes"
echo " $0 -v '*.iso' # verbose"
exit
fi
tmpstr=$(find $1 -maxdepth 0 -type f 2> /dev/null)
if [ "$tmpstr" == "" ]
then
echo "No such file '$1'. Try another pattern!"
exit
fi
tmpdir=$(mktemp -d)
tmpfil="$tmpdir/fil1"
tmpfi2="$tmpdir/fil2"
resetvid="033[0m"
prev2line="033[2F"
next2line="033[2E"
sln=1
cln=0
cnt=0
for i in $1
do
if test -f "$i"
then
cln=$((cln+1))
tmp=$(find -L "$i" -printf "%s")
cnt=$((cnt+tmp))
fi
done
echo "
number of files = $cln
total file size = $cnt B ~ $(($cnt/2**20)) MiB
"
for i in $1
do
if test -f "$i"
then
tmpnam=$(echo -n "$i")
tmpsum=$(< "$i" pv -ptrbs "$cnt" | md5sum)
sleep 0.05
echo "$sln" | pv -ls "$cln" > /dev/null
sleep 0.05
sln="$sln
$i"
sleep 0.05
printf "${tmpsum/-}${tmpnam}n" >> "$tmpfil"
echo -ne "$prev2line" > /dev/stderr
fi
done
sync
sleep 0.1
echo -ne "$next2line" > /dev/stderr
echo "-----"
if $verbose
then
sort -k2 "$tmpfil" | tee "$tmpfi2" | md5sum -c
echo "-----"
cat "$tmpfi2"
else
sort -k2 "$tmpfil"
fi
sleep 0.5
sync
rm -r "$tmpdir"
Demo example
Usage
$ md5summer
Usage: /home/sudodus/bin/md5summer [-v] <pattern>
Example: /home/sudodus/bin/md5summer '*.iso' # notice the quotes
/home/sudodus/bin/md5summer -v '*.iso' # verbose
I tested in this directory
$ ls -1a
.
..
'filename with spaces'
md5summer
md5summer1
md5summer2
subdir
.ttt
zenity-info-message.png
Normal usage plus pattern to see hidden files
$ md5summer ".* *"
number of files = 6
total file size = 12649 B ~ 0 MiB
8,32KiB 0:00:00 [ 156MiB/s] [=============================> ] 67%
6,00 0:00:00 [ 133k/s] [====================================>] 100%
-----
184d0995cc8b6d8070f89f15caee35ce filename with spaces
28227139997996c7838f07cd4c630ffc md5summer
3383b86a0753e486215280f0baf94399 md5summer1
28227139997996c7838f07cd4c630ffc md5summer2
31cd03f64a466e680e9c22fef4bcf14b .ttt
670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png
Verbose output plus pattern to see hidden files
$ md5summer -v ".* *"
number of files = 6
total file size = 12649 B ~ 0 MiB
8,32KiB 0:00:00 [ 184MiB/s] [=============================> ] 67%
6,00 0:00:00 [ 133k/s] [====================================>] 100%
-----
filename with spaces: OK
md5summer: OK
md5summer1: OK
md5summer2: OK
.ttt: OK
zenity-info-message.png: OK
-----
184d0995cc8b6d8070f89f15caee35ce filename with spaces
28227139997996c7838f07cd4c630ffc md5summer
3383b86a0753e486215280f0baf94399 md5summer1
28227139997996c7838f07cd4c630ffc md5summer2
31cd03f64a466e680e9c22fef4bcf14b .ttt
670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png
Redirection to a file, first the screen output
$ md5summer ".* *" > subdir/save
8,32KiB 0:00:00 [ 180MiB/s] [=============================> ] 67%
6,00 0:00:00 [ 162k/s] [====================================>] 100%
and then the saved output
$ cat subdir/save
number of files = 6
total file size = 12649 B ~ 0 MiB
-----
184d0995cc8b6d8070f89f15caee35ce filename with spaces
28227139997996c7838f07cd4c630ffc md5summer
3383b86a0753e486215280f0baf94399 md5summer1
28227139997996c7838f07cd4c630ffc md5summer2
31cd03f64a466e680e9c22fef4bcf14b .ttt
670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png
Checking iso files
$ md5summer "*.iso"
number of files = 10
total file size = 7112491008 B ~ 6783 MiB
28,0MiB 0:00:00 [ 160MiB/s] [> ] 0%
10,0 0:00:00 [ 204k/s] [====================================>] 100%
-----
7a27fdd46a63ba4375896891826c1c88 debian-live-8.6.0-amd64-lxde-desktop.iso
d70eec28cdbdee7f7aa95fb53b9bfdac debian-live-8.7.1-amd64-standard.iso
382cfbe621ca446d12871b8945b50d20 debian-live-8.8.0-amd64-standard.iso
44473dfe2ee1aad0f71506f1d5862457 debian-live-8.8.0-i386-standard.iso
f396b3532fa84059e7738c3c1827bada debian-live-9.3.0-amd64-cinnamon.iso
8f6def28ae7cbefa0a6e59407c884466 debian-live-9.6.0-amd64-cinnamon.iso
90b1815da0a5bf4ee4b00eec2b5d3587 debian-testing-amd64-netinst_2017-07-28.iso
8f75074ab98e166b7469299d3e459ac6 mini-amd64-2016-01-21-daily.iso
e580266fba58eb34b05bf6e13f51a047 mini-jessie-32.iso
646c109a9a16c0527ce1c7afa922e2ed mini-jessie-64.iso
add a comment |
I have also enjoyed taming the 'fancy cat', pv
, for md5sum
:-)
- I think my shellscript is rather stable now
- There is a
usage
output, if you do not enter the pattern correctly. - It works with wild cards, but does not recurse into subdirectories
- You can enter more than one pattern, for example
".* *"
- There is a verbosity switch that turns on checking the md5sums
... OK
- You can redirect the relevant output into a file; the process view output of
pv
will stay on the {screen/terminal window} - There are two
pv
processes in a for loop, one global and one for each file, the globalpv
'only counts the files', and the other one measures the speed and amount of data transferred - ANSI escape sequences are used to keep the process view in a stable position
I use the name md5summer
, make the shellscript executable and put it in a directory in PATH (my ~/bin
directory, you may prefer /usr/local/bin
).
#!/bin/bash
# date sign comment
# 20190119 sudodus created md5summer version 1.0
if [ "$1" == "-v" ]
then
verbose=true
shift
else
verbose=false
fi
if [ $# -ne 1 ]
then
echo "Usage: $0 [-v] <pattern>"
echo "Example: $0 '*.iso' # notice the quotes"
echo " $0 -v '*.iso' # verbose"
exit
fi
tmpstr=$(find $1 -maxdepth 0 -type f 2> /dev/null)
if [ "$tmpstr" == "" ]
then
echo "No such file '$1'. Try another pattern!"
exit
fi
tmpdir=$(mktemp -d)
tmpfil="$tmpdir/fil1"
tmpfi2="$tmpdir/fil2"
resetvid="033[0m"
prev2line="033[2F"
next2line="033[2E"
sln=1
cln=0
cnt=0
for i in $1
do
if test -f "$i"
then
cln=$((cln+1))
tmp=$(find -L "$i" -printf "%s")
cnt=$((cnt+tmp))
fi
done
echo "
number of files = $cln
total file size = $cnt B ~ $(($cnt/2**20)) MiB
"
for i in $1
do
if test -f "$i"
then
tmpnam=$(echo -n "$i")
tmpsum=$(< "$i" pv -ptrbs "$cnt" | md5sum)
sleep 0.05
echo "$sln" | pv -ls "$cln" > /dev/null
sleep 0.05
sln="$sln
$i"
sleep 0.05
printf "${tmpsum/-}${tmpnam}n" >> "$tmpfil"
echo -ne "$prev2line" > /dev/stderr
fi
done
sync
sleep 0.1
echo -ne "$next2line" > /dev/stderr
echo "-----"
if $verbose
then
sort -k2 "$tmpfil" | tee "$tmpfi2" | md5sum -c
echo "-----"
cat "$tmpfi2"
else
sort -k2 "$tmpfil"
fi
sleep 0.5
sync
rm -r "$tmpdir"
Demo example
Usage
$ md5summer
Usage: /home/sudodus/bin/md5summer [-v] <pattern>
Example: /home/sudodus/bin/md5summer '*.iso' # notice the quotes
/home/sudodus/bin/md5summer -v '*.iso' # verbose
I tested in this directory
$ ls -1a
.
..
'filename with spaces'
md5summer
md5summer1
md5summer2
subdir
.ttt
zenity-info-message.png
Normal usage plus pattern to see hidden files
$ md5summer ".* *"
number of files = 6
total file size = 12649 B ~ 0 MiB
8,32KiB 0:00:00 [ 156MiB/s] [=============================> ] 67%
6,00 0:00:00 [ 133k/s] [====================================>] 100%
-----
184d0995cc8b6d8070f89f15caee35ce filename with spaces
28227139997996c7838f07cd4c630ffc md5summer
3383b86a0753e486215280f0baf94399 md5summer1
28227139997996c7838f07cd4c630ffc md5summer2
31cd03f64a466e680e9c22fef4bcf14b .ttt
670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png
Verbose output plus pattern to see hidden files
$ md5summer -v ".* *"
number of files = 6
total file size = 12649 B ~ 0 MiB
8,32KiB 0:00:00 [ 184MiB/s] [=============================> ] 67%
6,00 0:00:00 [ 133k/s] [====================================>] 100%
-----
filename with spaces: OK
md5summer: OK
md5summer1: OK
md5summer2: OK
.ttt: OK
zenity-info-message.png: OK
-----
184d0995cc8b6d8070f89f15caee35ce filename with spaces
28227139997996c7838f07cd4c630ffc md5summer
3383b86a0753e486215280f0baf94399 md5summer1
28227139997996c7838f07cd4c630ffc md5summer2
31cd03f64a466e680e9c22fef4bcf14b .ttt
670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png
Redirection to a file, first the screen output
$ md5summer ".* *" > subdir/save
8,32KiB 0:00:00 [ 180MiB/s] [=============================> ] 67%
6,00 0:00:00 [ 162k/s] [====================================>] 100%
and then the saved output
$ cat subdir/save
number of files = 6
total file size = 12649 B ~ 0 MiB
-----
184d0995cc8b6d8070f89f15caee35ce filename with spaces
28227139997996c7838f07cd4c630ffc md5summer
3383b86a0753e486215280f0baf94399 md5summer1
28227139997996c7838f07cd4c630ffc md5summer2
31cd03f64a466e680e9c22fef4bcf14b .ttt
670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png
Checking iso files
$ md5summer "*.iso"
number of files = 10
total file size = 7112491008 B ~ 6783 MiB
28,0MiB 0:00:00 [ 160MiB/s] [> ] 0%
10,0 0:00:00 [ 204k/s] [====================================>] 100%
-----
7a27fdd46a63ba4375896891826c1c88 debian-live-8.6.0-amd64-lxde-desktop.iso
d70eec28cdbdee7f7aa95fb53b9bfdac debian-live-8.7.1-amd64-standard.iso
382cfbe621ca446d12871b8945b50d20 debian-live-8.8.0-amd64-standard.iso
44473dfe2ee1aad0f71506f1d5862457 debian-live-8.8.0-i386-standard.iso
f396b3532fa84059e7738c3c1827bada debian-live-9.3.0-amd64-cinnamon.iso
8f6def28ae7cbefa0a6e59407c884466 debian-live-9.6.0-amd64-cinnamon.iso
90b1815da0a5bf4ee4b00eec2b5d3587 debian-testing-amd64-netinst_2017-07-28.iso
8f75074ab98e166b7469299d3e459ac6 mini-amd64-2016-01-21-daily.iso
e580266fba58eb34b05bf6e13f51a047 mini-jessie-32.iso
646c109a9a16c0527ce1c7afa922e2ed mini-jessie-64.iso
add a comment |
I have also enjoyed taming the 'fancy cat', pv
, for md5sum
:-)
- I think my shellscript is rather stable now
- There is a
usage
output, if you do not enter the pattern correctly. - It works with wild cards, but does not recurse into subdirectories
- You can enter more than one pattern, for example
".* *"
- There is a verbosity switch that turns on checking the md5sums
... OK
- You can redirect the relevant output into a file; the process view output of
pv
will stay on the {screen/terminal window} - There are two
pv
processes in a for loop, one global and one for each file, the globalpv
'only counts the files', and the other one measures the speed and amount of data transferred - ANSI escape sequences are used to keep the process view in a stable position
I use the name md5summer
, make the shellscript executable and put it in a directory in PATH (my ~/bin
directory, you may prefer /usr/local/bin
).
#!/bin/bash
# date sign comment
# 20190119 sudodus created md5summer version 1.0
if [ "$1" == "-v" ]
then
verbose=true
shift
else
verbose=false
fi
if [ $# -ne 1 ]
then
echo "Usage: $0 [-v] <pattern>"
echo "Example: $0 '*.iso' # notice the quotes"
echo " $0 -v '*.iso' # verbose"
exit
fi
tmpstr=$(find $1 -maxdepth 0 -type f 2> /dev/null)
if [ "$tmpstr" == "" ]
then
echo "No such file '$1'. Try another pattern!"
exit
fi
tmpdir=$(mktemp -d)
tmpfil="$tmpdir/fil1"
tmpfi2="$tmpdir/fil2"
resetvid="033[0m"
prev2line="033[2F"
next2line="033[2E"
sln=1
cln=0
cnt=0
for i in $1
do
if test -f "$i"
then
cln=$((cln+1))
tmp=$(find -L "$i" -printf "%s")
cnt=$((cnt+tmp))
fi
done
echo "
number of files = $cln
total file size = $cnt B ~ $(($cnt/2**20)) MiB
"
for i in $1
do
if test -f "$i"
then
tmpnam=$(echo -n "$i")
tmpsum=$(< "$i" pv -ptrbs "$cnt" | md5sum)
sleep 0.05
echo "$sln" | pv -ls "$cln" > /dev/null
sleep 0.05
sln="$sln
$i"
sleep 0.05
printf "${tmpsum/-}${tmpnam}n" >> "$tmpfil"
echo -ne "$prev2line" > /dev/stderr
fi
done
sync
sleep 0.1
echo -ne "$next2line" > /dev/stderr
echo "-----"
if $verbose
then
sort -k2 "$tmpfil" | tee "$tmpfi2" | md5sum -c
echo "-----"
cat "$tmpfi2"
else
sort -k2 "$tmpfil"
fi
sleep 0.5
sync
rm -r "$tmpdir"
Demo example
Usage
$ md5summer
Usage: /home/sudodus/bin/md5summer [-v] <pattern>
Example: /home/sudodus/bin/md5summer '*.iso' # notice the quotes
/home/sudodus/bin/md5summer -v '*.iso' # verbose
I tested in this directory
$ ls -1a
.
..
'filename with spaces'
md5summer
md5summer1
md5summer2
subdir
.ttt
zenity-info-message.png
Normal usage plus pattern to see hidden files
$ md5summer ".* *"
number of files = 6
total file size = 12649 B ~ 0 MiB
8,32KiB 0:00:00 [ 156MiB/s] [=============================> ] 67%
6,00 0:00:00 [ 133k/s] [====================================>] 100%
-----
184d0995cc8b6d8070f89f15caee35ce filename with spaces
28227139997996c7838f07cd4c630ffc md5summer
3383b86a0753e486215280f0baf94399 md5summer1
28227139997996c7838f07cd4c630ffc md5summer2
31cd03f64a466e680e9c22fef4bcf14b .ttt
670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png
Verbose output plus pattern to see hidden files
$ md5summer -v ".* *"
number of files = 6
total file size = 12649 B ~ 0 MiB
8,32KiB 0:00:00 [ 184MiB/s] [=============================> ] 67%
6,00 0:00:00 [ 133k/s] [====================================>] 100%
-----
filename with spaces: OK
md5summer: OK
md5summer1: OK
md5summer2: OK
.ttt: OK
zenity-info-message.png: OK
-----
184d0995cc8b6d8070f89f15caee35ce filename with spaces
28227139997996c7838f07cd4c630ffc md5summer
3383b86a0753e486215280f0baf94399 md5summer1
28227139997996c7838f07cd4c630ffc md5summer2
31cd03f64a466e680e9c22fef4bcf14b .ttt
670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png
Redirection to a file, first the screen output
$ md5summer ".* *" > subdir/save
8,32KiB 0:00:00 [ 180MiB/s] [=============================> ] 67%
6,00 0:00:00 [ 162k/s] [====================================>] 100%
and then the saved output
$ cat subdir/save
number of files = 6
total file size = 12649 B ~ 0 MiB
-----
184d0995cc8b6d8070f89f15caee35ce filename with spaces
28227139997996c7838f07cd4c630ffc md5summer
3383b86a0753e486215280f0baf94399 md5summer1
28227139997996c7838f07cd4c630ffc md5summer2
31cd03f64a466e680e9c22fef4bcf14b .ttt
670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png
Checking iso files
$ md5summer "*.iso"
number of files = 10
total file size = 7112491008 B ~ 6783 MiB
28,0MiB 0:00:00 [ 160MiB/s] [> ] 0%
10,0 0:00:00 [ 204k/s] [====================================>] 100%
-----
7a27fdd46a63ba4375896891826c1c88 debian-live-8.6.0-amd64-lxde-desktop.iso
d70eec28cdbdee7f7aa95fb53b9bfdac debian-live-8.7.1-amd64-standard.iso
382cfbe621ca446d12871b8945b50d20 debian-live-8.8.0-amd64-standard.iso
44473dfe2ee1aad0f71506f1d5862457 debian-live-8.8.0-i386-standard.iso
f396b3532fa84059e7738c3c1827bada debian-live-9.3.0-amd64-cinnamon.iso
8f6def28ae7cbefa0a6e59407c884466 debian-live-9.6.0-amd64-cinnamon.iso
90b1815da0a5bf4ee4b00eec2b5d3587 debian-testing-amd64-netinst_2017-07-28.iso
8f75074ab98e166b7469299d3e459ac6 mini-amd64-2016-01-21-daily.iso
e580266fba58eb34b05bf6e13f51a047 mini-jessie-32.iso
646c109a9a16c0527ce1c7afa922e2ed mini-jessie-64.iso
I have also enjoyed taming the 'fancy cat', pv
, for md5sum
:-)
- I think my shellscript is rather stable now
- There is a
usage
output, if you do not enter the pattern correctly. - It works with wild cards, but does not recurse into subdirectories
- You can enter more than one pattern, for example
".* *"
- There is a verbosity switch that turns on checking the md5sums
... OK
- You can redirect the relevant output into a file; the process view output of
pv
will stay on the {screen/terminal window} - There are two
pv
processes in a for loop, one global and one for each file, the globalpv
'only counts the files', and the other one measures the speed and amount of data transferred - ANSI escape sequences are used to keep the process view in a stable position
I use the name md5summer
, make the shellscript executable and put it in a directory in PATH (my ~/bin
directory, you may prefer /usr/local/bin
).
#!/bin/bash
# date sign comment
# 20190119 sudodus created md5summer version 1.0
if [ "$1" == "-v" ]
then
verbose=true
shift
else
verbose=false
fi
if [ $# -ne 1 ]
then
echo "Usage: $0 [-v] <pattern>"
echo "Example: $0 '*.iso' # notice the quotes"
echo " $0 -v '*.iso' # verbose"
exit
fi
tmpstr=$(find $1 -maxdepth 0 -type f 2> /dev/null)
if [ "$tmpstr" == "" ]
then
echo "No such file '$1'. Try another pattern!"
exit
fi
tmpdir=$(mktemp -d)
tmpfil="$tmpdir/fil1"
tmpfi2="$tmpdir/fil2"
resetvid="033[0m"
prev2line="033[2F"
next2line="033[2E"
sln=1
cln=0
cnt=0
for i in $1
do
if test -f "$i"
then
cln=$((cln+1))
tmp=$(find -L "$i" -printf "%s")
cnt=$((cnt+tmp))
fi
done
echo "
number of files = $cln
total file size = $cnt B ~ $(($cnt/2**20)) MiB
"
for i in $1
do
if test -f "$i"
then
tmpnam=$(echo -n "$i")
tmpsum=$(< "$i" pv -ptrbs "$cnt" | md5sum)
sleep 0.05
echo "$sln" | pv -ls "$cln" > /dev/null
sleep 0.05
sln="$sln
$i"
sleep 0.05
printf "${tmpsum/-}${tmpnam}n" >> "$tmpfil"
echo -ne "$prev2line" > /dev/stderr
fi
done
sync
sleep 0.1
echo -ne "$next2line" > /dev/stderr
echo "-----"
if $verbose
then
sort -k2 "$tmpfil" | tee "$tmpfi2" | md5sum -c
echo "-----"
cat "$tmpfi2"
else
sort -k2 "$tmpfil"
fi
sleep 0.5
sync
rm -r "$tmpdir"
Demo example
Usage
$ md5summer
Usage: /home/sudodus/bin/md5summer [-v] <pattern>
Example: /home/sudodus/bin/md5summer '*.iso' # notice the quotes
/home/sudodus/bin/md5summer -v '*.iso' # verbose
I tested in this directory
$ ls -1a
.
..
'filename with spaces'
md5summer
md5summer1
md5summer2
subdir
.ttt
zenity-info-message.png
Normal usage plus pattern to see hidden files
$ md5summer ".* *"
number of files = 6
total file size = 12649 B ~ 0 MiB
8,32KiB 0:00:00 [ 156MiB/s] [=============================> ] 67%
6,00 0:00:00 [ 133k/s] [====================================>] 100%
-----
184d0995cc8b6d8070f89f15caee35ce filename with spaces
28227139997996c7838f07cd4c630ffc md5summer
3383b86a0753e486215280f0baf94399 md5summer1
28227139997996c7838f07cd4c630ffc md5summer2
31cd03f64a466e680e9c22fef4bcf14b .ttt
670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png
Verbose output plus pattern to see hidden files
$ md5summer -v ".* *"
number of files = 6
total file size = 12649 B ~ 0 MiB
8,32KiB 0:00:00 [ 184MiB/s] [=============================> ] 67%
6,00 0:00:00 [ 133k/s] [====================================>] 100%
-----
filename with spaces: OK
md5summer: OK
md5summer1: OK
md5summer2: OK
.ttt: OK
zenity-info-message.png: OK
-----
184d0995cc8b6d8070f89f15caee35ce filename with spaces
28227139997996c7838f07cd4c630ffc md5summer
3383b86a0753e486215280f0baf94399 md5summer1
28227139997996c7838f07cd4c630ffc md5summer2
31cd03f64a466e680e9c22fef4bcf14b .ttt
670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png
Redirection to a file, first the screen output
$ md5summer ".* *" > subdir/save
8,32KiB 0:00:00 [ 180MiB/s] [=============================> ] 67%
6,00 0:00:00 [ 162k/s] [====================================>] 100%
and then the saved output
$ cat subdir/save
number of files = 6
total file size = 12649 B ~ 0 MiB
-----
184d0995cc8b6d8070f89f15caee35ce filename with spaces
28227139997996c7838f07cd4c630ffc md5summer
3383b86a0753e486215280f0baf94399 md5summer1
28227139997996c7838f07cd4c630ffc md5summer2
31cd03f64a466e680e9c22fef4bcf14b .ttt
670b8db45e57723b5f1b8a63399cdfa1 zenity-info-message.png
Checking iso files
$ md5summer "*.iso"
number of files = 10
total file size = 7112491008 B ~ 6783 MiB
28,0MiB 0:00:00 [ 160MiB/s] [> ] 0%
10,0 0:00:00 [ 204k/s] [====================================>] 100%
-----
7a27fdd46a63ba4375896891826c1c88 debian-live-8.6.0-amd64-lxde-desktop.iso
d70eec28cdbdee7f7aa95fb53b9bfdac debian-live-8.7.1-amd64-standard.iso
382cfbe621ca446d12871b8945b50d20 debian-live-8.8.0-amd64-standard.iso
44473dfe2ee1aad0f71506f1d5862457 debian-live-8.8.0-i386-standard.iso
f396b3532fa84059e7738c3c1827bada debian-live-9.3.0-amd64-cinnamon.iso
8f6def28ae7cbefa0a6e59407c884466 debian-live-9.6.0-amd64-cinnamon.iso
90b1815da0a5bf4ee4b00eec2b5d3587 debian-testing-amd64-netinst_2017-07-28.iso
8f75074ab98e166b7469299d3e459ac6 mini-amd64-2016-01-21-daily.iso
e580266fba58eb34b05bf6e13f51a047 mini-jessie-32.iso
646c109a9a16c0527ce1c7afa922e2ed mini-jessie-64.iso
edited Jan 20 at 17:04
answered Jan 20 at 3:21
sudodussudodus
1,32016
1,32016
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f495477%2fusing-pv-with-md5sum%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
It's likely a buffering issue. That is, the output from
md5sum
is not line-buffered and won't arrive atpv
until the process is done or has produced enough data to fill the output buffer. I can't see an option in themd5sum
manual to make it line-buffered. Or, you are misunderstanding what is happening, which is that the data sent throughpv
is only the checksums (and filenames). Alsopv
does not know how much data to expect, so it can't say how much is left.– Kusalananda
Jan 19 at 16:42
It seems like only the checksums and filenames are going thorough
pv
(but this doesn't seem to affect anyone else?). Is there are way to make all of the file data go throughpv
?– EmmaV
Jan 19 at 16:49
The issue with that is that you would loose the filename. Think of
pv
as a "fancycat
". Usingcat file | md5sum
, you would get the MD5 hash for a single file, butmd5sum
has no way of tagging the result with a filename.– Kusalananda
Jan 19 at 16:51
1
You are using
pv
to rate the output of md5sum (which is a few bytes) and notmd5sum
's own progress of reading the files themselves. Maybe this answer is related: unix.stackexchange.com/q/16826/30851 (on second thought, maybe not - it's about textfiles...)– frostschutz
Jan 19 at 17:05
2
Since you are not feeding 4Gb of data down the pipe, but just the output of
md5sum
for a plurality of files, changing the-s 4g
option such that it reflects an estimate of the size ofmd5sum
's output, e.g.-s 512
, should be a step in the right direction.– ozzy
Jan 19 at 17:12