Count lines containing word

I have a file with multiple lines. I want to know, for each word that appears in the total file, how many lines contain that word, for example:

0 hello world the man is world

1 this is the world

2 a different man is the possible one

The result I'm expecting is:

0:1

1:1

2:1

a:1

different:1

hello:1

is:3

man:2

one:1

possible:1

the:3

this:1

world:2

Note that the count for "world" is 2, not 3, since the word appears on 2 lines. Because of this, translating blanks to newline chars wouldn't be the exact solution.

edited yesterday

Jeff Schaller

39k1053125

asked yesterday

Netzsooc

586

New contributor

What have you try to the moment?
– Romeo Ninov
yesterday

This seems highly relevant: unix.stackexchange.com/a/332890/224077
– Panki
yesterday

add a comment |

I have a file with multiple lines. I want to know, for each word that appears in the total file, how many lines contain that word, for example:

0 hello world the man is world

1 this is the world

2 a different man is the possible one

The result I'm expecting is:

0:1

1:1

2:1

a:1

different:1

hello:1

is:3

man:2

one:1

possible:1

the:3

this:1

world:2

Note that the count for "world" is 2, not 3, since the word appears on 2 lines. Because of this, translating blanks to newline chars wouldn't be the exact solution.

edited yesterday

Jeff Schaller

39k1053125

asked yesterday

Netzsooc

586

New contributor

What have you try to the moment?
– Romeo Ninov
yesterday

This seems highly relevant: unix.stackexchange.com/a/332890/224077
– Panki
yesterday

add a comment |

I have a file with multiple lines. I want to know, for each word that appears in the total file, how many lines contain that word, for example:

0 hello world the man is world

1 this is the world

2 a different man is the possible one

The result I'm expecting is:

0:1

1:1

2:1

a:1

different:1

hello:1

is:3

man:2

one:1

possible:1

the:3

this:1

world:2

Note that the count for "world" is 2, not 3, since the word appears on 2 lines. Because of this, translating blanks to newline chars wouldn't be the exact solution.

edited yesterday

Jeff Schaller

39k1053125

asked yesterday

Netzsooc

586

New contributor

I have a file with multiple lines. I want to know, for each word that appears in the total file, how many lines contain that word, for example:

0 hello world the man is world

1 this is the world

2 a different man is the possible one

The result I'm expecting is:

0:1

1:1

2:1

a:1

different:1

hello:1

is:3

man:2

one:1

possible:1

the:3

this:1

world:2

Note that the count for "world" is 2, not 3, since the word appears on 2 lines. Because of this, translating blanks to newline chars wouldn't be the exact solution.

text-processing

edited yesterday

Jeff Schaller

39k1053125

asked yesterday

Netzsooc

586

New contributor

edited yesterday

Jeff Schaller

39k1053125

asked yesterday

Netzsooc

586

New contributor

edited yesterday

Jeff Schaller

39k1053125

edited yesterday

Jeff Schaller

39k1053125

edited yesterday

Jeff Schaller

39k1053125

asked yesterday

Netzsooc

586

New contributor

asked yesterday

Netzsooc

586

asked yesterday

Netzsooc

586

New contributor

Netzsooc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

What have you try to the moment?
– Romeo Ninov
yesterday

This seems highly relevant: unix.stackexchange.com/a/332890/224077
– Panki
yesterday

add a comment |

What have you try to the moment?
– Romeo Ninov
yesterday

This seems highly relevant: unix.stackexchange.com/a/332890/224077
– Panki
yesterday

What have you try to the moment?
– Romeo Ninov
yesterday

This seems highly relevant: unix.stackexchange.com/a/332890/224077
– Panki
yesterday

add a comment |

8 Answers
8

active

oldest

votes

Another Perl variant, using List::Util

$ perl -MList::Util=uniq -alne '

  map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}

' file

0: 1

1: 1

2: 1

a: 1

different: 1

hello: 1

is: 3

man: 2

one: 1

possible: 1

the: 3

this: 1

world: 2

answered yesterday

steeldriver

34.6k35083

add a comment |

It's a pretty straight-forward perl script:

#!/usr/bin/perl -w

use strict;



my %words = ();

while (<>) {

  chomp;

  my %linewords = ();

  map { $linewords{$_}=1 } split / /;

  foreach my $word (keys %linewords) {

    $words{$word}++;

  }

}



foreach my $word (sort keys %words) {

  print "$word:$words{$word}n";

}

The basic idea is to loop over the input; for each line, split it into words, then save those words into a hash (associative array) in order to remove any duplicates, then loop over that array of words and add one to an overall counter for that word. At the end, report on the words and their counts.

answered yesterday

Jeff Schaller

39k1053125

1

A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
– Larry
yesterday

add a comment |

Straightfoward-ish in bash:

declare -A wordcount

while read -ra words; do 

    # unique words on this line

    declare -A uniq

    for word in "${words[@]}"; do 

        uniq[$word]=1

    done

    # accumulate the words

    for word in "${!uniq[@]}"; do 

        ((wordcount[$word]++))

    done

    unset uniq

done < file

Looking at the data:

$ declare -p wordcount

declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'

and formatting as you want:

$ printf "%sn" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done

0:1

1:1

2:1

a:1

different:1

hello:1

is:3

man:2

one:1

possible:1

the:3

this:1

world:2

answered yesterday

glenn jackman

50.4k570107

add a comment |

A solution that calls several programs from a shell:

fmt -1 words.txt | sort -u | xargs -Ipattern sh -c 'echo "pattern:$(grep -cw pattern words.txt)"'

A little explanation:

The fmt -1 words.txt prints out all the words, 1 per line, and the | sort -u sorts this output and extracts only the unique words from it.

In order to count the occurences of a word in a file, one can use grep (a tool meant to search files for patterns). By passing the -cw option, grep gives the number of word matches it finds. So you can find the total number of occurrences of pattern using grep -cw pattern words.txt.

The tool xargs allows us to do this for each and every single word output by sort. The -Ipattern means that it will execute the following command multiple times, replacing each occurrence of pattern with a word it reads from standard input, which is what it gets from sort.

The indirection with sh is needed because xargs only knows how to execute a single program, given it's name, passing everything else as arguments to it. xargs does not handle things like command substitution. The $(...) is command substitution in the above snippet, as it substitutes the output from grep into echo, allowing it to be formatted correctly. Since we need the command substitution, we must use the sh -c command which runs whatever it recieves as an argument in its own shell.

edited yesterday

vikarjramun

1428

answered yesterday

Larry

1065

An optimisation to this approach: fmt -1 words.txt | sort | uniq -c | awk '{ print $2 ":" $1 }'
– matja
yesterday

@matja is sort | uniq -c more efficient than sort -u?
– vikarjramun
yesterday

vikarjramun@ no, but uniq -c gives you the counts of each word in one pass, so you don't have to use xargs to do multiple passes of the input file for each word.
– matja
19 hours ago

1

@matja: I actually made the answer you provided before the current one. However, it does not do what OP asked for. I misread the question at first entirely as well, and was corrected by glenn jackman. What you are suggesting would count every occurrence of each word. What OP asked for is to count the number of lines each word occurs in at least once.
– Larry
19 hours ago

add a comment |

Another simple alternative would be to use Python (>3.6). This solution has the same problem as the one mentioned by @Larry in his comment.

from collections import Counter



with open("words.txt") as f:

    c = Counter(word for line in [line.strip().split() for line in f] for word in set(line))

    for word, occurrence in sorted(c.items()):

        print(f'{word}:{occurrence}')

        # for Python 2.7.x compatibility you can replace the above line with 

        # the following one:

        # print('{}:{}'.format(word, occurrence))

A more explicit version version of the above:

from collections import Counter





FILENAME = "words.txt"





def find_unique_words():

    with open(FILENAME) as f:

        lines = [line.strip().split() for line in f]



    unique_words = Counter(word for line in lines for word in set(line))

    return sorted(unique_words.items())





def print_unique_words():

    unique_words = find_unique_words()

    for word, occurrence in unique_words:

        print(f'{word}:{occurrence}')





def main():

    print_unique_words()





if __name__ == '__main__':

    main()

Output:

0:1

1:1

2:1

a:1

different:1

hello:1

is:3

man:2

one:1

possible:1

the:3

this:1

world:2

The above also assumes that words.txt is on the same directory as script.py. Note that this is not much different from other solutions provided here, but perhaps somebody will find it useful.

edited 17 hours ago

David Foerster

951616

answered yesterday

яүυк

1216

add a comment |

Trying to do it with awk:

count.awk:

#!/usr/bin/awk -f

# count line containing word



{

    for (i = 1 ; i <= NF ; i++) {

        word_in_a_line[$i] ++

        if (word_in_a_line[$i] == 1) {

            word_line_count[$i] ++

        }

    }



    delete word_in_a_line

}



END {

    for (word in word_line_count){

        printf "%s:%dn",word,word_line_count[word]

    }

}

Run it by:

$ awk -f count.awk ./test.data | sort

answered 4 hours ago

Charles

1567

add a comment |

A pure bash answer

echo "0 hello world the man is world

1 this is the world

2 a different man is the possible one" | while IFS=$'n' read -r line; do echo $line | tr ' ' 'n' | sort -u; done | sort | uniq -c





   1 0

   1 1

   1 2

   1 a

   1 different

   1 hello

   3 is

   2 man

   1 one

   1 possible

   3 the

   1 this

   2 world

I looped unique words on each line and passed it to uniq -c

edit: I did not see glenn's answer. I found it strange to not see a bash answer

edited 56 mins ago

answered 1 hour ago

user1462442

1214

add a comment |

-2

Simple, though doesn't care if it reads the file many times:

sed 's/ /n/g' file.txt | sort | uniq | while read word; do

  printf "%s:%dn" "$word" "$(grep -Fw $word file.txt | wc -l)"

done

answered yesterday

JoL

995310

1

Read the question again. It literally says translating blanks to newline chars wouldn't be the exact solution.
– Sparhawk
19 hours ago

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Netzsooc is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492501%2fcount-lines-containing-word%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

8 Answers
8

active

oldest

votes

8 Answers
8

active

oldest

votes

Another Perl variant, using List::Util

$ perl -MList::Util=uniq -alne '

  map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}

' file

0: 1

1: 1

2: 1

a: 1

different: 1

hello: 1

is: 3

man: 2

one: 1

possible: 1

the: 3

this: 1

world: 2

answered yesterday

steeldriver

34.6k35083

add a comment |

Another Perl variant, using List::Util

$ perl -MList::Util=uniq -alne '

  map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}

' file

0: 1

1: 1

2: 1

a: 1

different: 1

hello: 1

is: 3

man: 2

one: 1

possible: 1

the: 3

this: 1

world: 2

answered yesterday

steeldriver

34.6k35083

add a comment |

Another Perl variant, using List::Util

$ perl -MList::Util=uniq -alne '

  map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}

' file

0: 1

1: 1

2: 1

a: 1

different: 1

hello: 1

is: 3

man: 2

one: 1

possible: 1

the: 3

this: 1

world: 2

answered yesterday

steeldriver

34.6k35083

Another Perl variant, using List::Util

$ perl -MList::Util=uniq -alne '

  map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}

' file

0: 1

1: 1

2: 1

a: 1

different: 1

hello: 1

is: 3

man: 2

one: 1

possible: 1

the: 3

this: 1

world: 2

answered yesterday

steeldriver

34.6k35083

answered yesterday

steeldriver

34.6k35083

answered yesterday

steeldriver

34.6k35083

answered yesterday

steeldriver

34.6k35083

add a comment |

It's a pretty straight-forward perl script:

#!/usr/bin/perl -w

use strict;



my %words = ();

while (<>) {

  chomp;

  my %linewords = ();

  map { $linewords{$_}=1 } split / /;

  foreach my $word (keys %linewords) {

    $words{$word}++;

  }

}



foreach my $word (sort keys %words) {

  print "$word:$words{$word}n";

}

answered yesterday

Jeff Schaller

39k1053125

1

A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
– Larry
yesterday

add a comment |

It's a pretty straight-forward perl script:

#!/usr/bin/perl -w

use strict;



my %words = ();

while (<>) {

  chomp;

  my %linewords = ();

  map { $linewords{$_}=1 } split / /;

  foreach my $word (keys %linewords) {

    $words{$word}++;

  }

}



foreach my $word (sort keys %words) {

  print "$word:$words{$word}n";

}

answered yesterday

Jeff Schaller

39k1053125

1

A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
– Larry
yesterday

add a comment |

It's a pretty straight-forward perl script:

#!/usr/bin/perl -w

use strict;



my %words = ();

while (<>) {

  chomp;

  my %linewords = ();

  map { $linewords{$_}=1 } split / /;

  foreach my $word (keys %linewords) {

    $words{$word}++;

  }

}



foreach my $word (sort keys %words) {

  print "$word:$words{$word}n";

}

answered yesterday

Jeff Schaller

39k1053125

It's a pretty straight-forward perl script:

#!/usr/bin/perl -w

use strict;



my %words = ();

while (<>) {

  chomp;

  my %linewords = ();

  map { $linewords{$_}=1 } split / /;

  foreach my $word (keys %linewords) {

    $words{$word}++;

  }

}



foreach my $word (sort keys %words) {

  print "$word:$words{$word}n";

}

answered yesterday

Jeff Schaller

39k1053125

answered yesterday

Jeff Schaller

39k1053125

answered yesterday

Jeff Schaller

39k1053125

answered yesterday

Jeff Schaller

39k1053125

1

A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
– Larry
yesterday

add a comment |

1

A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
– Larry
yesterday

A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
– Larry
yesterday

add a comment |

Straightfoward-ish in bash:

declare -A wordcount

while read -ra words; do 

    # unique words on this line

    declare -A uniq

    for word in "${words[@]}"; do 

        uniq[$word]=1

    done

    # accumulate the words

    for word in "${!uniq[@]}"; do 

        ((wordcount[$word]++))

    done

    unset uniq

done < file

Looking at the data:

$ declare -p wordcount

declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'

and formatting as you want:

$ printf "%sn" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done

0:1

1:1

2:1

a:1

different:1

hello:1

is:3

man:2

one:1

possible:1

the:3

this:1

world:2

answered yesterday

glenn jackman

50.4k570107

add a comment |

Straightfoward-ish in bash:

declare -A wordcount

while read -ra words; do 

    # unique words on this line

    declare -A uniq

    for word in "${words[@]}"; do 

        uniq[$word]=1

    done

    # accumulate the words

    for word in "${!uniq[@]}"; do 

        ((wordcount[$word]++))

    done

    unset uniq

done < file

Looking at the data:

$ declare -p wordcount

declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'

and formatting as you want:

$ printf "%sn" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done

0:1

1:1

2:1

a:1

different:1

hello:1

is:3

man:2

one:1

possible:1

the:3

this:1

world:2

answered yesterday

glenn jackman

50.4k570107

add a comment |

Straightfoward-ish in bash:

declare -A wordcount

while read -ra words; do 

    # unique words on this line

    declare -A uniq

    for word in "${words[@]}"; do 

        uniq[$word]=1

    done

    # accumulate the words

    for word in "${!uniq[@]}"; do 

        ((wordcount[$word]++))

    done

    unset uniq

done < file

Looking at the data:

$ declare -p wordcount

declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'

and formatting as you want:

$ printf "%sn" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done

0:1

1:1

2:1

a:1

different:1

hello:1

is:3

man:2

one:1

possible:1

the:3

this:1

world:2

answered yesterday

glenn jackman

50.4k570107

Straightfoward-ish in bash:

declare -A wordcount

while read -ra words; do 

    # unique words on this line

    declare -A uniq

    for word in "${words[@]}"; do 

        uniq[$word]=1

    done

    # accumulate the words

    for word in "${!uniq[@]}"; do 

        ((wordcount[$word]++))

    done

    unset uniq

done < file

Looking at the data:

$ declare -p wordcount

declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'

and formatting as you want:

$ printf "%sn" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done

0:1

1:1

2:1

a:1

different:1

hello:1

is:3

man:2

one:1

possible:1

the:3

this:1

world:2

answered yesterday

glenn jackman

50.4k570107

answered yesterday

glenn jackman

50.4k570107

answered yesterday

glenn jackman

50.4k570107

answered yesterday

glenn jackman

50.4k570107

add a comment |

A solution that calls several programs from a shell:

fmt -1 words.txt | sort -u | xargs -Ipattern sh -c 'echo "pattern:$(grep -cw pattern words.txt)"'

A little explanation:

The fmt -1 words.txt prints out all the words, 1 per line, and the | sort -u sorts this output and extracts only the unique words from it.

edited yesterday

vikarjramun

1428

answered yesterday

Larry

1065

An optimisation to this approach: fmt -1 words.txt | sort | uniq -c | awk '{ print $2 ":" $1 }'
– matja
yesterday

@matja is sort | uniq -c more efficient than sort -u?
– vikarjramun
yesterday

vikarjramun@ no, but uniq -c gives you the counts of each word in one pass, so you don't have to use xargs to do multiple passes of the input file for each word.
– matja
19 hours ago

1

@matja: I actually made the answer you provided before the current one. However, it does not do what OP asked for. I misread the question at first entirely as well, and was corrected by glenn jackman. What you are suggesting would count every occurrence of each word. What OP asked for is to count the number of lines each word occurs in at least once.
– Larry
19 hours ago

add a comment |

A solution that calls several programs from a shell:

fmt -1 words.txt | sort -u | xargs -Ipattern sh -c 'echo "pattern:$(grep -cw pattern words.txt)"'

A little explanation:

The fmt -1 words.txt prints out all the words, 1 per line, and the | sort -u sorts this output and extracts only the unique words from it.

edited yesterday

vikarjramun

1428

answered yesterday

Larry

1065

An optimisation to this approach: fmt -1 words.txt | sort | uniq -c | awk '{ print $2 ":" $1 }'
– matja
yesterday

@matja is sort | uniq -c more efficient than sort -u?
– vikarjramun
yesterday

vikarjramun@ no, but uniq -c gives you the counts of each word in one pass, so you don't have to use xargs to do multiple passes of the input file for each word.
– matja
19 hours ago

1

@matja: I actually made the answer you provided before the current one. However, it does not do what OP asked for. I misread the question at first entirely as well, and was corrected by glenn jackman. What you are suggesting would count every occurrence of each word. What OP asked for is to count the number of lines each word occurs in at least once.
– Larry
19 hours ago

add a comment |

A solution that calls several programs from a shell:

fmt -1 words.txt | sort -u | xargs -Ipattern sh -c 'echo "pattern:$(grep -cw pattern words.txt)"'

A little explanation:

The fmt -1 words.txt prints out all the words, 1 per line, and the | sort -u sorts this output and extracts only the unique words from it.

edited yesterday

vikarjramun

1428

answered yesterday

Larry

1065

A solution that calls several programs from a shell:

fmt -1 words.txt | sort -u | xargs -Ipattern sh -c 'echo "pattern:$(grep -cw pattern words.txt)"'

A little explanation:

The fmt -1 words.txt prints out all the words, 1 per line, and the | sort -u sorts this output and extracts only the unique words from it.

edited yesterday

vikarjramun

1428

answered yesterday

Larry

1065

edited yesterday

vikarjramun

1428

edited yesterday

vikarjramun

1428

edited yesterday

vikarjramun

1428

answered yesterday

Larry

1065

answered yesterday

Larry

1065

answered yesterday

Larry

1065

An optimisation to this approach: fmt -1 words.txt | sort | uniq -c | awk '{ print $2 ":" $1 }'
– matja
yesterday

@matja is sort | uniq -c more efficient than sort -u?
– vikarjramun
yesterday

vikarjramun@ no, but uniq -c gives you the counts of each word in one pass, so you don't have to use xargs to do multiple passes of the input file for each word.
– matja
19 hours ago

1

@matja: I actually made the answer you provided before the current one. However, it does not do what OP asked for. I misread the question at first entirely as well, and was corrected by glenn jackman. What you are suggesting would count every occurrence of each word. What OP asked for is to count the number of lines each word occurs in at least once.
– Larry
19 hours ago

add a comment |

An optimisation to this approach: fmt -1 words.txt | sort | uniq -c | awk '{ print $2 ":" $1 }'
– matja
yesterday

@matja is sort | uniq -c more efficient than sort -u?
– vikarjramun
yesterday

vikarjramun@ no, but uniq -c gives you the counts of each word in one pass, so you don't have to use xargs to do multiple passes of the input file for each word.
– matja
19 hours ago

1

@matja: I actually made the answer you provided before the current one. However, it does not do what OP asked for. I misread the question at first entirely as well, and was corrected by glenn jackman. What you are suggesting would count every occurrence of each word. What OP asked for is to count the number of lines each word occurs in at least once.
– Larry
19 hours ago

An optimisation to this approach: fmt -1 words.txt | sort | uniq -c | awk '{ print $2 ":" $1 }'
– matja
yesterday

@matja is sort | uniq -c more efficient than sort -u?
– vikarjramun
yesterday

vikarjramun@ no, but uniq -c gives you the counts of each word in one pass, so you don't have to use xargs to do multiple passes of the input file for each word.
– matja
19 hours ago

@matja: I actually made the answer you provided before the current one. However, it does not do what OP asked for. I misread the question at first entirely as well, and was corrected by glenn jackman. What you are suggesting would count every occurrence of each word. What OP asked for is to count the number of lines each word occurs in at least once.
– Larry
19 hours ago

add a comment |

Another simple alternative would be to use Python (>3.6). This solution has the same problem as the one mentioned by @Larry in his comment.

from collections import Counter



with open("words.txt") as f:

    c = Counter(word for line in [line.strip().split() for line in f] for word in set(line))

    for word, occurrence in sorted(c.items()):

        print(f'{word}:{occurrence}')

        # for Python 2.7.x compatibility you can replace the above line with 

        # the following one:

        # print('{}:{}'.format(word, occurrence))

A more explicit version version of the above:

from collections import Counter





FILENAME = "words.txt"





def find_unique_words():

    with open(FILENAME) as f:

        lines = [line.strip().split() for line in f]



    unique_words = Counter(word for line in lines for word in set(line))

    return sorted(unique_words.items())





def print_unique_words():

    unique_words = find_unique_words()

    for word, occurrence in unique_words:

        print(f'{word}:{occurrence}')





def main():

    print_unique_words()





if __name__ == '__main__':

    main()

Output:

0:1

1:1

2:1

a:1

different:1

hello:1

is:3

man:2

one:1

possible:1

the:3

this:1

world:2

The above also assumes that words.txt is on the same directory as script.py. Note that this is not much different from other solutions provided here, but perhaps somebody will find it useful.

edited 17 hours ago

David Foerster

951616

answered yesterday

яүυк

1216

add a comment |

Another simple alternative would be to use Python (>3.6). This solution has the same problem as the one mentioned by @Larry in his comment.

from collections import Counter



with open("words.txt") as f:

    c = Counter(word for line in [line.strip().split() for line in f] for word in set(line))

    for word, occurrence in sorted(c.items()):

        print(f'{word}:{occurrence}')

        # for Python 2.7.x compatibility you can replace the above line with 

        # the following one:

        # print('{}:{}'.format(word, occurrence))

A more explicit version version of the above:

from collections import Counter





FILENAME = "words.txt"





def find_unique_words():

    with open(FILENAME) as f:

        lines = [line.strip().split() for line in f]



    unique_words = Counter(word for line in lines for word in set(line))

    return sorted(unique_words.items())





def print_unique_words():

    unique_words = find_unique_words()

    for word, occurrence in unique_words:

        print(f'{word}:{occurrence}')





def main():

    print_unique_words()





if __name__ == '__main__':

    main()

Output:

0:1

1:1

2:1

a:1

different:1

hello:1

is:3

man:2

one:1

possible:1

the:3

this:1

world:2

The above also assumes that words.txt is on the same directory as script.py. Note that this is not much different from other solutions provided here, but perhaps somebody will find it useful.

edited 17 hours ago

David Foerster

951616

answered yesterday

яүυк

1216

add a comment |

Another simple alternative would be to use Python (>3.6). This solution has the same problem as the one mentioned by @Larry in his comment.

from collections import Counter



with open("words.txt") as f:

    c = Counter(word for line in [line.strip().split() for line in f] for word in set(line))

    for word, occurrence in sorted(c.items()):

        print(f'{word}:{occurrence}')

        # for Python 2.7.x compatibility you can replace the above line with 

        # the following one:

        # print('{}:{}'.format(word, occurrence))

A more explicit version version of the above:

from collections import Counter





FILENAME = "words.txt"





def find_unique_words():

    with open(FILENAME) as f:

        lines = [line.strip().split() for line in f]



    unique_words = Counter(word for line in lines for word in set(line))

    return sorted(unique_words.items())





def print_unique_words():

    unique_words = find_unique_words()

    for word, occurrence in unique_words:

        print(f'{word}:{occurrence}')





def main():

    print_unique_words()





if __name__ == '__main__':

    main()

Output:

0:1

1:1

2:1

a:1

different:1

hello:1

is:3

man:2

one:1

possible:1

the:3

this:1

world:2

The above also assumes that words.txt is on the same directory as script.py. Note that this is not much different from other solutions provided here, but perhaps somebody will find it useful.

edited 17 hours ago

David Foerster

951616

answered yesterday

яүυк

1216

Another simple alternative would be to use Python (>3.6). This solution has the same problem as the one mentioned by @Larry in his comment.

from collections import Counter



with open("words.txt") as f:

    c = Counter(word for line in [line.strip().split() for line in f] for word in set(line))

    for word, occurrence in sorted(c.items()):

        print(f'{word}:{occurrence}')

        # for Python 2.7.x compatibility you can replace the above line with 

        # the following one:

        # print('{}:{}'.format(word, occurrence))

A more explicit version version of the above:

from collections import Counter





FILENAME = "words.txt"





def find_unique_words():

    with open(FILENAME) as f:

        lines = [line.strip().split() for line in f]



    unique_words = Counter(word for line in lines for word in set(line))

    return sorted(unique_words.items())





def print_unique_words():

    unique_words = find_unique_words()

    for word, occurrence in unique_words:

        print(f'{word}:{occurrence}')





def main():

    print_unique_words()





if __name__ == '__main__':

    main()

Output:

0:1

1:1

2:1

a:1

different:1

hello:1

is:3

man:2

one:1

possible:1

the:3

this:1

world:2

The above also assumes that words.txt is on the same directory as script.py. Note that this is not much different from other solutions provided here, but perhaps somebody will find it useful.

edited 17 hours ago

David Foerster

951616

answered yesterday

яүυк

1216

edited 17 hours ago

David Foerster

951616

edited 17 hours ago

David Foerster

951616

edited 17 hours ago

David Foerster

951616

answered yesterday

яүυк

1216

answered yesterday

яүυк

1216

answered yesterday

яүυк

1216

add a comment |

Trying to do it with awk:

count.awk:

#!/usr/bin/awk -f

# count line containing word



{

    for (i = 1 ; i <= NF ; i++) {

        word_in_a_line[$i] ++

        if (word_in_a_line[$i] == 1) {

            word_line_count[$i] ++

        }

    }



    delete word_in_a_line

}



END {

    for (word in word_line_count){

        printf "%s:%dn",word,word_line_count[word]

    }

}

Run it by:

$ awk -f count.awk ./test.data | sort

answered 4 hours ago

Charles

1567

add a comment |

Trying to do it with awk:

count.awk:

#!/usr/bin/awk -f

# count line containing word



{

    for (i = 1 ; i <= NF ; i++) {

        word_in_a_line[$i] ++

        if (word_in_a_line[$i] == 1) {

            word_line_count[$i] ++

        }

    }



    delete word_in_a_line

}



END {

    for (word in word_line_count){

        printf "%s:%dn",word,word_line_count[word]

    }

}

Run it by:

$ awk -f count.awk ./test.data | sort

answered 4 hours ago

Charles

1567

add a comment |

Trying to do it with awk:

count.awk:

#!/usr/bin/awk -f

# count line containing word



{

    for (i = 1 ; i <= NF ; i++) {

        word_in_a_line[$i] ++

        if (word_in_a_line[$i] == 1) {

            word_line_count[$i] ++

        }

    }



    delete word_in_a_line

}



END {

    for (word in word_line_count){

        printf "%s:%dn",word,word_line_count[word]

    }

}

Run it by:

$ awk -f count.awk ./test.data | sort

answered 4 hours ago

Charles

1567

Trying to do it with awk:

count.awk:

#!/usr/bin/awk -f

# count line containing word



{

    for (i = 1 ; i <= NF ; i++) {

        word_in_a_line[$i] ++

        if (word_in_a_line[$i] == 1) {

            word_line_count[$i] ++

        }

    }



    delete word_in_a_line

}



END {

    for (word in word_line_count){

        printf "%s:%dn",word,word_line_count[word]

    }

}

Run it by:

$ awk -f count.awk ./test.data | sort

answered 4 hours ago

Charles

1567

answered 4 hours ago

Charles

1567

answered 4 hours ago

Charles

1567

answered 4 hours ago

Charles

1567

add a comment |

A pure bash answer

echo "0 hello world the man is world

1 this is the world

2 a different man is the possible one" | while IFS=$'n' read -r line; do echo $line | tr ' ' 'n' | sort -u; done | sort | uniq -c





   1 0

   1 1

   1 2

   1 a

   1 different

   1 hello

   3 is

   2 man

   1 one

   1 possible

   3 the

   1 this

   2 world

I looped unique words on each line and passed it to uniq -c

edit: I did not see glenn's answer. I found it strange to not see a bash answer

edited 56 mins ago

answered 1 hour ago

user1462442

1214

add a comment |

A pure bash answer

echo "0 hello world the man is world

1 this is the world

2 a different man is the possible one" | while IFS=$'n' read -r line; do echo $line | tr ' ' 'n' | sort -u; done | sort | uniq -c





   1 0

   1 1

   1 2

   1 a

   1 different

   1 hello

   3 is

   2 man

   1 one

   1 possible

   3 the

   1 this

   2 world

I looped unique words on each line and passed it to uniq -c

edit: I did not see glenn's answer. I found it strange to not see a bash answer

edited 56 mins ago

answered 1 hour ago

user1462442

1214

add a comment |

A pure bash answer

echo "0 hello world the man is world

1 this is the world

2 a different man is the possible one" | while IFS=$'n' read -r line; do echo $line | tr ' ' 'n' | sort -u; done | sort | uniq -c





   1 0

   1 1

   1 2

   1 a

   1 different

   1 hello

   3 is

   2 man

   1 one

   1 possible

   3 the

   1 this

   2 world

I looped unique words on each line and passed it to uniq -c

edit: I did not see glenn's answer. I found it strange to not see a bash answer

edited 56 mins ago

answered 1 hour ago

user1462442

1214

A pure bash answer

echo "0 hello world the man is world

1 this is the world

2 a different man is the possible one" | while IFS=$'n' read -r line; do echo $line | tr ' ' 'n' | sort -u; done | sort | uniq -c





   1 0

   1 1

   1 2

   1 a

   1 different

   1 hello

   3 is

   2 man

   1 one

   1 possible

   3 the

   1 this

   2 world

I looped unique words on each line and passed it to uniq -c

edit: I did not see glenn's answer. I found it strange to not see a bash answer

edited 56 mins ago

answered 1 hour ago

user1462442

1214

edited 56 mins ago

answered 1 hour ago

user1462442

1214

answered 1 hour ago

user1462442

1214

answered 1 hour ago

user1462442

1214

add a comment |

-2

Simple, though doesn't care if it reads the file many times:

sed 's/ /n/g' file.txt | sort | uniq | while read word; do

  printf "%s:%dn" "$word" "$(grep -Fw $word file.txt | wc -l)"

done

answered yesterday

JoL

995310

1

Read the question again. It literally says translating blanks to newline chars wouldn't be the exact solution.
– Sparhawk
19 hours ago

add a comment |

-2

Simple, though doesn't care if it reads the file many times:

sed 's/ /n/g' file.txt | sort | uniq | while read word; do

  printf "%s:%dn" "$word" "$(grep -Fw $word file.txt | wc -l)"

done

answered yesterday

JoL

995310

1

Read the question again. It literally says translating blanks to newline chars wouldn't be the exact solution.
– Sparhawk
19 hours ago

add a comment |

-2

Simple, though doesn't care if it reads the file many times:

sed 's/ /n/g' file.txt | sort | uniq | while read word; do

  printf "%s:%dn" "$word" "$(grep -Fw $word file.txt | wc -l)"

done

answered yesterday

JoL

995310

Simple, though doesn't care if it reads the file many times:

sed 's/ /n/g' file.txt | sort | uniq | while read word; do

  printf "%s:%dn" "$word" "$(grep -Fw $word file.txt | wc -l)"

done

answered yesterday

JoL

995310

answered yesterday

JoL

995310

answered yesterday

JoL

995310

answered yesterday

JoL

995310

1

Read the question again. It literally says translating blanks to newline chars wouldn't be the exact solution.
– Sparhawk
19 hours ago

add a comment |

1

Read the question again. It literally says translating blanks to newline chars wouldn't be the exact solution.
– Sparhawk
19 hours ago

Read the question again. It literally says translating blanks to newline chars wouldn't be the exact solution.
– Sparhawk
19 hours ago

add a comment |

Netzsooc is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Netzsooc is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Mfrhtyj