Count lines containing word












6














I have a file with multiple lines. I want to know, for each word that appears in the total file, how many lines contain that word, for example:



0 hello world the man is world
1 this is the world
2 a different man is the possible one


The result I'm expecting is:



0:1
1:1
2:1
a:1
different:1
hello:1
is:3
man:2
one:1
possible:1
the:3
this:1
world:2


Note that the count for "world" is 2, not 3, since the word appears on 2 lines. Because of this, translating blanks to newline chars wouldn't be the exact solution.










share|improve this question









New contributor




Netzsooc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • What have you try to the moment?
    – Romeo Ninov
    yesterday










  • This seems highly relevant: unix.stackexchange.com/a/332890/224077
    – Panki
    yesterday
















6














I have a file with multiple lines. I want to know, for each word that appears in the total file, how many lines contain that word, for example:



0 hello world the man is world
1 this is the world
2 a different man is the possible one


The result I'm expecting is:



0:1
1:1
2:1
a:1
different:1
hello:1
is:3
man:2
one:1
possible:1
the:3
this:1
world:2


Note that the count for "world" is 2, not 3, since the word appears on 2 lines. Because of this, translating blanks to newline chars wouldn't be the exact solution.










share|improve this question









New contributor




Netzsooc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • What have you try to the moment?
    – Romeo Ninov
    yesterday










  • This seems highly relevant: unix.stackexchange.com/a/332890/224077
    – Panki
    yesterday














6












6








6







I have a file with multiple lines. I want to know, for each word that appears in the total file, how many lines contain that word, for example:



0 hello world the man is world
1 this is the world
2 a different man is the possible one


The result I'm expecting is:



0:1
1:1
2:1
a:1
different:1
hello:1
is:3
man:2
one:1
possible:1
the:3
this:1
world:2


Note that the count for "world" is 2, not 3, since the word appears on 2 lines. Because of this, translating blanks to newline chars wouldn't be the exact solution.










share|improve this question









New contributor




Netzsooc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I have a file with multiple lines. I want to know, for each word that appears in the total file, how many lines contain that word, for example:



0 hello world the man is world
1 this is the world
2 a different man is the possible one


The result I'm expecting is:



0:1
1:1
2:1
a:1
different:1
hello:1
is:3
man:2
one:1
possible:1
the:3
this:1
world:2


Note that the count for "world" is 2, not 3, since the word appears on 2 lines. Because of this, translating blanks to newline chars wouldn't be the exact solution.







text-processing






share|improve this question









New contributor




Netzsooc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Netzsooc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited yesterday









Jeff Schaller

39k1053125




39k1053125






New contributor




Netzsooc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked yesterday









Netzsooc

586




586




New contributor




Netzsooc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Netzsooc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Netzsooc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • What have you try to the moment?
    – Romeo Ninov
    yesterday










  • This seems highly relevant: unix.stackexchange.com/a/332890/224077
    – Panki
    yesterday


















  • What have you try to the moment?
    – Romeo Ninov
    yesterday










  • This seems highly relevant: unix.stackexchange.com/a/332890/224077
    – Panki
    yesterday
















What have you try to the moment?
– Romeo Ninov
yesterday




What have you try to the moment?
– Romeo Ninov
yesterday












This seems highly relevant: unix.stackexchange.com/a/332890/224077
– Panki
yesterday




This seems highly relevant: unix.stackexchange.com/a/332890/224077
– Panki
yesterday










8 Answers
8






active

oldest

votes


















5














Another Perl variant, using List::Util



$ perl -MList::Util=uniq -alne '
map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}
' file
0: 1
1: 1
2: 1
a: 1
different: 1
hello: 1
is: 3
man: 2
one: 1
possible: 1
the: 3
this: 1
world: 2





share|improve this answer





























    4














    It's a pretty straight-forward perl script:



    #!/usr/bin/perl -w
    use strict;

    my %words = ();
    while (<>) {
    chomp;
    my %linewords = ();
    map { $linewords{$_}=1 } split / /;
    foreach my $word (keys %linewords) {
    $words{$word}++;
    }
    }

    foreach my $word (sort keys %words) {
    print "$word:$words{$word}n";
    }


    The basic idea is to loop over the input; for each line, split it into words, then save those words into a hash (associative array) in order to remove any duplicates, then loop over that array of words and add one to an overall counter for that word. At the end, report on the words and their counts.






    share|improve this answer

















    • 1




      A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
      – Larry
      yesterday



















    4














    Straightfoward-ish in bash:



    declare -A wordcount
    while read -ra words; do
    # unique words on this line
    declare -A uniq
    for word in "${words[@]}"; do
    uniq[$word]=1
    done
    # accumulate the words
    for word in "${!uniq[@]}"; do
    ((wordcount[$word]++))
    done
    unset uniq
    done < file


    Looking at the data:



    $ declare -p wordcount
    declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'


    and formatting as you want:



    $ printf "%sn" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done
    0:1
    1:1
    2:1
    a:1
    different:1
    hello:1
    is:3
    man:2
    one:1
    possible:1
    the:3
    this:1
    world:2





    share|improve this answer





























      2














      A solution that calls several programs from a shell:



      fmt -1 words.txt | sort -u | xargs -Ipattern sh -c 'echo "pattern:$(grep -cw pattern words.txt)"'



      A little explanation:



      The fmt -1 words.txt prints out all the words, 1 per line, and the | sort -u sorts this output and extracts only the unique words from it.



      In order to count the occurences of a word in a file, one can use grep (a tool meant to search files for patterns). By passing the -cw option, grep gives the number of word matches it finds. So you can find the total number of occurrences of pattern using grep -cw pattern words.txt.



      The tool xargs allows us to do this for each and every single word output by sort. The -Ipattern means that it will execute the following command multiple times, replacing each occurrence of pattern with a word it reads from standard input, which is what it gets from sort.



      The indirection with sh is needed because xargs only knows how to execute a single program, given it's name, passing everything else as arguments to it. xargs does not handle things like command substitution. The $(...) is command substitution in the above snippet, as it substitutes the output from grep into echo, allowing it to be formatted correctly. Since we need the command substitution, we must use the sh -c command which runs whatever it recieves as an argument in its own shell.






      share|improve this answer























      • An optimisation to this approach: fmt -1 words.txt | sort | uniq -c | awk '{ print $2 ":" $1 }'
        – matja
        yesterday












      • @matja is sort | uniq -c more efficient than sort -u?
        – vikarjramun
        yesterday










      • vikarjramun@ no, but uniq -c gives you the counts of each word in one pass, so you don't have to use xargs to do multiple passes of the input file for each word.
        – matja
        19 hours ago






      • 1




        @matja: I actually made the answer you provided before the current one. However, it does not do what OP asked for. I misread the question at first entirely as well, and was corrected by glenn jackman. What you are suggesting would count every occurrence of each word. What OP asked for is to count the number of lines each word occurs in at least once.
        – Larry
        19 hours ago





















      2














      Another simple alternative would be to use Python (>3.6). This solution has the same problem as the one mentioned by @Larry in his comment.



      from collections import Counter

      with open("words.txt") as f:
      c = Counter(word for line in [line.strip().split() for line in f] for word in set(line))
      for word, occurrence in sorted(c.items()):
      print(f'{word}:{occurrence}')
      # for Python 2.7.x compatibility you can replace the above line with
      # the following one:
      # print('{}:{}'.format(word, occurrence))


      A more explicit version version of the above:



      from collections import Counter


      FILENAME = "words.txt"


      def find_unique_words():
      with open(FILENAME) as f:
      lines = [line.strip().split() for line in f]

      unique_words = Counter(word for line in lines for word in set(line))
      return sorted(unique_words.items())


      def print_unique_words():
      unique_words = find_unique_words()
      for word, occurrence in unique_words:
      print(f'{word}:{occurrence}')


      def main():
      print_unique_words()


      if __name__ == '__main__':
      main()


      Output:



      0:1
      1:1
      2:1
      a:1
      different:1
      hello:1
      is:3
      man:2
      one:1
      possible:1
      the:3
      this:1
      world:2


      The above also assumes that words.txt is on the same directory as script.py. Note that this is not much different from other solutions provided here, but perhaps somebody will find it useful.






      share|improve this answer































        0














        Trying to do it with awk:



        count.awk:



        #!/usr/bin/awk -f
        # count line containing word

        {
        for (i = 1 ; i <= NF ; i++) {
        word_in_a_line[$i] ++
        if (word_in_a_line[$i] == 1) {
        word_line_count[$i] ++
        }
        }

        delete word_in_a_line
        }

        END {
        for (word in word_line_count){
        printf "%s:%dn",word,word_line_count[word]
        }
        }


        Run it by:



        $ awk -f count.awk ./test.data | sort





        share|improve this answer





























          0














          A pure bash answer



          echo "0 hello world the man is world
          1 this is the world
          2 a different man is the possible one" | while IFS=$'n' read -r line; do echo $line | tr ' ' 'n' | sort -u; done | sort | uniq -c


          1 0
          1 1
          1 2
          1 a
          1 different
          1 hello
          3 is
          2 man
          1 one
          1 possible
          3 the
          1 this
          2 world


          I looped unique words on each line and passed it to uniq -c



          edit: I did not see glenn's answer. I found it strange to not see a bash answer






          share|improve this answer































            -2














            Simple, though doesn't care if it reads the file many times:



            sed 's/ /n/g' file.txt | sort | uniq | while read word; do
            printf "%s:%dn" "$word" "$(grep -Fw $word file.txt | wc -l)"
            done





            share|improve this answer

















            • 1




              Read the question again. It literally says translating blanks to newline chars wouldn't be the exact solution.
              – Sparhawk
              19 hours ago











            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "106"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });






            Netzsooc is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492501%2fcount-lines-containing-word%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            8 Answers
            8






            active

            oldest

            votes








            8 Answers
            8






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            5














            Another Perl variant, using List::Util



            $ perl -MList::Util=uniq -alne '
            map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}
            ' file
            0: 1
            1: 1
            2: 1
            a: 1
            different: 1
            hello: 1
            is: 3
            man: 2
            one: 1
            possible: 1
            the: 3
            this: 1
            world: 2





            share|improve this answer


























              5














              Another Perl variant, using List::Util



              $ perl -MList::Util=uniq -alne '
              map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}
              ' file
              0: 1
              1: 1
              2: 1
              a: 1
              different: 1
              hello: 1
              is: 3
              man: 2
              one: 1
              possible: 1
              the: 3
              this: 1
              world: 2





              share|improve this answer
























                5












                5








                5






                Another Perl variant, using List::Util



                $ perl -MList::Util=uniq -alne '
                map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}
                ' file
                0: 1
                1: 1
                2: 1
                a: 1
                different: 1
                hello: 1
                is: 3
                man: 2
                one: 1
                possible: 1
                the: 3
                this: 1
                world: 2





                share|improve this answer












                Another Perl variant, using List::Util



                $ perl -MList::Util=uniq -alne '
                map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}
                ' file
                0: 1
                1: 1
                2: 1
                a: 1
                different: 1
                hello: 1
                is: 3
                man: 2
                one: 1
                possible: 1
                the: 3
                this: 1
                world: 2






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered yesterday









                steeldriver

                34.6k35083




                34.6k35083

























                    4














                    It's a pretty straight-forward perl script:



                    #!/usr/bin/perl -w
                    use strict;

                    my %words = ();
                    while (<>) {
                    chomp;
                    my %linewords = ();
                    map { $linewords{$_}=1 } split / /;
                    foreach my $word (keys %linewords) {
                    $words{$word}++;
                    }
                    }

                    foreach my $word (sort keys %words) {
                    print "$word:$words{$word}n";
                    }


                    The basic idea is to loop over the input; for each line, split it into words, then save those words into a hash (associative array) in order to remove any duplicates, then loop over that array of words and add one to an overall counter for that word. At the end, report on the words and their counts.






                    share|improve this answer

















                    • 1




                      A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
                      – Larry
                      yesterday
















                    4














                    It's a pretty straight-forward perl script:



                    #!/usr/bin/perl -w
                    use strict;

                    my %words = ();
                    while (<>) {
                    chomp;
                    my %linewords = ();
                    map { $linewords{$_}=1 } split / /;
                    foreach my $word (keys %linewords) {
                    $words{$word}++;
                    }
                    }

                    foreach my $word (sort keys %words) {
                    print "$word:$words{$word}n";
                    }


                    The basic idea is to loop over the input; for each line, split it into words, then save those words into a hash (associative array) in order to remove any duplicates, then loop over that array of words and add one to an overall counter for that word. At the end, report on the words and their counts.






                    share|improve this answer

















                    • 1




                      A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
                      – Larry
                      yesterday














                    4












                    4








                    4






                    It's a pretty straight-forward perl script:



                    #!/usr/bin/perl -w
                    use strict;

                    my %words = ();
                    while (<>) {
                    chomp;
                    my %linewords = ();
                    map { $linewords{$_}=1 } split / /;
                    foreach my $word (keys %linewords) {
                    $words{$word}++;
                    }
                    }

                    foreach my $word (sort keys %words) {
                    print "$word:$words{$word}n";
                    }


                    The basic idea is to loop over the input; for each line, split it into words, then save those words into a hash (associative array) in order to remove any duplicates, then loop over that array of words and add one to an overall counter for that word. At the end, report on the words and their counts.






                    share|improve this answer












                    It's a pretty straight-forward perl script:



                    #!/usr/bin/perl -w
                    use strict;

                    my %words = ();
                    while (<>) {
                    chomp;
                    my %linewords = ();
                    map { $linewords{$_}=1 } split / /;
                    foreach my $word (keys %linewords) {
                    $words{$word}++;
                    }
                    }

                    foreach my $word (sort keys %words) {
                    print "$word:$words{$word}n";
                    }


                    The basic idea is to loop over the input; for each line, split it into words, then save those words into a hash (associative array) in order to remove any duplicates, then loop over that array of words and add one to an overall counter for that word. At the end, report on the words and their counts.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered yesterday









                    Jeff Schaller

                    39k1053125




                    39k1053125








                    • 1




                      A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
                      – Larry
                      yesterday














                    • 1




                      A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
                      – Larry
                      yesterday








                    1




                    1




                    A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
                    – Larry
                    yesterday




                    A slight problem with this is in my opinion that it does not respect what the usual definition of a word is, since it splits on a single space character. If two spaces were found somewhere, an empty string inbetween would be considered a word as well if I'm not mistaken. Let alone if words were separated by other punctuation characters. Of course, it was not specified in the question whether "word" is understood as the programmer's concept of a "word", or as a word of a natural language.
                    – Larry
                    yesterday











                    4














                    Straightfoward-ish in bash:



                    declare -A wordcount
                    while read -ra words; do
                    # unique words on this line
                    declare -A uniq
                    for word in "${words[@]}"; do
                    uniq[$word]=1
                    done
                    # accumulate the words
                    for word in "${!uniq[@]}"; do
                    ((wordcount[$word]++))
                    done
                    unset uniq
                    done < file


                    Looking at the data:



                    $ declare -p wordcount
                    declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'


                    and formatting as you want:



                    $ printf "%sn" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done
                    0:1
                    1:1
                    2:1
                    a:1
                    different:1
                    hello:1
                    is:3
                    man:2
                    one:1
                    possible:1
                    the:3
                    this:1
                    world:2





                    share|improve this answer


























                      4














                      Straightfoward-ish in bash:



                      declare -A wordcount
                      while read -ra words; do
                      # unique words on this line
                      declare -A uniq
                      for word in "${words[@]}"; do
                      uniq[$word]=1
                      done
                      # accumulate the words
                      for word in "${!uniq[@]}"; do
                      ((wordcount[$word]++))
                      done
                      unset uniq
                      done < file


                      Looking at the data:



                      $ declare -p wordcount
                      declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'


                      and formatting as you want:



                      $ printf "%sn" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done
                      0:1
                      1:1
                      2:1
                      a:1
                      different:1
                      hello:1
                      is:3
                      man:2
                      one:1
                      possible:1
                      the:3
                      this:1
                      world:2





                      share|improve this answer
























                        4












                        4








                        4






                        Straightfoward-ish in bash:



                        declare -A wordcount
                        while read -ra words; do
                        # unique words on this line
                        declare -A uniq
                        for word in "${words[@]}"; do
                        uniq[$word]=1
                        done
                        # accumulate the words
                        for word in "${!uniq[@]}"; do
                        ((wordcount[$word]++))
                        done
                        unset uniq
                        done < file


                        Looking at the data:



                        $ declare -p wordcount
                        declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'


                        and formatting as you want:



                        $ printf "%sn" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done
                        0:1
                        1:1
                        2:1
                        a:1
                        different:1
                        hello:1
                        is:3
                        man:2
                        one:1
                        possible:1
                        the:3
                        this:1
                        world:2





                        share|improve this answer












                        Straightfoward-ish in bash:



                        declare -A wordcount
                        while read -ra words; do
                        # unique words on this line
                        declare -A uniq
                        for word in "${words[@]}"; do
                        uniq[$word]=1
                        done
                        # accumulate the words
                        for word in "${!uniq[@]}"; do
                        ((wordcount[$word]++))
                        done
                        unset uniq
                        done < file


                        Looking at the data:



                        $ declare -p wordcount
                        declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'


                        and formatting as you want:



                        $ printf "%sn" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done
                        0:1
                        1:1
                        2:1
                        a:1
                        different:1
                        hello:1
                        is:3
                        man:2
                        one:1
                        possible:1
                        the:3
                        this:1
                        world:2






                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered yesterday









                        glenn jackman

                        50.4k570107




                        50.4k570107























                            2














                            A solution that calls several programs from a shell:



                            fmt -1 words.txt | sort -u | xargs -Ipattern sh -c 'echo "pattern:$(grep -cw pattern words.txt)"'



                            A little explanation:



                            The fmt -1 words.txt prints out all the words, 1 per line, and the | sort -u sorts this output and extracts only the unique words from it.



                            In order to count the occurences of a word in a file, one can use grep (a tool meant to search files for patterns). By passing the -cw option, grep gives the number of word matches it finds. So you can find the total number of occurrences of pattern using grep -cw pattern words.txt.



                            The tool xargs allows us to do this for each and every single word output by sort. The -Ipattern means that it will execute the following command multiple times, replacing each occurrence of pattern with a word it reads from standard input, which is what it gets from sort.



                            The indirection with sh is needed because xargs only knows how to execute a single program, given it's name, passing everything else as arguments to it. xargs does not handle things like command substitution. The $(...) is command substitution in the above snippet, as it substitutes the output from grep into echo, allowing it to be formatted correctly. Since we need the command substitution, we must use the sh -c command which runs whatever it recieves as an argument in its own shell.






                            share|improve this answer























                            • An optimisation to this approach: fmt -1 words.txt | sort | uniq -c | awk '{ print $2 ":" $1 }'
                              – matja
                              yesterday












                            • @matja is sort | uniq -c more efficient than sort -u?
                              – vikarjramun
                              yesterday










                            • vikarjramun@ no, but uniq -c gives you the counts of each word in one pass, so you don't have to use xargs to do multiple passes of the input file for each word.
                              – matja
                              19 hours ago






                            • 1




                              @matja: I actually made the answer you provided before the current one. However, it does not do what OP asked for. I misread the question at first entirely as well, and was corrected by glenn jackman. What you are suggesting would count every occurrence of each word. What OP asked for is to count the number of lines each word occurs in at least once.
                              – Larry
                              19 hours ago


















                            2














                            A solution that calls several programs from a shell:



                            fmt -1 words.txt | sort -u | xargs -Ipattern sh -c 'echo "pattern:$(grep -cw pattern words.txt)"'



                            A little explanation:



                            The fmt -1 words.txt prints out all the words, 1 per line, and the | sort -u sorts this output and extracts only the unique words from it.



                            In order to count the occurences of a word in a file, one can use grep (a tool meant to search files for patterns). By passing the -cw option, grep gives the number of word matches it finds. So you can find the total number of occurrences of pattern using grep -cw pattern words.txt.



                            The tool xargs allows us to do this for each and every single word output by sort. The -Ipattern means that it will execute the following command multiple times, replacing each occurrence of pattern with a word it reads from standard input, which is what it gets from sort.



                            The indirection with sh is needed because xargs only knows how to execute a single program, given it's name, passing everything else as arguments to it. xargs does not handle things like command substitution. The $(...) is command substitution in the above snippet, as it substitutes the output from grep into echo, allowing it to be formatted correctly. Since we need the command substitution, we must use the sh -c command which runs whatever it recieves as an argument in its own shell.






                            share|improve this answer























                            • An optimisation to this approach: fmt -1 words.txt | sort | uniq -c | awk '{ print $2 ":" $1 }'
                              – matja
                              yesterday












                            • @matja is sort | uniq -c more efficient than sort -u?
                              – vikarjramun
                              yesterday










                            • vikarjramun@ no, but uniq -c gives you the counts of each word in one pass, so you don't have to use xargs to do multiple passes of the input file for each word.
                              – matja
                              19 hours ago






                            • 1




                              @matja: I actually made the answer you provided before the current one. However, it does not do what OP asked for. I misread the question at first entirely as well, and was corrected by glenn jackman. What you are suggesting would count every occurrence of each word. What OP asked for is to count the number of lines each word occurs in at least once.
                              – Larry
                              19 hours ago
















                            2












                            2








                            2






                            A solution that calls several programs from a shell:



                            fmt -1 words.txt | sort -u | xargs -Ipattern sh -c 'echo "pattern:$(grep -cw pattern words.txt)"'



                            A little explanation:



                            The fmt -1 words.txt prints out all the words, 1 per line, and the | sort -u sorts this output and extracts only the unique words from it.



                            In order to count the occurences of a word in a file, one can use grep (a tool meant to search files for patterns). By passing the -cw option, grep gives the number of word matches it finds. So you can find the total number of occurrences of pattern using grep -cw pattern words.txt.



                            The tool xargs allows us to do this for each and every single word output by sort. The -Ipattern means that it will execute the following command multiple times, replacing each occurrence of pattern with a word it reads from standard input, which is what it gets from sort.



                            The indirection with sh is needed because xargs only knows how to execute a single program, given it's name, passing everything else as arguments to it. xargs does not handle things like command substitution. The $(...) is command substitution in the above snippet, as it substitutes the output from grep into echo, allowing it to be formatted correctly. Since we need the command substitution, we must use the sh -c command which runs whatever it recieves as an argument in its own shell.






                            share|improve this answer














                            A solution that calls several programs from a shell:



                            fmt -1 words.txt | sort -u | xargs -Ipattern sh -c 'echo "pattern:$(grep -cw pattern words.txt)"'



                            A little explanation:



                            The fmt -1 words.txt prints out all the words, 1 per line, and the | sort -u sorts this output and extracts only the unique words from it.



                            In order to count the occurences of a word in a file, one can use grep (a tool meant to search files for patterns). By passing the -cw option, grep gives the number of word matches it finds. So you can find the total number of occurrences of pattern using grep -cw pattern words.txt.



                            The tool xargs allows us to do this for each and every single word output by sort. The -Ipattern means that it will execute the following command multiple times, replacing each occurrence of pattern with a word it reads from standard input, which is what it gets from sort.



                            The indirection with sh is needed because xargs only knows how to execute a single program, given it's name, passing everything else as arguments to it. xargs does not handle things like command substitution. The $(...) is command substitution in the above snippet, as it substitutes the output from grep into echo, allowing it to be formatted correctly. Since we need the command substitution, we must use the sh -c command which runs whatever it recieves as an argument in its own shell.







                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited yesterday









                            vikarjramun

                            1428




                            1428










                            answered yesterday









                            Larry

                            1065




                            1065












                            • An optimisation to this approach: fmt -1 words.txt | sort | uniq -c | awk '{ print $2 ":" $1 }'
                              – matja
                              yesterday












                            • @matja is sort | uniq -c more efficient than sort -u?
                              – vikarjramun
                              yesterday










                            • vikarjramun@ no, but uniq -c gives you the counts of each word in one pass, so you don't have to use xargs to do multiple passes of the input file for each word.
                              – matja
                              19 hours ago






                            • 1




                              @matja: I actually made the answer you provided before the current one. However, it does not do what OP asked for. I misread the question at first entirely as well, and was corrected by glenn jackman. What you are suggesting would count every occurrence of each word. What OP asked for is to count the number of lines each word occurs in at least once.
                              – Larry
                              19 hours ago




















                            • An optimisation to this approach: fmt -1 words.txt | sort | uniq -c | awk '{ print $2 ":" $1 }'
                              – matja
                              yesterday












                            • @matja is sort | uniq -c more efficient than sort -u?
                              – vikarjramun
                              yesterday










                            • vikarjramun@ no, but uniq -c gives you the counts of each word in one pass, so you don't have to use xargs to do multiple passes of the input file for each word.
                              – matja
                              19 hours ago






                            • 1




                              @matja: I actually made the answer you provided before the current one. However, it does not do what OP asked for. I misread the question at first entirely as well, and was corrected by glenn jackman. What you are suggesting would count every occurrence of each word. What OP asked for is to count the number of lines each word occurs in at least once.
                              – Larry
                              19 hours ago


















                            An optimisation to this approach: fmt -1 words.txt | sort | uniq -c | awk '{ print $2 ":" $1 }'
                            – matja
                            yesterday






                            An optimisation to this approach: fmt -1 words.txt | sort | uniq -c | awk '{ print $2 ":" $1 }'
                            – matja
                            yesterday














                            @matja is sort | uniq -c more efficient than sort -u?
                            – vikarjramun
                            yesterday




                            @matja is sort | uniq -c more efficient than sort -u?
                            – vikarjramun
                            yesterday












                            vikarjramun@ no, but uniq -c gives you the counts of each word in one pass, so you don't have to use xargs to do multiple passes of the input file for each word.
                            – matja
                            19 hours ago




                            vikarjramun@ no, but uniq -c gives you the counts of each word in one pass, so you don't have to use xargs to do multiple passes of the input file for each word.
                            – matja
                            19 hours ago




                            1




                            1




                            @matja: I actually made the answer you provided before the current one. However, it does not do what OP asked for. I misread the question at first entirely as well, and was corrected by glenn jackman. What you are suggesting would count every occurrence of each word. What OP asked for is to count the number of lines each word occurs in at least once.
                            – Larry
                            19 hours ago






                            @matja: I actually made the answer you provided before the current one. However, it does not do what OP asked for. I misread the question at first entirely as well, and was corrected by glenn jackman. What you are suggesting would count every occurrence of each word. What OP asked for is to count the number of lines each word occurs in at least once.
                            – Larry
                            19 hours ago













                            2














                            Another simple alternative would be to use Python (>3.6). This solution has the same problem as the one mentioned by @Larry in his comment.



                            from collections import Counter

                            with open("words.txt") as f:
                            c = Counter(word for line in [line.strip().split() for line in f] for word in set(line))
                            for word, occurrence in sorted(c.items()):
                            print(f'{word}:{occurrence}')
                            # for Python 2.7.x compatibility you can replace the above line with
                            # the following one:
                            # print('{}:{}'.format(word, occurrence))


                            A more explicit version version of the above:



                            from collections import Counter


                            FILENAME = "words.txt"


                            def find_unique_words():
                            with open(FILENAME) as f:
                            lines = [line.strip().split() for line in f]

                            unique_words = Counter(word for line in lines for word in set(line))
                            return sorted(unique_words.items())


                            def print_unique_words():
                            unique_words = find_unique_words()
                            for word, occurrence in unique_words:
                            print(f'{word}:{occurrence}')


                            def main():
                            print_unique_words()


                            if __name__ == '__main__':
                            main()


                            Output:



                            0:1
                            1:1
                            2:1
                            a:1
                            different:1
                            hello:1
                            is:3
                            man:2
                            one:1
                            possible:1
                            the:3
                            this:1
                            world:2


                            The above also assumes that words.txt is on the same directory as script.py. Note that this is not much different from other solutions provided here, but perhaps somebody will find it useful.






                            share|improve this answer




























                              2














                              Another simple alternative would be to use Python (>3.6). This solution has the same problem as the one mentioned by @Larry in his comment.



                              from collections import Counter

                              with open("words.txt") as f:
                              c = Counter(word for line in [line.strip().split() for line in f] for word in set(line))
                              for word, occurrence in sorted(c.items()):
                              print(f'{word}:{occurrence}')
                              # for Python 2.7.x compatibility you can replace the above line with
                              # the following one:
                              # print('{}:{}'.format(word, occurrence))


                              A more explicit version version of the above:



                              from collections import Counter


                              FILENAME = "words.txt"


                              def find_unique_words():
                              with open(FILENAME) as f:
                              lines = [line.strip().split() for line in f]

                              unique_words = Counter(word for line in lines for word in set(line))
                              return sorted(unique_words.items())


                              def print_unique_words():
                              unique_words = find_unique_words()
                              for word, occurrence in unique_words:
                              print(f'{word}:{occurrence}')


                              def main():
                              print_unique_words()


                              if __name__ == '__main__':
                              main()


                              Output:



                              0:1
                              1:1
                              2:1
                              a:1
                              different:1
                              hello:1
                              is:3
                              man:2
                              one:1
                              possible:1
                              the:3
                              this:1
                              world:2


                              The above also assumes that words.txt is on the same directory as script.py. Note that this is not much different from other solutions provided here, but perhaps somebody will find it useful.






                              share|improve this answer


























                                2












                                2








                                2






                                Another simple alternative would be to use Python (>3.6). This solution has the same problem as the one mentioned by @Larry in his comment.



                                from collections import Counter

                                with open("words.txt") as f:
                                c = Counter(word for line in [line.strip().split() for line in f] for word in set(line))
                                for word, occurrence in sorted(c.items()):
                                print(f'{word}:{occurrence}')
                                # for Python 2.7.x compatibility you can replace the above line with
                                # the following one:
                                # print('{}:{}'.format(word, occurrence))


                                A more explicit version version of the above:



                                from collections import Counter


                                FILENAME = "words.txt"


                                def find_unique_words():
                                with open(FILENAME) as f:
                                lines = [line.strip().split() for line in f]

                                unique_words = Counter(word for line in lines for word in set(line))
                                return sorted(unique_words.items())


                                def print_unique_words():
                                unique_words = find_unique_words()
                                for word, occurrence in unique_words:
                                print(f'{word}:{occurrence}')


                                def main():
                                print_unique_words()


                                if __name__ == '__main__':
                                main()


                                Output:



                                0:1
                                1:1
                                2:1
                                a:1
                                different:1
                                hello:1
                                is:3
                                man:2
                                one:1
                                possible:1
                                the:3
                                this:1
                                world:2


                                The above also assumes that words.txt is on the same directory as script.py. Note that this is not much different from other solutions provided here, but perhaps somebody will find it useful.






                                share|improve this answer














                                Another simple alternative would be to use Python (>3.6). This solution has the same problem as the one mentioned by @Larry in his comment.



                                from collections import Counter

                                with open("words.txt") as f:
                                c = Counter(word for line in [line.strip().split() for line in f] for word in set(line))
                                for word, occurrence in sorted(c.items()):
                                print(f'{word}:{occurrence}')
                                # for Python 2.7.x compatibility you can replace the above line with
                                # the following one:
                                # print('{}:{}'.format(word, occurrence))


                                A more explicit version version of the above:



                                from collections import Counter


                                FILENAME = "words.txt"


                                def find_unique_words():
                                with open(FILENAME) as f:
                                lines = [line.strip().split() for line in f]

                                unique_words = Counter(word for line in lines for word in set(line))
                                return sorted(unique_words.items())


                                def print_unique_words():
                                unique_words = find_unique_words()
                                for word, occurrence in unique_words:
                                print(f'{word}:{occurrence}')


                                def main():
                                print_unique_words()


                                if __name__ == '__main__':
                                main()


                                Output:



                                0:1
                                1:1
                                2:1
                                a:1
                                different:1
                                hello:1
                                is:3
                                man:2
                                one:1
                                possible:1
                                the:3
                                this:1
                                world:2


                                The above also assumes that words.txt is on the same directory as script.py. Note that this is not much different from other solutions provided here, but perhaps somebody will find it useful.







                                share|improve this answer














                                share|improve this answer



                                share|improve this answer








                                edited 17 hours ago









                                David Foerster

                                951616




                                951616










                                answered yesterday









                                яүυк

                                1216




                                1216























                                    0














                                    Trying to do it with awk:



                                    count.awk:



                                    #!/usr/bin/awk -f
                                    # count line containing word

                                    {
                                    for (i = 1 ; i <= NF ; i++) {
                                    word_in_a_line[$i] ++
                                    if (word_in_a_line[$i] == 1) {
                                    word_line_count[$i] ++
                                    }
                                    }

                                    delete word_in_a_line
                                    }

                                    END {
                                    for (word in word_line_count){
                                    printf "%s:%dn",word,word_line_count[word]
                                    }
                                    }


                                    Run it by:



                                    $ awk -f count.awk ./test.data | sort





                                    share|improve this answer


























                                      0














                                      Trying to do it with awk:



                                      count.awk:



                                      #!/usr/bin/awk -f
                                      # count line containing word

                                      {
                                      for (i = 1 ; i <= NF ; i++) {
                                      word_in_a_line[$i] ++
                                      if (word_in_a_line[$i] == 1) {
                                      word_line_count[$i] ++
                                      }
                                      }

                                      delete word_in_a_line
                                      }

                                      END {
                                      for (word in word_line_count){
                                      printf "%s:%dn",word,word_line_count[word]
                                      }
                                      }


                                      Run it by:



                                      $ awk -f count.awk ./test.data | sort





                                      share|improve this answer
























                                        0












                                        0








                                        0






                                        Trying to do it with awk:



                                        count.awk:



                                        #!/usr/bin/awk -f
                                        # count line containing word

                                        {
                                        for (i = 1 ; i <= NF ; i++) {
                                        word_in_a_line[$i] ++
                                        if (word_in_a_line[$i] == 1) {
                                        word_line_count[$i] ++
                                        }
                                        }

                                        delete word_in_a_line
                                        }

                                        END {
                                        for (word in word_line_count){
                                        printf "%s:%dn",word,word_line_count[word]
                                        }
                                        }


                                        Run it by:



                                        $ awk -f count.awk ./test.data | sort





                                        share|improve this answer












                                        Trying to do it with awk:



                                        count.awk:



                                        #!/usr/bin/awk -f
                                        # count line containing word

                                        {
                                        for (i = 1 ; i <= NF ; i++) {
                                        word_in_a_line[$i] ++
                                        if (word_in_a_line[$i] == 1) {
                                        word_line_count[$i] ++
                                        }
                                        }

                                        delete word_in_a_line
                                        }

                                        END {
                                        for (word in word_line_count){
                                        printf "%s:%dn",word,word_line_count[word]
                                        }
                                        }


                                        Run it by:



                                        $ awk -f count.awk ./test.data | sort






                                        share|improve this answer












                                        share|improve this answer



                                        share|improve this answer










                                        answered 4 hours ago









                                        Charles

                                        1567




                                        1567























                                            0














                                            A pure bash answer



                                            echo "0 hello world the man is world
                                            1 this is the world
                                            2 a different man is the possible one" | while IFS=$'n' read -r line; do echo $line | tr ' ' 'n' | sort -u; done | sort | uniq -c


                                            1 0
                                            1 1
                                            1 2
                                            1 a
                                            1 different
                                            1 hello
                                            3 is
                                            2 man
                                            1 one
                                            1 possible
                                            3 the
                                            1 this
                                            2 world


                                            I looped unique words on each line and passed it to uniq -c



                                            edit: I did not see glenn's answer. I found it strange to not see a bash answer






                                            share|improve this answer




























                                              0














                                              A pure bash answer



                                              echo "0 hello world the man is world
                                              1 this is the world
                                              2 a different man is the possible one" | while IFS=$'n' read -r line; do echo $line | tr ' ' 'n' | sort -u; done | sort | uniq -c


                                              1 0
                                              1 1
                                              1 2
                                              1 a
                                              1 different
                                              1 hello
                                              3 is
                                              2 man
                                              1 one
                                              1 possible
                                              3 the
                                              1 this
                                              2 world


                                              I looped unique words on each line and passed it to uniq -c



                                              edit: I did not see glenn's answer. I found it strange to not see a bash answer






                                              share|improve this answer


























                                                0












                                                0








                                                0






                                                A pure bash answer



                                                echo "0 hello world the man is world
                                                1 this is the world
                                                2 a different man is the possible one" | while IFS=$'n' read -r line; do echo $line | tr ' ' 'n' | sort -u; done | sort | uniq -c


                                                1 0
                                                1 1
                                                1 2
                                                1 a
                                                1 different
                                                1 hello
                                                3 is
                                                2 man
                                                1 one
                                                1 possible
                                                3 the
                                                1 this
                                                2 world


                                                I looped unique words on each line and passed it to uniq -c



                                                edit: I did not see glenn's answer. I found it strange to not see a bash answer






                                                share|improve this answer














                                                A pure bash answer



                                                echo "0 hello world the man is world
                                                1 this is the world
                                                2 a different man is the possible one" | while IFS=$'n' read -r line; do echo $line | tr ' ' 'n' | sort -u; done | sort | uniq -c


                                                1 0
                                                1 1
                                                1 2
                                                1 a
                                                1 different
                                                1 hello
                                                3 is
                                                2 man
                                                1 one
                                                1 possible
                                                3 the
                                                1 this
                                                2 world


                                                I looped unique words on each line and passed it to uniq -c



                                                edit: I did not see glenn's answer. I found it strange to not see a bash answer







                                                share|improve this answer














                                                share|improve this answer



                                                share|improve this answer








                                                edited 56 mins ago

























                                                answered 1 hour ago









                                                user1462442

                                                1214




                                                1214























                                                    -2














                                                    Simple, though doesn't care if it reads the file many times:



                                                    sed 's/ /n/g' file.txt | sort | uniq | while read word; do
                                                    printf "%s:%dn" "$word" "$(grep -Fw $word file.txt | wc -l)"
                                                    done





                                                    share|improve this answer

















                                                    • 1




                                                      Read the question again. It literally says translating blanks to newline chars wouldn't be the exact solution.
                                                      – Sparhawk
                                                      19 hours ago
















                                                    -2














                                                    Simple, though doesn't care if it reads the file many times:



                                                    sed 's/ /n/g' file.txt | sort | uniq | while read word; do
                                                    printf "%s:%dn" "$word" "$(grep -Fw $word file.txt | wc -l)"
                                                    done





                                                    share|improve this answer

















                                                    • 1




                                                      Read the question again. It literally says translating blanks to newline chars wouldn't be the exact solution.
                                                      – Sparhawk
                                                      19 hours ago














                                                    -2












                                                    -2








                                                    -2






                                                    Simple, though doesn't care if it reads the file many times:



                                                    sed 's/ /n/g' file.txt | sort | uniq | while read word; do
                                                    printf "%s:%dn" "$word" "$(grep -Fw $word file.txt | wc -l)"
                                                    done





                                                    share|improve this answer












                                                    Simple, though doesn't care if it reads the file many times:



                                                    sed 's/ /n/g' file.txt | sort | uniq | while read word; do
                                                    printf "%s:%dn" "$word" "$(grep -Fw $word file.txt | wc -l)"
                                                    done






                                                    share|improve this answer












                                                    share|improve this answer



                                                    share|improve this answer










                                                    answered yesterday









                                                    JoL

                                                    995310




                                                    995310








                                                    • 1




                                                      Read the question again. It literally says translating blanks to newline chars wouldn't be the exact solution.
                                                      – Sparhawk
                                                      19 hours ago














                                                    • 1




                                                      Read the question again. It literally says translating blanks to newline chars wouldn't be the exact solution.
                                                      – Sparhawk
                                                      19 hours ago








                                                    1




                                                    1




                                                    Read the question again. It literally says translating blanks to newline chars wouldn't be the exact solution.
                                                    – Sparhawk
                                                    19 hours ago




                                                    Read the question again. It literally says translating blanks to newline chars wouldn't be the exact solution.
                                                    – Sparhawk
                                                    19 hours ago










                                                    Netzsooc is a new contributor. Be nice, and check out our Code of Conduct.










                                                    draft saved

                                                    draft discarded


















                                                    Netzsooc is a new contributor. Be nice, and check out our Code of Conduct.













                                                    Netzsooc is a new contributor. Be nice, and check out our Code of Conduct.












                                                    Netzsooc is a new contributor. Be nice, and check out our Code of Conduct.
















                                                    Thanks for contributing an answer to Unix & Linux Stack Exchange!


                                                    • Please be sure to answer the question. Provide details and share your research!

                                                    But avoid



                                                    • Asking for help, clarification, or responding to other answers.

                                                    • Making statements based on opinion; back them up with references or personal experience.


                                                    To learn more, see our tips on writing great answers.





                                                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                                                    Please pay close attention to the following guidance:


                                                    • Please be sure to answer the question. Provide details and share your research!

                                                    But avoid



                                                    • Asking for help, clarification, or responding to other answers.

                                                    • Making statements based on opinion; back them up with references or personal experience.


                                                    To learn more, see our tips on writing great answers.




                                                    draft saved


                                                    draft discarded














                                                    StackExchange.ready(
                                                    function () {
                                                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492501%2fcount-lines-containing-word%23new-answer', 'question_page');
                                                    }
                                                    );

                                                    Post as a guest















                                                    Required, but never shown





















































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown

































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown







                                                    Popular posts from this blog

                                                    1300-talet

                                                    1300-talet

                                                    Display a custom attribute below product name in the front-end Magento 1.9.3.8