What is the most efficient way to compute the difference of lines from two files?












13















I have two lists in python list_a and list_b. The list_a have some images links, and the list_b too. 99% of the items are the same, but i have to know this 1%. The all surplus items are in list_a, that means all items in list_b are in list_a. My initial idea is subtract all items:
list_a - list_b = list_c, where the list_c are my surplus items. My code is:



list_a = 
list_b =
list_c =

arq_b = open('list_b.txt','r')
for b in arq_b:
list_b.append(b)

arq_a = open('list_a.txt','r')
for a in arq_a:
if a not in arq_b:
list_c.append(a)

arq_c = open('list_c.txt','w')
for c in list_c:
arq_c.write(c)


I think the logic is right, if i have some items, the code is run fast. But i dont have 10 items, or 1.000, or even 100.000. I have 78.514.022 items in my list_b.txt and 78.616.777 in my list list_a.txt. I dont't know the cost of this expression: if a not in arq_b. But if i execute this code, i think wont finish in this year.



My pc have 8GB, and i allocate 15gb for swap to not explode my RAM.



My question is, there's another way to make this operation more efficiently(Faster)?




  • The list_a is ordinate but the list_b not.

  • Each item have this size: images/00000cd9fc6ae2fe9ec4bbdb2bf27318f2babc00.png

  • The order doesnt matter, i want know the surplus.










share|improve this question




















  • 5





    Does the order matter? If not, try using sets. With sets, subtraction should be linear: set_c = set_a - set_b.

    – L3viathan
    2 days ago













  • But is possible make this in python?

    – Vinicius Morais
    2 days ago











  • The python will use the most efficient way to make this operation?

    – Vinicius Morais
    2 days ago






  • 1





    Yes, I mean the Python datatype set.

    – L3viathan
    2 days ago








  • 1





    @tripleee It's not a duplicate of that - that question is about mapping subtraction over a list, this question is about the difference between what's included in the lists.

    – SpoonMeiser
    2 days ago
















13















I have two lists in python list_a and list_b. The list_a have some images links, and the list_b too. 99% of the items are the same, but i have to know this 1%. The all surplus items are in list_a, that means all items in list_b are in list_a. My initial idea is subtract all items:
list_a - list_b = list_c, where the list_c are my surplus items. My code is:



list_a = 
list_b =
list_c =

arq_b = open('list_b.txt','r')
for b in arq_b:
list_b.append(b)

arq_a = open('list_a.txt','r')
for a in arq_a:
if a not in arq_b:
list_c.append(a)

arq_c = open('list_c.txt','w')
for c in list_c:
arq_c.write(c)


I think the logic is right, if i have some items, the code is run fast. But i dont have 10 items, or 1.000, or even 100.000. I have 78.514.022 items in my list_b.txt and 78.616.777 in my list list_a.txt. I dont't know the cost of this expression: if a not in arq_b. But if i execute this code, i think wont finish in this year.



My pc have 8GB, and i allocate 15gb for swap to not explode my RAM.



My question is, there's another way to make this operation more efficiently(Faster)?




  • The list_a is ordinate but the list_b not.

  • Each item have this size: images/00000cd9fc6ae2fe9ec4bbdb2bf27318f2babc00.png

  • The order doesnt matter, i want know the surplus.










share|improve this question




















  • 5





    Does the order matter? If not, try using sets. With sets, subtraction should be linear: set_c = set_a - set_b.

    – L3viathan
    2 days ago













  • But is possible make this in python?

    – Vinicius Morais
    2 days ago











  • The python will use the most efficient way to make this operation?

    – Vinicius Morais
    2 days ago






  • 1





    Yes, I mean the Python datatype set.

    – L3viathan
    2 days ago








  • 1





    @tripleee It's not a duplicate of that - that question is about mapping subtraction over a list, this question is about the difference between what's included in the lists.

    – SpoonMeiser
    2 days ago














13












13








13


3






I have two lists in python list_a and list_b. The list_a have some images links, and the list_b too. 99% of the items are the same, but i have to know this 1%. The all surplus items are in list_a, that means all items in list_b are in list_a. My initial idea is subtract all items:
list_a - list_b = list_c, where the list_c are my surplus items. My code is:



list_a = 
list_b =
list_c =

arq_b = open('list_b.txt','r')
for b in arq_b:
list_b.append(b)

arq_a = open('list_a.txt','r')
for a in arq_a:
if a not in arq_b:
list_c.append(a)

arq_c = open('list_c.txt','w')
for c in list_c:
arq_c.write(c)


I think the logic is right, if i have some items, the code is run fast. But i dont have 10 items, or 1.000, or even 100.000. I have 78.514.022 items in my list_b.txt and 78.616.777 in my list list_a.txt. I dont't know the cost of this expression: if a not in arq_b. But if i execute this code, i think wont finish in this year.



My pc have 8GB, and i allocate 15gb for swap to not explode my RAM.



My question is, there's another way to make this operation more efficiently(Faster)?




  • The list_a is ordinate but the list_b not.

  • Each item have this size: images/00000cd9fc6ae2fe9ec4bbdb2bf27318f2babc00.png

  • The order doesnt matter, i want know the surplus.










share|improve this question
















I have two lists in python list_a and list_b. The list_a have some images links, and the list_b too. 99% of the items are the same, but i have to know this 1%. The all surplus items are in list_a, that means all items in list_b are in list_a. My initial idea is subtract all items:
list_a - list_b = list_c, where the list_c are my surplus items. My code is:



list_a = 
list_b =
list_c =

arq_b = open('list_b.txt','r')
for b in arq_b:
list_b.append(b)

arq_a = open('list_a.txt','r')
for a in arq_a:
if a not in arq_b:
list_c.append(a)

arq_c = open('list_c.txt','w')
for c in list_c:
arq_c.write(c)


I think the logic is right, if i have some items, the code is run fast. But i dont have 10 items, or 1.000, or even 100.000. I have 78.514.022 items in my list_b.txt and 78.616.777 in my list list_a.txt. I dont't know the cost of this expression: if a not in arq_b. But if i execute this code, i think wont finish in this year.



My pc have 8GB, and i allocate 15gb for swap to not explode my RAM.



My question is, there's another way to make this operation more efficiently(Faster)?




  • The list_a is ordinate but the list_b not.

  • Each item have this size: images/00000cd9fc6ae2fe9ec4bbdb2bf27318f2babc00.png

  • The order doesnt matter, i want know the surplus.







python python-3.x list performance






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 days ago









Jean-François Fabre

101k954109




101k954109










asked 2 days ago









Vinicius MoraisVinicius Morais

1779




1779








  • 5





    Does the order matter? If not, try using sets. With sets, subtraction should be linear: set_c = set_a - set_b.

    – L3viathan
    2 days ago













  • But is possible make this in python?

    – Vinicius Morais
    2 days ago











  • The python will use the most efficient way to make this operation?

    – Vinicius Morais
    2 days ago






  • 1





    Yes, I mean the Python datatype set.

    – L3viathan
    2 days ago








  • 1





    @tripleee It's not a duplicate of that - that question is about mapping subtraction over a list, this question is about the difference between what's included in the lists.

    – SpoonMeiser
    2 days ago














  • 5





    Does the order matter? If not, try using sets. With sets, subtraction should be linear: set_c = set_a - set_b.

    – L3viathan
    2 days ago













  • But is possible make this in python?

    – Vinicius Morais
    2 days ago











  • The python will use the most efficient way to make this operation?

    – Vinicius Morais
    2 days ago






  • 1





    Yes, I mean the Python datatype set.

    – L3viathan
    2 days ago








  • 1





    @tripleee It's not a duplicate of that - that question is about mapping subtraction over a list, this question is about the difference between what's included in the lists.

    – SpoonMeiser
    2 days ago








5




5





Does the order matter? If not, try using sets. With sets, subtraction should be linear: set_c = set_a - set_b.

– L3viathan
2 days ago







Does the order matter? If not, try using sets. With sets, subtraction should be linear: set_c = set_a - set_b.

– L3viathan
2 days ago















But is possible make this in python?

– Vinicius Morais
2 days ago





But is possible make this in python?

– Vinicius Morais
2 days ago













The python will use the most efficient way to make this operation?

– Vinicius Morais
2 days ago





The python will use the most efficient way to make this operation?

– Vinicius Morais
2 days ago




1




1





Yes, I mean the Python datatype set.

– L3viathan
2 days ago







Yes, I mean the Python datatype set.

– L3viathan
2 days ago






1




1





@tripleee It's not a duplicate of that - that question is about mapping subtraction over a list, this question is about the difference between what's included in the lists.

– SpoonMeiser
2 days ago





@tripleee It's not a duplicate of that - that question is about mapping subtraction over a list, this question is about the difference between what's included in the lists.

– SpoonMeiser
2 days ago












4 Answers
4






active

oldest

votes


















8














you can create one set of the first file contents, then just use difference or symmetric_difference depending on what you call a difference



with open("list_a.txt") as f:
set_a = set(f)

with open("list_b.txt") as f:
diffs = set_a.difference(f)


if list_b.txt contains more items than list_a.txt you want to swap them or use set_a.symmetric_difference(f) instead, depending on what you need.



difference(f) works but still has to construct a new set internally. Not a great performance gain (see set issubset performance difference depending on the argument type), but it's shorter.






share|improve this answer


























  • Nice, this avoids having to allocate space for the second set.

    – L3viathan
    2 days ago






  • 1





    Well, not really, because internally a set is created, then thrown away. but it's thrown away faster

    – Jean-François Fabre
    2 days ago













  • But the complexity is the same of subtract sets?

    – Vinicius Morais
    2 days ago











  • @ViniciusMorais The time complexity is the same, the space complexity (apparently), too.

    – L3viathan
    2 days ago






  • 1





    @L3viathan In case the original list (the original set) is not needed anymore you can use difference_update. This should not require to allocate a new set internally.

    – a_guest
    2 days ago





















8














Try using sets:



with open("list_a.txt") as f:
set_a = set(f)

with open("list_b.txt") as f:
set_b = set(f)

set_c = set_a - set_b

with open("list_c.txt","w") as f:
for c in set_c:
f.write(c)


The complexity of subtracting two sets is O(n) in the size of the set a.






share|improve this answer





















  • 2





    You know - an open file is an iterator - therefore you can simply do set_a = set(open("list_a.txt"))

    – jsbueno
    2 days ago






  • 11





    yes but doing set(f) in with block ensures that it closes the file

    – Jean-François Fabre
    2 days ago



















2














To extend the comment of @L3viathan
If order of element is not important set is the rigth way.
here a dummy example you can adapt:



l1 = [0,1,2,3,4,5]
l2 = [3,4,5]
setL1 = set(l1) # transform the list into a set
setL2 = set(l2)
setDiff = setl1 - setl2 # make the difference
listeDiff = list(setDiff) # if you want to have your element back in a list


as you see is pretty straightforward in python.






share|improve this answer































    2














    In case order matters you can presort the lists together with item indices and then iterate over them together:



    list_2 = sorted(list_2)
    diff_idx =
    j = 0
    for i, x in sorted(enumerate(list_1), key=lambda x: x[1]):
    if x != list_2[j]:
    diff_idx.append(i)
    else:
    j += 1
    diff = [list_1[i] for i in sorted(diff_idx)]


    This has time complexity of the sorting algorithm, i.e. O(n*log n).






    share|improve this answer

























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54128876%2fwhat-is-the-most-efficient-way-to-compute-the-difference-of-lines-from-two-files%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      4 Answers
      4






      active

      oldest

      votes








      4 Answers
      4






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      8














      you can create one set of the first file contents, then just use difference or symmetric_difference depending on what you call a difference



      with open("list_a.txt") as f:
      set_a = set(f)

      with open("list_b.txt") as f:
      diffs = set_a.difference(f)


      if list_b.txt contains more items than list_a.txt you want to swap them or use set_a.symmetric_difference(f) instead, depending on what you need.



      difference(f) works but still has to construct a new set internally. Not a great performance gain (see set issubset performance difference depending on the argument type), but it's shorter.






      share|improve this answer


























      • Nice, this avoids having to allocate space for the second set.

        – L3viathan
        2 days ago






      • 1





        Well, not really, because internally a set is created, then thrown away. but it's thrown away faster

        – Jean-François Fabre
        2 days ago













      • But the complexity is the same of subtract sets?

        – Vinicius Morais
        2 days ago











      • @ViniciusMorais The time complexity is the same, the space complexity (apparently), too.

        – L3viathan
        2 days ago






      • 1





        @L3viathan In case the original list (the original set) is not needed anymore you can use difference_update. This should not require to allocate a new set internally.

        – a_guest
        2 days ago


















      8














      you can create one set of the first file contents, then just use difference or symmetric_difference depending on what you call a difference



      with open("list_a.txt") as f:
      set_a = set(f)

      with open("list_b.txt") as f:
      diffs = set_a.difference(f)


      if list_b.txt contains more items than list_a.txt you want to swap them or use set_a.symmetric_difference(f) instead, depending on what you need.



      difference(f) works but still has to construct a new set internally. Not a great performance gain (see set issubset performance difference depending on the argument type), but it's shorter.






      share|improve this answer


























      • Nice, this avoids having to allocate space for the second set.

        – L3viathan
        2 days ago






      • 1





        Well, not really, because internally a set is created, then thrown away. but it's thrown away faster

        – Jean-François Fabre
        2 days ago













      • But the complexity is the same of subtract sets?

        – Vinicius Morais
        2 days ago











      • @ViniciusMorais The time complexity is the same, the space complexity (apparently), too.

        – L3viathan
        2 days ago






      • 1





        @L3viathan In case the original list (the original set) is not needed anymore you can use difference_update. This should not require to allocate a new set internally.

        – a_guest
        2 days ago
















      8












      8








      8







      you can create one set of the first file contents, then just use difference or symmetric_difference depending on what you call a difference



      with open("list_a.txt") as f:
      set_a = set(f)

      with open("list_b.txt") as f:
      diffs = set_a.difference(f)


      if list_b.txt contains more items than list_a.txt you want to swap them or use set_a.symmetric_difference(f) instead, depending on what you need.



      difference(f) works but still has to construct a new set internally. Not a great performance gain (see set issubset performance difference depending on the argument type), but it's shorter.






      share|improve this answer















      you can create one set of the first file contents, then just use difference or symmetric_difference depending on what you call a difference



      with open("list_a.txt") as f:
      set_a = set(f)

      with open("list_b.txt") as f:
      diffs = set_a.difference(f)


      if list_b.txt contains more items than list_a.txt you want to swap them or use set_a.symmetric_difference(f) instead, depending on what you need.



      difference(f) works but still has to construct a new set internally. Not a great performance gain (see set issubset performance difference depending on the argument type), but it's shorter.







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited 2 days ago

























      answered 2 days ago









      Jean-François FabreJean-François Fabre

      101k954109




      101k954109













      • Nice, this avoids having to allocate space for the second set.

        – L3viathan
        2 days ago






      • 1





        Well, not really, because internally a set is created, then thrown away. but it's thrown away faster

        – Jean-François Fabre
        2 days ago













      • But the complexity is the same of subtract sets?

        – Vinicius Morais
        2 days ago











      • @ViniciusMorais The time complexity is the same, the space complexity (apparently), too.

        – L3viathan
        2 days ago






      • 1





        @L3viathan In case the original list (the original set) is not needed anymore you can use difference_update. This should not require to allocate a new set internally.

        – a_guest
        2 days ago





















      • Nice, this avoids having to allocate space for the second set.

        – L3viathan
        2 days ago






      • 1





        Well, not really, because internally a set is created, then thrown away. but it's thrown away faster

        – Jean-François Fabre
        2 days ago













      • But the complexity is the same of subtract sets?

        – Vinicius Morais
        2 days ago











      • @ViniciusMorais The time complexity is the same, the space complexity (apparently), too.

        – L3viathan
        2 days ago






      • 1





        @L3viathan In case the original list (the original set) is not needed anymore you can use difference_update. This should not require to allocate a new set internally.

        – a_guest
        2 days ago



















      Nice, this avoids having to allocate space for the second set.

      – L3viathan
      2 days ago





      Nice, this avoids having to allocate space for the second set.

      – L3viathan
      2 days ago




      1




      1





      Well, not really, because internally a set is created, then thrown away. but it's thrown away faster

      – Jean-François Fabre
      2 days ago







      Well, not really, because internally a set is created, then thrown away. but it's thrown away faster

      – Jean-François Fabre
      2 days ago















      But the complexity is the same of subtract sets?

      – Vinicius Morais
      2 days ago





      But the complexity is the same of subtract sets?

      – Vinicius Morais
      2 days ago













      @ViniciusMorais The time complexity is the same, the space complexity (apparently), too.

      – L3viathan
      2 days ago





      @ViniciusMorais The time complexity is the same, the space complexity (apparently), too.

      – L3viathan
      2 days ago




      1




      1





      @L3viathan In case the original list (the original set) is not needed anymore you can use difference_update. This should not require to allocate a new set internally.

      – a_guest
      2 days ago







      @L3viathan In case the original list (the original set) is not needed anymore you can use difference_update. This should not require to allocate a new set internally.

      – a_guest
      2 days ago















      8














      Try using sets:



      with open("list_a.txt") as f:
      set_a = set(f)

      with open("list_b.txt") as f:
      set_b = set(f)

      set_c = set_a - set_b

      with open("list_c.txt","w") as f:
      for c in set_c:
      f.write(c)


      The complexity of subtracting two sets is O(n) in the size of the set a.






      share|improve this answer





















      • 2





        You know - an open file is an iterator - therefore you can simply do set_a = set(open("list_a.txt"))

        – jsbueno
        2 days ago






      • 11





        yes but doing set(f) in with block ensures that it closes the file

        – Jean-François Fabre
        2 days ago
















      8














      Try using sets:



      with open("list_a.txt") as f:
      set_a = set(f)

      with open("list_b.txt") as f:
      set_b = set(f)

      set_c = set_a - set_b

      with open("list_c.txt","w") as f:
      for c in set_c:
      f.write(c)


      The complexity of subtracting two sets is O(n) in the size of the set a.






      share|improve this answer





















      • 2





        You know - an open file is an iterator - therefore you can simply do set_a = set(open("list_a.txt"))

        – jsbueno
        2 days ago






      • 11





        yes but doing set(f) in with block ensures that it closes the file

        – Jean-François Fabre
        2 days ago














      8












      8








      8







      Try using sets:



      with open("list_a.txt") as f:
      set_a = set(f)

      with open("list_b.txt") as f:
      set_b = set(f)

      set_c = set_a - set_b

      with open("list_c.txt","w") as f:
      for c in set_c:
      f.write(c)


      The complexity of subtracting two sets is O(n) in the size of the set a.






      share|improve this answer















      Try using sets:



      with open("list_a.txt") as f:
      set_a = set(f)

      with open("list_b.txt") as f:
      set_b = set(f)

      set_c = set_a - set_b

      with open("list_c.txt","w") as f:
      for c in set_c:
      f.write(c)


      The complexity of subtracting two sets is O(n) in the size of the set a.







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited 2 days ago

























      answered 2 days ago









      L3viathanL3viathan

      15.7k12847




      15.7k12847








      • 2





        You know - an open file is an iterator - therefore you can simply do set_a = set(open("list_a.txt"))

        – jsbueno
        2 days ago






      • 11





        yes but doing set(f) in with block ensures that it closes the file

        – Jean-François Fabre
        2 days ago














      • 2





        You know - an open file is an iterator - therefore you can simply do set_a = set(open("list_a.txt"))

        – jsbueno
        2 days ago






      • 11





        yes but doing set(f) in with block ensures that it closes the file

        – Jean-François Fabre
        2 days ago








      2




      2





      You know - an open file is an iterator - therefore you can simply do set_a = set(open("list_a.txt"))

      – jsbueno
      2 days ago





      You know - an open file is an iterator - therefore you can simply do set_a = set(open("list_a.txt"))

      – jsbueno
      2 days ago




      11




      11





      yes but doing set(f) in with block ensures that it closes the file

      – Jean-François Fabre
      2 days ago





      yes but doing set(f) in with block ensures that it closes the file

      – Jean-François Fabre
      2 days ago











      2














      To extend the comment of @L3viathan
      If order of element is not important set is the rigth way.
      here a dummy example you can adapt:



      l1 = [0,1,2,3,4,5]
      l2 = [3,4,5]
      setL1 = set(l1) # transform the list into a set
      setL2 = set(l2)
      setDiff = setl1 - setl2 # make the difference
      listeDiff = list(setDiff) # if you want to have your element back in a list


      as you see is pretty straightforward in python.






      share|improve this answer




























        2














        To extend the comment of @L3viathan
        If order of element is not important set is the rigth way.
        here a dummy example you can adapt:



        l1 = [0,1,2,3,4,5]
        l2 = [3,4,5]
        setL1 = set(l1) # transform the list into a set
        setL2 = set(l2)
        setDiff = setl1 - setl2 # make the difference
        listeDiff = list(setDiff) # if you want to have your element back in a list


        as you see is pretty straightforward in python.






        share|improve this answer


























          2












          2








          2







          To extend the comment of @L3viathan
          If order of element is not important set is the rigth way.
          here a dummy example you can adapt:



          l1 = [0,1,2,3,4,5]
          l2 = [3,4,5]
          setL1 = set(l1) # transform the list into a set
          setL2 = set(l2)
          setDiff = setl1 - setl2 # make the difference
          listeDiff = list(setDiff) # if you want to have your element back in a list


          as you see is pretty straightforward in python.






          share|improve this answer













          To extend the comment of @L3viathan
          If order of element is not important set is the rigth way.
          here a dummy example you can adapt:



          l1 = [0,1,2,3,4,5]
          l2 = [3,4,5]
          setL1 = set(l1) # transform the list into a set
          setL2 = set(l2)
          setDiff = setl1 - setl2 # make the difference
          listeDiff = list(setDiff) # if you want to have your element back in a list


          as you see is pretty straightforward in python.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 2 days ago









          RomainL.RomainL.

          308313




          308313























              2














              In case order matters you can presort the lists together with item indices and then iterate over them together:



              list_2 = sorted(list_2)
              diff_idx =
              j = 0
              for i, x in sorted(enumerate(list_1), key=lambda x: x[1]):
              if x != list_2[j]:
              diff_idx.append(i)
              else:
              j += 1
              diff = [list_1[i] for i in sorted(diff_idx)]


              This has time complexity of the sorting algorithm, i.e. O(n*log n).






              share|improve this answer






























                2














                In case order matters you can presort the lists together with item indices and then iterate over them together:



                list_2 = sorted(list_2)
                diff_idx =
                j = 0
                for i, x in sorted(enumerate(list_1), key=lambda x: x[1]):
                if x != list_2[j]:
                diff_idx.append(i)
                else:
                j += 1
                diff = [list_1[i] for i in sorted(diff_idx)]


                This has time complexity of the sorting algorithm, i.e. O(n*log n).






                share|improve this answer




























                  2












                  2








                  2







                  In case order matters you can presort the lists together with item indices and then iterate over them together:



                  list_2 = sorted(list_2)
                  diff_idx =
                  j = 0
                  for i, x in sorted(enumerate(list_1), key=lambda x: x[1]):
                  if x != list_2[j]:
                  diff_idx.append(i)
                  else:
                  j += 1
                  diff = [list_1[i] for i in sorted(diff_idx)]


                  This has time complexity of the sorting algorithm, i.e. O(n*log n).






                  share|improve this answer















                  In case order matters you can presort the lists together with item indices and then iterate over them together:



                  list_2 = sorted(list_2)
                  diff_idx =
                  j = 0
                  for i, x in sorted(enumerate(list_1), key=lambda x: x[1]):
                  if x != list_2[j]:
                  diff_idx.append(i)
                  else:
                  j += 1
                  diff = [list_1[i] for i in sorted(diff_idx)]


                  This has time complexity of the sorting algorithm, i.e. O(n*log n).







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited 2 days ago

























                  answered 2 days ago









                  a_guesta_guest

                  5,60821241




                  5,60821241






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54128876%2fwhat-is-the-most-efficient-way-to-compute-the-difference-of-lines-from-two-files%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      1300-talet

                      1300-talet

                      Has there ever been an instance of an active nuclear power plant within or near a war zone?