Why do the hash values differ for NaN and Inf - Inf?












18














I use this hash function a lot, i.e. to record the value of a dataframe. Wanted to see if I could break it. Why aren't these hash values identical?



This requires the digest package.



Plain text output:



> digest(Inf-Inf)
[1] "0d59b2dae9351c1ce6c76133295322d7"
> digest(NaN)
[1] "4e9653ddf814f0d16b72624aeb85bc20"
> digest(1)
[1] "6717f2823d3202449301145073ab8719"
> digest(1 + 0)
[1] "6717f2823d3202449301145073ab8719"
> digest(5)
[1] "5e338704a8e069ebd8b38ca71991cf94"
> digest(sum(1, 1, 1, 1, 1))
[1] "5e338704a8e069ebd8b38ca71991cf94"
> digest(1^0)
[1] "6717f2823d3202449301145073ab8719"
> 1^0
[1] 1
> digest(1)
[1] "6717f2823d3202449301145073ab8719"


Additional weirdness. Calculations that equal NaN have identical hash values, but NaN's hash values are not equivalent:



> Inf - Inf
[1] NaN
> 0/0
[1] NaN
> digest(Inf - Inf)
[1] "0d59b2dae9351c1ce6c76133295322d7"
> digest(0/0)
[1] "0d59b2dae9351c1ce6c76133295322d7"
> digest(NaN)
[1] "4e9653ddf814f0d16b72624aeb85bc20"









share|improve this question
























  • Looks like a bug. You could ask Dirk directly to fix it and and add a unit test for this particular case.
    – Oleg Sklyar
    2 days ago










  • What is the type of NaN if input directly? It is probably numeric/double when it results from a subtraction of a division operation.
    – Oleg Sklyar
    2 days ago






  • 1




    yes, it's numeric; str(Inf-Inf) and str(NaN) are both "num NaN" (and identical(Inf-Inf,NaN) is TRUE ...
    – Ben Bolker
    2 days ago










  • @OlegSklyar, class of NaN and class of operations that equal NaN (0/0, Inf - Inf) is "numeric".
    – King_Cordelia
    2 days ago
















18














I use this hash function a lot, i.e. to record the value of a dataframe. Wanted to see if I could break it. Why aren't these hash values identical?



This requires the digest package.



Plain text output:



> digest(Inf-Inf)
[1] "0d59b2dae9351c1ce6c76133295322d7"
> digest(NaN)
[1] "4e9653ddf814f0d16b72624aeb85bc20"
> digest(1)
[1] "6717f2823d3202449301145073ab8719"
> digest(1 + 0)
[1] "6717f2823d3202449301145073ab8719"
> digest(5)
[1] "5e338704a8e069ebd8b38ca71991cf94"
> digest(sum(1, 1, 1, 1, 1))
[1] "5e338704a8e069ebd8b38ca71991cf94"
> digest(1^0)
[1] "6717f2823d3202449301145073ab8719"
> 1^0
[1] 1
> digest(1)
[1] "6717f2823d3202449301145073ab8719"


Additional weirdness. Calculations that equal NaN have identical hash values, but NaN's hash values are not equivalent:



> Inf - Inf
[1] NaN
> 0/0
[1] NaN
> digest(Inf - Inf)
[1] "0d59b2dae9351c1ce6c76133295322d7"
> digest(0/0)
[1] "0d59b2dae9351c1ce6c76133295322d7"
> digest(NaN)
[1] "4e9653ddf814f0d16b72624aeb85bc20"









share|improve this question
























  • Looks like a bug. You could ask Dirk directly to fix it and and add a unit test for this particular case.
    – Oleg Sklyar
    2 days ago










  • What is the type of NaN if input directly? It is probably numeric/double when it results from a subtraction of a division operation.
    – Oleg Sklyar
    2 days ago






  • 1




    yes, it's numeric; str(Inf-Inf) and str(NaN) are both "num NaN" (and identical(Inf-Inf,NaN) is TRUE ...
    – Ben Bolker
    2 days ago










  • @OlegSklyar, class of NaN and class of operations that equal NaN (0/0, Inf - Inf) is "numeric".
    – King_Cordelia
    2 days ago














18












18








18


2





I use this hash function a lot, i.e. to record the value of a dataframe. Wanted to see if I could break it. Why aren't these hash values identical?



This requires the digest package.



Plain text output:



> digest(Inf-Inf)
[1] "0d59b2dae9351c1ce6c76133295322d7"
> digest(NaN)
[1] "4e9653ddf814f0d16b72624aeb85bc20"
> digest(1)
[1] "6717f2823d3202449301145073ab8719"
> digest(1 + 0)
[1] "6717f2823d3202449301145073ab8719"
> digest(5)
[1] "5e338704a8e069ebd8b38ca71991cf94"
> digest(sum(1, 1, 1, 1, 1))
[1] "5e338704a8e069ebd8b38ca71991cf94"
> digest(1^0)
[1] "6717f2823d3202449301145073ab8719"
> 1^0
[1] 1
> digest(1)
[1] "6717f2823d3202449301145073ab8719"


Additional weirdness. Calculations that equal NaN have identical hash values, but NaN's hash values are not equivalent:



> Inf - Inf
[1] NaN
> 0/0
[1] NaN
> digest(Inf - Inf)
[1] "0d59b2dae9351c1ce6c76133295322d7"
> digest(0/0)
[1] "0d59b2dae9351c1ce6c76133295322d7"
> digest(NaN)
[1] "4e9653ddf814f0d16b72624aeb85bc20"









share|improve this question















I use this hash function a lot, i.e. to record the value of a dataframe. Wanted to see if I could break it. Why aren't these hash values identical?



This requires the digest package.



Plain text output:



> digest(Inf-Inf)
[1] "0d59b2dae9351c1ce6c76133295322d7"
> digest(NaN)
[1] "4e9653ddf814f0d16b72624aeb85bc20"
> digest(1)
[1] "6717f2823d3202449301145073ab8719"
> digest(1 + 0)
[1] "6717f2823d3202449301145073ab8719"
> digest(5)
[1] "5e338704a8e069ebd8b38ca71991cf94"
> digest(sum(1, 1, 1, 1, 1))
[1] "5e338704a8e069ebd8b38ca71991cf94"
> digest(1^0)
[1] "6717f2823d3202449301145073ab8719"
> 1^0
[1] 1
> digest(1)
[1] "6717f2823d3202449301145073ab8719"


Additional weirdness. Calculations that equal NaN have identical hash values, but NaN's hash values are not equivalent:



> Inf - Inf
[1] NaN
> 0/0
[1] NaN
> digest(Inf - Inf)
[1] "0d59b2dae9351c1ce6c76133295322d7"
> digest(0/0)
[1] "0d59b2dae9351c1ce6c76133295322d7"
> digest(NaN)
[1] "4e9653ddf814f0d16b72624aeb85bc20"






r math hash digest






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 7 hours ago







King_Cordelia

















asked 2 days ago









King_CordeliaKing_Cordelia

936




936












  • Looks like a bug. You could ask Dirk directly to fix it and and add a unit test for this particular case.
    – Oleg Sklyar
    2 days ago










  • What is the type of NaN if input directly? It is probably numeric/double when it results from a subtraction of a division operation.
    – Oleg Sklyar
    2 days ago






  • 1




    yes, it's numeric; str(Inf-Inf) and str(NaN) are both "num NaN" (and identical(Inf-Inf,NaN) is TRUE ...
    – Ben Bolker
    2 days ago










  • @OlegSklyar, class of NaN and class of operations that equal NaN (0/0, Inf - Inf) is "numeric".
    – King_Cordelia
    2 days ago


















  • Looks like a bug. You could ask Dirk directly to fix it and and add a unit test for this particular case.
    – Oleg Sklyar
    2 days ago










  • What is the type of NaN if input directly? It is probably numeric/double when it results from a subtraction of a division operation.
    – Oleg Sklyar
    2 days ago






  • 1




    yes, it's numeric; str(Inf-Inf) and str(NaN) are both "num NaN" (and identical(Inf-Inf,NaN) is TRUE ...
    – Ben Bolker
    2 days ago










  • @OlegSklyar, class of NaN and class of operations that equal NaN (0/0, Inf - Inf) is "numeric".
    – King_Cordelia
    2 days ago
















Looks like a bug. You could ask Dirk directly to fix it and and add a unit test for this particular case.
– Oleg Sklyar
2 days ago




Looks like a bug. You could ask Dirk directly to fix it and and add a unit test for this particular case.
– Oleg Sklyar
2 days ago












What is the type of NaN if input directly? It is probably numeric/double when it results from a subtraction of a division operation.
– Oleg Sklyar
2 days ago




What is the type of NaN if input directly? It is probably numeric/double when it results from a subtraction of a division operation.
– Oleg Sklyar
2 days ago




1




1




yes, it's numeric; str(Inf-Inf) and str(NaN) are both "num NaN" (and identical(Inf-Inf,NaN) is TRUE ...
– Ben Bolker
2 days ago




yes, it's numeric; str(Inf-Inf) and str(NaN) are both "num NaN" (and identical(Inf-Inf,NaN) is TRUE ...
– Ben Bolker
2 days ago












@OlegSklyar, class of NaN and class of operations that equal NaN (0/0, Inf - Inf) is "numeric".
– King_Cordelia
2 days ago




@OlegSklyar, class of NaN and class of operations that equal NaN (0/0, Inf - Inf) is "numeric".
– King_Cordelia
2 days ago












2 Answers
2






active

oldest

votes


















26














tl;dr this has to do with very deep details of how NaNs are represented in binary. You could work around it by using digest(.,ascii=TRUE) ...



Following up on @Jozef's answer: note boldfaced digits ...




> base::serialize(Inf-Inf,connection=NULL)
[1] 58 0a 00 00 00 03 00 03 06 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
[26] 00 0e 00 00 00 01 ff f8 00 00 00 00 00 00
> base::serialize(NaN,connection=NULL)
[1] 58 0a 00 00 00 03 00 03 06 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
[26] 00 0e 00 00 00 01 7f f8 00 00 00 00 00 00


Alternatively, using pryr::bytes() ...



> bytes(NaN)
[1] "7F F8 00 00 00 00 00 00"
> bytes(Inf-Inf)
[1] "FF F8 00 00 00 00 00 00"


The Wikipedia article on floating point format/NaNs says:




Some operations of floating-point arithmetic are invalid, such as taking the square root of a negative number. The act of reaching an invalid result is called a floating-point exception. An exceptional result is represented by a special code called a NaN, for "Not a Number". All NaNs in IEEE 754-1985 have this format:




  • sign = either 0 or 1.

  • biased exponent = all 1 bits.

  • fraction = anything except all 0 bits (since all 0 bits represents infinity).




The sign is the first bit; the exponent is the next 11 bits; the fraction is the last 52 bits. Translating the first four hex digits given above to binary, Inf-Inf is 1111 1111 1111 0100 (sign=1; exponent is all ones, as required; fraction starts with 0100) whereas NaN is 0111 1111 1111 0100 (the same, but with sign=0).



To understand why Inf-Inf ends up with sign bit 1 and NaN has sign bit 0 you'd probably have to dig more deeply into the way floating point arithmetic is implemented on this platform ...



It might be worth raising an issue on the digest GitHub repo about this; I can't think of an elegant way to do it, but it seems reasonable that objects where identical(x,y) is TRUE in R should have identical hashes ... Note that identical() specifically ignores these differences in bit patterns via the single.NA (default TRUE) argument:




single.NA: logical indicating if there is conceptually just one numeric
‘NA’ and one ‘NaN’; ‘single.NA = FALSE’ differentiates bit
patterns.




Within the C code, it looks like R simply uses C's != operator to compare NaN values unless bitwise comparison is enabled, in which case it does an explicit check of equality of the memory locations: see here. That is, C's comparison operator appears to treat different kinds of NaN values as equivalent ...






share|improve this answer























  • Awesome, thanks.The point about some operations of floating-point arithmetic being invalid is a great one.
    – King_Cordelia
    2 days ago






  • 4




    Negative infinity is obviously bigger than positive infinity making the result negative. After all, any 2's compliment stored number has one more negative value than positive value. It's science.
    – corsiKa
    2 days ago






  • 3




    There's a good story floating around of a programmer who changed some C code that compared two arrays of floats element-by-element with a simple memcmp of the two arrays. It passed all unit tests but failed horribly for a key customer. There's the +0/-0 thing, the NaN!=NaN thing, and so on.
    – David Schwartz
    2 days ago








  • 4




    @corsiKa IEEE 754 floating point numbers are not 2's complement! They're more like sign-magnitude, which has equal ranges in the negative and the positive. You can easily see this, because floats know both +0 and -0.
    – marcelm
    2 days ago






  • 1




    @marcelm I guess you didn't umm.. get the joke. Obviously, mathematically positive nor negative infinity's size can be measured, making the first statement non-sensical... it follows the the rest of the statement is also non-sensical... so yeah... <3
    – corsiKa
    2 days ago



















8














This has to do with digest::digest using base::serialize, which gives non-identical results for the 2 mentioned objects with ascii = FALSE, which is the default passed to it by digest:



identical(
base::serialize(Inf-Inf, connection = NULL, ascii = FALSE),
base::serialize(NaN, connection = NULL, ascii = FALSE)
)
# [1] FALSE


Even though



identical(Inf-Inf, NaN)
# [1] TRUE





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54095499%2fwhy-do-the-hash-values-differ-for-nan-and-inf-inf%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    26














    tl;dr this has to do with very deep details of how NaNs are represented in binary. You could work around it by using digest(.,ascii=TRUE) ...



    Following up on @Jozef's answer: note boldfaced digits ...




    > base::serialize(Inf-Inf,connection=NULL)
    [1] 58 0a 00 00 00 03 00 03 06 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
    [26] 00 0e 00 00 00 01 ff f8 00 00 00 00 00 00
    > base::serialize(NaN,connection=NULL)
    [1] 58 0a 00 00 00 03 00 03 06 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
    [26] 00 0e 00 00 00 01 7f f8 00 00 00 00 00 00


    Alternatively, using pryr::bytes() ...



    > bytes(NaN)
    [1] "7F F8 00 00 00 00 00 00"
    > bytes(Inf-Inf)
    [1] "FF F8 00 00 00 00 00 00"


    The Wikipedia article on floating point format/NaNs says:




    Some operations of floating-point arithmetic are invalid, such as taking the square root of a negative number. The act of reaching an invalid result is called a floating-point exception. An exceptional result is represented by a special code called a NaN, for "Not a Number". All NaNs in IEEE 754-1985 have this format:




    • sign = either 0 or 1.

    • biased exponent = all 1 bits.

    • fraction = anything except all 0 bits (since all 0 bits represents infinity).




    The sign is the first bit; the exponent is the next 11 bits; the fraction is the last 52 bits. Translating the first four hex digits given above to binary, Inf-Inf is 1111 1111 1111 0100 (sign=1; exponent is all ones, as required; fraction starts with 0100) whereas NaN is 0111 1111 1111 0100 (the same, but with sign=0).



    To understand why Inf-Inf ends up with sign bit 1 and NaN has sign bit 0 you'd probably have to dig more deeply into the way floating point arithmetic is implemented on this platform ...



    It might be worth raising an issue on the digest GitHub repo about this; I can't think of an elegant way to do it, but it seems reasonable that objects where identical(x,y) is TRUE in R should have identical hashes ... Note that identical() specifically ignores these differences in bit patterns via the single.NA (default TRUE) argument:




    single.NA: logical indicating if there is conceptually just one numeric
    ‘NA’ and one ‘NaN’; ‘single.NA = FALSE’ differentiates bit
    patterns.




    Within the C code, it looks like R simply uses C's != operator to compare NaN values unless bitwise comparison is enabled, in which case it does an explicit check of equality of the memory locations: see here. That is, C's comparison operator appears to treat different kinds of NaN values as equivalent ...






    share|improve this answer























    • Awesome, thanks.The point about some operations of floating-point arithmetic being invalid is a great one.
      – King_Cordelia
      2 days ago






    • 4




      Negative infinity is obviously bigger than positive infinity making the result negative. After all, any 2's compliment stored number has one more negative value than positive value. It's science.
      – corsiKa
      2 days ago






    • 3




      There's a good story floating around of a programmer who changed some C code that compared two arrays of floats element-by-element with a simple memcmp of the two arrays. It passed all unit tests but failed horribly for a key customer. There's the +0/-0 thing, the NaN!=NaN thing, and so on.
      – David Schwartz
      2 days ago








    • 4




      @corsiKa IEEE 754 floating point numbers are not 2's complement! They're more like sign-magnitude, which has equal ranges in the negative and the positive. You can easily see this, because floats know both +0 and -0.
      – marcelm
      2 days ago






    • 1




      @marcelm I guess you didn't umm.. get the joke. Obviously, mathematically positive nor negative infinity's size can be measured, making the first statement non-sensical... it follows the the rest of the statement is also non-sensical... so yeah... <3
      – corsiKa
      2 days ago
















    26














    tl;dr this has to do with very deep details of how NaNs are represented in binary. You could work around it by using digest(.,ascii=TRUE) ...



    Following up on @Jozef's answer: note boldfaced digits ...




    > base::serialize(Inf-Inf,connection=NULL)
    [1] 58 0a 00 00 00 03 00 03 06 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
    [26] 00 0e 00 00 00 01 ff f8 00 00 00 00 00 00
    > base::serialize(NaN,connection=NULL)
    [1] 58 0a 00 00 00 03 00 03 06 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
    [26] 00 0e 00 00 00 01 7f f8 00 00 00 00 00 00


    Alternatively, using pryr::bytes() ...



    > bytes(NaN)
    [1] "7F F8 00 00 00 00 00 00"
    > bytes(Inf-Inf)
    [1] "FF F8 00 00 00 00 00 00"


    The Wikipedia article on floating point format/NaNs says:




    Some operations of floating-point arithmetic are invalid, such as taking the square root of a negative number. The act of reaching an invalid result is called a floating-point exception. An exceptional result is represented by a special code called a NaN, for "Not a Number". All NaNs in IEEE 754-1985 have this format:




    • sign = either 0 or 1.

    • biased exponent = all 1 bits.

    • fraction = anything except all 0 bits (since all 0 bits represents infinity).




    The sign is the first bit; the exponent is the next 11 bits; the fraction is the last 52 bits. Translating the first four hex digits given above to binary, Inf-Inf is 1111 1111 1111 0100 (sign=1; exponent is all ones, as required; fraction starts with 0100) whereas NaN is 0111 1111 1111 0100 (the same, but with sign=0).



    To understand why Inf-Inf ends up with sign bit 1 and NaN has sign bit 0 you'd probably have to dig more deeply into the way floating point arithmetic is implemented on this platform ...



    It might be worth raising an issue on the digest GitHub repo about this; I can't think of an elegant way to do it, but it seems reasonable that objects where identical(x,y) is TRUE in R should have identical hashes ... Note that identical() specifically ignores these differences in bit patterns via the single.NA (default TRUE) argument:




    single.NA: logical indicating if there is conceptually just one numeric
    ‘NA’ and one ‘NaN’; ‘single.NA = FALSE’ differentiates bit
    patterns.




    Within the C code, it looks like R simply uses C's != operator to compare NaN values unless bitwise comparison is enabled, in which case it does an explicit check of equality of the memory locations: see here. That is, C's comparison operator appears to treat different kinds of NaN values as equivalent ...






    share|improve this answer























    • Awesome, thanks.The point about some operations of floating-point arithmetic being invalid is a great one.
      – King_Cordelia
      2 days ago






    • 4




      Negative infinity is obviously bigger than positive infinity making the result negative. After all, any 2's compliment stored number has one more negative value than positive value. It's science.
      – corsiKa
      2 days ago






    • 3




      There's a good story floating around of a programmer who changed some C code that compared two arrays of floats element-by-element with a simple memcmp of the two arrays. It passed all unit tests but failed horribly for a key customer. There's the +0/-0 thing, the NaN!=NaN thing, and so on.
      – David Schwartz
      2 days ago








    • 4




      @corsiKa IEEE 754 floating point numbers are not 2's complement! They're more like sign-magnitude, which has equal ranges in the negative and the positive. You can easily see this, because floats know both +0 and -0.
      – marcelm
      2 days ago






    • 1




      @marcelm I guess you didn't umm.. get the joke. Obviously, mathematically positive nor negative infinity's size can be measured, making the first statement non-sensical... it follows the the rest of the statement is also non-sensical... so yeah... <3
      – corsiKa
      2 days ago














    26












    26








    26






    tl;dr this has to do with very deep details of how NaNs are represented in binary. You could work around it by using digest(.,ascii=TRUE) ...



    Following up on @Jozef's answer: note boldfaced digits ...




    > base::serialize(Inf-Inf,connection=NULL)
    [1] 58 0a 00 00 00 03 00 03 06 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
    [26] 00 0e 00 00 00 01 ff f8 00 00 00 00 00 00
    > base::serialize(NaN,connection=NULL)
    [1] 58 0a 00 00 00 03 00 03 06 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
    [26] 00 0e 00 00 00 01 7f f8 00 00 00 00 00 00


    Alternatively, using pryr::bytes() ...



    > bytes(NaN)
    [1] "7F F8 00 00 00 00 00 00"
    > bytes(Inf-Inf)
    [1] "FF F8 00 00 00 00 00 00"


    The Wikipedia article on floating point format/NaNs says:




    Some operations of floating-point arithmetic are invalid, such as taking the square root of a negative number. The act of reaching an invalid result is called a floating-point exception. An exceptional result is represented by a special code called a NaN, for "Not a Number". All NaNs in IEEE 754-1985 have this format:




    • sign = either 0 or 1.

    • biased exponent = all 1 bits.

    • fraction = anything except all 0 bits (since all 0 bits represents infinity).




    The sign is the first bit; the exponent is the next 11 bits; the fraction is the last 52 bits. Translating the first four hex digits given above to binary, Inf-Inf is 1111 1111 1111 0100 (sign=1; exponent is all ones, as required; fraction starts with 0100) whereas NaN is 0111 1111 1111 0100 (the same, but with sign=0).



    To understand why Inf-Inf ends up with sign bit 1 and NaN has sign bit 0 you'd probably have to dig more deeply into the way floating point arithmetic is implemented on this platform ...



    It might be worth raising an issue on the digest GitHub repo about this; I can't think of an elegant way to do it, but it seems reasonable that objects where identical(x,y) is TRUE in R should have identical hashes ... Note that identical() specifically ignores these differences in bit patterns via the single.NA (default TRUE) argument:




    single.NA: logical indicating if there is conceptually just one numeric
    ‘NA’ and one ‘NaN’; ‘single.NA = FALSE’ differentiates bit
    patterns.




    Within the C code, it looks like R simply uses C's != operator to compare NaN values unless bitwise comparison is enabled, in which case it does an explicit check of equality of the memory locations: see here. That is, C's comparison operator appears to treat different kinds of NaN values as equivalent ...






    share|improve this answer














    tl;dr this has to do with very deep details of how NaNs are represented in binary. You could work around it by using digest(.,ascii=TRUE) ...



    Following up on @Jozef's answer: note boldfaced digits ...




    > base::serialize(Inf-Inf,connection=NULL)
    [1] 58 0a 00 00 00 03 00 03 06 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
    [26] 00 0e 00 00 00 01 ff f8 00 00 00 00 00 00
    > base::serialize(NaN,connection=NULL)
    [1] 58 0a 00 00 00 03 00 03 06 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
    [26] 00 0e 00 00 00 01 7f f8 00 00 00 00 00 00


    Alternatively, using pryr::bytes() ...



    > bytes(NaN)
    [1] "7F F8 00 00 00 00 00 00"
    > bytes(Inf-Inf)
    [1] "FF F8 00 00 00 00 00 00"


    The Wikipedia article on floating point format/NaNs says:




    Some operations of floating-point arithmetic are invalid, such as taking the square root of a negative number. The act of reaching an invalid result is called a floating-point exception. An exceptional result is represented by a special code called a NaN, for "Not a Number". All NaNs in IEEE 754-1985 have this format:




    • sign = either 0 or 1.

    • biased exponent = all 1 bits.

    • fraction = anything except all 0 bits (since all 0 bits represents infinity).




    The sign is the first bit; the exponent is the next 11 bits; the fraction is the last 52 bits. Translating the first four hex digits given above to binary, Inf-Inf is 1111 1111 1111 0100 (sign=1; exponent is all ones, as required; fraction starts with 0100) whereas NaN is 0111 1111 1111 0100 (the same, but with sign=0).



    To understand why Inf-Inf ends up with sign bit 1 and NaN has sign bit 0 you'd probably have to dig more deeply into the way floating point arithmetic is implemented on this platform ...



    It might be worth raising an issue on the digest GitHub repo about this; I can't think of an elegant way to do it, but it seems reasonable that objects where identical(x,y) is TRUE in R should have identical hashes ... Note that identical() specifically ignores these differences in bit patterns via the single.NA (default TRUE) argument:




    single.NA: logical indicating if there is conceptually just one numeric
    ‘NA’ and one ‘NaN’; ‘single.NA = FALSE’ differentiates bit
    patterns.




    Within the C code, it looks like R simply uses C's != operator to compare NaN values unless bitwise comparison is enabled, in which case it does an explicit check of equality of the memory locations: see here. That is, C's comparison operator appears to treat different kinds of NaN values as equivalent ...







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 2 days ago

























    answered 2 days ago









    Ben BolkerBen Bolker

    133k11223311




    133k11223311












    • Awesome, thanks.The point about some operations of floating-point arithmetic being invalid is a great one.
      – King_Cordelia
      2 days ago






    • 4




      Negative infinity is obviously bigger than positive infinity making the result negative. After all, any 2's compliment stored number has one more negative value than positive value. It's science.
      – corsiKa
      2 days ago






    • 3




      There's a good story floating around of a programmer who changed some C code that compared two arrays of floats element-by-element with a simple memcmp of the two arrays. It passed all unit tests but failed horribly for a key customer. There's the +0/-0 thing, the NaN!=NaN thing, and so on.
      – David Schwartz
      2 days ago








    • 4




      @corsiKa IEEE 754 floating point numbers are not 2's complement! They're more like sign-magnitude, which has equal ranges in the negative and the positive. You can easily see this, because floats know both +0 and -0.
      – marcelm
      2 days ago






    • 1




      @marcelm I guess you didn't umm.. get the joke. Obviously, mathematically positive nor negative infinity's size can be measured, making the first statement non-sensical... it follows the the rest of the statement is also non-sensical... so yeah... <3
      – corsiKa
      2 days ago


















    • Awesome, thanks.The point about some operations of floating-point arithmetic being invalid is a great one.
      – King_Cordelia
      2 days ago






    • 4




      Negative infinity is obviously bigger than positive infinity making the result negative. After all, any 2's compliment stored number has one more negative value than positive value. It's science.
      – corsiKa
      2 days ago






    • 3




      There's a good story floating around of a programmer who changed some C code that compared two arrays of floats element-by-element with a simple memcmp of the two arrays. It passed all unit tests but failed horribly for a key customer. There's the +0/-0 thing, the NaN!=NaN thing, and so on.
      – David Schwartz
      2 days ago








    • 4




      @corsiKa IEEE 754 floating point numbers are not 2's complement! They're more like sign-magnitude, which has equal ranges in the negative and the positive. You can easily see this, because floats know both +0 and -0.
      – marcelm
      2 days ago






    • 1




      @marcelm I guess you didn't umm.. get the joke. Obviously, mathematically positive nor negative infinity's size can be measured, making the first statement non-sensical... it follows the the rest of the statement is also non-sensical... so yeah... <3
      – corsiKa
      2 days ago
















    Awesome, thanks.The point about some operations of floating-point arithmetic being invalid is a great one.
    – King_Cordelia
    2 days ago




    Awesome, thanks.The point about some operations of floating-point arithmetic being invalid is a great one.
    – King_Cordelia
    2 days ago




    4




    4




    Negative infinity is obviously bigger than positive infinity making the result negative. After all, any 2's compliment stored number has one more negative value than positive value. It's science.
    – corsiKa
    2 days ago




    Negative infinity is obviously bigger than positive infinity making the result negative. After all, any 2's compliment stored number has one more negative value than positive value. It's science.
    – corsiKa
    2 days ago




    3




    3




    There's a good story floating around of a programmer who changed some C code that compared two arrays of floats element-by-element with a simple memcmp of the two arrays. It passed all unit tests but failed horribly for a key customer. There's the +0/-0 thing, the NaN!=NaN thing, and so on.
    – David Schwartz
    2 days ago






    There's a good story floating around of a programmer who changed some C code that compared two arrays of floats element-by-element with a simple memcmp of the two arrays. It passed all unit tests but failed horribly for a key customer. There's the +0/-0 thing, the NaN!=NaN thing, and so on.
    – David Schwartz
    2 days ago






    4




    4




    @corsiKa IEEE 754 floating point numbers are not 2's complement! They're more like sign-magnitude, which has equal ranges in the negative and the positive. You can easily see this, because floats know both +0 and -0.
    – marcelm
    2 days ago




    @corsiKa IEEE 754 floating point numbers are not 2's complement! They're more like sign-magnitude, which has equal ranges in the negative and the positive. You can easily see this, because floats know both +0 and -0.
    – marcelm
    2 days ago




    1




    1




    @marcelm I guess you didn't umm.. get the joke. Obviously, mathematically positive nor negative infinity's size can be measured, making the first statement non-sensical... it follows the the rest of the statement is also non-sensical... so yeah... <3
    – corsiKa
    2 days ago




    @marcelm I guess you didn't umm.. get the joke. Obviously, mathematically positive nor negative infinity's size can be measured, making the first statement non-sensical... it follows the the rest of the statement is also non-sensical... so yeah... <3
    – corsiKa
    2 days ago













    8














    This has to do with digest::digest using base::serialize, which gives non-identical results for the 2 mentioned objects with ascii = FALSE, which is the default passed to it by digest:



    identical(
    base::serialize(Inf-Inf, connection = NULL, ascii = FALSE),
    base::serialize(NaN, connection = NULL, ascii = FALSE)
    )
    # [1] FALSE


    Even though



    identical(Inf-Inf, NaN)
    # [1] TRUE





    share|improve this answer




























      8














      This has to do with digest::digest using base::serialize, which gives non-identical results for the 2 mentioned objects with ascii = FALSE, which is the default passed to it by digest:



      identical(
      base::serialize(Inf-Inf, connection = NULL, ascii = FALSE),
      base::serialize(NaN, connection = NULL, ascii = FALSE)
      )
      # [1] FALSE


      Even though



      identical(Inf-Inf, NaN)
      # [1] TRUE





      share|improve this answer


























        8












        8








        8






        This has to do with digest::digest using base::serialize, which gives non-identical results for the 2 mentioned objects with ascii = FALSE, which is the default passed to it by digest:



        identical(
        base::serialize(Inf-Inf, connection = NULL, ascii = FALSE),
        base::serialize(NaN, connection = NULL, ascii = FALSE)
        )
        # [1] FALSE


        Even though



        identical(Inf-Inf, NaN)
        # [1] TRUE





        share|improve this answer














        This has to do with digest::digest using base::serialize, which gives non-identical results for the 2 mentioned objects with ascii = FALSE, which is the default passed to it by digest:



        identical(
        base::serialize(Inf-Inf, connection = NULL, ascii = FALSE),
        base::serialize(NaN, connection = NULL, ascii = FALSE)
        )
        # [1] FALSE


        Even though



        identical(Inf-Inf, NaN)
        # [1] TRUE






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 2 days ago

























        answered 2 days ago









        JozefJozef

        849210




        849210






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54095499%2fwhy-do-the-hash-values-differ-for-nan-and-inf-inf%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            An IMO inspired problem

            Management

            Has there ever been an instance of an active nuclear power plant within or near a war zone?