regexpr syntax in R












8














Going crazy here over syntax of regexpr in R



I am trying the following which should allow me to get everything between productUrl:// and the following ?



(?<="productUrl":"//)(.*?)(?=?)



The above works on https://regexr.com/



I am then trying to escape the backslashes to fit that string into the grep function but with no luck. What is the proper way of doing it ?



EDIT:



Added link to example



EDIT2: I actually need to extract the substrings that match my pattern so grep may be used in conjunction with another function.










share|improve this question





























    8














    Going crazy here over syntax of regexpr in R



    I am trying the following which should allow me to get everything between productUrl:// and the following ?



    (?<="productUrl":"//)(.*?)(?=?)



    The above works on https://regexr.com/



    I am then trying to escape the backslashes to fit that string into the grep function but with no luck. What is the proper way of doing it ?



    EDIT:



    Added link to example



    EDIT2: I actually need to extract the substrings that match my pattern so grep may be used in conjunction with another function.










    share|improve this question



























      8












      8








      8


      1





      Going crazy here over syntax of regexpr in R



      I am trying the following which should allow me to get everything between productUrl:// and the following ?



      (?<="productUrl":"//)(.*?)(?=?)



      The above works on https://regexr.com/



      I am then trying to escape the backslashes to fit that string into the grep function but with no luck. What is the proper way of doing it ?



      EDIT:



      Added link to example



      EDIT2: I actually need to extract the substrings that match my pattern so grep may be used in conjunction with another function.










      share|improve this question















      Going crazy here over syntax of regexpr in R



      I am trying the following which should allow me to get everything between productUrl:// and the following ?



      (?<="productUrl":"//)(.*?)(?=?)



      The above works on https://regexr.com/



      I am then trying to escape the backslashes to fit that string into the grep function but with no luck. What is the proper way of doing it ?



      EDIT:



      Added link to example



      EDIT2: I actually need to extract the substrings that match my pattern so grep may be used in conjunction with another function.







      r regex






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited yesterday







      Chapo

















      asked yesterday









      ChapoChapo

      83111434




      83111434
























          1 Answer
          1






          active

          oldest

          votes


















          10














          Note you do not need to escape / in R regex patterns as they are defined with string literals and / is not a special regex metacharacter. If you want to write a " inside "..." string literal, you should escape it with a single , as you are already doing.



          You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?) into a negated character class:



          grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)


          The [^?]* negated character class matches any 0 or more chars other than ?.



          If the string you are checking against has no double quotes remove them from the lookbehind:



          grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)


          Instead of the lookbehind, you may also use K to omit the part of text matched:



          grep('productUrl://\K[^?]*', x, perl=TRUE)
          ^^^


          Actually, you do not even need the capturing group in your pattern.



          Solving the actual task



          You cannot extract substrings with grep in R, you can only find/identify elements to fetch from a character vector using grep. To extract substrings, you need to use base R regmatches or stringr str_extract/str_extract_all or similar match functions.



          Example with base R:



          > x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":[{"name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":[{"domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0}],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"}],"restrictedAge":0,"categories":[1438,1565,4776,7305'
          > regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))
          [[1]]
          [1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
          [2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"


          With stringr:



          > library(stringr)
          > str_extract_all(x, '(?<="productUrl":")[^?"]*')
          [[1]]
          [1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
          [2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"





          share|improve this answer























          • Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
            – Chapo
            yesterday










          • regexr.com/45ug5 is with my example
            – Chapo
            yesterday










          • I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice
            – Chapo
            yesterday










          • What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.
            – Wiktor Stribiżew
            yesterday












          • I've put an input example in the link in previous comment : regexr.com/45ug5
            – Chapo
            yesterday











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54070796%2fregexpr-syntax-in-r%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          10














          Note you do not need to escape / in R regex patterns as they are defined with string literals and / is not a special regex metacharacter. If you want to write a " inside "..." string literal, you should escape it with a single , as you are already doing.



          You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?) into a negated character class:



          grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)


          The [^?]* negated character class matches any 0 or more chars other than ?.



          If the string you are checking against has no double quotes remove them from the lookbehind:



          grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)


          Instead of the lookbehind, you may also use K to omit the part of text matched:



          grep('productUrl://\K[^?]*', x, perl=TRUE)
          ^^^


          Actually, you do not even need the capturing group in your pattern.



          Solving the actual task



          You cannot extract substrings with grep in R, you can only find/identify elements to fetch from a character vector using grep. To extract substrings, you need to use base R regmatches or stringr str_extract/str_extract_all or similar match functions.



          Example with base R:



          > x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":[{"name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":[{"domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0}],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"}],"restrictedAge":0,"categories":[1438,1565,4776,7305'
          > regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))
          [[1]]
          [1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
          [2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"


          With stringr:



          > library(stringr)
          > str_extract_all(x, '(?<="productUrl":")[^?"]*')
          [[1]]
          [1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
          [2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"





          share|improve this answer























          • Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
            – Chapo
            yesterday










          • regexr.com/45ug5 is with my example
            – Chapo
            yesterday










          • I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice
            – Chapo
            yesterday










          • What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.
            – Wiktor Stribiżew
            yesterday












          • I've put an input example in the link in previous comment : regexr.com/45ug5
            – Chapo
            yesterday
















          10














          Note you do not need to escape / in R regex patterns as they are defined with string literals and / is not a special regex metacharacter. If you want to write a " inside "..." string literal, you should escape it with a single , as you are already doing.



          You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?) into a negated character class:



          grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)


          The [^?]* negated character class matches any 0 or more chars other than ?.



          If the string you are checking against has no double quotes remove them from the lookbehind:



          grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)


          Instead of the lookbehind, you may also use K to omit the part of text matched:



          grep('productUrl://\K[^?]*', x, perl=TRUE)
          ^^^


          Actually, you do not even need the capturing group in your pattern.



          Solving the actual task



          You cannot extract substrings with grep in R, you can only find/identify elements to fetch from a character vector using grep. To extract substrings, you need to use base R regmatches or stringr str_extract/str_extract_all or similar match functions.



          Example with base R:



          > x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":[{"name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":[{"domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0}],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"}],"restrictedAge":0,"categories":[1438,1565,4776,7305'
          > regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))
          [[1]]
          [1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
          [2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"


          With stringr:



          > library(stringr)
          > str_extract_all(x, '(?<="productUrl":")[^?"]*')
          [[1]]
          [1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
          [2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"





          share|improve this answer























          • Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
            – Chapo
            yesterday










          • regexr.com/45ug5 is with my example
            – Chapo
            yesterday










          • I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice
            – Chapo
            yesterday










          • What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.
            – Wiktor Stribiżew
            yesterday












          • I've put an input example in the link in previous comment : regexr.com/45ug5
            – Chapo
            yesterday














          10












          10








          10






          Note you do not need to escape / in R regex patterns as they are defined with string literals and / is not a special regex metacharacter. If you want to write a " inside "..." string literal, you should escape it with a single , as you are already doing.



          You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?) into a negated character class:



          grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)


          The [^?]* negated character class matches any 0 or more chars other than ?.



          If the string you are checking against has no double quotes remove them from the lookbehind:



          grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)


          Instead of the lookbehind, you may also use K to omit the part of text matched:



          grep('productUrl://\K[^?]*', x, perl=TRUE)
          ^^^


          Actually, you do not even need the capturing group in your pattern.



          Solving the actual task



          You cannot extract substrings with grep in R, you can only find/identify elements to fetch from a character vector using grep. To extract substrings, you need to use base R regmatches or stringr str_extract/str_extract_all or similar match functions.



          Example with base R:



          > x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":[{"name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":[{"domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0}],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"}],"restrictedAge":0,"categories":[1438,1565,4776,7305'
          > regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))
          [[1]]
          [1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
          [2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"


          With stringr:



          > library(stringr)
          > str_extract_all(x, '(?<="productUrl":")[^?"]*')
          [[1]]
          [1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
          [2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"





          share|improve this answer














          Note you do not need to escape / in R regex patterns as they are defined with string literals and / is not a special regex metacharacter. If you want to write a " inside "..." string literal, you should escape it with a single , as you are already doing.



          You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?) into a negated character class:



          grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)


          The [^?]* negated character class matches any 0 or more chars other than ?.



          If the string you are checking against has no double quotes remove them from the lookbehind:



          grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)


          Instead of the lookbehind, you may also use K to omit the part of text matched:



          grep('productUrl://\K[^?]*', x, perl=TRUE)
          ^^^


          Actually, you do not even need the capturing group in your pattern.



          Solving the actual task



          You cannot extract substrings with grep in R, you can only find/identify elements to fetch from a character vector using grep. To extract substrings, you need to use base R regmatches or stringr str_extract/str_extract_all or similar match functions.



          Example with base R:



          > x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":[{"name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":[{"domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0}],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"}],"restrictedAge":0,"categories":[1438,1565,4776,7305'
          > regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))
          [[1]]
          [1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
          [2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"


          With stringr:



          > library(stringr)
          > str_extract_all(x, '(?<="productUrl":")[^?"]*')
          [[1]]
          [1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
          [2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited yesterday

























          answered yesterday









          Wiktor StribiżewWiktor Stribiżew

          309k16127203




          309k16127203












          • Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
            – Chapo
            yesterday










          • regexr.com/45ug5 is with my example
            – Chapo
            yesterday










          • I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice
            – Chapo
            yesterday










          • What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.
            – Wiktor Stribiżew
            yesterday












          • I've put an input example in the link in previous comment : regexr.com/45ug5
            – Chapo
            yesterday


















          • Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
            – Chapo
            yesterday










          • regexr.com/45ug5 is with my example
            – Chapo
            yesterday










          • I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice
            – Chapo
            yesterday










          • What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.
            – Wiktor Stribiżew
            yesterday












          • I've put an input example in the link in previous comment : regexr.com/45ug5
            – Chapo
            yesterday
















          Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
          – Chapo
          yesterday




          Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
          – Chapo
          yesterday












          regexr.com/45ug5 is with my example
          – Chapo
          yesterday




          regexr.com/45ug5 is with my example
          – Chapo
          yesterday












          I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice
          – Chapo
          yesterday




          I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice
          – Chapo
          yesterday












          What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.
          – Wiktor Stribiżew
          yesterday






          What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.
          – Wiktor Stribiżew
          yesterday














          I've put an input example in the link in previous comment : regexr.com/45ug5
          – Chapo
          yesterday




          I've put an input example in the link in previous comment : regexr.com/45ug5
          – Chapo
          yesterday


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54070796%2fregexpr-syntax-in-r%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          An IMO inspired problem

          Management

          Has there ever been an instance of an active nuclear power plant within or near a war zone?