regexpr syntax in R
Going crazy here over syntax of regexpr in R
I am trying the following which should allow me to get everything between productUrl://
and the following ?
(?<="productUrl":"//)(.*?)(?=?)
The above works on https://regexr.com/
I am then trying to escape the backslashes to fit that string into the grep
function but with no luck. What is the proper way of doing it ?
EDIT:
Added link to example
EDIT2: I actually need to extract the substrings that match my pattern so grep
may be used in conjunction with another function.
r regex
add a comment |
Going crazy here over syntax of regexpr in R
I am trying the following which should allow me to get everything between productUrl://
and the following ?
(?<="productUrl":"//)(.*?)(?=?)
The above works on https://regexr.com/
I am then trying to escape the backslashes to fit that string into the grep
function but with no luck. What is the proper way of doing it ?
EDIT:
Added link to example
EDIT2: I actually need to extract the substrings that match my pattern so grep
may be used in conjunction with another function.
r regex
add a comment |
Going crazy here over syntax of regexpr in R
I am trying the following which should allow me to get everything between productUrl://
and the following ?
(?<="productUrl":"//)(.*?)(?=?)
The above works on https://regexr.com/
I am then trying to escape the backslashes to fit that string into the grep
function but with no luck. What is the proper way of doing it ?
EDIT:
Added link to example
EDIT2: I actually need to extract the substrings that match my pattern so grep
may be used in conjunction with another function.
r regex
Going crazy here over syntax of regexpr in R
I am trying the following which should allow me to get everything between productUrl://
and the following ?
(?<="productUrl":"//)(.*?)(?=?)
The above works on https://regexr.com/
I am then trying to escape the backslashes to fit that string into the grep
function but with no luck. What is the proper way of doing it ?
EDIT:
Added link to example
EDIT2: I actually need to extract the substrings that match my pattern so grep
may be used in conjunction with another function.
r regex
r regex
edited yesterday
Chapo
asked yesterday
ChapoChapo
83111434
83111434
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Note you do not need to escape /
in R regex patterns as they are defined with string literals and /
is not a special regex metacharacter. If you want to write a "
inside "..."
string literal, you should escape it with a single , as you are already doing.
You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?)
into a negated character class:
grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)
The [^?]*
negated character class matches any 0 or more chars other than ?
.
If the string you are checking against has no double quotes remove them from the lookbehind:
grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)
Instead of the lookbehind, you may also use K
to omit the part of text matched:
grep('productUrl://\K[^?]*', x, perl=TRUE)
^^^
Actually, you do not even need the capturing group in your pattern.
Solving the actual task
You cannot extract substrings with grep
in R, you can only find/identify elements to fetch from a character vector using grep
. To extract substrings, you need to use base R regmatches
or stringr str_extract
/str_extract_all
or similar match
functions.
Example with base R:
> x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":[{"name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":[{"domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0}],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"}],"restrictedAge":0,"categories":[1438,1565,4776,7305'
> regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
With stringr:
> library(stringr)
> str_extract_all(x, '(?<="productUrl":")[^?"]*')
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
– Chapo
yesterday
regexr.com/45ug5 is with my example
– Chapo
yesterday
I triedgrep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE)
following your advice
– Chapo
yesterday
What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it ingrep
command, that is why I used it in the answer.
– Wiktor Stribiżew
yesterday
I've put an input example in the link in previous comment : regexr.com/45ug5
– Chapo
yesterday
|
show 4 more comments
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54070796%2fregexpr-syntax-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Note you do not need to escape /
in R regex patterns as they are defined with string literals and /
is not a special regex metacharacter. If you want to write a "
inside "..."
string literal, you should escape it with a single , as you are already doing.
You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?)
into a negated character class:
grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)
The [^?]*
negated character class matches any 0 or more chars other than ?
.
If the string you are checking against has no double quotes remove them from the lookbehind:
grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)
Instead of the lookbehind, you may also use K
to omit the part of text matched:
grep('productUrl://\K[^?]*', x, perl=TRUE)
^^^
Actually, you do not even need the capturing group in your pattern.
Solving the actual task
You cannot extract substrings with grep
in R, you can only find/identify elements to fetch from a character vector using grep
. To extract substrings, you need to use base R regmatches
or stringr str_extract
/str_extract_all
or similar match
functions.
Example with base R:
> x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":[{"name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":[{"domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0}],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"}],"restrictedAge":0,"categories":[1438,1565,4776,7305'
> regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
With stringr:
> library(stringr)
> str_extract_all(x, '(?<="productUrl":")[^?"]*')
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
– Chapo
yesterday
regexr.com/45ug5 is with my example
– Chapo
yesterday
I triedgrep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE)
following your advice
– Chapo
yesterday
What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it ingrep
command, that is why I used it in the answer.
– Wiktor Stribiżew
yesterday
I've put an input example in the link in previous comment : regexr.com/45ug5
– Chapo
yesterday
|
show 4 more comments
Note you do not need to escape /
in R regex patterns as they are defined with string literals and /
is not a special regex metacharacter. If you want to write a "
inside "..."
string literal, you should escape it with a single , as you are already doing.
You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?)
into a negated character class:
grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)
The [^?]*
negated character class matches any 0 or more chars other than ?
.
If the string you are checking against has no double quotes remove them from the lookbehind:
grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)
Instead of the lookbehind, you may also use K
to omit the part of text matched:
grep('productUrl://\K[^?]*', x, perl=TRUE)
^^^
Actually, you do not even need the capturing group in your pattern.
Solving the actual task
You cannot extract substrings with grep
in R, you can only find/identify elements to fetch from a character vector using grep
. To extract substrings, you need to use base R regmatches
or stringr str_extract
/str_extract_all
or similar match
functions.
Example with base R:
> x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":[{"name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":[{"domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0}],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"}],"restrictedAge":0,"categories":[1438,1565,4776,7305'
> regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
With stringr:
> library(stringr)
> str_extract_all(x, '(?<="productUrl":")[^?"]*')
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
– Chapo
yesterday
regexr.com/45ug5 is with my example
– Chapo
yesterday
I triedgrep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE)
following your advice
– Chapo
yesterday
What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it ingrep
command, that is why I used it in the answer.
– Wiktor Stribiżew
yesterday
I've put an input example in the link in previous comment : regexr.com/45ug5
– Chapo
yesterday
|
show 4 more comments
Note you do not need to escape /
in R regex patterns as they are defined with string literals and /
is not a special regex metacharacter. If you want to write a "
inside "..."
string literal, you should escape it with a single , as you are already doing.
You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?)
into a negated character class:
grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)
The [^?]*
negated character class matches any 0 or more chars other than ?
.
If the string you are checking against has no double quotes remove them from the lookbehind:
grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)
Instead of the lookbehind, you may also use K
to omit the part of text matched:
grep('productUrl://\K[^?]*', x, perl=TRUE)
^^^
Actually, you do not even need the capturing group in your pattern.
Solving the actual task
You cannot extract substrings with grep
in R, you can only find/identify elements to fetch from a character vector using grep
. To extract substrings, you need to use base R regmatches
or stringr str_extract
/str_extract_all
or similar match
functions.
Example with base R:
> x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":[{"name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":[{"domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0}],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"}],"restrictedAge":0,"categories":[1438,1565,4776,7305'
> regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
With stringr:
> library(stringr)
> str_extract_all(x, '(?<="productUrl":")[^?"]*')
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
Note you do not need to escape /
in R regex patterns as they are defined with string literals and /
is not a special regex metacharacter. If you want to write a "
inside "..."
string literal, you should escape it with a single , as you are already doing.
You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?)
into a negated character class:
grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)
The [^?]*
negated character class matches any 0 or more chars other than ?
.
If the string you are checking against has no double quotes remove them from the lookbehind:
grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)
Instead of the lookbehind, you may also use K
to omit the part of text matched:
grep('productUrl://\K[^?]*', x, perl=TRUE)
^^^
Actually, you do not even need the capturing group in your pattern.
Solving the actual task
You cannot extract substrings with grep
in R, you can only find/identify elements to fetch from a character vector using grep
. To extract substrings, you need to use base R regmatches
or stringr str_extract
/str_extract_all
or similar match
functions.
Example with base R:
> x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":[{"name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":[{"domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0}],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"}],"restrictedAge":0,"categories":[1438,1565,4776,7305'
> regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
With stringr:
> library(stringr)
> str_extract_all(x, '(?<="productUrl":")[^?"]*')
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
edited yesterday
answered yesterday
Wiktor StribiżewWiktor Stribiżew
309k16127203
309k16127203
Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
– Chapo
yesterday
regexr.com/45ug5 is with my example
– Chapo
yesterday
I triedgrep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE)
following your advice
– Chapo
yesterday
What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it ingrep
command, that is why I used it in the answer.
– Wiktor Stribiżew
yesterday
I've put an input example in the link in previous comment : regexr.com/45ug5
– Chapo
yesterday
|
show 4 more comments
Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
– Chapo
yesterday
regexr.com/45ug5 is with my example
– Chapo
yesterday
I triedgrep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE)
following your advice
– Chapo
yesterday
What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it ingrep
command, that is why I used it in the answer.
– Wiktor Stribiżew
yesterday
I've put an input example in the link in previous comment : regexr.com/45ug5
– Chapo
yesterday
Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
– Chapo
yesterday
Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
– Chapo
yesterday
regexr.com/45ug5 is with my example
– Chapo
yesterday
regexr.com/45ug5 is with my example
– Chapo
yesterday
I tried
grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE)
following your advice– Chapo
yesterday
I tried
grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE)
following your advice– Chapo
yesterday
What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in
grep
command, that is why I used it in the answer.– Wiktor Stribiżew
yesterday
What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in
grep
command, that is why I used it in the answer.– Wiktor Stribiżew
yesterday
I've put an input example in the link in previous comment : regexr.com/45ug5
– Chapo
yesterday
I've put an input example in the link in previous comment : regexr.com/45ug5
– Chapo
yesterday
|
show 4 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54070796%2fregexpr-syntax-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown