regexpr syntax in R

Going crazy here over syntax of regexpr in R

I am trying the following which should allow me to get everything between productUrl:// and the following ?

(?<="productUrl":"//)(.*?)(?=?)

The above works on https://regexr.com/

I am then trying to escape the backslashes to fit that string into the grep function but with no luck. What is the proper way of doing it ?

EDIT:

Added link to example

EDIT2: I actually need to extract the substrings that match my pattern so grep may be used in conjunction with another function.

edited yesterday

asked yesterday

Chapo

83111434

add a comment |

Going crazy here over syntax of regexpr in R

I am trying the following which should allow me to get everything between productUrl:// and the following ?

(?<="productUrl":"//)(.*?)(?=?)

The above works on https://regexr.com/

I am then trying to escape the backslashes to fit that string into the grep function but with no luck. What is the proper way of doing it ?

EDIT:

Added link to example

EDIT2: I actually need to extract the substrings that match my pattern so grep may be used in conjunction with another function.

edited yesterday

asked yesterday

Chapo

83111434

add a comment |

Going crazy here over syntax of regexpr in R

I am trying the following which should allow me to get everything between productUrl:// and the following ?

(?<="productUrl":"//)(.*?)(?=?)

The above works on https://regexr.com/

I am then trying to escape the backslashes to fit that string into the grep function but with no luck. What is the proper way of doing it ?

EDIT:

Added link to example

EDIT2: I actually need to extract the substrings that match my pattern so grep may be used in conjunction with another function.

edited yesterday

asked yesterday

Chapo

83111434

Going crazy here over syntax of regexpr in R

I am trying the following which should allow me to get everything between productUrl:// and the following ?

(?<="productUrl":"//)(.*?)(?=?)

The above works on https://regexr.com/

I am then trying to escape the backslashes to fit that string into the grep function but with no luck. What is the proper way of doing it ?

EDIT:

Added link to example

EDIT2: I actually need to extract the substrings that match my pattern so grep may be used in conjunction with another function.

r regex

edited yesterday

asked yesterday

Chapo

83111434

edited yesterday

asked yesterday

Chapo

83111434

edited yesterday

asked yesterday

Chapo

83111434

asked yesterday

Chapo

83111434

asked yesterday

Chapo

83111434

add a comment |

1 Answer
1

active

oldest

votes

Note you do not need to escape / in R regex patterns as they are defined with string literals and / is not a special regex metacharacter. If you want to write a " inside "..." string literal, you should escape it with a single , as you are already doing.

You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?) into a negated character class:

grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)

The [^?]* negated character class matches any 0 or more chars other than ?.

If the string you are checking against has no double quotes remove them from the lookbehind:

grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)

Instead of the lookbehind, you may also use K to omit the part of text matched:

grep('productUrl://\K[^?]*', x, perl=TRUE)

                   ^^^

Actually, you do not even need the capturing group in your pattern.

Solving the actual task

You cannot extract substrings with grep in R, you can only find/identify elements to fetch from a character vector using grep. To extract substrings, you need to use base R regmatches or stringr str_extract/str_extract_all or similar match functions.

Example with base R:

> x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":[{"name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":[{"domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0}],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"}],"restrictedAge":0,"categories":[1438,1565,4776,7305'

> regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))

[[1]]

[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

With stringr:

> library(stringr)

> str_extract_all(x, '(?<="productUrl":")[^?"]*')

[[1]]

[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

edited yesterday

answered yesterday

Wiktor Stribiżew

309k16127203

Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
– Chapo
yesterday

regexr.com/45ug5 is with my example
– Chapo
yesterday

I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice
– Chapo
yesterday

What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.
– Wiktor Stribiżew
yesterday

I've put an input example in the link in previous comment : regexr.com/45ug5
– Chapo
yesterday

|
show 4 more comments

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54070796%2fregexpr-syntax-in-r%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?) into a negated character class:

grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)

The [^?]* negated character class matches any 0 or more chars other than ?.

If the string you are checking against has no double quotes remove them from the lookbehind:

grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)

Instead of the lookbehind, you may also use K to omit the part of text matched:

grep('productUrl://\K[^?]*', x, perl=TRUE)

                   ^^^

Actually, you do not even need the capturing group in your pattern.

Solving the actual task

Example with base R:

> x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":[{"name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":[{"domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0}],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"}],"restrictedAge":0,"categories":[1438,1565,4776,7305'

> regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))

[[1]]

[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

With stringr:

> library(stringr)

> str_extract_all(x, '(?<="productUrl":")[^?"]*')

[[1]]

[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

edited yesterday

answered yesterday

Wiktor Stribiżew

309k16127203

Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
– Chapo
yesterday

regexr.com/45ug5 is with my example
– Chapo
yesterday

I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice
– Chapo
yesterday

What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.
– Wiktor Stribiżew
yesterday

I've put an input example in the link in previous comment : regexr.com/45ug5
– Chapo
yesterday

|
show 4 more comments

You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?) into a negated character class:

grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)

The [^?]* negated character class matches any 0 or more chars other than ?.

If the string you are checking against has no double quotes remove them from the lookbehind:

grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)

Instead of the lookbehind, you may also use K to omit the part of text matched:

grep('productUrl://\K[^?]*', x, perl=TRUE)

                   ^^^

Actually, you do not even need the capturing group in your pattern.

Solving the actual task

Example with base R:

> x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":[{"name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":[{"domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0}],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"}],"restrictedAge":0,"categories":[1438,1565,4776,7305'

> regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))

[[1]]

[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

With stringr:

> library(stringr)

> str_extract_all(x, '(?<="productUrl":")[^?"]*')

[[1]]

[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

edited yesterday

answered yesterday

Wiktor Stribiżew

309k16127203

Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
– Chapo
yesterday

regexr.com/45ug5 is with my example
– Chapo
yesterday

I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice
– Chapo
yesterday

What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.
– Wiktor Stribiżew
yesterday

I've put an input example in the link in previous comment : regexr.com/45ug5
– Chapo
yesterday

|
show 4 more comments

You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?) into a negated character class:

grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)

The [^?]* negated character class matches any 0 or more chars other than ?.

If the string you are checking against has no double quotes remove them from the lookbehind:

grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)

Instead of the lookbehind, you may also use K to omit the part of text matched:

grep('productUrl://\K[^?]*', x, perl=TRUE)

                   ^^^

Actually, you do not even need the capturing group in your pattern.

Solving the actual task

Example with base R:

> x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":[{"name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":[{"domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0}],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"}],"restrictedAge":0,"categories":[1438,1565,4776,7305'

> regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))

[[1]]

[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

With stringr:

> library(stringr)

> str_extract_all(x, '(?<="productUrl":")[^?"]*')

[[1]]

[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

edited yesterday

answered yesterday

Wiktor Stribiżew

309k16127203

You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=?) into a negated character class:

grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)

The [^?]* negated character class matches any 0 or more chars other than ?.

If the string you are checking against has no double quotes remove them from the lookbehind:

grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)

Instead of the lookbehind, you may also use K to omit the part of text matched:

grep('productUrl://\K[^?]*', x, perl=TRUE)

                   ^^^

Actually, you do not even need the capturing group in your pattern.

Solving the actual task

Example with base R:

> x <- '":"ppath","value":,"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":[{"name":"BRAND'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":[{"domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0}],n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"}],"restrictedAge":0,"categories":[1438,1565,4776,7305'

> regmatches(x, gregexpr('"productUrl":"\K[^?"]*', x, perl=TRUE))

[[1]]

[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

With stringr:

> library(stringr)

> str_extract_all(x, '(?<="productUrl":")[^?"]*')

[[1]]

[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"

edited yesterday

answered yesterday

Wiktor Stribiżew

309k16127203

edited yesterday

answered yesterday

Wiktor Stribiżew

309k16127203

answered yesterday

Wiktor Stribiżew

309k16127203

answered yesterday

Wiktor Stribiżew

309k16127203

Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
– Chapo
yesterday

regexr.com/45ug5 is with my example
– Chapo
yesterday

I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice
– Chapo
yesterday

What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.
– Wiktor Stribiżew
yesterday

I've put an input example in the link in previous comment : regexr.com/45ug5
– Chapo
yesterday

|
show 4 more comments

Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
– Chapo
yesterday

regexr.com/45ug5 is with my example
– Chapo
yesterday

I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice
– Chapo
yesterday

What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.
– Wiktor Stribiżew
yesterday

I've put an input example in the link in previous comment : regexr.com/45ug5
– Chapo
yesterday

Thanks for the answer but your solution doesn't yield the expected result unfortunately. I might have several instances of the pattern so I do need the capturing group no ?
– Chapo
yesterday

regexr.com/45ug5 is with my example
– Chapo
yesterday

I tried grep('(?<="productUrl":"//)[^?]*', a, perl=TRUE,value=TRUE) following your advice
– Chapo
yesterday

What is the input type? Is it a single string? Do you want to extract substrings (as matches) from it? You said you are trying to use it in grep command, that is why I used it in the answer.
– Wiktor Stribiżew
yesterday

I've put an input example in the link in previous comment : regexr.com/45ug5
– Chapo
yesterday

|
show 4 more comments

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Mfrhtyj