Remove re-occuring text strings [closed]
I am new to R and have searched the forum for almost 2 hours now without getting it to work for me.
My problem: I have a long text string scraped from internet. As I scraped code for images were included. The are coded in a way that they start with "Embed from Getty Images"
and ends with "false })});n"
. I would like to remove everything in between those strings. I have tried gsub()
as per:
AmericanTexts3 <- gsub("Embed.*})});n", "", AmericanTexts)
But what happens then is that they remove everything between the first picture and the last picture. Do anyone know how to solve this?
r regex
closed as off-topic by Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob Nov 16 '18 at 12:24
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob
add a comment |
I am new to R and have searched the forum for almost 2 hours now without getting it to work for me.
My problem: I have a long text string scraped from internet. As I scraped code for images were included. The are coded in a way that they start with "Embed from Getty Images"
and ends with "false })});n"
. I would like to remove everything in between those strings. I have tried gsub()
as per:
AmericanTexts3 <- gsub("Embed.*})});n", "", AmericanTexts)
But what happens then is that they remove everything between the first picture and the last picture. Do anyone know how to solve this?
r regex
closed as off-topic by Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob Nov 16 '18 at 12:24
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob
add a comment |
I am new to R and have searched the forum for almost 2 hours now without getting it to work for me.
My problem: I have a long text string scraped from internet. As I scraped code for images were included. The are coded in a way that they start with "Embed from Getty Images"
and ends with "false })});n"
. I would like to remove everything in between those strings. I have tried gsub()
as per:
AmericanTexts3 <- gsub("Embed.*})});n", "", AmericanTexts)
But what happens then is that they remove everything between the first picture and the last picture. Do anyone know how to solve this?
r regex
I am new to R and have searched the forum for almost 2 hours now without getting it to work for me.
My problem: I have a long text string scraped from internet. As I scraped code for images were included. The are coded in a way that they start with "Embed from Getty Images"
and ends with "false })});n"
. I would like to remove everything in between those strings. I have tried gsub()
as per:
AmericanTexts3 <- gsub("Embed.*})});n", "", AmericanTexts)
But what happens then is that they remove everything between the first picture and the last picture. Do anyone know how to solve this?
r regex
r regex
edited Nov 16 '18 at 8:24
sindri_baldur
8,3651033
8,3651033
asked Nov 16 '18 at 7:57
VictorVictor
11
11
closed as off-topic by Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob Nov 16 '18 at 12:24
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob
closed as off-topic by Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob Nov 16 '18 at 12:24
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
You need to use a non-greedy regular expression.
Try
AmericanTexts3<-gsub("Embed.*?})});n","",AmericanTexts)
The ?
matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You need to use a non-greedy regular expression.
Try
AmericanTexts3<-gsub("Embed.*?})});n","",AmericanTexts)
The ?
matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.
add a comment |
You need to use a non-greedy regular expression.
Try
AmericanTexts3<-gsub("Embed.*?})});n","",AmericanTexts)
The ?
matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.
add a comment |
You need to use a non-greedy regular expression.
Try
AmericanTexts3<-gsub("Embed.*?})});n","",AmericanTexts)
The ?
matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.
You need to use a non-greedy regular expression.
Try
AmericanTexts3<-gsub("Embed.*?})});n","",AmericanTexts)
The ?
matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.
answered Nov 16 '18 at 8:05
LAPLAP
5,7902723
5,7902723
add a comment |
add a comment |