Remove re-occuring text strings [closed]

I am new to R and have searched the forum for almost 2 hours now without getting it to work for me.

My problem: I have a long text string scraped from internet. As I scraped code for images were included. The are coded in a way that they start with "Embed from Getty Images" and ends with "false })});n". I would like to remove everything in between those strings. I have tried gsub() as per:

AmericanTexts3 <- gsub("Embed.*})});n", "", AmericanTexts)

But what happens then is that they remove everything between the first picture and the last picture. Do anyone know how to solve this?

edited Nov 16 '18 at 8:24

sindri_baldur

8,3651033

asked Nov 16 '18 at 7:57

Victor

closed as off-topic by Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob Nov 16 '18 at 12:24

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob

If this question can be reworded to fit the rules in the help center, please edit the question.

add a comment |

I am new to R and have searched the forum for almost 2 hours now without getting it to work for me.

AmericanTexts3 <- gsub("Embed.*})});n", "", AmericanTexts)

But what happens then is that they remove everything between the first picture and the last picture. Do anyone know how to solve this?

edited Nov 16 '18 at 8:24

sindri_baldur

8,3651033

asked Nov 16 '18 at 7:57

Victor

closed as off-topic by Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob Nov 16 '18 at 12:24

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob

If this question can be reworded to fit the rules in the help center, please edit the question.

add a comment |

I am new to R and have searched the forum for almost 2 hours now without getting it to work for me.

AmericanTexts3 <- gsub("Embed.*})});n", "", AmericanTexts)

But what happens then is that they remove everything between the first picture and the last picture. Do anyone know how to solve this?

edited Nov 16 '18 at 8:24

sindri_baldur

8,3651033

asked Nov 16 '18 at 7:57

Victor

I am new to R and have searched the forum for almost 2 hours now without getting it to work for me.

AmericanTexts3 <- gsub("Embed.*})});n", "", AmericanTexts)

But what happens then is that they remove everything between the first picture and the last picture. Do anyone know how to solve this?

r regex

edited Nov 16 '18 at 8:24

sindri_baldur

8,3651033

asked Nov 16 '18 at 7:57

Victor

edited Nov 16 '18 at 8:24

sindri_baldur

8,3651033

asked Nov 16 '18 at 7:57

Victor

edited Nov 16 '18 at 8:24

sindri_baldur

8,3651033

edited Nov 16 '18 at 8:24

sindri_baldur

8,3651033

edited Nov 16 '18 at 8:24

sindri_baldur

8,3651033

asked Nov 16 '18 at 7:57

Victor

asked Nov 16 '18 at 7:57

Victor

asked Nov 16 '18 at 7:57

Victor

closed as off-topic by Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob Nov 16 '18 at 12:24

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob

If this question can be reworded to fit the rules in the help center, please edit the question.

closed as off-topic by Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob Nov 16 '18 at 12:24

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob

If this question can be reworded to fit the rules in the help center, please edit the question.

add a comment |

1 Answer
1

active

oldest

votes

You need to use a non-greedy regular expression.

Try

AmericanTexts3<-gsub("Embed.*?})});n","",AmericanTexts)

The ? matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.

answered Nov 16 '18 at 8:05

LAP

5,7902723

add a comment |

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You need to use a non-greedy regular expression.

Try

AmericanTexts3<-gsub("Embed.*?})});n","",AmericanTexts)

The ? matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.

answered Nov 16 '18 at 8:05

LAP

5,7902723

add a comment |

You need to use a non-greedy regular expression.

Try

AmericanTexts3<-gsub("Embed.*?})});n","",AmericanTexts)

The ? matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.

answered Nov 16 '18 at 8:05

LAP

5,7902723

add a comment |

You need to use a non-greedy regular expression.

Try

AmericanTexts3<-gsub("Embed.*?})});n","",AmericanTexts)

The ? matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.

answered Nov 16 '18 at 8:05

LAP

5,7902723

You need to use a non-greedy regular expression.

Try

AmericanTexts3<-gsub("Embed.*?})});n","",AmericanTexts)

The ? matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.

answered Nov 16 '18 at 8:05

LAP

5,7902723

answered Nov 16 '18 at 8:05

LAP

5,7902723

answered Nov 16 '18 at 8:05

LAP

5,7902723

answered Nov 16 '18 at 8:05

LAP

5,7902723

add a comment |

This page is only for reference, If you need detailed information, please check here

D 4bRfiMjr 860TDEqEEik3LMS8tYNAJHHc2fRy 2 liSOBBMEMD6eEkg yTk9,T0SIB1r 9

搜尋此網誌

Myujth