Remove re-occuring text strings [closed]










0















I am new to R and have searched the forum for almost 2 hours now without getting it to work for me.



My problem: I have a long text string scraped from internet. As I scraped code for images were included. The are coded in a way that they start with "Embed from Getty Images" and ends with "false })});n". I would like to remove everything in between those strings. I have tried gsub() as per:



AmericanTexts3 <- gsub("Embed.*})});n", "", AmericanTexts)


But what happens then is that they remove everything between the first picture and the last picture. Do anyone know how to solve this?










share|improve this question















closed as off-topic by Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob Nov 16 '18 at 12:24


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob
If this question can be reworded to fit the rules in the help center, please edit the question.




















    0















    I am new to R and have searched the forum for almost 2 hours now without getting it to work for me.



    My problem: I have a long text string scraped from internet. As I scraped code for images were included. The are coded in a way that they start with "Embed from Getty Images" and ends with "false })});n". I would like to remove everything in between those strings. I have tried gsub() as per:



    AmericanTexts3 <- gsub("Embed.*})});n", "", AmericanTexts)


    But what happens then is that they remove everything between the first picture and the last picture. Do anyone know how to solve this?










    share|improve this question















    closed as off-topic by Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob Nov 16 '18 at 12:24


    This question appears to be off-topic. The users who voted to close gave this specific reason:


    • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob
    If this question can be reworded to fit the rules in the help center, please edit the question.


















      0












      0








      0








      I am new to R and have searched the forum for almost 2 hours now without getting it to work for me.



      My problem: I have a long text string scraped from internet. As I scraped code for images were included. The are coded in a way that they start with "Embed from Getty Images" and ends with "false })});n". I would like to remove everything in between those strings. I have tried gsub() as per:



      AmericanTexts3 <- gsub("Embed.*})});n", "", AmericanTexts)


      But what happens then is that they remove everything between the first picture and the last picture. Do anyone know how to solve this?










      share|improve this question
















      I am new to R and have searched the forum for almost 2 hours now without getting it to work for me.



      My problem: I have a long text string scraped from internet. As I scraped code for images were included. The are coded in a way that they start with "Embed from Getty Images" and ends with "false })});n". I would like to remove everything in between those strings. I have tried gsub() as per:



      AmericanTexts3 <- gsub("Embed.*})});n", "", AmericanTexts)


      But what happens then is that they remove everything between the first picture and the last picture. Do anyone know how to solve this?







      r regex






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 16 '18 at 8:24









      sindri_baldur

      8,3651033




      8,3651033










      asked Nov 16 '18 at 7:57









      VictorVictor

      11




      11




      closed as off-topic by Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob Nov 16 '18 at 12:24


      This question appears to be off-topic. The users who voted to close gave this specific reason:


      • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob
      If this question can be reworded to fit the rules in the help center, please edit the question.







      closed as off-topic by Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob Nov 16 '18 at 12:24


      This question appears to be off-topic. The users who voted to close gave this specific reason:


      • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Henrik, Sven Hohenstein, Umair, Sahil Mahajan Mj, Rob
      If this question can be reworded to fit the rules in the help center, please edit the question.






















          1 Answer
          1






          active

          oldest

          votes


















          1














          You need to use a non-greedy regular expression.



          Try



          AmericanTexts3<-gsub("Embed.*?})});n","",AmericanTexts)


          The ? matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.






          share|improve this answer





























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            You need to use a non-greedy regular expression.



            Try



            AmericanTexts3<-gsub("Embed.*?})});n","",AmericanTexts)


            The ? matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.






            share|improve this answer



























              1














              You need to use a non-greedy regular expression.



              Try



              AmericanTexts3<-gsub("Embed.*?})});n","",AmericanTexts)


              The ? matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.






              share|improve this answer

























                1












                1








                1







                You need to use a non-greedy regular expression.



                Try



                AmericanTexts3<-gsub("Embed.*?})});n","",AmericanTexts)


                The ? matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.






                share|improve this answer













                You need to use a non-greedy regular expression.



                Try



                AmericanTexts3<-gsub("Embed.*?})});n","",AmericanTexts)


                The ? matches the first occurence of the second part of the regex, so that only the part between the matches should be removed.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 16 '18 at 8:05









                LAPLAP

                5,7902723




                5,7902723















                    Popular posts from this blog

                    Top Tejano songwriter Luis Silva dead of heart attack at 64

                    政党

                    天津地下鉄3号線