from html text links to txt file in python 2









up vote
-1
down vote

favorite












I need help with writing a script in python 2 only, which will take headlines from this page : https://lite.cnn.com/en , and save it in a text file line by line , like this :



"Trump, Macron gloss over differences in France after rough start 
Trump spars with Macron as Air Force One lands in France
Opinion: Which President Trump will show up in Paris?
Two leaders holding bilateral talks"
...


Please leave any suggestions you have. Thank you .










share|improve this question























  • why not use beautifulsoup ?
    – Redanium
    2 days ago















up vote
-1
down vote

favorite












I need help with writing a script in python 2 only, which will take headlines from this page : https://lite.cnn.com/en , and save it in a text file line by line , like this :



"Trump, Macron gloss over differences in France after rough start 
Trump spars with Macron as Air Force One lands in France
Opinion: Which President Trump will show up in Paris?
Two leaders holding bilateral talks"
...


Please leave any suggestions you have. Thank you .










share|improve this question























  • why not use beautifulsoup ?
    – Redanium
    2 days ago













up vote
-1
down vote

favorite









up vote
-1
down vote

favorite











I need help with writing a script in python 2 only, which will take headlines from this page : https://lite.cnn.com/en , and save it in a text file line by line , like this :



"Trump, Macron gloss over differences in France after rough start 
Trump spars with Macron as Air Force One lands in France
Opinion: Which President Trump will show up in Paris?
Two leaders holding bilateral talks"
...


Please leave any suggestions you have. Thank you .










share|improve this question















I need help with writing a script in python 2 only, which will take headlines from this page : https://lite.cnn.com/en , and save it in a text file line by line , like this :



"Trump, Macron gloss over differences in France after rough start 
Trump spars with Macron as Air Force One lands in France
Opinion: Which President Trump will show up in Paris?
Two leaders holding bilateral talks"
...


Please leave any suggestions you have. Thank you .







python-2.7






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 days ago









ACupOfBreadTea

195




195










asked 2 days ago









xanpx

45




45











  • why not use beautifulsoup ?
    – Redanium
    2 days ago

















  • why not use beautifulsoup ?
    – Redanium
    2 days ago
















why not use beautifulsoup ?
– Redanium
2 days ago





why not use beautifulsoup ?
– Redanium
2 days ago













2 Answers
2






active

oldest

votes

















up vote
0
down vote













There is some easy way i can read HTML but its read the source code of page :



import urllib2
for line in urllib2.urlopen("https://lite.cnn.com/en"):
file = open('testfile.txt','a')
file.write(line)
file.close()





share|improve this answer



























    up vote
    0
    down vote













    You can use beautifulSoup to do the job



    from bs4 import BeautifulSoup

    import requests

    url = "https://lite.cnn.com/en"
    r = requests.get(url)

    data = r.text
    #different parsers : "lxml", "html5lib", "xml" and "html.parser"
    soup = BeautifulSoup(data,"html.parser")
    file = open('testfile.txt','a')
    #loop thru our links
    for link in soup.select('li a'):
    file.write(link.text + "n")
    file.close()


    testfile.txt



    Whitaker's controversial prosecution of a gay Democrat
    Sessions realized too late that Whitaker was auditioning for his job
    Opinion: The other potential threat to Mueller's investigation
    How Kellyanne Conway's husband became an issue for President Trump
    Trump, Macron gloss over differences in France after rough start
    Trump spars with Macron as Air Force One lands in France
    Opinion: Which President Trump will show up in Paris?
    Trump's new aggression is forcing the world to change once again
    WSJ: Draft indictment detailed Trump's role in hush money scheme
    Raging infernos spread on both ends of California, killing 9 people
    Why the California fires are spreading so quickly
    Authorities believe gunman posted on Facebook around time of shooting, official says
    This California shooting victim's mom doesn't want your prayers
    What we know about the people killed in the Thousand Oaks shooting
    Will Thousand Oaks be the mass shooting that spurs change? Maybe not
    Must-watch videos of the week
    Settle in with these weekend reads
    How a night out turned into a night of horror at a bar in California
    When the dreaded 'other' is an angry white man
    How Democrats fought their way back to power in Washington
    Opinion: What we learned from WWI, the first "total war"
    How an eight-year-old American boy became a viral sensation in China
    Turkey gives recordings on Khashoggi's death to Saudis, US, Britain - Erdogan
    Democrats are in. Sessions is out. Here's what that means for immigration
    Why what's happening in Florida is a 'count' not a 'recount'
    Bill Nelson's campaign sues Florida secretary of state as vote count fight continues
    Scott's lawyer expects recount in FL Senate race
    No allegations of criminal activity in Florida election, law enforcement says
    Analysis: The question now facing Democrats: How to wake up the 'too woke to vote' crowd
    Washington Post: Michelle Obama says in memoir she'll 'never forgive' Trump for endangering her family
    How a century-old war affects you
    Toobin says 'racial dimension' to Trump's attacks on black female journalists
    Sri Lanka's President dissolves parliament and calls snap election amid political crisis
    Triple car bombings in Mogadishu kill at least 18 people, police say
    Snoop Dogg smokes a blunt in front of the White House
    New York parishioners are using the collection basket to ask embattled Catholic bishop to resign
    Trump trade adviser warns Wall Street 'globalists' over China
    Doctors share gun stories, demand action after NRA tells them to 'stay in their lane'
    Judge: 'We're approaching the end of reunification'
    Family apprehensions at southern border hit record monthly high
    Opinion: The President says he is keeping us safe. But at what cost?
    What happened this week (in anything but politics)
    5 tips for booking Thanksgiving flights
    Gobble up these Turkey Day destinations
    Thanksgiving in New York: Parade, dining and more
    Musician Lydia Lunch's fast friendship with Anthony Bourdain
    Mother sues facility after 10 children died in adenovirus outbreak
    Flash floods in Jordan kill at least 11
    US banks prepare for Iranian cyberattacks as retaliation for sanctions
    We need stronger cybersecurity laws for the Internet of Things
    The 'Year of the Woman' goes global
    How Hong Kong plans to replace 100,000 trees
    Ex-Goldman Sachs banker tied to 1MDB scandal blames bank's 'culture' in guilty plea
    Progressive backlash against Amazon HQ2 is growing. Here's why





    share|improve this answer






















    • thank you a lot for comment , its a problem for me to use soup i have problems with lxml parser
      – xanpx
      2 days ago










    • you can specify other parsers than lxml in the bs constructor .check updated answer
      – Redanium
      2 days ago










    • I got the output with error : UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. The code that caused this warning is on line 15 of the file news.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor. and what this output give me , i wish to get plaint text out put , its will be read by espack later
      – xanpx
      2 days ago











    • try default python built-in parser html.parser soup = BeautifulSoup(data,"html.parser")
      – Redanium
      2 days ago











    • ok , great its work error is gone , now next question how i get from this normal text ? i got error : UnicodeEncodeError: 'ascii' codec can't encode character u'xf1' in position 4: ordinal not in range(128)
      – xanpx
      2 days ago











    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53238262%2ffrom-html-text-links-to-txt-file-in-python-2%23new-answer', 'question_page');

    );

    Post as a guest






























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    There is some easy way i can read HTML but its read the source code of page :



    import urllib2
    for line in urllib2.urlopen("https://lite.cnn.com/en"):
    file = open('testfile.txt','a')
    file.write(line)
    file.close()





    share|improve this answer
























      up vote
      0
      down vote













      There is some easy way i can read HTML but its read the source code of page :



      import urllib2
      for line in urllib2.urlopen("https://lite.cnn.com/en"):
      file = open('testfile.txt','a')
      file.write(line)
      file.close()





      share|improve this answer






















        up vote
        0
        down vote










        up vote
        0
        down vote









        There is some easy way i can read HTML but its read the source code of page :



        import urllib2
        for line in urllib2.urlopen("https://lite.cnn.com/en"):
        file = open('testfile.txt','a')
        file.write(line)
        file.close()





        share|improve this answer












        There is some easy way i can read HTML but its read the source code of page :



        import urllib2
        for line in urllib2.urlopen("https://lite.cnn.com/en"):
        file = open('testfile.txt','a')
        file.write(line)
        file.close()






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 2 days ago









        xanpx

        45




        45






















            up vote
            0
            down vote













            You can use beautifulSoup to do the job



            from bs4 import BeautifulSoup

            import requests

            url = "https://lite.cnn.com/en"
            r = requests.get(url)

            data = r.text
            #different parsers : "lxml", "html5lib", "xml" and "html.parser"
            soup = BeautifulSoup(data,"html.parser")
            file = open('testfile.txt','a')
            #loop thru our links
            for link in soup.select('li a'):
            file.write(link.text + "n")
            file.close()


            testfile.txt



            Whitaker's controversial prosecution of a gay Democrat
            Sessions realized too late that Whitaker was auditioning for his job
            Opinion: The other potential threat to Mueller's investigation
            How Kellyanne Conway's husband became an issue for President Trump
            Trump, Macron gloss over differences in France after rough start
            Trump spars with Macron as Air Force One lands in France
            Opinion: Which President Trump will show up in Paris?
            Trump's new aggression is forcing the world to change once again
            WSJ: Draft indictment detailed Trump's role in hush money scheme
            Raging infernos spread on both ends of California, killing 9 people
            Why the California fires are spreading so quickly
            Authorities believe gunman posted on Facebook around time of shooting, official says
            This California shooting victim's mom doesn't want your prayers
            What we know about the people killed in the Thousand Oaks shooting
            Will Thousand Oaks be the mass shooting that spurs change? Maybe not
            Must-watch videos of the week
            Settle in with these weekend reads
            How a night out turned into a night of horror at a bar in California
            When the dreaded 'other' is an angry white man
            How Democrats fought their way back to power in Washington
            Opinion: What we learned from WWI, the first "total war"
            How an eight-year-old American boy became a viral sensation in China
            Turkey gives recordings on Khashoggi's death to Saudis, US, Britain - Erdogan
            Democrats are in. Sessions is out. Here's what that means for immigration
            Why what's happening in Florida is a 'count' not a 'recount'
            Bill Nelson's campaign sues Florida secretary of state as vote count fight continues
            Scott's lawyer expects recount in FL Senate race
            No allegations of criminal activity in Florida election, law enforcement says
            Analysis: The question now facing Democrats: How to wake up the 'too woke to vote' crowd
            Washington Post: Michelle Obama says in memoir she'll 'never forgive' Trump for endangering her family
            How a century-old war affects you
            Toobin says 'racial dimension' to Trump's attacks on black female journalists
            Sri Lanka's President dissolves parliament and calls snap election amid political crisis
            Triple car bombings in Mogadishu kill at least 18 people, police say
            Snoop Dogg smokes a blunt in front of the White House
            New York parishioners are using the collection basket to ask embattled Catholic bishop to resign
            Trump trade adviser warns Wall Street 'globalists' over China
            Doctors share gun stories, demand action after NRA tells them to 'stay in their lane'
            Judge: 'We're approaching the end of reunification'
            Family apprehensions at southern border hit record monthly high
            Opinion: The President says he is keeping us safe. But at what cost?
            What happened this week (in anything but politics)
            5 tips for booking Thanksgiving flights
            Gobble up these Turkey Day destinations
            Thanksgiving in New York: Parade, dining and more
            Musician Lydia Lunch's fast friendship with Anthony Bourdain
            Mother sues facility after 10 children died in adenovirus outbreak
            Flash floods in Jordan kill at least 11
            US banks prepare for Iranian cyberattacks as retaliation for sanctions
            We need stronger cybersecurity laws for the Internet of Things
            The 'Year of the Woman' goes global
            How Hong Kong plans to replace 100,000 trees
            Ex-Goldman Sachs banker tied to 1MDB scandal blames bank's 'culture' in guilty plea
            Progressive backlash against Amazon HQ2 is growing. Here's why





            share|improve this answer






















            • thank you a lot for comment , its a problem for me to use soup i have problems with lxml parser
              – xanpx
              2 days ago










            • you can specify other parsers than lxml in the bs constructor .check updated answer
              – Redanium
              2 days ago










            • I got the output with error : UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. The code that caused this warning is on line 15 of the file news.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor. and what this output give me , i wish to get plaint text out put , its will be read by espack later
              – xanpx
              2 days ago











            • try default python built-in parser html.parser soup = BeautifulSoup(data,"html.parser")
              – Redanium
              2 days ago











            • ok , great its work error is gone , now next question how i get from this normal text ? i got error : UnicodeEncodeError: 'ascii' codec can't encode character u'xf1' in position 4: ordinal not in range(128)
              – xanpx
              2 days ago















            up vote
            0
            down vote













            You can use beautifulSoup to do the job



            from bs4 import BeautifulSoup

            import requests

            url = "https://lite.cnn.com/en"
            r = requests.get(url)

            data = r.text
            #different parsers : "lxml", "html5lib", "xml" and "html.parser"
            soup = BeautifulSoup(data,"html.parser")
            file = open('testfile.txt','a')
            #loop thru our links
            for link in soup.select('li a'):
            file.write(link.text + "n")
            file.close()


            testfile.txt



            Whitaker's controversial prosecution of a gay Democrat
            Sessions realized too late that Whitaker was auditioning for his job
            Opinion: The other potential threat to Mueller's investigation
            How Kellyanne Conway's husband became an issue for President Trump
            Trump, Macron gloss over differences in France after rough start
            Trump spars with Macron as Air Force One lands in France
            Opinion: Which President Trump will show up in Paris?
            Trump's new aggression is forcing the world to change once again
            WSJ: Draft indictment detailed Trump's role in hush money scheme
            Raging infernos spread on both ends of California, killing 9 people
            Why the California fires are spreading so quickly
            Authorities believe gunman posted on Facebook around time of shooting, official says
            This California shooting victim's mom doesn't want your prayers
            What we know about the people killed in the Thousand Oaks shooting
            Will Thousand Oaks be the mass shooting that spurs change? Maybe not
            Must-watch videos of the week
            Settle in with these weekend reads
            How a night out turned into a night of horror at a bar in California
            When the dreaded 'other' is an angry white man
            How Democrats fought their way back to power in Washington
            Opinion: What we learned from WWI, the first "total war"
            How an eight-year-old American boy became a viral sensation in China
            Turkey gives recordings on Khashoggi's death to Saudis, US, Britain - Erdogan
            Democrats are in. Sessions is out. Here's what that means for immigration
            Why what's happening in Florida is a 'count' not a 'recount'
            Bill Nelson's campaign sues Florida secretary of state as vote count fight continues
            Scott's lawyer expects recount in FL Senate race
            No allegations of criminal activity in Florida election, law enforcement says
            Analysis: The question now facing Democrats: How to wake up the 'too woke to vote' crowd
            Washington Post: Michelle Obama says in memoir she'll 'never forgive' Trump for endangering her family
            How a century-old war affects you
            Toobin says 'racial dimension' to Trump's attacks on black female journalists
            Sri Lanka's President dissolves parliament and calls snap election amid political crisis
            Triple car bombings in Mogadishu kill at least 18 people, police say
            Snoop Dogg smokes a blunt in front of the White House
            New York parishioners are using the collection basket to ask embattled Catholic bishop to resign
            Trump trade adviser warns Wall Street 'globalists' over China
            Doctors share gun stories, demand action after NRA tells them to 'stay in their lane'
            Judge: 'We're approaching the end of reunification'
            Family apprehensions at southern border hit record monthly high
            Opinion: The President says he is keeping us safe. But at what cost?
            What happened this week (in anything but politics)
            5 tips for booking Thanksgiving flights
            Gobble up these Turkey Day destinations
            Thanksgiving in New York: Parade, dining and more
            Musician Lydia Lunch's fast friendship with Anthony Bourdain
            Mother sues facility after 10 children died in adenovirus outbreak
            Flash floods in Jordan kill at least 11
            US banks prepare for Iranian cyberattacks as retaliation for sanctions
            We need stronger cybersecurity laws for the Internet of Things
            The 'Year of the Woman' goes global
            How Hong Kong plans to replace 100,000 trees
            Ex-Goldman Sachs banker tied to 1MDB scandal blames bank's 'culture' in guilty plea
            Progressive backlash against Amazon HQ2 is growing. Here's why





            share|improve this answer






















            • thank you a lot for comment , its a problem for me to use soup i have problems with lxml parser
              – xanpx
              2 days ago










            • you can specify other parsers than lxml in the bs constructor .check updated answer
              – Redanium
              2 days ago










            • I got the output with error : UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. The code that caused this warning is on line 15 of the file news.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor. and what this output give me , i wish to get plaint text out put , its will be read by espack later
              – xanpx
              2 days ago











            • try default python built-in parser html.parser soup = BeautifulSoup(data,"html.parser")
              – Redanium
              2 days ago











            • ok , great its work error is gone , now next question how i get from this normal text ? i got error : UnicodeEncodeError: 'ascii' codec can't encode character u'xf1' in position 4: ordinal not in range(128)
              – xanpx
              2 days ago













            up vote
            0
            down vote










            up vote
            0
            down vote









            You can use beautifulSoup to do the job



            from bs4 import BeautifulSoup

            import requests

            url = "https://lite.cnn.com/en"
            r = requests.get(url)

            data = r.text
            #different parsers : "lxml", "html5lib", "xml" and "html.parser"
            soup = BeautifulSoup(data,"html.parser")
            file = open('testfile.txt','a')
            #loop thru our links
            for link in soup.select('li a'):
            file.write(link.text + "n")
            file.close()


            testfile.txt



            Whitaker's controversial prosecution of a gay Democrat
            Sessions realized too late that Whitaker was auditioning for his job
            Opinion: The other potential threat to Mueller's investigation
            How Kellyanne Conway's husband became an issue for President Trump
            Trump, Macron gloss over differences in France after rough start
            Trump spars with Macron as Air Force One lands in France
            Opinion: Which President Trump will show up in Paris?
            Trump's new aggression is forcing the world to change once again
            WSJ: Draft indictment detailed Trump's role in hush money scheme
            Raging infernos spread on both ends of California, killing 9 people
            Why the California fires are spreading so quickly
            Authorities believe gunman posted on Facebook around time of shooting, official says
            This California shooting victim's mom doesn't want your prayers
            What we know about the people killed in the Thousand Oaks shooting
            Will Thousand Oaks be the mass shooting that spurs change? Maybe not
            Must-watch videos of the week
            Settle in with these weekend reads
            How a night out turned into a night of horror at a bar in California
            When the dreaded 'other' is an angry white man
            How Democrats fought their way back to power in Washington
            Opinion: What we learned from WWI, the first "total war"
            How an eight-year-old American boy became a viral sensation in China
            Turkey gives recordings on Khashoggi's death to Saudis, US, Britain - Erdogan
            Democrats are in. Sessions is out. Here's what that means for immigration
            Why what's happening in Florida is a 'count' not a 'recount'
            Bill Nelson's campaign sues Florida secretary of state as vote count fight continues
            Scott's lawyer expects recount in FL Senate race
            No allegations of criminal activity in Florida election, law enforcement says
            Analysis: The question now facing Democrats: How to wake up the 'too woke to vote' crowd
            Washington Post: Michelle Obama says in memoir she'll 'never forgive' Trump for endangering her family
            How a century-old war affects you
            Toobin says 'racial dimension' to Trump's attacks on black female journalists
            Sri Lanka's President dissolves parliament and calls snap election amid political crisis
            Triple car bombings in Mogadishu kill at least 18 people, police say
            Snoop Dogg smokes a blunt in front of the White House
            New York parishioners are using the collection basket to ask embattled Catholic bishop to resign
            Trump trade adviser warns Wall Street 'globalists' over China
            Doctors share gun stories, demand action after NRA tells them to 'stay in their lane'
            Judge: 'We're approaching the end of reunification'
            Family apprehensions at southern border hit record monthly high
            Opinion: The President says he is keeping us safe. But at what cost?
            What happened this week (in anything but politics)
            5 tips for booking Thanksgiving flights
            Gobble up these Turkey Day destinations
            Thanksgiving in New York: Parade, dining and more
            Musician Lydia Lunch's fast friendship with Anthony Bourdain
            Mother sues facility after 10 children died in adenovirus outbreak
            Flash floods in Jordan kill at least 11
            US banks prepare for Iranian cyberattacks as retaliation for sanctions
            We need stronger cybersecurity laws for the Internet of Things
            The 'Year of the Woman' goes global
            How Hong Kong plans to replace 100,000 trees
            Ex-Goldman Sachs banker tied to 1MDB scandal blames bank's 'culture' in guilty plea
            Progressive backlash against Amazon HQ2 is growing. Here's why





            share|improve this answer














            You can use beautifulSoup to do the job



            from bs4 import BeautifulSoup

            import requests

            url = "https://lite.cnn.com/en"
            r = requests.get(url)

            data = r.text
            #different parsers : "lxml", "html5lib", "xml" and "html.parser"
            soup = BeautifulSoup(data,"html.parser")
            file = open('testfile.txt','a')
            #loop thru our links
            for link in soup.select('li a'):
            file.write(link.text + "n")
            file.close()


            testfile.txt



            Whitaker's controversial prosecution of a gay Democrat
            Sessions realized too late that Whitaker was auditioning for his job
            Opinion: The other potential threat to Mueller's investigation
            How Kellyanne Conway's husband became an issue for President Trump
            Trump, Macron gloss over differences in France after rough start
            Trump spars with Macron as Air Force One lands in France
            Opinion: Which President Trump will show up in Paris?
            Trump's new aggression is forcing the world to change once again
            WSJ: Draft indictment detailed Trump's role in hush money scheme
            Raging infernos spread on both ends of California, killing 9 people
            Why the California fires are spreading so quickly
            Authorities believe gunman posted on Facebook around time of shooting, official says
            This California shooting victim's mom doesn't want your prayers
            What we know about the people killed in the Thousand Oaks shooting
            Will Thousand Oaks be the mass shooting that spurs change? Maybe not
            Must-watch videos of the week
            Settle in with these weekend reads
            How a night out turned into a night of horror at a bar in California
            When the dreaded 'other' is an angry white man
            How Democrats fought their way back to power in Washington
            Opinion: What we learned from WWI, the first "total war"
            How an eight-year-old American boy became a viral sensation in China
            Turkey gives recordings on Khashoggi's death to Saudis, US, Britain - Erdogan
            Democrats are in. Sessions is out. Here's what that means for immigration
            Why what's happening in Florida is a 'count' not a 'recount'
            Bill Nelson's campaign sues Florida secretary of state as vote count fight continues
            Scott's lawyer expects recount in FL Senate race
            No allegations of criminal activity in Florida election, law enforcement says
            Analysis: The question now facing Democrats: How to wake up the 'too woke to vote' crowd
            Washington Post: Michelle Obama says in memoir she'll 'never forgive' Trump for endangering her family
            How a century-old war affects you
            Toobin says 'racial dimension' to Trump's attacks on black female journalists
            Sri Lanka's President dissolves parliament and calls snap election amid political crisis
            Triple car bombings in Mogadishu kill at least 18 people, police say
            Snoop Dogg smokes a blunt in front of the White House
            New York parishioners are using the collection basket to ask embattled Catholic bishop to resign
            Trump trade adviser warns Wall Street 'globalists' over China
            Doctors share gun stories, demand action after NRA tells them to 'stay in their lane'
            Judge: 'We're approaching the end of reunification'
            Family apprehensions at southern border hit record monthly high
            Opinion: The President says he is keeping us safe. But at what cost?
            What happened this week (in anything but politics)
            5 tips for booking Thanksgiving flights
            Gobble up these Turkey Day destinations
            Thanksgiving in New York: Parade, dining and more
            Musician Lydia Lunch's fast friendship with Anthony Bourdain
            Mother sues facility after 10 children died in adenovirus outbreak
            Flash floods in Jordan kill at least 11
            US banks prepare for Iranian cyberattacks as retaliation for sanctions
            We need stronger cybersecurity laws for the Internet of Things
            The 'Year of the Woman' goes global
            How Hong Kong plans to replace 100,000 trees
            Ex-Goldman Sachs banker tied to 1MDB scandal blames bank's 'culture' in guilty plea
            Progressive backlash against Amazon HQ2 is growing. Here's why






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 2 days ago

























            answered 2 days ago









            Redanium

            735413




            735413











            • thank you a lot for comment , its a problem for me to use soup i have problems with lxml parser
              – xanpx
              2 days ago










            • you can specify other parsers than lxml in the bs constructor .check updated answer
              – Redanium
              2 days ago










            • I got the output with error : UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. The code that caused this warning is on line 15 of the file news.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor. and what this output give me , i wish to get plaint text out put , its will be read by espack later
              – xanpx
              2 days ago











            • try default python built-in parser html.parser soup = BeautifulSoup(data,"html.parser")
              – Redanium
              2 days ago











            • ok , great its work error is gone , now next question how i get from this normal text ? i got error : UnicodeEncodeError: 'ascii' codec can't encode character u'xf1' in position 4: ordinal not in range(128)
              – xanpx
              2 days ago

















            • thank you a lot for comment , its a problem for me to use soup i have problems with lxml parser
              – xanpx
              2 days ago










            • you can specify other parsers than lxml in the bs constructor .check updated answer
              – Redanium
              2 days ago










            • I got the output with error : UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. The code that caused this warning is on line 15 of the file news.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor. and what this output give me , i wish to get plaint text out put , its will be read by espack later
              – xanpx
              2 days ago











            • try default python built-in parser html.parser soup = BeautifulSoup(data,"html.parser")
              – Redanium
              2 days ago











            • ok , great its work error is gone , now next question how i get from this normal text ? i got error : UnicodeEncodeError: 'ascii' codec can't encode character u'xf1' in position 4: ordinal not in range(128)
              – xanpx
              2 days ago
















            thank you a lot for comment , its a problem for me to use soup i have problems with lxml parser
            – xanpx
            2 days ago




            thank you a lot for comment , its a problem for me to use soup i have problems with lxml parser
            – xanpx
            2 days ago












            you can specify other parsers than lxml in the bs constructor .check updated answer
            – Redanium
            2 days ago




            you can specify other parsers than lxml in the bs constructor .check updated answer
            – Redanium
            2 days ago












            I got the output with error : UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. The code that caused this warning is on line 15 of the file news.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor. and what this output give me , i wish to get plaint text out put , its will be read by espack later
            – xanpx
            2 days ago





            I got the output with error : UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. The code that caused this warning is on line 15 of the file news.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor. and what this output give me , i wish to get plaint text out put , its will be read by espack later
            – xanpx
            2 days ago













            try default python built-in parser html.parser soup = BeautifulSoup(data,"html.parser")
            – Redanium
            2 days ago





            try default python built-in parser html.parser soup = BeautifulSoup(data,"html.parser")
            – Redanium
            2 days ago













            ok , great its work error is gone , now next question how i get from this normal text ? i got error : UnicodeEncodeError: 'ascii' codec can't encode character u'xf1' in position 4: ordinal not in range(128)
            – xanpx
            2 days ago





            ok , great its work error is gone , now next question how i get from this normal text ? i got error : UnicodeEncodeError: 'ascii' codec can't encode character u'xf1' in position 4: ordinal not in range(128)
            – xanpx
            2 days ago


















             

            draft saved


            draft discarded















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53238262%2ffrom-html-text-links-to-txt-file-in-python-2%23new-answer', 'question_page');

            );

            Post as a guest














































































            Popular posts from this blog

            Top Tejano songwriter Luis Silva dead of heart attack at 64

            政党

            天津地下鉄3号線