from html text links to txt file in python 2
up vote
-1
down vote
favorite
I need help with writing a script in python 2 only, which will take headlines from this page : https://lite.cnn.com/en , and save it in a text file line by line , like this :
"Trump, Macron gloss over differences in France after rough start
Trump spars with Macron as Air Force One lands in France
Opinion: Which President Trump will show up in Paris?
Two leaders holding bilateral talks"
...
Please leave any suggestions you have. Thank you .
python-2.7
add a comment |
up vote
-1
down vote
favorite
I need help with writing a script in python 2 only, which will take headlines from this page : https://lite.cnn.com/en , and save it in a text file line by line , like this :
"Trump, Macron gloss over differences in France after rough start
Trump spars with Macron as Air Force One lands in France
Opinion: Which President Trump will show up in Paris?
Two leaders holding bilateral talks"
...
Please leave any suggestions you have. Thank you .
python-2.7
why not use beautifulsoup ?
– Redanium
2 days ago
add a comment |
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
I need help with writing a script in python 2 only, which will take headlines from this page : https://lite.cnn.com/en , and save it in a text file line by line , like this :
"Trump, Macron gloss over differences in France after rough start
Trump spars with Macron as Air Force One lands in France
Opinion: Which President Trump will show up in Paris?
Two leaders holding bilateral talks"
...
Please leave any suggestions you have. Thank you .
python-2.7
I need help with writing a script in python 2 only, which will take headlines from this page : https://lite.cnn.com/en , and save it in a text file line by line , like this :
"Trump, Macron gloss over differences in France after rough start
Trump spars with Macron as Air Force One lands in France
Opinion: Which President Trump will show up in Paris?
Two leaders holding bilateral talks"
...
Please leave any suggestions you have. Thank you .
python-2.7
python-2.7
edited 2 days ago
ACupOfBreadTea
195
195
asked 2 days ago
xanpx
45
45
why not use beautifulsoup ?
– Redanium
2 days ago
add a comment |
why not use beautifulsoup ?
– Redanium
2 days ago
why not use beautifulsoup ?
– Redanium
2 days ago
why not use beautifulsoup ?
– Redanium
2 days ago
add a comment |
2 Answers
2
active
oldest
votes
up vote
0
down vote
There is some easy way i can read HTML but its read the source code of page :
import urllib2
for line in urllib2.urlopen("https://lite.cnn.com/en"):
file = open('testfile.txt','a')
file.write(line)
file.close()
add a comment |
up vote
0
down vote
You can use beautifulSoup to do the job
from bs4 import BeautifulSoup
import requests
url = "https://lite.cnn.com/en"
r = requests.get(url)
data = r.text
#different parsers : "lxml", "html5lib", "xml" and "html.parser"
soup = BeautifulSoup(data,"html.parser")
file = open('testfile.txt','a')
#loop thru our links
for link in soup.select('li a'):
file.write(link.text + "n")
file.close()
testfile.txt
Whitaker's controversial prosecution of a gay Democrat
Sessions realized too late that Whitaker was auditioning for his job
Opinion: The other potential threat to Mueller's investigation
How Kellyanne Conway's husband became an issue for President Trump
Trump, Macron gloss over differences in France after rough start
Trump spars with Macron as Air Force One lands in France
Opinion: Which President Trump will show up in Paris?
Trump's new aggression is forcing the world to change once again
WSJ: Draft indictment detailed Trump's role in hush money scheme
Raging infernos spread on both ends of California, killing 9 people
Why the California fires are spreading so quickly
Authorities believe gunman posted on Facebook around time of shooting, official says
This California shooting victim's mom doesn't want your prayers
What we know about the people killed in the Thousand Oaks shooting
Will Thousand Oaks be the mass shooting that spurs change? Maybe not
Must-watch videos of the week
Settle in with these weekend reads
How a night out turned into a night of horror at a bar in California
When the dreaded 'other' is an angry white man
How Democrats fought their way back to power in Washington
Opinion: What we learned from WWI, the first "total war"
How an eight-year-old American boy became a viral sensation in China
Turkey gives recordings on Khashoggi's death to Saudis, US, Britain - Erdogan
Democrats are in. Sessions is out. Here's what that means for immigration
Why what's happening in Florida is a 'count' not a 'recount'
Bill Nelson's campaign sues Florida secretary of state as vote count fight continues
Scott's lawyer expects recount in FL Senate race
No allegations of criminal activity in Florida election, law enforcement says
Analysis: The question now facing Democrats: How to wake up the 'too woke to vote' crowd
Washington Post: Michelle Obama says in memoir she'll 'never forgive' Trump for endangering her family
How a century-old war affects you
Toobin says 'racial dimension' to Trump's attacks on black female journalists
Sri Lanka's President dissolves parliament and calls snap election amid political crisis
Triple car bombings in Mogadishu kill at least 18 people, police say
Snoop Dogg smokes a blunt in front of the White House
New York parishioners are using the collection basket to ask embattled Catholic bishop to resign
Trump trade adviser warns Wall Street 'globalists' over China
Doctors share gun stories, demand action after NRA tells them to 'stay in their lane'
Judge: 'We're approaching the end of reunification'
Family apprehensions at southern border hit record monthly high
Opinion: The President says he is keeping us safe. But at what cost?
What happened this week (in anything but politics)
5 tips for booking Thanksgiving flights
Gobble up these Turkey Day destinations
Thanksgiving in New York: Parade, dining and more
Musician Lydia Lunch's fast friendship with Anthony Bourdain
Mother sues facility after 10 children died in adenovirus outbreak
Flash floods in Jordan kill at least 11
US banks prepare for Iranian cyberattacks as retaliation for sanctions
We need stronger cybersecurity laws for the Internet of Things
The 'Year of the Woman' goes global
How Hong Kong plans to replace 100,000 trees
Ex-Goldman Sachs banker tied to 1MDB scandal blames bank's 'culture' in guilty plea
Progressive backlash against Amazon HQ2 is growing. Here's why
thank you a lot for comment , its a problem for me to use soup i have problems with lxml parser
– xanpx
2 days ago
you can specify other parsers thanlxml
in the bs constructor .check updated answer
– Redanium
2 days ago
I got the output with error : UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. The code that caused this warning is on line 15 of the file news.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor. and what this output give me , i wish to get plaint text out put , its will be read by espack later
– xanpx
2 days ago
try default python built-in parserhtml.parser
soup = BeautifulSoup(data,"html.parser")
– Redanium
2 days ago
ok , great its work error is gone , now next question how i get from this normal text ? i got error : UnicodeEncodeError: 'ascii' codec can't encode character u'xf1' in position 4: ordinal not in range(128)
– xanpx
2 days ago
|
show 6 more comments
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
There is some easy way i can read HTML but its read the source code of page :
import urllib2
for line in urllib2.urlopen("https://lite.cnn.com/en"):
file = open('testfile.txt','a')
file.write(line)
file.close()
add a comment |
up vote
0
down vote
There is some easy way i can read HTML but its read the source code of page :
import urllib2
for line in urllib2.urlopen("https://lite.cnn.com/en"):
file = open('testfile.txt','a')
file.write(line)
file.close()
add a comment |
up vote
0
down vote
up vote
0
down vote
There is some easy way i can read HTML but its read the source code of page :
import urllib2
for line in urllib2.urlopen("https://lite.cnn.com/en"):
file = open('testfile.txt','a')
file.write(line)
file.close()
There is some easy way i can read HTML but its read the source code of page :
import urllib2
for line in urllib2.urlopen("https://lite.cnn.com/en"):
file = open('testfile.txt','a')
file.write(line)
file.close()
answered 2 days ago
xanpx
45
45
add a comment |
add a comment |
up vote
0
down vote
You can use beautifulSoup to do the job
from bs4 import BeautifulSoup
import requests
url = "https://lite.cnn.com/en"
r = requests.get(url)
data = r.text
#different parsers : "lxml", "html5lib", "xml" and "html.parser"
soup = BeautifulSoup(data,"html.parser")
file = open('testfile.txt','a')
#loop thru our links
for link in soup.select('li a'):
file.write(link.text + "n")
file.close()
testfile.txt
Whitaker's controversial prosecution of a gay Democrat
Sessions realized too late that Whitaker was auditioning for his job
Opinion: The other potential threat to Mueller's investigation
How Kellyanne Conway's husband became an issue for President Trump
Trump, Macron gloss over differences in France after rough start
Trump spars with Macron as Air Force One lands in France
Opinion: Which President Trump will show up in Paris?
Trump's new aggression is forcing the world to change once again
WSJ: Draft indictment detailed Trump's role in hush money scheme
Raging infernos spread on both ends of California, killing 9 people
Why the California fires are spreading so quickly
Authorities believe gunman posted on Facebook around time of shooting, official says
This California shooting victim's mom doesn't want your prayers
What we know about the people killed in the Thousand Oaks shooting
Will Thousand Oaks be the mass shooting that spurs change? Maybe not
Must-watch videos of the week
Settle in with these weekend reads
How a night out turned into a night of horror at a bar in California
When the dreaded 'other' is an angry white man
How Democrats fought their way back to power in Washington
Opinion: What we learned from WWI, the first "total war"
How an eight-year-old American boy became a viral sensation in China
Turkey gives recordings on Khashoggi's death to Saudis, US, Britain - Erdogan
Democrats are in. Sessions is out. Here's what that means for immigration
Why what's happening in Florida is a 'count' not a 'recount'
Bill Nelson's campaign sues Florida secretary of state as vote count fight continues
Scott's lawyer expects recount in FL Senate race
No allegations of criminal activity in Florida election, law enforcement says
Analysis: The question now facing Democrats: How to wake up the 'too woke to vote' crowd
Washington Post: Michelle Obama says in memoir she'll 'never forgive' Trump for endangering her family
How a century-old war affects you
Toobin says 'racial dimension' to Trump's attacks on black female journalists
Sri Lanka's President dissolves parliament and calls snap election amid political crisis
Triple car bombings in Mogadishu kill at least 18 people, police say
Snoop Dogg smokes a blunt in front of the White House
New York parishioners are using the collection basket to ask embattled Catholic bishop to resign
Trump trade adviser warns Wall Street 'globalists' over China
Doctors share gun stories, demand action after NRA tells them to 'stay in their lane'
Judge: 'We're approaching the end of reunification'
Family apprehensions at southern border hit record monthly high
Opinion: The President says he is keeping us safe. But at what cost?
What happened this week (in anything but politics)
5 tips for booking Thanksgiving flights
Gobble up these Turkey Day destinations
Thanksgiving in New York: Parade, dining and more
Musician Lydia Lunch's fast friendship with Anthony Bourdain
Mother sues facility after 10 children died in adenovirus outbreak
Flash floods in Jordan kill at least 11
US banks prepare for Iranian cyberattacks as retaliation for sanctions
We need stronger cybersecurity laws for the Internet of Things
The 'Year of the Woman' goes global
How Hong Kong plans to replace 100,000 trees
Ex-Goldman Sachs banker tied to 1MDB scandal blames bank's 'culture' in guilty plea
Progressive backlash against Amazon HQ2 is growing. Here's why
thank you a lot for comment , its a problem for me to use soup i have problems with lxml parser
– xanpx
2 days ago
you can specify other parsers thanlxml
in the bs constructor .check updated answer
– Redanium
2 days ago
I got the output with error : UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. The code that caused this warning is on line 15 of the file news.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor. and what this output give me , i wish to get plaint text out put , its will be read by espack later
– xanpx
2 days ago
try default python built-in parserhtml.parser
soup = BeautifulSoup(data,"html.parser")
– Redanium
2 days ago
ok , great its work error is gone , now next question how i get from this normal text ? i got error : UnicodeEncodeError: 'ascii' codec can't encode character u'xf1' in position 4: ordinal not in range(128)
– xanpx
2 days ago
|
show 6 more comments
up vote
0
down vote
You can use beautifulSoup to do the job
from bs4 import BeautifulSoup
import requests
url = "https://lite.cnn.com/en"
r = requests.get(url)
data = r.text
#different parsers : "lxml", "html5lib", "xml" and "html.parser"
soup = BeautifulSoup(data,"html.parser")
file = open('testfile.txt','a')
#loop thru our links
for link in soup.select('li a'):
file.write(link.text + "n")
file.close()
testfile.txt
Whitaker's controversial prosecution of a gay Democrat
Sessions realized too late that Whitaker was auditioning for his job
Opinion: The other potential threat to Mueller's investigation
How Kellyanne Conway's husband became an issue for President Trump
Trump, Macron gloss over differences in France after rough start
Trump spars with Macron as Air Force One lands in France
Opinion: Which President Trump will show up in Paris?
Trump's new aggression is forcing the world to change once again
WSJ: Draft indictment detailed Trump's role in hush money scheme
Raging infernos spread on both ends of California, killing 9 people
Why the California fires are spreading so quickly
Authorities believe gunman posted on Facebook around time of shooting, official says
This California shooting victim's mom doesn't want your prayers
What we know about the people killed in the Thousand Oaks shooting
Will Thousand Oaks be the mass shooting that spurs change? Maybe not
Must-watch videos of the week
Settle in with these weekend reads
How a night out turned into a night of horror at a bar in California
When the dreaded 'other' is an angry white man
How Democrats fought their way back to power in Washington
Opinion: What we learned from WWI, the first "total war"
How an eight-year-old American boy became a viral sensation in China
Turkey gives recordings on Khashoggi's death to Saudis, US, Britain - Erdogan
Democrats are in. Sessions is out. Here's what that means for immigration
Why what's happening in Florida is a 'count' not a 'recount'
Bill Nelson's campaign sues Florida secretary of state as vote count fight continues
Scott's lawyer expects recount in FL Senate race
No allegations of criminal activity in Florida election, law enforcement says
Analysis: The question now facing Democrats: How to wake up the 'too woke to vote' crowd
Washington Post: Michelle Obama says in memoir she'll 'never forgive' Trump for endangering her family
How a century-old war affects you
Toobin says 'racial dimension' to Trump's attacks on black female journalists
Sri Lanka's President dissolves parliament and calls snap election amid political crisis
Triple car bombings in Mogadishu kill at least 18 people, police say
Snoop Dogg smokes a blunt in front of the White House
New York parishioners are using the collection basket to ask embattled Catholic bishop to resign
Trump trade adviser warns Wall Street 'globalists' over China
Doctors share gun stories, demand action after NRA tells them to 'stay in their lane'
Judge: 'We're approaching the end of reunification'
Family apprehensions at southern border hit record monthly high
Opinion: The President says he is keeping us safe. But at what cost?
What happened this week (in anything but politics)
5 tips for booking Thanksgiving flights
Gobble up these Turkey Day destinations
Thanksgiving in New York: Parade, dining and more
Musician Lydia Lunch's fast friendship with Anthony Bourdain
Mother sues facility after 10 children died in adenovirus outbreak
Flash floods in Jordan kill at least 11
US banks prepare for Iranian cyberattacks as retaliation for sanctions
We need stronger cybersecurity laws for the Internet of Things
The 'Year of the Woman' goes global
How Hong Kong plans to replace 100,000 trees
Ex-Goldman Sachs banker tied to 1MDB scandal blames bank's 'culture' in guilty plea
Progressive backlash against Amazon HQ2 is growing. Here's why
thank you a lot for comment , its a problem for me to use soup i have problems with lxml parser
– xanpx
2 days ago
you can specify other parsers thanlxml
in the bs constructor .check updated answer
– Redanium
2 days ago
I got the output with error : UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. The code that caused this warning is on line 15 of the file news.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor. and what this output give me , i wish to get plaint text out put , its will be read by espack later
– xanpx
2 days ago
try default python built-in parserhtml.parser
soup = BeautifulSoup(data,"html.parser")
– Redanium
2 days ago
ok , great its work error is gone , now next question how i get from this normal text ? i got error : UnicodeEncodeError: 'ascii' codec can't encode character u'xf1' in position 4: ordinal not in range(128)
– xanpx
2 days ago
|
show 6 more comments
up vote
0
down vote
up vote
0
down vote
You can use beautifulSoup to do the job
from bs4 import BeautifulSoup
import requests
url = "https://lite.cnn.com/en"
r = requests.get(url)
data = r.text
#different parsers : "lxml", "html5lib", "xml" and "html.parser"
soup = BeautifulSoup(data,"html.parser")
file = open('testfile.txt','a')
#loop thru our links
for link in soup.select('li a'):
file.write(link.text + "n")
file.close()
testfile.txt
Whitaker's controversial prosecution of a gay Democrat
Sessions realized too late that Whitaker was auditioning for his job
Opinion: The other potential threat to Mueller's investigation
How Kellyanne Conway's husband became an issue for President Trump
Trump, Macron gloss over differences in France after rough start
Trump spars with Macron as Air Force One lands in France
Opinion: Which President Trump will show up in Paris?
Trump's new aggression is forcing the world to change once again
WSJ: Draft indictment detailed Trump's role in hush money scheme
Raging infernos spread on both ends of California, killing 9 people
Why the California fires are spreading so quickly
Authorities believe gunman posted on Facebook around time of shooting, official says
This California shooting victim's mom doesn't want your prayers
What we know about the people killed in the Thousand Oaks shooting
Will Thousand Oaks be the mass shooting that spurs change? Maybe not
Must-watch videos of the week
Settle in with these weekend reads
How a night out turned into a night of horror at a bar in California
When the dreaded 'other' is an angry white man
How Democrats fought their way back to power in Washington
Opinion: What we learned from WWI, the first "total war"
How an eight-year-old American boy became a viral sensation in China
Turkey gives recordings on Khashoggi's death to Saudis, US, Britain - Erdogan
Democrats are in. Sessions is out. Here's what that means for immigration
Why what's happening in Florida is a 'count' not a 'recount'
Bill Nelson's campaign sues Florida secretary of state as vote count fight continues
Scott's lawyer expects recount in FL Senate race
No allegations of criminal activity in Florida election, law enforcement says
Analysis: The question now facing Democrats: How to wake up the 'too woke to vote' crowd
Washington Post: Michelle Obama says in memoir she'll 'never forgive' Trump for endangering her family
How a century-old war affects you
Toobin says 'racial dimension' to Trump's attacks on black female journalists
Sri Lanka's President dissolves parliament and calls snap election amid political crisis
Triple car bombings in Mogadishu kill at least 18 people, police say
Snoop Dogg smokes a blunt in front of the White House
New York parishioners are using the collection basket to ask embattled Catholic bishop to resign
Trump trade adviser warns Wall Street 'globalists' over China
Doctors share gun stories, demand action after NRA tells them to 'stay in their lane'
Judge: 'We're approaching the end of reunification'
Family apprehensions at southern border hit record monthly high
Opinion: The President says he is keeping us safe. But at what cost?
What happened this week (in anything but politics)
5 tips for booking Thanksgiving flights
Gobble up these Turkey Day destinations
Thanksgiving in New York: Parade, dining and more
Musician Lydia Lunch's fast friendship with Anthony Bourdain
Mother sues facility after 10 children died in adenovirus outbreak
Flash floods in Jordan kill at least 11
US banks prepare for Iranian cyberattacks as retaliation for sanctions
We need stronger cybersecurity laws for the Internet of Things
The 'Year of the Woman' goes global
How Hong Kong plans to replace 100,000 trees
Ex-Goldman Sachs banker tied to 1MDB scandal blames bank's 'culture' in guilty plea
Progressive backlash against Amazon HQ2 is growing. Here's why
You can use beautifulSoup to do the job
from bs4 import BeautifulSoup
import requests
url = "https://lite.cnn.com/en"
r = requests.get(url)
data = r.text
#different parsers : "lxml", "html5lib", "xml" and "html.parser"
soup = BeautifulSoup(data,"html.parser")
file = open('testfile.txt','a')
#loop thru our links
for link in soup.select('li a'):
file.write(link.text + "n")
file.close()
testfile.txt
Whitaker's controversial prosecution of a gay Democrat
Sessions realized too late that Whitaker was auditioning for his job
Opinion: The other potential threat to Mueller's investigation
How Kellyanne Conway's husband became an issue for President Trump
Trump, Macron gloss over differences in France after rough start
Trump spars with Macron as Air Force One lands in France
Opinion: Which President Trump will show up in Paris?
Trump's new aggression is forcing the world to change once again
WSJ: Draft indictment detailed Trump's role in hush money scheme
Raging infernos spread on both ends of California, killing 9 people
Why the California fires are spreading so quickly
Authorities believe gunman posted on Facebook around time of shooting, official says
This California shooting victim's mom doesn't want your prayers
What we know about the people killed in the Thousand Oaks shooting
Will Thousand Oaks be the mass shooting that spurs change? Maybe not
Must-watch videos of the week
Settle in with these weekend reads
How a night out turned into a night of horror at a bar in California
When the dreaded 'other' is an angry white man
How Democrats fought their way back to power in Washington
Opinion: What we learned from WWI, the first "total war"
How an eight-year-old American boy became a viral sensation in China
Turkey gives recordings on Khashoggi's death to Saudis, US, Britain - Erdogan
Democrats are in. Sessions is out. Here's what that means for immigration
Why what's happening in Florida is a 'count' not a 'recount'
Bill Nelson's campaign sues Florida secretary of state as vote count fight continues
Scott's lawyer expects recount in FL Senate race
No allegations of criminal activity in Florida election, law enforcement says
Analysis: The question now facing Democrats: How to wake up the 'too woke to vote' crowd
Washington Post: Michelle Obama says in memoir she'll 'never forgive' Trump for endangering her family
How a century-old war affects you
Toobin says 'racial dimension' to Trump's attacks on black female journalists
Sri Lanka's President dissolves parliament and calls snap election amid political crisis
Triple car bombings in Mogadishu kill at least 18 people, police say
Snoop Dogg smokes a blunt in front of the White House
New York parishioners are using the collection basket to ask embattled Catholic bishop to resign
Trump trade adviser warns Wall Street 'globalists' over China
Doctors share gun stories, demand action after NRA tells them to 'stay in their lane'
Judge: 'We're approaching the end of reunification'
Family apprehensions at southern border hit record monthly high
Opinion: The President says he is keeping us safe. But at what cost?
What happened this week (in anything but politics)
5 tips for booking Thanksgiving flights
Gobble up these Turkey Day destinations
Thanksgiving in New York: Parade, dining and more
Musician Lydia Lunch's fast friendship with Anthony Bourdain
Mother sues facility after 10 children died in adenovirus outbreak
Flash floods in Jordan kill at least 11
US banks prepare for Iranian cyberattacks as retaliation for sanctions
We need stronger cybersecurity laws for the Internet of Things
The 'Year of the Woman' goes global
How Hong Kong plans to replace 100,000 trees
Ex-Goldman Sachs banker tied to 1MDB scandal blames bank's 'culture' in guilty plea
Progressive backlash against Amazon HQ2 is growing. Here's why
edited 2 days ago
answered 2 days ago
Redanium
735413
735413
thank you a lot for comment , its a problem for me to use soup i have problems with lxml parser
– xanpx
2 days ago
you can specify other parsers thanlxml
in the bs constructor .check updated answer
– Redanium
2 days ago
I got the output with error : UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. The code that caused this warning is on line 15 of the file news.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor. and what this output give me , i wish to get plaint text out put , its will be read by espack later
– xanpx
2 days ago
try default python built-in parserhtml.parser
soup = BeautifulSoup(data,"html.parser")
– Redanium
2 days ago
ok , great its work error is gone , now next question how i get from this normal text ? i got error : UnicodeEncodeError: 'ascii' codec can't encode character u'xf1' in position 4: ordinal not in range(128)
– xanpx
2 days ago
|
show 6 more comments
thank you a lot for comment , its a problem for me to use soup i have problems with lxml parser
– xanpx
2 days ago
you can specify other parsers thanlxml
in the bs constructor .check updated answer
– Redanium
2 days ago
I got the output with error : UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. The code that caused this warning is on line 15 of the file news.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor. and what this output give me , i wish to get plaint text out put , its will be read by espack later
– xanpx
2 days ago
try default python built-in parserhtml.parser
soup = BeautifulSoup(data,"html.parser")
– Redanium
2 days ago
ok , great its work error is gone , now next question how i get from this normal text ? i got error : UnicodeEncodeError: 'ascii' codec can't encode character u'xf1' in position 4: ordinal not in range(128)
– xanpx
2 days ago
thank you a lot for comment , its a problem for me to use soup i have problems with lxml parser
– xanpx
2 days ago
thank you a lot for comment , its a problem for me to use soup i have problems with lxml parser
– xanpx
2 days ago
you can specify other parsers than
lxml
in the bs constructor .check updated answer– Redanium
2 days ago
you can specify other parsers than
lxml
in the bs constructor .check updated answer– Redanium
2 days ago
I got the output with error : UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. The code that caused this warning is on line 15 of the file news.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor. and what this output give me , i wish to get plaint text out put , its will be read by espack later
– xanpx
2 days ago
I got the output with error : UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. The code that caused this warning is on line 15 of the file news.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor. and what this output give me , i wish to get plaint text out put , its will be read by espack later
– xanpx
2 days ago
try default python built-in parser
html.parser
soup = BeautifulSoup(data,"html.parser")
– Redanium
2 days ago
try default python built-in parser
html.parser
soup = BeautifulSoup(data,"html.parser")
– Redanium
2 days ago
ok , great its work error is gone , now next question how i get from this normal text ? i got error : UnicodeEncodeError: 'ascii' codec can't encode character u'xf1' in position 4: ordinal not in range(128)
– xanpx
2 days ago
ok , great its work error is gone , now next question how i get from this normal text ? i got error : UnicodeEncodeError: 'ascii' codec can't encode character u'xf1' in position 4: ordinal not in range(128)
– xanpx
2 days ago
|
show 6 more comments
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53238262%2ffrom-html-text-links-to-txt-file-in-python-2%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
why not use beautifulsoup ?
– Redanium
2 days ago