Python requests. How to stay logged in?
I am new to web scarping and would like to learn how to do it properly and politely. My problem is similar to this.
'So I am trying to log into and navigate to a page using python and requests. I'm pretty sure I am getting logged in, but once I try to navigate to a page the HTML I print from that page states you must be logged in to see this page.'
I've checked robots.txt of the website I would like to scrape. Is there something which prevents me from scraping?
User-agent: *
Disallow: /caching/
Disallow: /admin3003/
Disallow: /admin5573/
Disallow: /members/
Disallow: /pp/
Disallow: /subdomains/
Disallow: /tags/
Disallow: /templates/
Disallow: /bin/
Disallow: /emails/
My code with the solution from the link above which does not work for me:
import requests
from bs4 import BeautifulSoup
login_page = <login url>
link = <required url>
payload =
“username” = <some username>,
“password” = <some password>
p = requests.post(login_page, data=payload)
cookies = p.cookies
page_response = requests.get(link, cookies=cookies)
page_content = BeautifulSoup(page_response.content, "html.parser")
RequestsCookieJar shows Cookie ASP.NET_SessionId=1adqylnfxbqf5n45p0ooy345 for WEBSITE (with p.cookies command)
Output of p.status_code : 200
UPDATE:
s = requests.session()
doesn't solve my problem. I had tried this before I started looking into cookies.
Update 2:
I am trying to collect news from a particular web site. First I filtered news with a search word and saved links appeared on the first page with python requests + beautifulsoup. Now I would like to go through the links and extract news from them. The full text is possible to see with credentials only. There is no special login window and it's possible to log in via any page. There is a login button and when one move a mouse to that a login window appears as in attached image. I tried to login in both via the main page and via the page from which I would like to extract a text (not at the same time, but in different trials). None of this works.
I also tried to find csrf token by searching for “csrf_token”, “authentication_token”, “csrfmiddlewaretoken”, :csrf", "auth". Nothing was found in html on the web pages.Image
python web-scraping beautifulsoup python-requests robots.txt
add a comment |
I am new to web scarping and would like to learn how to do it properly and politely. My problem is similar to this.
'So I am trying to log into and navigate to a page using python and requests. I'm pretty sure I am getting logged in, but once I try to navigate to a page the HTML I print from that page states you must be logged in to see this page.'
I've checked robots.txt of the website I would like to scrape. Is there something which prevents me from scraping?
User-agent: *
Disallow: /caching/
Disallow: /admin3003/
Disallow: /admin5573/
Disallow: /members/
Disallow: /pp/
Disallow: /subdomains/
Disallow: /tags/
Disallow: /templates/
Disallow: /bin/
Disallow: /emails/
My code with the solution from the link above which does not work for me:
import requests
from bs4 import BeautifulSoup
login_page = <login url>
link = <required url>
payload =
“username” = <some username>,
“password” = <some password>
p = requests.post(login_page, data=payload)
cookies = p.cookies
page_response = requests.get(link, cookies=cookies)
page_content = BeautifulSoup(page_response.content, "html.parser")
RequestsCookieJar shows Cookie ASP.NET_SessionId=1adqylnfxbqf5n45p0ooy345 for WEBSITE (with p.cookies command)
Output of p.status_code : 200
UPDATE:
s = requests.session()
doesn't solve my problem. I had tried this before I started looking into cookies.
Update 2:
I am trying to collect news from a particular web site. First I filtered news with a search word and saved links appeared on the first page with python requests + beautifulsoup. Now I would like to go through the links and extract news from them. The full text is possible to see with credentials only. There is no special login window and it's possible to log in via any page. There is a login button and when one move a mouse to that a login window appears as in attached image. I tried to login in both via the main page and via the page from which I would like to extract a text (not at the same time, but in different trials). None of this works.
I also tried to find csrf token by searching for “csrf_token”, “authentication_token”, “csrfmiddlewaretoken”, :csrf", "auth". Nothing was found in html on the web pages.Image
python web-scraping beautifulsoup python-requests robots.txt
you need to usesession
, check here stackoverflow.com/questions/12737740/…
– Stack
Nov 13 '18 at 16:40
1
Possible duplicate of Python Requests and persistent sessions
– Antwane
Nov 13 '18 at 16:51
add a comment |
I am new to web scarping and would like to learn how to do it properly and politely. My problem is similar to this.
'So I am trying to log into and navigate to a page using python and requests. I'm pretty sure I am getting logged in, but once I try to navigate to a page the HTML I print from that page states you must be logged in to see this page.'
I've checked robots.txt of the website I would like to scrape. Is there something which prevents me from scraping?
User-agent: *
Disallow: /caching/
Disallow: /admin3003/
Disallow: /admin5573/
Disallow: /members/
Disallow: /pp/
Disallow: /subdomains/
Disallow: /tags/
Disallow: /templates/
Disallow: /bin/
Disallow: /emails/
My code with the solution from the link above which does not work for me:
import requests
from bs4 import BeautifulSoup
login_page = <login url>
link = <required url>
payload =
“username” = <some username>,
“password” = <some password>
p = requests.post(login_page, data=payload)
cookies = p.cookies
page_response = requests.get(link, cookies=cookies)
page_content = BeautifulSoup(page_response.content, "html.parser")
RequestsCookieJar shows Cookie ASP.NET_SessionId=1adqylnfxbqf5n45p0ooy345 for WEBSITE (with p.cookies command)
Output of p.status_code : 200
UPDATE:
s = requests.session()
doesn't solve my problem. I had tried this before I started looking into cookies.
Update 2:
I am trying to collect news from a particular web site. First I filtered news with a search word and saved links appeared on the first page with python requests + beautifulsoup. Now I would like to go through the links and extract news from them. The full text is possible to see with credentials only. There is no special login window and it's possible to log in via any page. There is a login button and when one move a mouse to that a login window appears as in attached image. I tried to login in both via the main page and via the page from which I would like to extract a text (not at the same time, but in different trials). None of this works.
I also tried to find csrf token by searching for “csrf_token”, “authentication_token”, “csrfmiddlewaretoken”, :csrf", "auth". Nothing was found in html on the web pages.Image
python web-scraping beautifulsoup python-requests robots.txt
I am new to web scarping and would like to learn how to do it properly and politely. My problem is similar to this.
'So I am trying to log into and navigate to a page using python and requests. I'm pretty sure I am getting logged in, but once I try to navigate to a page the HTML I print from that page states you must be logged in to see this page.'
I've checked robots.txt of the website I would like to scrape. Is there something which prevents me from scraping?
User-agent: *
Disallow: /caching/
Disallow: /admin3003/
Disallow: /admin5573/
Disallow: /members/
Disallow: /pp/
Disallow: /subdomains/
Disallow: /tags/
Disallow: /templates/
Disallow: /bin/
Disallow: /emails/
My code with the solution from the link above which does not work for me:
import requests
from bs4 import BeautifulSoup
login_page = <login url>
link = <required url>
payload =
“username” = <some username>,
“password” = <some password>
p = requests.post(login_page, data=payload)
cookies = p.cookies
page_response = requests.get(link, cookies=cookies)
page_content = BeautifulSoup(page_response.content, "html.parser")
RequestsCookieJar shows Cookie ASP.NET_SessionId=1adqylnfxbqf5n45p0ooy345 for WEBSITE (with p.cookies command)
Output of p.status_code : 200
UPDATE:
s = requests.session()
doesn't solve my problem. I had tried this before I started looking into cookies.
Update 2:
I am trying to collect news from a particular web site. First I filtered news with a search word and saved links appeared on the first page with python requests + beautifulsoup. Now I would like to go through the links and extract news from them. The full text is possible to see with credentials only. There is no special login window and it's possible to log in via any page. There is a login button and when one move a mouse to that a login window appears as in attached image. I tried to login in both via the main page and via the page from which I would like to extract a text (not at the same time, but in different trials). None of this works.
I also tried to find csrf token by searching for “csrf_token”, “authentication_token”, “csrfmiddlewaretoken”, :csrf", "auth". Nothing was found in html on the web pages.Image
python web-scraping beautifulsoup python-requests robots.txt
python web-scraping beautifulsoup python-requests robots.txt
edited Nov 14 '18 at 10:24
Mirit
asked Nov 13 '18 at 16:30
MiritMirit
62
62
you need to usesession
, check here stackoverflow.com/questions/12737740/…
– Stack
Nov 13 '18 at 16:40
1
Possible duplicate of Python Requests and persistent sessions
– Antwane
Nov 13 '18 at 16:51
add a comment |
you need to usesession
, check here stackoverflow.com/questions/12737740/…
– Stack
Nov 13 '18 at 16:40
1
Possible duplicate of Python Requests and persistent sessions
– Antwane
Nov 13 '18 at 16:51
you need to use
session
, check here stackoverflow.com/questions/12737740/…– Stack
Nov 13 '18 at 16:40
you need to use
session
, check here stackoverflow.com/questions/12737740/…– Stack
Nov 13 '18 at 16:40
1
1
Possible duplicate of Python Requests and persistent sessions
– Antwane
Nov 13 '18 at 16:51
Possible duplicate of Python Requests and persistent sessions
– Antwane
Nov 13 '18 at 16:51
add a comment |
1 Answer
1
active
oldest
votes
You can use requests.Session() to stay logged in but you have to save the cookie for the login as a json file. The example below shows a scrapping code that saves login session to facebook as a cookie in json format;
import selenium
import mechanicalsoup
import json
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import requests
import time
s = requests.Session()
email = raw_input("Enter your facebook login username/email: ")
password = raw_input("Enter your facebook password: ")
def get_driver():
driver = webdriver.Chrome(executable_path = 'your_path_to_chrome_driver')
driver.wait = WebDriverWait(driver, 3)
return driver
def get_url_cookie(driver):
dirver.get('https://facebook.com')
dirver.find_element_by_name('email').send_keys(email)
driver.find_element_by_name('pass').send_keys(password)
driver.find_element_by_id('loginbutton').click()
cookies_list= driver.get_cookies()
script = open('facebook_cookie.json','w')
json.dump(cookies_list,script)
driver = get_driver()
get_url_cookie(driver)
The code above gets you the login session cookie using the driver.get_cookies() and saves it as a json file. To use the cookie, just load it using;
with open('facebook_cookie.json') as c:
load = json.load(c)
for cookie in load:
s.cookie.set(cookie['name'],cookie['value'])
url = 'facebook.com/the_url_you_want_to_visit_on_facebook'
browser= mechanicalsoup.StatefulBrowser(session=s)
browser.open(url)
and you get your session loaded...
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53285464%2fpython-requests-how-to-stay-logged-in%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can use requests.Session() to stay logged in but you have to save the cookie for the login as a json file. The example below shows a scrapping code that saves login session to facebook as a cookie in json format;
import selenium
import mechanicalsoup
import json
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import requests
import time
s = requests.Session()
email = raw_input("Enter your facebook login username/email: ")
password = raw_input("Enter your facebook password: ")
def get_driver():
driver = webdriver.Chrome(executable_path = 'your_path_to_chrome_driver')
driver.wait = WebDriverWait(driver, 3)
return driver
def get_url_cookie(driver):
dirver.get('https://facebook.com')
dirver.find_element_by_name('email').send_keys(email)
driver.find_element_by_name('pass').send_keys(password)
driver.find_element_by_id('loginbutton').click()
cookies_list= driver.get_cookies()
script = open('facebook_cookie.json','w')
json.dump(cookies_list,script)
driver = get_driver()
get_url_cookie(driver)
The code above gets you the login session cookie using the driver.get_cookies() and saves it as a json file. To use the cookie, just load it using;
with open('facebook_cookie.json') as c:
load = json.load(c)
for cookie in load:
s.cookie.set(cookie['name'],cookie['value'])
url = 'facebook.com/the_url_you_want_to_visit_on_facebook'
browser= mechanicalsoup.StatefulBrowser(session=s)
browser.open(url)
and you get your session loaded...
add a comment |
You can use requests.Session() to stay logged in but you have to save the cookie for the login as a json file. The example below shows a scrapping code that saves login session to facebook as a cookie in json format;
import selenium
import mechanicalsoup
import json
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import requests
import time
s = requests.Session()
email = raw_input("Enter your facebook login username/email: ")
password = raw_input("Enter your facebook password: ")
def get_driver():
driver = webdriver.Chrome(executable_path = 'your_path_to_chrome_driver')
driver.wait = WebDriverWait(driver, 3)
return driver
def get_url_cookie(driver):
dirver.get('https://facebook.com')
dirver.find_element_by_name('email').send_keys(email)
driver.find_element_by_name('pass').send_keys(password)
driver.find_element_by_id('loginbutton').click()
cookies_list= driver.get_cookies()
script = open('facebook_cookie.json','w')
json.dump(cookies_list,script)
driver = get_driver()
get_url_cookie(driver)
The code above gets you the login session cookie using the driver.get_cookies() and saves it as a json file. To use the cookie, just load it using;
with open('facebook_cookie.json') as c:
load = json.load(c)
for cookie in load:
s.cookie.set(cookie['name'],cookie['value'])
url = 'facebook.com/the_url_you_want_to_visit_on_facebook'
browser= mechanicalsoup.StatefulBrowser(session=s)
browser.open(url)
and you get your session loaded...
add a comment |
You can use requests.Session() to stay logged in but you have to save the cookie for the login as a json file. The example below shows a scrapping code that saves login session to facebook as a cookie in json format;
import selenium
import mechanicalsoup
import json
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import requests
import time
s = requests.Session()
email = raw_input("Enter your facebook login username/email: ")
password = raw_input("Enter your facebook password: ")
def get_driver():
driver = webdriver.Chrome(executable_path = 'your_path_to_chrome_driver')
driver.wait = WebDriverWait(driver, 3)
return driver
def get_url_cookie(driver):
dirver.get('https://facebook.com')
dirver.find_element_by_name('email').send_keys(email)
driver.find_element_by_name('pass').send_keys(password)
driver.find_element_by_id('loginbutton').click()
cookies_list= driver.get_cookies()
script = open('facebook_cookie.json','w')
json.dump(cookies_list,script)
driver = get_driver()
get_url_cookie(driver)
The code above gets you the login session cookie using the driver.get_cookies() and saves it as a json file. To use the cookie, just load it using;
with open('facebook_cookie.json') as c:
load = json.load(c)
for cookie in load:
s.cookie.set(cookie['name'],cookie['value'])
url = 'facebook.com/the_url_you_want_to_visit_on_facebook'
browser= mechanicalsoup.StatefulBrowser(session=s)
browser.open(url)
and you get your session loaded...
You can use requests.Session() to stay logged in but you have to save the cookie for the login as a json file. The example below shows a scrapping code that saves login session to facebook as a cookie in json format;
import selenium
import mechanicalsoup
import json
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import requests
import time
s = requests.Session()
email = raw_input("Enter your facebook login username/email: ")
password = raw_input("Enter your facebook password: ")
def get_driver():
driver = webdriver.Chrome(executable_path = 'your_path_to_chrome_driver')
driver.wait = WebDriverWait(driver, 3)
return driver
def get_url_cookie(driver):
dirver.get('https://facebook.com')
dirver.find_element_by_name('email').send_keys(email)
driver.find_element_by_name('pass').send_keys(password)
driver.find_element_by_id('loginbutton').click()
cookies_list= driver.get_cookies()
script = open('facebook_cookie.json','w')
json.dump(cookies_list,script)
driver = get_driver()
get_url_cookie(driver)
The code above gets you the login session cookie using the driver.get_cookies() and saves it as a json file. To use the cookie, just load it using;
with open('facebook_cookie.json') as c:
load = json.load(c)
for cookie in load:
s.cookie.set(cookie['name'],cookie['value'])
url = 'facebook.com/the_url_you_want_to_visit_on_facebook'
browser= mechanicalsoup.StatefulBrowser(session=s)
browser.open(url)
and you get your session loaded...
answered Nov 14 '18 at 9:50
PauLBincomPauLBincom
114
114
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53285464%2fpython-requests-how-to-stay-logged-in%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
you need to use
session
, check here stackoverflow.com/questions/12737740/…– Stack
Nov 13 '18 at 16:40
1
Possible duplicate of Python Requests and persistent sessions
– Antwane
Nov 13 '18 at 16:51