Python requests. How to stay logged in?

I am new to web scarping and would like to learn how to do it properly and politely. My problem is similar to this.

'So I am trying to log into and navigate to a page using python and requests. I'm pretty sure I am getting logged in, but once I try to navigate to a page the HTML I print from that page states you must be logged in to see this page.'

I've checked robots.txt of the website I would like to scrape. Is there something which prevents me from scraping?
User-agent: *
Disallow: /caching/
Disallow: /admin3003/
Disallow: /admin5573/
Disallow: /members/
Disallow: /pp/
Disallow: /subdomains/
Disallow: /tags/
Disallow: /templates/
Disallow: /bin/
Disallow: /emails/

My code with the solution from the link above which does not work for me:

import requests
from bs4 import BeautifulSoup

login_page = <login url>
link = <required url>

payload = 
 “username” = <some username>,
 “password” = <some password> 

 

p = requests.post(login_page, data=payload) 
cookies = p.cookies
page_response = requests.get(link, cookies=cookies)
page_content = BeautifulSoup(page_response.content, "html.parser")

RequestsCookieJar shows Cookie ASP.NET_SessionId=1adqylnfxbqf5n45p0ooy345 for WEBSITE (with p.cookies command)

Output of p.status_code : 200

UPDATE:

s = requests.session()

doesn't solve my problem. I had tried this before I started looking into cookies.

Update 2:
I am trying to collect news from a particular web site. First I filtered news with a search word and saved links appeared on the first page with python requests + beautifulsoup. Now I would like to go through the links and extract news from them. The full text is possible to see with credentials only. There is no special login window and it's possible to log in via any page. There is a login button and when one move a mouse to that a login window appears as in attached image. I tried to login in both via the main page and via the page from which I would like to extract a text (not at the same time, but in different trials). None of this works.
I also tried to find csrf token by searching for “csrf_token”, “authentication_token”, “csrfmiddlewaretoken”, :csrf", "auth". Nothing was found in html on the web pages.Image

edited Nov 14 '18 at 10:24

asked Nov 13 '18 at 16:30

Mirit

you need to use session, check here stackoverflow.com/questions/12737740/…

– Stack
Nov 13 '18 at 16:40

1

Possible duplicate of Python Requests and persistent sessions

– Antwane
Nov 13 '18 at 16:51

add a comment |

I am new to web scarping and would like to learn how to do it properly and politely. My problem is similar to this.

My code with the solution from the link above which does not work for me:

import requests
from bs4 import BeautifulSoup

login_page = <login url>
link = <required url>

payload = 
 “username” = <some username>,
 “password” = <some password> 

 

p = requests.post(login_page, data=payload) 
cookies = p.cookies
page_response = requests.get(link, cookies=cookies)
page_content = BeautifulSoup(page_response.content, "html.parser")

RequestsCookieJar shows Cookie ASP.NET_SessionId=1adqylnfxbqf5n45p0ooy345 for WEBSITE (with p.cookies command)

Output of p.status_code : 200

UPDATE:

s = requests.session()

doesn't solve my problem. I had tried this before I started looking into cookies.

edited Nov 14 '18 at 10:24

asked Nov 13 '18 at 16:30

Mirit

you need to use session, check here stackoverflow.com/questions/12737740/…

– Stack
Nov 13 '18 at 16:40

1

Possible duplicate of Python Requests and persistent sessions

– Antwane
Nov 13 '18 at 16:51

add a comment |

I am new to web scarping and would like to learn how to do it properly and politely. My problem is similar to this.

My code with the solution from the link above which does not work for me:

import requests
from bs4 import BeautifulSoup

login_page = <login url>
link = <required url>

payload = 
 “username” = <some username>,
 “password” = <some password> 

 

p = requests.post(login_page, data=payload) 
cookies = p.cookies
page_response = requests.get(link, cookies=cookies)
page_content = BeautifulSoup(page_response.content, "html.parser")

RequestsCookieJar shows Cookie ASP.NET_SessionId=1adqylnfxbqf5n45p0ooy345 for WEBSITE (with p.cookies command)

Output of p.status_code : 200

UPDATE:

s = requests.session()

doesn't solve my problem. I had tried this before I started looking into cookies.

edited Nov 14 '18 at 10:24

asked Nov 13 '18 at 16:30

Mirit

I am new to web scarping and would like to learn how to do it properly and politely. My problem is similar to this.

My code with the solution from the link above which does not work for me:

import requests
from bs4 import BeautifulSoup

login_page = <login url>
link = <required url>

payload = 
 “username” = <some username>,
 “password” = <some password> 

 

p = requests.post(login_page, data=payload) 
cookies = p.cookies
page_response = requests.get(link, cookies=cookies)
page_content = BeautifulSoup(page_response.content, "html.parser")

RequestsCookieJar shows Cookie ASP.NET_SessionId=1adqylnfxbqf5n45p0ooy345 for WEBSITE (with p.cookies command)

Output of p.status_code : 200

UPDATE:

s = requests.session()

doesn't solve my problem. I had tried this before I started looking into cookies.

python web-scraping beautifulsoup python-requests robots.txt

edited Nov 14 '18 at 10:24

asked Nov 13 '18 at 16:30

Mirit

edited Nov 14 '18 at 10:24

asked Nov 13 '18 at 16:30

Mirit

edited Nov 14 '18 at 10:24

asked Nov 13 '18 at 16:30

Mirit

asked Nov 13 '18 at 16:30

Mirit

asked Nov 13 '18 at 16:30

Mirit

you need to use session, check here stackoverflow.com/questions/12737740/…

– Stack
Nov 13 '18 at 16:40

1

Possible duplicate of Python Requests and persistent sessions

– Antwane
Nov 13 '18 at 16:51

add a comment |

you need to use session, check here stackoverflow.com/questions/12737740/…

– Stack
Nov 13 '18 at 16:40

1

Possible duplicate of Python Requests and persistent sessions

– Antwane
Nov 13 '18 at 16:51

you need to use session, check here stackoverflow.com/questions/12737740/…

– Stack
Nov 13 '18 at 16:40

Possible duplicate of Python Requests and persistent sessions

– Antwane
Nov 13 '18 at 16:51

add a comment |

1 Answer
1

active

oldest

votes

You can use requests.Session() to stay logged in but you have to save the cookie for the login as a json file. The example below shows a scrapping code that saves login session to facebook as a cookie in json format;

import selenium
import mechanicalsoup
import json
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import requests
import time

s = requests.Session()

email = raw_input("Enter your facebook login username/email: ")
password = raw_input("Enter your facebook password: ")

def get_driver():
 driver = webdriver.Chrome(executable_path = 'your_path_to_chrome_driver')
 driver.wait = WebDriverWait(driver, 3)
 return driver

def get_url_cookie(driver):
 dirver.get('https://facebook.com')
 dirver.find_element_by_name('email').send_keys(email)
 driver.find_element_by_name('pass').send_keys(password)
 driver.find_element_by_id('loginbutton').click()
 cookies_list= driver.get_cookies()
 script = open('facebook_cookie.json','w')
 json.dump(cookies_list,script)

driver = get_driver()
get_url_cookie(driver)

The code above gets you the login session cookie using the driver.get_cookies() and saves it as a json file. To use the cookie, just load it using;

with open('facebook_cookie.json') as c:
 load = json.load(c)
for cookie in load:
 s.cookie.set(cookie['name'],cookie['value'])
url = 'facebook.com/the_url_you_want_to_visit_on_facebook'
browser= mechanicalsoup.StatefulBrowser(session=s)
browser.open(url)

and you get your session loaded...

answered Nov 14 '18 at 9:50

PauLBincom

114

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53285464%2fpython-requests-how-to-stay-logged-in%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

import selenium
import mechanicalsoup
import json
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import requests
import time

s = requests.Session()

email = raw_input("Enter your facebook login username/email: ")
password = raw_input("Enter your facebook password: ")

def get_driver():
 driver = webdriver.Chrome(executable_path = 'your_path_to_chrome_driver')
 driver.wait = WebDriverWait(driver, 3)
 return driver

def get_url_cookie(driver):
 dirver.get('https://facebook.com')
 dirver.find_element_by_name('email').send_keys(email)
 driver.find_element_by_name('pass').send_keys(password)
 driver.find_element_by_id('loginbutton').click()
 cookies_list= driver.get_cookies()
 script = open('facebook_cookie.json','w')
 json.dump(cookies_list,script)

driver = get_driver()
get_url_cookie(driver)

The code above gets you the login session cookie using the driver.get_cookies() and saves it as a json file. To use the cookie, just load it using;

with open('facebook_cookie.json') as c:
 load = json.load(c)
for cookie in load:
 s.cookie.set(cookie['name'],cookie['value'])
url = 'facebook.com/the_url_you_want_to_visit_on_facebook'
browser= mechanicalsoup.StatefulBrowser(session=s)
browser.open(url)

and you get your session loaded...

answered Nov 14 '18 at 9:50

PauLBincom

114

add a comment |

import selenium
import mechanicalsoup
import json
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import requests
import time

s = requests.Session()

email = raw_input("Enter your facebook login username/email: ")
password = raw_input("Enter your facebook password: ")

def get_driver():
 driver = webdriver.Chrome(executable_path = 'your_path_to_chrome_driver')
 driver.wait = WebDriverWait(driver, 3)
 return driver

def get_url_cookie(driver):
 dirver.get('https://facebook.com')
 dirver.find_element_by_name('email').send_keys(email)
 driver.find_element_by_name('pass').send_keys(password)
 driver.find_element_by_id('loginbutton').click()
 cookies_list= driver.get_cookies()
 script = open('facebook_cookie.json','w')
 json.dump(cookies_list,script)

driver = get_driver()
get_url_cookie(driver)

The code above gets you the login session cookie using the driver.get_cookies() and saves it as a json file. To use the cookie, just load it using;

with open('facebook_cookie.json') as c:
 load = json.load(c)
for cookie in load:
 s.cookie.set(cookie['name'],cookie['value'])
url = 'facebook.com/the_url_you_want_to_visit_on_facebook'
browser= mechanicalsoup.StatefulBrowser(session=s)
browser.open(url)

and you get your session loaded...

answered Nov 14 '18 at 9:50

PauLBincom

114

add a comment |

import selenium
import mechanicalsoup
import json
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import requests
import time

s = requests.Session()

email = raw_input("Enter your facebook login username/email: ")
password = raw_input("Enter your facebook password: ")

def get_driver():
 driver = webdriver.Chrome(executable_path = 'your_path_to_chrome_driver')
 driver.wait = WebDriverWait(driver, 3)
 return driver

def get_url_cookie(driver):
 dirver.get('https://facebook.com')
 dirver.find_element_by_name('email').send_keys(email)
 driver.find_element_by_name('pass').send_keys(password)
 driver.find_element_by_id('loginbutton').click()
 cookies_list= driver.get_cookies()
 script = open('facebook_cookie.json','w')
 json.dump(cookies_list,script)

driver = get_driver()
get_url_cookie(driver)

The code above gets you the login session cookie using the driver.get_cookies() and saves it as a json file. To use the cookie, just load it using;

with open('facebook_cookie.json') as c:
 load = json.load(c)
for cookie in load:
 s.cookie.set(cookie['name'],cookie['value'])
url = 'facebook.com/the_url_you_want_to_visit_on_facebook'
browser= mechanicalsoup.StatefulBrowser(session=s)
browser.open(url)

and you get your session loaded...

answered Nov 14 '18 at 9:50

PauLBincom

114

import selenium
import mechanicalsoup
import json
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import requests
import time

s = requests.Session()

email = raw_input("Enter your facebook login username/email: ")
password = raw_input("Enter your facebook password: ")

def get_driver():
 driver = webdriver.Chrome(executable_path = 'your_path_to_chrome_driver')
 driver.wait = WebDriverWait(driver, 3)
 return driver

def get_url_cookie(driver):
 dirver.get('https://facebook.com')
 dirver.find_element_by_name('email').send_keys(email)
 driver.find_element_by_name('pass').send_keys(password)
 driver.find_element_by_id('loginbutton').click()
 cookies_list= driver.get_cookies()
 script = open('facebook_cookie.json','w')
 json.dump(cookies_list,script)

driver = get_driver()
get_url_cookie(driver)

The code above gets you the login session cookie using the driver.get_cookies() and saves it as a json file. To use the cookie, just load it using;

with open('facebook_cookie.json') as c:
 load = json.load(c)
for cookie in load:
 s.cookie.set(cookie['name'],cookie['value'])
url = 'facebook.com/the_url_you_want_to_visit_on_facebook'
browser= mechanicalsoup.StatefulBrowser(session=s)
browser.open(url)

and you get your session loaded...

answered Nov 14 '18 at 9:50

PauLBincom

114

answered Nov 14 '18 at 9:50

PauLBincom

114

answered Nov 14 '18 at 9:50

PauLBincom

114

answered Nov 14 '18 at 9:50

PauLBincom

114

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

IJPx65PzPkMtoj,83dP GL,JK,U,A64 2n1,Ot

搜尋此網誌

Myujth