Python requests. How to stay logged in?










0
















I am new to web scarping and would like to learn how to do it properly and politely. My problem is similar to this.



'So I am trying to log into and navigate to a page using python and requests. I'm pretty sure I am getting logged in, but once I try to navigate to a page the HTML I print from that page states you must be logged in to see this page.'





I've checked robots.txt of the website I would like to scrape. Is there something which prevents me from scraping?
User-agent: *
Disallow: /caching/
Disallow: /admin3003/
Disallow: /admin5573/
Disallow: /members/
Disallow: /pp/
Disallow: /subdomains/
Disallow: /tags/
Disallow: /templates/
Disallow: /bin/
Disallow: /emails/



My code with the solution from the link above which does not work for me:



import requests
from bs4 import BeautifulSoup

login_page = <login url>
link = <required url>

payload =
“username” = <some username>,
“password” = <some password>



p = requests.post(login_page, data=payload)
cookies = p.cookies
page_response = requests.get(link, cookies=cookies)
page_content = BeautifulSoup(page_response.content, "html.parser")


RequestsCookieJar shows Cookie ASP.NET_SessionId=1adqylnfxbqf5n45p0ooy345 for WEBSITE (with p.cookies command)



Output of p.status_code : 200



UPDATE:



s = requests.session()


doesn't solve my problem. I had tried this before I started looking into cookies.



Update 2:
I am trying to collect news from a particular web site. First I filtered news with a search word and saved links appeared on the first page with python requests + beautifulsoup. Now I would like to go through the links and extract news from them. The full text is possible to see with credentials only. There is no special login window and it's possible to log in via any page. There is a login button and when one move a mouse to that a login window appears as in attached image. I tried to login in both via the main page and via the page from which I would like to extract a text (not at the same time, but in different trials). None of this works.
I also tried to find csrf token by searching for “csrf_token”, “authentication_token”, “csrfmiddlewaretoken”, :csrf", "auth". Nothing was found in html on the web pages.Image










share|improve this question
























  • you need to use session, check here stackoverflow.com/questions/12737740/…

    – Stack
    Nov 13 '18 at 16:40






  • 1





    Possible duplicate of Python Requests and persistent sessions

    – Antwane
    Nov 13 '18 at 16:51















0
















I am new to web scarping and would like to learn how to do it properly and politely. My problem is similar to this.



'So I am trying to log into and navigate to a page using python and requests. I'm pretty sure I am getting logged in, but once I try to navigate to a page the HTML I print from that page states you must be logged in to see this page.'





I've checked robots.txt of the website I would like to scrape. Is there something which prevents me from scraping?
User-agent: *
Disallow: /caching/
Disallow: /admin3003/
Disallow: /admin5573/
Disallow: /members/
Disallow: /pp/
Disallow: /subdomains/
Disallow: /tags/
Disallow: /templates/
Disallow: /bin/
Disallow: /emails/



My code with the solution from the link above which does not work for me:



import requests
from bs4 import BeautifulSoup

login_page = <login url>
link = <required url>

payload =
“username” = <some username>,
“password” = <some password>



p = requests.post(login_page, data=payload)
cookies = p.cookies
page_response = requests.get(link, cookies=cookies)
page_content = BeautifulSoup(page_response.content, "html.parser")


RequestsCookieJar shows Cookie ASP.NET_SessionId=1adqylnfxbqf5n45p0ooy345 for WEBSITE (with p.cookies command)



Output of p.status_code : 200



UPDATE:



s = requests.session()


doesn't solve my problem. I had tried this before I started looking into cookies.



Update 2:
I am trying to collect news from a particular web site. First I filtered news with a search word and saved links appeared on the first page with python requests + beautifulsoup. Now I would like to go through the links and extract news from them. The full text is possible to see with credentials only. There is no special login window and it's possible to log in via any page. There is a login button and when one move a mouse to that a login window appears as in attached image. I tried to login in both via the main page and via the page from which I would like to extract a text (not at the same time, but in different trials). None of this works.
I also tried to find csrf token by searching for “csrf_token”, “authentication_token”, “csrfmiddlewaretoken”, :csrf", "auth". Nothing was found in html on the web pages.Image










share|improve this question
























  • you need to use session, check here stackoverflow.com/questions/12737740/…

    – Stack
    Nov 13 '18 at 16:40






  • 1





    Possible duplicate of Python Requests and persistent sessions

    – Antwane
    Nov 13 '18 at 16:51













0












0








0









I am new to web scarping and would like to learn how to do it properly and politely. My problem is similar to this.



'So I am trying to log into and navigate to a page using python and requests. I'm pretty sure I am getting logged in, but once I try to navigate to a page the HTML I print from that page states you must be logged in to see this page.'





I've checked robots.txt of the website I would like to scrape. Is there something which prevents me from scraping?
User-agent: *
Disallow: /caching/
Disallow: /admin3003/
Disallow: /admin5573/
Disallow: /members/
Disallow: /pp/
Disallow: /subdomains/
Disallow: /tags/
Disallow: /templates/
Disallow: /bin/
Disallow: /emails/



My code with the solution from the link above which does not work for me:



import requests
from bs4 import BeautifulSoup

login_page = <login url>
link = <required url>

payload =
“username” = <some username>,
“password” = <some password>



p = requests.post(login_page, data=payload)
cookies = p.cookies
page_response = requests.get(link, cookies=cookies)
page_content = BeautifulSoup(page_response.content, "html.parser")


RequestsCookieJar shows Cookie ASP.NET_SessionId=1adqylnfxbqf5n45p0ooy345 for WEBSITE (with p.cookies command)



Output of p.status_code : 200



UPDATE:



s = requests.session()


doesn't solve my problem. I had tried this before I started looking into cookies.



Update 2:
I am trying to collect news from a particular web site. First I filtered news with a search word and saved links appeared on the first page with python requests + beautifulsoup. Now I would like to go through the links and extract news from them. The full text is possible to see with credentials only. There is no special login window and it's possible to log in via any page. There is a login button and when one move a mouse to that a login window appears as in attached image. I tried to login in both via the main page and via the page from which I would like to extract a text (not at the same time, but in different trials). None of this works.
I also tried to find csrf token by searching for “csrf_token”, “authentication_token”, “csrfmiddlewaretoken”, :csrf", "auth". Nothing was found in html on the web pages.Image










share|improve this question

















I am new to web scarping and would like to learn how to do it properly and politely. My problem is similar to this.



'So I am trying to log into and navigate to a page using python and requests. I'm pretty sure I am getting logged in, but once I try to navigate to a page the HTML I print from that page states you must be logged in to see this page.'





I've checked robots.txt of the website I would like to scrape. Is there something which prevents me from scraping?
User-agent: *
Disallow: /caching/
Disallow: /admin3003/
Disallow: /admin5573/
Disallow: /members/
Disallow: /pp/
Disallow: /subdomains/
Disallow: /tags/
Disallow: /templates/
Disallow: /bin/
Disallow: /emails/



My code with the solution from the link above which does not work for me:



import requests
from bs4 import BeautifulSoup

login_page = <login url>
link = <required url>

payload =
“username” = <some username>,
“password” = <some password>



p = requests.post(login_page, data=payload)
cookies = p.cookies
page_response = requests.get(link, cookies=cookies)
page_content = BeautifulSoup(page_response.content, "html.parser")


RequestsCookieJar shows Cookie ASP.NET_SessionId=1adqylnfxbqf5n45p0ooy345 for WEBSITE (with p.cookies command)



Output of p.status_code : 200



UPDATE:



s = requests.session()


doesn't solve my problem. I had tried this before I started looking into cookies.



Update 2:
I am trying to collect news from a particular web site. First I filtered news with a search word and saved links appeared on the first page with python requests + beautifulsoup. Now I would like to go through the links and extract news from them. The full text is possible to see with credentials only. There is no special login window and it's possible to log in via any page. There is a login button and when one move a mouse to that a login window appears as in attached image. I tried to login in both via the main page and via the page from which I would like to extract a text (not at the same time, but in different trials). None of this works.
I also tried to find csrf token by searching for “csrf_token”, “authentication_token”, “csrfmiddlewaretoken”, :csrf", "auth". Nothing was found in html on the web pages.Image







python web-scraping beautifulsoup python-requests robots.txt






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 14 '18 at 10:24







Mirit

















asked Nov 13 '18 at 16:30









MiritMirit

62




62












  • you need to use session, check here stackoverflow.com/questions/12737740/…

    – Stack
    Nov 13 '18 at 16:40






  • 1





    Possible duplicate of Python Requests and persistent sessions

    – Antwane
    Nov 13 '18 at 16:51

















  • you need to use session, check here stackoverflow.com/questions/12737740/…

    – Stack
    Nov 13 '18 at 16:40






  • 1





    Possible duplicate of Python Requests and persistent sessions

    – Antwane
    Nov 13 '18 at 16:51
















you need to use session, check here stackoverflow.com/questions/12737740/…

– Stack
Nov 13 '18 at 16:40





you need to use session, check here stackoverflow.com/questions/12737740/…

– Stack
Nov 13 '18 at 16:40




1




1





Possible duplicate of Python Requests and persistent sessions

– Antwane
Nov 13 '18 at 16:51





Possible duplicate of Python Requests and persistent sessions

– Antwane
Nov 13 '18 at 16:51












1 Answer
1






active

oldest

votes


















0














You can use requests.Session() to stay logged in but you have to save the cookie for the login as a json file. The example below shows a scrapping code that saves login session to facebook as a cookie in json format;



import selenium
import mechanicalsoup
import json
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import requests
import time

s = requests.Session()

email = raw_input("Enter your facebook login username/email: ")
password = raw_input("Enter your facebook password: ")

def get_driver():
driver = webdriver.Chrome(executable_path = 'your_path_to_chrome_driver')
driver.wait = WebDriverWait(driver, 3)
return driver

def get_url_cookie(driver):
dirver.get('https://facebook.com')
dirver.find_element_by_name('email').send_keys(email)
driver.find_element_by_name('pass').send_keys(password)
driver.find_element_by_id('loginbutton').click()
cookies_list= driver.get_cookies()
script = open('facebook_cookie.json','w')
json.dump(cookies_list,script)

driver = get_driver()
get_url_cookie(driver)


The code above gets you the login session cookie using the driver.get_cookies() and saves it as a json file. To use the cookie, just load it using;



with open('facebook_cookie.json') as c:
load = json.load(c)
for cookie in load:
s.cookie.set(cookie['name'],cookie['value'])
url = 'facebook.com/the_url_you_want_to_visit_on_facebook'
browser= mechanicalsoup.StatefulBrowser(session=s)
browser.open(url)


and you get your session loaded...






share|improve this answer






















    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53285464%2fpython-requests-how-to-stay-logged-in%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    You can use requests.Session() to stay logged in but you have to save the cookie for the login as a json file. The example below shows a scrapping code that saves login session to facebook as a cookie in json format;



    import selenium
    import mechanicalsoup
    import json
    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    import requests
    import time

    s = requests.Session()

    email = raw_input("Enter your facebook login username/email: ")
    password = raw_input("Enter your facebook password: ")

    def get_driver():
    driver = webdriver.Chrome(executable_path = 'your_path_to_chrome_driver')
    driver.wait = WebDriverWait(driver, 3)
    return driver

    def get_url_cookie(driver):
    dirver.get('https://facebook.com')
    dirver.find_element_by_name('email').send_keys(email)
    driver.find_element_by_name('pass').send_keys(password)
    driver.find_element_by_id('loginbutton').click()
    cookies_list= driver.get_cookies()
    script = open('facebook_cookie.json','w')
    json.dump(cookies_list,script)

    driver = get_driver()
    get_url_cookie(driver)


    The code above gets you the login session cookie using the driver.get_cookies() and saves it as a json file. To use the cookie, just load it using;



    with open('facebook_cookie.json') as c:
    load = json.load(c)
    for cookie in load:
    s.cookie.set(cookie['name'],cookie['value'])
    url = 'facebook.com/the_url_you_want_to_visit_on_facebook'
    browser= mechanicalsoup.StatefulBrowser(session=s)
    browser.open(url)


    and you get your session loaded...






    share|improve this answer



























      0














      You can use requests.Session() to stay logged in but you have to save the cookie for the login as a json file. The example below shows a scrapping code that saves login session to facebook as a cookie in json format;



      import selenium
      import mechanicalsoup
      import json
      from selenium import webdriver
      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.common.by import By
      import requests
      import time

      s = requests.Session()

      email = raw_input("Enter your facebook login username/email: ")
      password = raw_input("Enter your facebook password: ")

      def get_driver():
      driver = webdriver.Chrome(executable_path = 'your_path_to_chrome_driver')
      driver.wait = WebDriverWait(driver, 3)
      return driver

      def get_url_cookie(driver):
      dirver.get('https://facebook.com')
      dirver.find_element_by_name('email').send_keys(email)
      driver.find_element_by_name('pass').send_keys(password)
      driver.find_element_by_id('loginbutton').click()
      cookies_list= driver.get_cookies()
      script = open('facebook_cookie.json','w')
      json.dump(cookies_list,script)

      driver = get_driver()
      get_url_cookie(driver)


      The code above gets you the login session cookie using the driver.get_cookies() and saves it as a json file. To use the cookie, just load it using;



      with open('facebook_cookie.json') as c:
      load = json.load(c)
      for cookie in load:
      s.cookie.set(cookie['name'],cookie['value'])
      url = 'facebook.com/the_url_you_want_to_visit_on_facebook'
      browser= mechanicalsoup.StatefulBrowser(session=s)
      browser.open(url)


      and you get your session loaded...






      share|improve this answer

























        0












        0








        0







        You can use requests.Session() to stay logged in but you have to save the cookie for the login as a json file. The example below shows a scrapping code that saves login session to facebook as a cookie in json format;



        import selenium
        import mechanicalsoup
        import json
        from selenium import webdriver
        from selenium.webdriver.support.ui import WebDriverWait
        from selenium.webdriver.common.by import By
        import requests
        import time

        s = requests.Session()

        email = raw_input("Enter your facebook login username/email: ")
        password = raw_input("Enter your facebook password: ")

        def get_driver():
        driver = webdriver.Chrome(executable_path = 'your_path_to_chrome_driver')
        driver.wait = WebDriverWait(driver, 3)
        return driver

        def get_url_cookie(driver):
        dirver.get('https://facebook.com')
        dirver.find_element_by_name('email').send_keys(email)
        driver.find_element_by_name('pass').send_keys(password)
        driver.find_element_by_id('loginbutton').click()
        cookies_list= driver.get_cookies()
        script = open('facebook_cookie.json','w')
        json.dump(cookies_list,script)

        driver = get_driver()
        get_url_cookie(driver)


        The code above gets you the login session cookie using the driver.get_cookies() and saves it as a json file. To use the cookie, just load it using;



        with open('facebook_cookie.json') as c:
        load = json.load(c)
        for cookie in load:
        s.cookie.set(cookie['name'],cookie['value'])
        url = 'facebook.com/the_url_you_want_to_visit_on_facebook'
        browser= mechanicalsoup.StatefulBrowser(session=s)
        browser.open(url)


        and you get your session loaded...






        share|improve this answer













        You can use requests.Session() to stay logged in but you have to save the cookie for the login as a json file. The example below shows a scrapping code that saves login session to facebook as a cookie in json format;



        import selenium
        import mechanicalsoup
        import json
        from selenium import webdriver
        from selenium.webdriver.support.ui import WebDriverWait
        from selenium.webdriver.common.by import By
        import requests
        import time

        s = requests.Session()

        email = raw_input("Enter your facebook login username/email: ")
        password = raw_input("Enter your facebook password: ")

        def get_driver():
        driver = webdriver.Chrome(executable_path = 'your_path_to_chrome_driver')
        driver.wait = WebDriverWait(driver, 3)
        return driver

        def get_url_cookie(driver):
        dirver.get('https://facebook.com')
        dirver.find_element_by_name('email').send_keys(email)
        driver.find_element_by_name('pass').send_keys(password)
        driver.find_element_by_id('loginbutton').click()
        cookies_list= driver.get_cookies()
        script = open('facebook_cookie.json','w')
        json.dump(cookies_list,script)

        driver = get_driver()
        get_url_cookie(driver)


        The code above gets you the login session cookie using the driver.get_cookies() and saves it as a json file. To use the cookie, just load it using;



        with open('facebook_cookie.json') as c:
        load = json.load(c)
        for cookie in load:
        s.cookie.set(cookie['name'],cookie['value'])
        url = 'facebook.com/the_url_you_want_to_visit_on_facebook'
        browser= mechanicalsoup.StatefulBrowser(session=s)
        browser.open(url)


        and you get your session loaded...







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 14 '18 at 9:50









        PauLBincomPauLBincom

        114




        114



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53285464%2fpython-requests-how-to-stay-logged-in%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Top Tejano songwriter Luis Silva dead of heart attack at 64

            政党

            天津地下鉄3号線