Finding element in a web page - Rselenium/rvest









up vote
1
down vote

favorite












I am trying to collect all individual URLs (lawyer's URL) from this website - https://www.linklaters.com/en/find-a-lawyer. I can't find a way how to extract the URLs- when i use CSS selector its not working. Could you suggest other way to find specific element in a web-page?
Also to collect all the data I need to click on the button "Load More" and I am using RSelenium.
I think that I am not doing something correct with running Rselenium through docker as it appear the error -
Error in checkError(res) :
Undefined error in httr call. httr output: Failed to connect to localhost port 4445: Connection refused



library(dplyr)
library(rvest)
library(stringr)
library(RSelenium)

link = "https://www.linklaters.com/en/find-a-lawyer"
hlink = read_html(link)
urls <- hlink %>%
html_nodes(".listCta__subtitle--top") %>%
html_attr("href")
urls <- as.data.frame(urls, stringsAsFactors = FALSE)
names(urls) <- "urls"

remDr <- RSelenium::remoteDriver(remoteServerAddr = "localhost",
port = 4445L,
browserName = "chrome")
remDr$open()

replicate(20,
# scroll down
webElem <- remDr$findElement("css", "body")
webElem$sendKeysToElement(list(key = "end"))
# find button
allURL <- remDr$findElement(using = "css selector", ".listCta__subtitle--top")
# click button
allURL$clickElement()
Sys.sleep(6)
)

allURL <- xml2::read_html(remDr$getPageSource()[[1]])%>%
rvest::html_nodes(".field--type-ds a") %>%
html_attr("href")









share|improve this question



























    up vote
    1
    down vote

    favorite












    I am trying to collect all individual URLs (lawyer's URL) from this website - https://www.linklaters.com/en/find-a-lawyer. I can't find a way how to extract the URLs- when i use CSS selector its not working. Could you suggest other way to find specific element in a web-page?
    Also to collect all the data I need to click on the button "Load More" and I am using RSelenium.
    I think that I am not doing something correct with running Rselenium through docker as it appear the error -
    Error in checkError(res) :
    Undefined error in httr call. httr output: Failed to connect to localhost port 4445: Connection refused



    library(dplyr)
    library(rvest)
    library(stringr)
    library(RSelenium)

    link = "https://www.linklaters.com/en/find-a-lawyer"
    hlink = read_html(link)
    urls <- hlink %>%
    html_nodes(".listCta__subtitle--top") %>%
    html_attr("href")
    urls <- as.data.frame(urls, stringsAsFactors = FALSE)
    names(urls) <- "urls"

    remDr <- RSelenium::remoteDriver(remoteServerAddr = "localhost",
    port = 4445L,
    browserName = "chrome")
    remDr$open()

    replicate(20,
    # scroll down
    webElem <- remDr$findElement("css", "body")
    webElem$sendKeysToElement(list(key = "end"))
    # find button
    allURL <- remDr$findElement(using = "css selector", ".listCta__subtitle--top")
    # click button
    allURL$clickElement()
    Sys.sleep(6)
    )

    allURL <- xml2::read_html(remDr$getPageSource()[[1]])%>%
    rvest::html_nodes(".field--type-ds a") %>%
    html_attr("href")









    share|improve this question

























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I am trying to collect all individual URLs (lawyer's URL) from this website - https://www.linklaters.com/en/find-a-lawyer. I can't find a way how to extract the URLs- when i use CSS selector its not working. Could you suggest other way to find specific element in a web-page?
      Also to collect all the data I need to click on the button "Load More" and I am using RSelenium.
      I think that I am not doing something correct with running Rselenium through docker as it appear the error -
      Error in checkError(res) :
      Undefined error in httr call. httr output: Failed to connect to localhost port 4445: Connection refused



      library(dplyr)
      library(rvest)
      library(stringr)
      library(RSelenium)

      link = "https://www.linklaters.com/en/find-a-lawyer"
      hlink = read_html(link)
      urls <- hlink %>%
      html_nodes(".listCta__subtitle--top") %>%
      html_attr("href")
      urls <- as.data.frame(urls, stringsAsFactors = FALSE)
      names(urls) <- "urls"

      remDr <- RSelenium::remoteDriver(remoteServerAddr = "localhost",
      port = 4445L,
      browserName = "chrome")
      remDr$open()

      replicate(20,
      # scroll down
      webElem <- remDr$findElement("css", "body")
      webElem$sendKeysToElement(list(key = "end"))
      # find button
      allURL <- remDr$findElement(using = "css selector", ".listCta__subtitle--top")
      # click button
      allURL$clickElement()
      Sys.sleep(6)
      )

      allURL <- xml2::read_html(remDr$getPageSource()[[1]])%>%
      rvest::html_nodes(".field--type-ds a") %>%
      html_attr("href")









      share|improve this question















      I am trying to collect all individual URLs (lawyer's URL) from this website - https://www.linklaters.com/en/find-a-lawyer. I can't find a way how to extract the URLs- when i use CSS selector its not working. Could you suggest other way to find specific element in a web-page?
      Also to collect all the data I need to click on the button "Load More" and I am using RSelenium.
      I think that I am not doing something correct with running Rselenium through docker as it appear the error -
      Error in checkError(res) :
      Undefined error in httr call. httr output: Failed to connect to localhost port 4445: Connection refused



      library(dplyr)
      library(rvest)
      library(stringr)
      library(RSelenium)

      link = "https://www.linklaters.com/en/find-a-lawyer"
      hlink = read_html(link)
      urls <- hlink %>%
      html_nodes(".listCta__subtitle--top") %>%
      html_attr("href")
      urls <- as.data.frame(urls, stringsAsFactors = FALSE)
      names(urls) <- "urls"

      remDr <- RSelenium::remoteDriver(remoteServerAddr = "localhost",
      port = 4445L,
      browserName = "chrome")
      remDr$open()

      replicate(20,
      # scroll down
      webElem <- remDr$findElement("css", "body")
      webElem$sendKeysToElement(list(key = "end"))
      # find button
      allURL <- remDr$findElement(using = "css selector", ".listCta__subtitle--top")
      # click button
      allURL$clickElement()
      Sys.sleep(6)
      )

      allURL <- xml2::read_html(remDr$getPageSource()[[1]])%>%
      rvest::html_nodes(".field--type-ds a") %>%
      html_attr("href")






      r rvest rselenium






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 10 at 20:57









      hrbrmstr

      58k584143




      58k584143










      asked Nov 10 at 19:35









      Rosi Ilieva

      111




      111






















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote













          It's just loading dynamic data over XHR requests. Just grab the lovely JSON:



          jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers")
          jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=30")
          jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=60")
          jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=90")


          Keep incrementing by 30 until an errant result comes back, preferably with a 5s sleep delay between requests so as not to come off as a jerk.






          share|improve this answer




















          • Thanks! As i need to do the same thing for 100 more websites. How to see the JSON for other websites ?
            – Rosi Ilieva
            Nov 11 at 8:31










          • stackoverflow.com/search?q=%5Br%5D+%5Brvest%5D+developer+tools
            – hrbrmstr
            Nov 11 at 10:38










          Your Answer






          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "1"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53242698%2ffinding-element-in-a-web-page-rselenium-rvest%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          0
          down vote













          It's just loading dynamic data over XHR requests. Just grab the lovely JSON:



          jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers")
          jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=30")
          jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=60")
          jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=90")


          Keep incrementing by 30 until an errant result comes back, preferably with a 5s sleep delay between requests so as not to come off as a jerk.






          share|improve this answer




















          • Thanks! As i need to do the same thing for 100 more websites. How to see the JSON for other websites ?
            – Rosi Ilieva
            Nov 11 at 8:31










          • stackoverflow.com/search?q=%5Br%5D+%5Brvest%5D+developer+tools
            – hrbrmstr
            Nov 11 at 10:38














          up vote
          0
          down vote













          It's just loading dynamic data over XHR requests. Just grab the lovely JSON:



          jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers")
          jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=30")
          jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=60")
          jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=90")


          Keep incrementing by 30 until an errant result comes back, preferably with a 5s sleep delay between requests so as not to come off as a jerk.






          share|improve this answer




















          • Thanks! As i need to do the same thing for 100 more websites. How to see the JSON for other websites ?
            – Rosi Ilieva
            Nov 11 at 8:31










          • stackoverflow.com/search?q=%5Br%5D+%5Brvest%5D+developer+tools
            – hrbrmstr
            Nov 11 at 10:38












          up vote
          0
          down vote










          up vote
          0
          down vote









          It's just loading dynamic data over XHR requests. Just grab the lovely JSON:



          jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers")
          jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=30")
          jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=60")
          jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=90")


          Keep incrementing by 30 until an errant result comes back, preferably with a 5s sleep delay between requests so as not to come off as a jerk.






          share|improve this answer












          It's just loading dynamic data over XHR requests. Just grab the lovely JSON:



          jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers")
          jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=30")
          jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=60")
          jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=90")


          Keep incrementing by 30 until an errant result comes back, preferably with a 5s sleep delay between requests so as not to come off as a jerk.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 10 at 19:43









          hrbrmstr

          58k584143




          58k584143











          • Thanks! As i need to do the same thing for 100 more websites. How to see the JSON for other websites ?
            – Rosi Ilieva
            Nov 11 at 8:31










          • stackoverflow.com/search?q=%5Br%5D+%5Brvest%5D+developer+tools
            – hrbrmstr
            Nov 11 at 10:38
















          • Thanks! As i need to do the same thing for 100 more websites. How to see the JSON for other websites ?
            – Rosi Ilieva
            Nov 11 at 8:31










          • stackoverflow.com/search?q=%5Br%5D+%5Brvest%5D+developer+tools
            – hrbrmstr
            Nov 11 at 10:38















          Thanks! As i need to do the same thing for 100 more websites. How to see the JSON for other websites ?
          – Rosi Ilieva
          Nov 11 at 8:31




          Thanks! As i need to do the same thing for 100 more websites. How to see the JSON for other websites ?
          – Rosi Ilieva
          Nov 11 at 8:31












          stackoverflow.com/search?q=%5Br%5D+%5Brvest%5D+developer+tools
          – hrbrmstr
          Nov 11 at 10:38




          stackoverflow.com/search?q=%5Br%5D+%5Brvest%5D+developer+tools
          – hrbrmstr
          Nov 11 at 10:38

















           

          draft saved


          draft discarded















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53242698%2ffinding-element-in-a-web-page-rselenium-rvest%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Top Tejano songwriter Luis Silva dead of heart attack at 64

          政党

          天津地下鉄3号線