Finding element in a web page - Rselenium/rvest
up vote
1
down vote
favorite
I am trying to collect all individual URLs (lawyer's URL) from this website - https://www.linklaters.com/en/find-a-lawyer. I can't find a way how to extract the URLs- when i use CSS selector its not working. Could you suggest other way to find specific element in a web-page?
Also to collect all the data I need to click on the button "Load More" and I am using RSelenium.
I think that I am not doing something correct with running Rselenium through docker as it appear the error -
Error in checkError(res) :
Undefined error in httr call. httr output: Failed to connect to localhost port 4445: Connection refused
library(dplyr)
library(rvest)
library(stringr)
library(RSelenium)
link = "https://www.linklaters.com/en/find-a-lawyer"
hlink = read_html(link)
urls <- hlink %>%
html_nodes(".listCta__subtitle--top") %>%
html_attr("href")
urls <- as.data.frame(urls, stringsAsFactors = FALSE)
names(urls) <- "urls"
remDr <- RSelenium::remoteDriver(remoteServerAddr = "localhost",
port = 4445L,
browserName = "chrome")
remDr$open()
replicate(20,
# scroll down
webElem <- remDr$findElement("css", "body")
webElem$sendKeysToElement(list(key = "end"))
# find button
allURL <- remDr$findElement(using = "css selector", ".listCta__subtitle--top")
# click button
allURL$clickElement()
Sys.sleep(6)
)
allURL <- xml2::read_html(remDr$getPageSource()[[1]])%>%
rvest::html_nodes(".field--type-ds a") %>%
html_attr("href")
r rvest rselenium
add a comment |
up vote
1
down vote
favorite
I am trying to collect all individual URLs (lawyer's URL) from this website - https://www.linklaters.com/en/find-a-lawyer. I can't find a way how to extract the URLs- when i use CSS selector its not working. Could you suggest other way to find specific element in a web-page?
Also to collect all the data I need to click on the button "Load More" and I am using RSelenium.
I think that I am not doing something correct with running Rselenium through docker as it appear the error -
Error in checkError(res) :
Undefined error in httr call. httr output: Failed to connect to localhost port 4445: Connection refused
library(dplyr)
library(rvest)
library(stringr)
library(RSelenium)
link = "https://www.linklaters.com/en/find-a-lawyer"
hlink = read_html(link)
urls <- hlink %>%
html_nodes(".listCta__subtitle--top") %>%
html_attr("href")
urls <- as.data.frame(urls, stringsAsFactors = FALSE)
names(urls) <- "urls"
remDr <- RSelenium::remoteDriver(remoteServerAddr = "localhost",
port = 4445L,
browserName = "chrome")
remDr$open()
replicate(20,
# scroll down
webElem <- remDr$findElement("css", "body")
webElem$sendKeysToElement(list(key = "end"))
# find button
allURL <- remDr$findElement(using = "css selector", ".listCta__subtitle--top")
# click button
allURL$clickElement()
Sys.sleep(6)
)
allURL <- xml2::read_html(remDr$getPageSource()[[1]])%>%
rvest::html_nodes(".field--type-ds a") %>%
html_attr("href")
r rvest rselenium
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I am trying to collect all individual URLs (lawyer's URL) from this website - https://www.linklaters.com/en/find-a-lawyer. I can't find a way how to extract the URLs- when i use CSS selector its not working. Could you suggest other way to find specific element in a web-page?
Also to collect all the data I need to click on the button "Load More" and I am using RSelenium.
I think that I am not doing something correct with running Rselenium through docker as it appear the error -
Error in checkError(res) :
Undefined error in httr call. httr output: Failed to connect to localhost port 4445: Connection refused
library(dplyr)
library(rvest)
library(stringr)
library(RSelenium)
link = "https://www.linklaters.com/en/find-a-lawyer"
hlink = read_html(link)
urls <- hlink %>%
html_nodes(".listCta__subtitle--top") %>%
html_attr("href")
urls <- as.data.frame(urls, stringsAsFactors = FALSE)
names(urls) <- "urls"
remDr <- RSelenium::remoteDriver(remoteServerAddr = "localhost",
port = 4445L,
browserName = "chrome")
remDr$open()
replicate(20,
# scroll down
webElem <- remDr$findElement("css", "body")
webElem$sendKeysToElement(list(key = "end"))
# find button
allURL <- remDr$findElement(using = "css selector", ".listCta__subtitle--top")
# click button
allURL$clickElement()
Sys.sleep(6)
)
allURL <- xml2::read_html(remDr$getPageSource()[[1]])%>%
rvest::html_nodes(".field--type-ds a") %>%
html_attr("href")
r rvest rselenium
I am trying to collect all individual URLs (lawyer's URL) from this website - https://www.linklaters.com/en/find-a-lawyer. I can't find a way how to extract the URLs- when i use CSS selector its not working. Could you suggest other way to find specific element in a web-page?
Also to collect all the data I need to click on the button "Load More" and I am using RSelenium.
I think that I am not doing something correct with running Rselenium through docker as it appear the error -
Error in checkError(res) :
Undefined error in httr call. httr output: Failed to connect to localhost port 4445: Connection refused
library(dplyr)
library(rvest)
library(stringr)
library(RSelenium)
link = "https://www.linklaters.com/en/find-a-lawyer"
hlink = read_html(link)
urls <- hlink %>%
html_nodes(".listCta__subtitle--top") %>%
html_attr("href")
urls <- as.data.frame(urls, stringsAsFactors = FALSE)
names(urls) <- "urls"
remDr <- RSelenium::remoteDriver(remoteServerAddr = "localhost",
port = 4445L,
browserName = "chrome")
remDr$open()
replicate(20,
# scroll down
webElem <- remDr$findElement("css", "body")
webElem$sendKeysToElement(list(key = "end"))
# find button
allURL <- remDr$findElement(using = "css selector", ".listCta__subtitle--top")
# click button
allURL$clickElement()
Sys.sleep(6)
)
allURL <- xml2::read_html(remDr$getPageSource()[[1]])%>%
rvest::html_nodes(".field--type-ds a") %>%
html_attr("href")
r rvest rselenium
r rvest rselenium
edited Nov 10 at 20:57
hrbrmstr
58k584143
58k584143
asked Nov 10 at 19:35
Rosi Ilieva
111
111
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
It's just loading dynamic data over XHR requests. Just grab the lovely JSON:
jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers")
jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=30")
jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=60")
jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=90")
Keep incrementing by 30 until an errant result comes back, preferably with a 5s sleep delay between requests so as not to come off as a jerk.
Thanks! As i need to do the same thing for 100 more websites. How to see the JSON for other websites ?
– Rosi Ilieva
Nov 11 at 8:31
stackoverflow.com/search?q=%5Br%5D+%5Brvest%5D+developer+tools
– hrbrmstr
Nov 11 at 10:38
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
It's just loading dynamic data over XHR requests. Just grab the lovely JSON:
jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers")
jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=30")
jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=60")
jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=90")
Keep incrementing by 30 until an errant result comes back, preferably with a 5s sleep delay between requests so as not to come off as a jerk.
Thanks! As i need to do the same thing for 100 more websites. How to see the JSON for other websites ?
– Rosi Ilieva
Nov 11 at 8:31
stackoverflow.com/search?q=%5Br%5D+%5Brvest%5D+developer+tools
– hrbrmstr
Nov 11 at 10:38
add a comment |
up vote
0
down vote
It's just loading dynamic data over XHR requests. Just grab the lovely JSON:
jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers")
jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=30")
jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=60")
jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=90")
Keep incrementing by 30 until an errant result comes back, preferably with a 5s sleep delay between requests so as not to come off as a jerk.
Thanks! As i need to do the same thing for 100 more websites. How to see the JSON for other websites ?
– Rosi Ilieva
Nov 11 at 8:31
stackoverflow.com/search?q=%5Br%5D+%5Brvest%5D+developer+tools
– hrbrmstr
Nov 11 at 10:38
add a comment |
up vote
0
down vote
up vote
0
down vote
It's just loading dynamic data over XHR requests. Just grab the lovely JSON:
jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers")
jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=30")
jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=60")
jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=90")
Keep incrementing by 30 until an errant result comes back, preferably with a 5s sleep delay between requests so as not to come off as a jerk.
It's just loading dynamic data over XHR requests. Just grab the lovely JSON:
jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers")
jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=30")
jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=60")
jsonlite::fromJSON("https://www.linklaters.com/en/api/lawyers/getlawyers?searchTerm=&sort=asc&showing=90")
Keep incrementing by 30 until an errant result comes back, preferably with a 5s sleep delay between requests so as not to come off as a jerk.
answered Nov 10 at 19:43
hrbrmstr
58k584143
58k584143
Thanks! As i need to do the same thing for 100 more websites. How to see the JSON for other websites ?
– Rosi Ilieva
Nov 11 at 8:31
stackoverflow.com/search?q=%5Br%5D+%5Brvest%5D+developer+tools
– hrbrmstr
Nov 11 at 10:38
add a comment |
Thanks! As i need to do the same thing for 100 more websites. How to see the JSON for other websites ?
– Rosi Ilieva
Nov 11 at 8:31
stackoverflow.com/search?q=%5Br%5D+%5Brvest%5D+developer+tools
– hrbrmstr
Nov 11 at 10:38
Thanks! As i need to do the same thing for 100 more websites. How to see the JSON for other websites ?
– Rosi Ilieva
Nov 11 at 8:31
Thanks! As i need to do the same thing for 100 more websites. How to see the JSON for other websites ?
– Rosi Ilieva
Nov 11 at 8:31
stackoverflow.com/search?q=%5Br%5D+%5Brvest%5D+developer+tools
– hrbrmstr
Nov 11 at 10:38
stackoverflow.com/search?q=%5Br%5D+%5Brvest%5D+developer+tools
– hrbrmstr
Nov 11 at 10:38
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53242698%2ffinding-element-in-a-web-page-rselenium-rvest%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown