Href not visible in scrapy result but visible in html
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
Set-up
I have the next-page button element from this page,
<li class="Pagination-item Pagination-item--next Pagination-item--nextSolo ">
<button type="button" class="Pagination-link js-veza-stranica kist-FauxAnchor" data-page="2" data-href="https://www.njuskalo.hr/prodaja-kuca?page=2" role="link">Sljedeća <span aria-hidden="true" role="presentation">»</span></button>
</li>
I need to obtain the url in the data-href
attribute.
Code
Using the following simple xpath to the button element in scrapy shell,
response.xpath('//*[@id="form_browse_detailed_search"]/div/div[1]/div[5]/div[1]/nav/ul/li[8]/button').extract_first()
I retrieve,
'<button type="button" class="Pagination-link js-veza-stranica" data-page="2">Sljedećaxa0<span aria-hidden="true" role="presentation">»</span></button>'
Question
Where did the data-href
attribute go to?
How do I obtain the url?
python scrapy attributes
add a comment |
Set-up
I have the next-page button element from this page,
<li class="Pagination-item Pagination-item--next Pagination-item--nextSolo ">
<button type="button" class="Pagination-link js-veza-stranica kist-FauxAnchor" data-page="2" data-href="https://www.njuskalo.hr/prodaja-kuca?page=2" role="link">Sljedeća <span aria-hidden="true" role="presentation">»</span></button>
</li>
I need to obtain the url in the data-href
attribute.
Code
Using the following simple xpath to the button element in scrapy shell,
response.xpath('//*[@id="form_browse_detailed_search"]/div/div[1]/div[5]/div[1]/nav/ul/li[8]/button').extract_first()
I retrieve,
'<button type="button" class="Pagination-link js-veza-stranica" data-page="2">Sljedećaxa0<span aria-hidden="true" role="presentation">»</span></button>'
Question
Where did the data-href
attribute go to?
How do I obtain the url?
python scrapy attributes
add a comment |
Set-up
I have the next-page button element from this page,
<li class="Pagination-item Pagination-item--next Pagination-item--nextSolo ">
<button type="button" class="Pagination-link js-veza-stranica kist-FauxAnchor" data-page="2" data-href="https://www.njuskalo.hr/prodaja-kuca?page=2" role="link">Sljedeća <span aria-hidden="true" role="presentation">»</span></button>
</li>
I need to obtain the url in the data-href
attribute.
Code
Using the following simple xpath to the button element in scrapy shell,
response.xpath('//*[@id="form_browse_detailed_search"]/div/div[1]/div[5]/div[1]/nav/ul/li[8]/button').extract_first()
I retrieve,
'<button type="button" class="Pagination-link js-veza-stranica" data-page="2">Sljedećaxa0<span aria-hidden="true" role="presentation">»</span></button>'
Question
Where did the data-href
attribute go to?
How do I obtain the url?
python scrapy attributes
Set-up
I have the next-page button element from this page,
<li class="Pagination-item Pagination-item--next Pagination-item--nextSolo ">
<button type="button" class="Pagination-link js-veza-stranica kist-FauxAnchor" data-page="2" data-href="https://www.njuskalo.hr/prodaja-kuca?page=2" role="link">Sljedeća <span aria-hidden="true" role="presentation">»</span></button>
</li>
I need to obtain the url in the data-href
attribute.
Code
Using the following simple xpath to the button element in scrapy shell,
response.xpath('//*[@id="form_browse_detailed_search"]/div/div[1]/div[5]/div[1]/nav/ul/li[8]/button').extract_first()
I retrieve,
'<button type="button" class="Pagination-link js-veza-stranica" data-page="2">Sljedećaxa0<span aria-hidden="true" role="presentation">»</span></button>'
Question
Where did the data-href
attribute go to?
How do I obtain the url?
python scrapy attributes
python scrapy attributes
asked Nov 16 '18 at 12:13
LucSpanLucSpan
687621
687621
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
The data-href
attribute is most likely being calculated by some JavaScript code running in your browser. If you look at the raw source code of this page ("view source code" option in your browser), you won't find that attribute there.
The output you see on developer tools is the DOM rendered by your browser, so you can expect differences between your browser view and what Scrapy actually fetches (which is the raw HTML source). Keep in mind that Scrapy doesn't execute any JavaScript code.
Anyway, a way to solve this would be building the pagination URL based on the data-page
attribute:
from w3lib.url import add_or_replace_parameter
...
next_page = response.css('.Pagination-item--nextSolo button::attr(data-page)').get()
next_page_url = add_or_replace_parameter(response.url, 'page', next_page)
w3lib
is an open source library: https://github.com/scrapy/w3lib
Thank you for the explanation and solution.w3lib.url
is very neat!
– LucSpan
Nov 16 '18 at 13:42
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53337700%2fhref-not-visible-in-scrapy-result-but-visible-in-html%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The data-href
attribute is most likely being calculated by some JavaScript code running in your browser. If you look at the raw source code of this page ("view source code" option in your browser), you won't find that attribute there.
The output you see on developer tools is the DOM rendered by your browser, so you can expect differences between your browser view and what Scrapy actually fetches (which is the raw HTML source). Keep in mind that Scrapy doesn't execute any JavaScript code.
Anyway, a way to solve this would be building the pagination URL based on the data-page
attribute:
from w3lib.url import add_or_replace_parameter
...
next_page = response.css('.Pagination-item--nextSolo button::attr(data-page)').get()
next_page_url = add_or_replace_parameter(response.url, 'page', next_page)
w3lib
is an open source library: https://github.com/scrapy/w3lib
Thank you for the explanation and solution.w3lib.url
is very neat!
– LucSpan
Nov 16 '18 at 13:42
add a comment |
The data-href
attribute is most likely being calculated by some JavaScript code running in your browser. If you look at the raw source code of this page ("view source code" option in your browser), you won't find that attribute there.
The output you see on developer tools is the DOM rendered by your browser, so you can expect differences between your browser view and what Scrapy actually fetches (which is the raw HTML source). Keep in mind that Scrapy doesn't execute any JavaScript code.
Anyway, a way to solve this would be building the pagination URL based on the data-page
attribute:
from w3lib.url import add_or_replace_parameter
...
next_page = response.css('.Pagination-item--nextSolo button::attr(data-page)').get()
next_page_url = add_or_replace_parameter(response.url, 'page', next_page)
w3lib
is an open source library: https://github.com/scrapy/w3lib
Thank you for the explanation and solution.w3lib.url
is very neat!
– LucSpan
Nov 16 '18 at 13:42
add a comment |
The data-href
attribute is most likely being calculated by some JavaScript code running in your browser. If you look at the raw source code of this page ("view source code" option in your browser), you won't find that attribute there.
The output you see on developer tools is the DOM rendered by your browser, so you can expect differences between your browser view and what Scrapy actually fetches (which is the raw HTML source). Keep in mind that Scrapy doesn't execute any JavaScript code.
Anyway, a way to solve this would be building the pagination URL based on the data-page
attribute:
from w3lib.url import add_or_replace_parameter
...
next_page = response.css('.Pagination-item--nextSolo button::attr(data-page)').get()
next_page_url = add_or_replace_parameter(response.url, 'page', next_page)
w3lib
is an open source library: https://github.com/scrapy/w3lib
The data-href
attribute is most likely being calculated by some JavaScript code running in your browser. If you look at the raw source code of this page ("view source code" option in your browser), you won't find that attribute there.
The output you see on developer tools is the DOM rendered by your browser, so you can expect differences between your browser view and what Scrapy actually fetches (which is the raw HTML source). Keep in mind that Scrapy doesn't execute any JavaScript code.
Anyway, a way to solve this would be building the pagination URL based on the data-page
attribute:
from w3lib.url import add_or_replace_parameter
...
next_page = response.css('.Pagination-item--nextSolo button::attr(data-page)').get()
next_page_url = add_or_replace_parameter(response.url, 'page', next_page)
w3lib
is an open source library: https://github.com/scrapy/w3lib
answered Nov 16 '18 at 12:43
Valdir Stumm JuniorValdir Stumm Junior
3,1031626
3,1031626
Thank you for the explanation and solution.w3lib.url
is very neat!
– LucSpan
Nov 16 '18 at 13:42
add a comment |
Thank you for the explanation and solution.w3lib.url
is very neat!
– LucSpan
Nov 16 '18 at 13:42
Thank you for the explanation and solution.
w3lib.url
is very neat!– LucSpan
Nov 16 '18 at 13:42
Thank you for the explanation and solution.
w3lib.url
is very neat!– LucSpan
Nov 16 '18 at 13:42
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53337700%2fhref-not-visible-in-scrapy-result-but-visible-in-html%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown