Getting 502 Bad Gateway and then redirected to another website. Scrapy

scrapy mega noob here.

When I try to scrapy shell a website like for example:

scrapy shell https://shop.coles.com.au/a/a-vic-metro-oakleigh/product/gasmate-cartridge-butane

I get the following messages:

...
[scrapy.core.engine] INFO: Spider opened
[scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://shop.coles.com.au/a/a-vic-metro-oakleigh/product/gasmate-cartridge-butane> (failed 1 times): 502 Bad Gateway
[scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://shop.coles.com.au/a/a-vic-metro-oakleigh/product/gasmate-cartridge-butane> (failed 2 times): 502 Bad Gateway
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://shop.coles.com.au/a/a-vic-metro-oakleigh/product/gasmate-cartridge-butane> (referer: None)
...

Then when I try to see what the response.body has:

In [1]: print(response.body)
b'<html><body><script>var $j='c';$3='c';$q='c';$G='f';$s='c';$F='c';$X='c';$t='c';$H='c';$e='=';$D='c';$g='c';$8='c';$A='c';$6='c';$O='=';$P='c';$U='5';$4='6';$y='c';$v='c';$u='c';$b='c';$V='b';$r='5';$2='6';$Q='f';$R='c';$5='c';$9='c';$c='c';$S='c';$l='c';$k='c';$m='_';$M='5';$N='c';$C='c';$d='c';$J='b';$E='5';$1='6';$i='f';document.cookie=(!4?$j:"")+(!""?$3:"")+(!4?$q:"")+(!4?$G:"")+(!()?$s:"")+(!NaN?$F:"")+(!NaN?$X:"")+(!?$t:"")+(!0?$H:"")+(!?$e:"")+(!4?$D:"")+(!""?$g:"")+(!""?$8:"")+(!?$A:"")+(!NaN?$6:"")+(!NaN?$O:"")+(!?$P:"")+(!0?$U:"")+(!()?$4:"")+(!4?$y:"")+(!""?$v:"")+(!0?$u:"")+(!0?$b:"")+(!""?$V:"")+(!0?$r:"")+(!0?$2:"")+(!""?$Q:"")+(!0?$R:"")+(!NaN?$5:"")+(!""?$9:"")+(!NaN?$c:"")+(!""?$S:"")+(!""?$l:"")+(!NaN?$k:"")+(!0?$m:"")+(!0?$M:"")+(!""?$N:"")+(!NaN?$C:"")+(!NaN?$d:"")+(!NaN?$J:"")+(!""?$E:"")+(!""?$1:"")+(!0?$i:"")+'; path=/';window.location.href=window.location.href;</script></body></html'

Which is not the website HTML, I can check in a Browser that the HTML of https://shop.coles.com.au/a/a-vic-metro-oakleigh/product/gasmate-cartridge-butane is totally different, therefore I know I'm being redirected to somewhere.

Question is how and why this is happening? and most importantly how to avoid it?

Additional info: I'm using a proxy service that will use random proxies each time I use Scrapy shell from a pool of over 20.000.
It's also worth noting that I've been scraping this webpage for quite a long time before this issue started.

asked Nov 15 '18 at 23:33

Jackknife

356

add a comment |

scrapy mega noob here.

When I try to scrapy shell a website like for example:

scrapy shell https://shop.coles.com.au/a/a-vic-metro-oakleigh/product/gasmate-cartridge-butane

I get the following messages:

...
[scrapy.core.engine] INFO: Spider opened
[scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://shop.coles.com.au/a/a-vic-metro-oakleigh/product/gasmate-cartridge-butane> (failed 1 times): 502 Bad Gateway
[scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://shop.coles.com.au/a/a-vic-metro-oakleigh/product/gasmate-cartridge-butane> (failed 2 times): 502 Bad Gateway
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://shop.coles.com.au/a/a-vic-metro-oakleigh/product/gasmate-cartridge-butane> (referer: None)
...

Then when I try to see what the response.body has:

In [1]: print(response.body)
b'<html><body><script>var $j='c';$3='c';$q='c';$G='f';$s='c';$F='c';$X='c';$t='c';$H='c';$e='=';$D='c';$g='c';$8='c';$A='c';$6='c';$O='=';$P='c';$U='5';$4='6';$y='c';$v='c';$u='c';$b='c';$V='b';$r='5';$2='6';$Q='f';$R='c';$5='c';$9='c';$c='c';$S='c';$l='c';$k='c';$m='_';$M='5';$N='c';$C='c';$d='c';$J='b';$E='5';$1='6';$i='f';document.cookie=(!4?$j:"")+(!""?$3:"")+(!4?$q:"")+(!4?$G:"")+(!()?$s:"")+(!NaN?$F:"")+(!NaN?$X:"")+(!?$t:"")+(!0?$H:"")+(!?$e:"")+(!4?$D:"")+(!""?$g:"")+(!""?$8:"")+(!?$A:"")+(!NaN?$6:"")+(!NaN?$O:"")+(!?$P:"")+(!0?$U:"")+(!()?$4:"")+(!4?$y:"")+(!""?$v:"")+(!0?$u:"")+(!0?$b:"")+(!""?$V:"")+(!0?$r:"")+(!0?$2:"")+(!""?$Q:"")+(!0?$R:"")+(!NaN?$5:"")+(!""?$9:"")+(!NaN?$c:"")+(!""?$S:"")+(!""?$l:"")+(!NaN?$k:"")+(!0?$m:"")+(!0?$M:"")+(!""?$N:"")+(!NaN?$C:"")+(!NaN?$d:"")+(!NaN?$J:"")+(!""?$E:"")+(!""?$1:"")+(!0?$i:"")+'; path=/';window.location.href=window.location.href;</script></body></html'

Question is how and why this is happening? and most importantly how to avoid it?

asked Nov 15 '18 at 23:33

Jackknife

356

add a comment |

scrapy mega noob here.

When I try to scrapy shell a website like for example:

scrapy shell https://shop.coles.com.au/a/a-vic-metro-oakleigh/product/gasmate-cartridge-butane

I get the following messages:

...
[scrapy.core.engine] INFO: Spider opened
[scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://shop.coles.com.au/a/a-vic-metro-oakleigh/product/gasmate-cartridge-butane> (failed 1 times): 502 Bad Gateway
[scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://shop.coles.com.au/a/a-vic-metro-oakleigh/product/gasmate-cartridge-butane> (failed 2 times): 502 Bad Gateway
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://shop.coles.com.au/a/a-vic-metro-oakleigh/product/gasmate-cartridge-butane> (referer: None)
...

Then when I try to see what the response.body has:

In [1]: print(response.body)
b'<html><body><script>var $j='c';$3='c';$q='c';$G='f';$s='c';$F='c';$X='c';$t='c';$H='c';$e='=';$D='c';$g='c';$8='c';$A='c';$6='c';$O='=';$P='c';$U='5';$4='6';$y='c';$v='c';$u='c';$b='c';$V='b';$r='5';$2='6';$Q='f';$R='c';$5='c';$9='c';$c='c';$S='c';$l='c';$k='c';$m='_';$M='5';$N='c';$C='c';$d='c';$J='b';$E='5';$1='6';$i='f';document.cookie=(!4?$j:"")+(!""?$3:"")+(!4?$q:"")+(!4?$G:"")+(!()?$s:"")+(!NaN?$F:"")+(!NaN?$X:"")+(!?$t:"")+(!0?$H:"")+(!?$e:"")+(!4?$D:"")+(!""?$g:"")+(!""?$8:"")+(!?$A:"")+(!NaN?$6:"")+(!NaN?$O:"")+(!?$P:"")+(!0?$U:"")+(!()?$4:"")+(!4?$y:"")+(!""?$v:"")+(!0?$u:"")+(!0?$b:"")+(!""?$V:"")+(!0?$r:"")+(!0?$2:"")+(!""?$Q:"")+(!0?$R:"")+(!NaN?$5:"")+(!""?$9:"")+(!NaN?$c:"")+(!""?$S:"")+(!""?$l:"")+(!NaN?$k:"")+(!0?$m:"")+(!0?$M:"")+(!""?$N:"")+(!NaN?$C:"")+(!NaN?$d:"")+(!NaN?$J:"")+(!""?$E:"")+(!""?$1:"")+(!0?$i:"")+'; path=/';window.location.href=window.location.href;</script></body></html'

Question is how and why this is happening? and most importantly how to avoid it?

asked Nov 15 '18 at 23:33

Jackknife

356

scrapy mega noob here.

When I try to scrapy shell a website like for example:

scrapy shell https://shop.coles.com.au/a/a-vic-metro-oakleigh/product/gasmate-cartridge-butane

I get the following messages:

...
[scrapy.core.engine] INFO: Spider opened
[scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://shop.coles.com.au/a/a-vic-metro-oakleigh/product/gasmate-cartridge-butane> (failed 1 times): 502 Bad Gateway
[scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://shop.coles.com.au/a/a-vic-metro-oakleigh/product/gasmate-cartridge-butane> (failed 2 times): 502 Bad Gateway
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://shop.coles.com.au/a/a-vic-metro-oakleigh/product/gasmate-cartridge-butane> (referer: None)
...

Then when I try to see what the response.body has:

In [1]: print(response.body)
b'<html><body><script>var $j='c';$3='c';$q='c';$G='f';$s='c';$F='c';$X='c';$t='c';$H='c';$e='=';$D='c';$g='c';$8='c';$A='c';$6='c';$O='=';$P='c';$U='5';$4='6';$y='c';$v='c';$u='c';$b='c';$V='b';$r='5';$2='6';$Q='f';$R='c';$5='c';$9='c';$c='c';$S='c';$l='c';$k='c';$m='_';$M='5';$N='c';$C='c';$d='c';$J='b';$E='5';$1='6';$i='f';document.cookie=(!4?$j:"")+(!""?$3:"")+(!4?$q:"")+(!4?$G:"")+(!()?$s:"")+(!NaN?$F:"")+(!NaN?$X:"")+(!?$t:"")+(!0?$H:"")+(!?$e:"")+(!4?$D:"")+(!""?$g:"")+(!""?$8:"")+(!?$A:"")+(!NaN?$6:"")+(!NaN?$O:"")+(!?$P:"")+(!0?$U:"")+(!()?$4:"")+(!4?$y:"")+(!""?$v:"")+(!0?$u:"")+(!0?$b:"")+(!""?$V:"")+(!0?$r:"")+(!0?$2:"")+(!""?$Q:"")+(!0?$R:"")+(!NaN?$5:"")+(!""?$9:"")+(!NaN?$c:"")+(!""?$S:"")+(!""?$l:"")+(!NaN?$k:"")+(!0?$m:"")+(!0?$M:"")+(!""?$N:"")+(!NaN?$C:"")+(!NaN?$d:"")+(!NaN?$J:"")+(!""?$E:"")+(!""?$1:"")+(!0?$i:"")+'; path=/';window.location.href=window.location.href;</script></body></html'

Question is how and why this is happening? and most importantly how to avoid it?

shell redirect scrapy

asked Nov 15 '18 at 23:33

Jackknife

356

asked Nov 15 '18 at 23:33

Jackknife

356

asked Nov 15 '18 at 23:33

Jackknife

356

asked Nov 15 '18 at 23:33

Jackknife

356

asked Nov 15 '18 at 23:33

Jackknife

356

add a comment |

1 Answer
1

active

oldest

votes

If you look at the Javascript code, it sets a cookie and redirects on itself.

It seems that the website expects you to have a specific cookie to access the "normal" pages, but since scrapy can't execute javascript, it stops there.

You may want to parse the Javascript code somehow, and set your cookie manually and re-query the same URL.

answered Nov 16 '18 at 18:25

Guillaume

1,1581724

That makes sense. Thank you

– Jackknife
Nov 18 '18 at 22:27

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53329336%2fgetting-502-bad-gateway-and-then-redirected-to-another-website-scrapy%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

If you look at the Javascript code, it sets a cookie and redirects on itself.

It seems that the website expects you to have a specific cookie to access the "normal" pages, but since scrapy can't execute javascript, it stops there.

You may want to parse the Javascript code somehow, and set your cookie manually and re-query the same URL.

answered Nov 16 '18 at 18:25

Guillaume

1,1581724

That makes sense. Thank you

– Jackknife
Nov 18 '18 at 22:27

add a comment |

If you look at the Javascript code, it sets a cookie and redirects on itself.

It seems that the website expects you to have a specific cookie to access the "normal" pages, but since scrapy can't execute javascript, it stops there.

You may want to parse the Javascript code somehow, and set your cookie manually and re-query the same URL.

answered Nov 16 '18 at 18:25

Guillaume

1,1581724

That makes sense. Thank you

– Jackknife
Nov 18 '18 at 22:27

add a comment |

If you look at the Javascript code, it sets a cookie and redirects on itself.

It seems that the website expects you to have a specific cookie to access the "normal" pages, but since scrapy can't execute javascript, it stops there.

You may want to parse the Javascript code somehow, and set your cookie manually and re-query the same URL.

answered Nov 16 '18 at 18:25

Guillaume

1,1581724

If you look at the Javascript code, it sets a cookie and redirects on itself.

It seems that the website expects you to have a specific cookie to access the "normal" pages, but since scrapy can't execute javascript, it stops there.

You may want to parse the Javascript code somehow, and set your cookie manually and re-query the same URL.

answered Nov 16 '18 at 18:25

Guillaume

1,1581724

answered Nov 16 '18 at 18:25

Guillaume

1,1581724

answered Nov 16 '18 at 18:25

Guillaume

1,1581724

answered Nov 16 '18 at 18:25

Guillaume

1,1581724

That makes sense. Thank you

– Jackknife
Nov 18 '18 at 22:27

add a comment |

That makes sense. Thank you

– Jackknife
Nov 18 '18 at 22:27

That makes sense. Thank you

– Jackknife
Nov 18 '18 at 22:27

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Myujth