Scrap the web page using jsoup









up vote
2
down vote

favorite












I need to scrap the postcode from below html code by using the jsoup. I only need postcode which is part of href attribute of a tag, called W2:




<a href="/properties-for-sale/w2/chpk3848653" class="property_photo_holder" style="backgroundimage:url(https://assets.foxtons.co.uk/w/480/1523289105/chpk3848653-23.jpg)"></a>



This is html code:



</div>

<div id="property_1062067" class="property_summary">

<h6><a href="/properties-for-sale/w2/chpk3848653">Lancaster Gate, <span class="property_address_location_name">Bayswater,</span> W2</a></h6>


Can anyone help ?
Thank you.










share|improve this question



















  • 1




    What do you mean by "I only need postcode which is W2" ? Also, may you post something you tried?
    – Subhasish Bhattacharjee
    Nov 11 at 9:18










  • I just tried to show what data exactly I want to scrap. Please see the below
    – Hakan
    Nov 11 at 9:48










  • >Bayswater,</span> W2</a></h6>
    – Hakan
    Nov 11 at 9:48










  • This is my code which I tried to scrap
    – Hakan
    Nov 11 at 9:51










  • Elements postcodes = doc.select("span.property_address_location_name"); for (Element postcode : postcodes) System.out.println(postcode.text());
    – Hakan
    Nov 11 at 9:51














up vote
2
down vote

favorite












I need to scrap the postcode from below html code by using the jsoup. I only need postcode which is part of href attribute of a tag, called W2:




<a href="/properties-for-sale/w2/chpk3848653" class="property_photo_holder" style="backgroundimage:url(https://assets.foxtons.co.uk/w/480/1523289105/chpk3848653-23.jpg)"></a>



This is html code:



</div>

<div id="property_1062067" class="property_summary">

<h6><a href="/properties-for-sale/w2/chpk3848653">Lancaster Gate, <span class="property_address_location_name">Bayswater,</span> W2</a></h6>


Can anyone help ?
Thank you.










share|improve this question



















  • 1




    What do you mean by "I only need postcode which is W2" ? Also, may you post something you tried?
    – Subhasish Bhattacharjee
    Nov 11 at 9:18










  • I just tried to show what data exactly I want to scrap. Please see the below
    – Hakan
    Nov 11 at 9:48










  • >Bayswater,</span> W2</a></h6>
    – Hakan
    Nov 11 at 9:48










  • This is my code which I tried to scrap
    – Hakan
    Nov 11 at 9:51










  • Elements postcodes = doc.select("span.property_address_location_name"); for (Element postcode : postcodes) System.out.println(postcode.text());
    – Hakan
    Nov 11 at 9:51












up vote
2
down vote

favorite









up vote
2
down vote

favorite











I need to scrap the postcode from below html code by using the jsoup. I only need postcode which is part of href attribute of a tag, called W2:




<a href="/properties-for-sale/w2/chpk3848653" class="property_photo_holder" style="backgroundimage:url(https://assets.foxtons.co.uk/w/480/1523289105/chpk3848653-23.jpg)"></a>



This is html code:



</div>

<div id="property_1062067" class="property_summary">

<h6><a href="/properties-for-sale/w2/chpk3848653">Lancaster Gate, <span class="property_address_location_name">Bayswater,</span> W2</a></h6>


Can anyone help ?
Thank you.










share|improve this question















I need to scrap the postcode from below html code by using the jsoup. I only need postcode which is part of href attribute of a tag, called W2:




<a href="/properties-for-sale/w2/chpk3848653" class="property_photo_holder" style="backgroundimage:url(https://assets.foxtons.co.uk/w/480/1523289105/chpk3848653-23.jpg)"></a>



This is html code:



</div>

<div id="property_1062067" class="property_summary">

<h6><a href="/properties-for-sale/w2/chpk3848653">Lancaster Gate, <span class="property_address_location_name">Bayswater,</span> W2</a></h6>


Can anyone help ?
Thank you.







java html parsing web-scraping jsoup






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 11 at 11:25









Dinko Pehar

596324




596324










asked Nov 11 at 8:46









Hakan

113




113







  • 1




    What do you mean by "I only need postcode which is W2" ? Also, may you post something you tried?
    – Subhasish Bhattacharjee
    Nov 11 at 9:18










  • I just tried to show what data exactly I want to scrap. Please see the below
    – Hakan
    Nov 11 at 9:48










  • >Bayswater,</span> W2</a></h6>
    – Hakan
    Nov 11 at 9:48










  • This is my code which I tried to scrap
    – Hakan
    Nov 11 at 9:51










  • Elements postcodes = doc.select("span.property_address_location_name"); for (Element postcode : postcodes) System.out.println(postcode.text());
    – Hakan
    Nov 11 at 9:51












  • 1




    What do you mean by "I only need postcode which is W2" ? Also, may you post something you tried?
    – Subhasish Bhattacharjee
    Nov 11 at 9:18










  • I just tried to show what data exactly I want to scrap. Please see the below
    – Hakan
    Nov 11 at 9:48










  • >Bayswater,</span> W2</a></h6>
    – Hakan
    Nov 11 at 9:48










  • This is my code which I tried to scrap
    – Hakan
    Nov 11 at 9:51










  • Elements postcodes = doc.select("span.property_address_location_name"); for (Element postcode : postcodes) System.out.println(postcode.text());
    – Hakan
    Nov 11 at 9:51







1




1




What do you mean by "I only need postcode which is W2" ? Also, may you post something you tried?
– Subhasish Bhattacharjee
Nov 11 at 9:18




What do you mean by "I only need postcode which is W2" ? Also, may you post something you tried?
– Subhasish Bhattacharjee
Nov 11 at 9:18












I just tried to show what data exactly I want to scrap. Please see the below
– Hakan
Nov 11 at 9:48




I just tried to show what data exactly I want to scrap. Please see the below
– Hakan
Nov 11 at 9:48












>Bayswater,</span> W2</a></h6>
– Hakan
Nov 11 at 9:48




>Bayswater,</span> W2</a></h6>
– Hakan
Nov 11 at 9:48












This is my code which I tried to scrap
– Hakan
Nov 11 at 9:51




This is my code which I tried to scrap
– Hakan
Nov 11 at 9:51












Elements postcodes = doc.select("span.property_address_location_name"); for (Element postcode : postcodes) System.out.println(postcode.text());
– Hakan
Nov 11 at 9:51




Elements postcodes = doc.select("span.property_address_location_name"); for (Element postcode : postcodes) System.out.println(postcode.text());
– Hakan
Nov 11 at 9:51












1 Answer
1






active

oldest

votes

















up vote
0
down vote













You can use JSOUP for that, you just need to retrieve the href attribute value as it follows:



Document document = Jsoup.connect(URL).userAgent("Mozilla/5.0").get();

Elements elements = document.select("a");

String href = elements.attr("href");


Now that you have the href attribute as a String, you need to apply a RegEx (Regular Expression) to get the field you want, in this case, the Postal Code contained in: "/properties-for-sale/w2/chpk3848653". To do that you will need to:



String regex = "[a-zA-Z0-9]11";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(href);

String postalCode = matcher.find().group(0);


That's all, if you need anything else feel free to ask! Hope this helped you!






share|improve this answer




















  • Something is wrong with this code. Thanks for anyway
    – Hakan
    Nov 13 at 19:07










  • @Hakan no problem! ask me if you need anything else, that was just a sample code as guide! +1 if you found it useful!
    – alvarobartt
    Nov 14 at 8:35










  • This is the code how I scraped all other attributes...etc.
    – Hakan
    Nov 15 at 10:52










  • //Get the location of property Elements locations = items.get(i).getElementsByTag("h6"); //Get the post code of property Elements postcodes = items.get(i).getElementsByTag("h6.a[href]"); //Get the longitude Elements longitude = items.get(i).select("div");
    – Hakan
    Nov 15 at 10:52










  • foxtons.co.uk/… This is the link to web scraping.
    – Hakan
    Nov 15 at 10:53










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53247125%2fscrap-the-web-page-using-jsoup%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote













You can use JSOUP for that, you just need to retrieve the href attribute value as it follows:



Document document = Jsoup.connect(URL).userAgent("Mozilla/5.0").get();

Elements elements = document.select("a");

String href = elements.attr("href");


Now that you have the href attribute as a String, you need to apply a RegEx (Regular Expression) to get the field you want, in this case, the Postal Code contained in: "/properties-for-sale/w2/chpk3848653". To do that you will need to:



String regex = "[a-zA-Z0-9]11";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(href);

String postalCode = matcher.find().group(0);


That's all, if you need anything else feel free to ask! Hope this helped you!






share|improve this answer




















  • Something is wrong with this code. Thanks for anyway
    – Hakan
    Nov 13 at 19:07










  • @Hakan no problem! ask me if you need anything else, that was just a sample code as guide! +1 if you found it useful!
    – alvarobartt
    Nov 14 at 8:35










  • This is the code how I scraped all other attributes...etc.
    – Hakan
    Nov 15 at 10:52










  • //Get the location of property Elements locations = items.get(i).getElementsByTag("h6"); //Get the post code of property Elements postcodes = items.get(i).getElementsByTag("h6.a[href]"); //Get the longitude Elements longitude = items.get(i).select("div");
    – Hakan
    Nov 15 at 10:52










  • foxtons.co.uk/… This is the link to web scraping.
    – Hakan
    Nov 15 at 10:53














up vote
0
down vote













You can use JSOUP for that, you just need to retrieve the href attribute value as it follows:



Document document = Jsoup.connect(URL).userAgent("Mozilla/5.0").get();

Elements elements = document.select("a");

String href = elements.attr("href");


Now that you have the href attribute as a String, you need to apply a RegEx (Regular Expression) to get the field you want, in this case, the Postal Code contained in: "/properties-for-sale/w2/chpk3848653". To do that you will need to:



String regex = "[a-zA-Z0-9]11";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(href);

String postalCode = matcher.find().group(0);


That's all, if you need anything else feel free to ask! Hope this helped you!






share|improve this answer




















  • Something is wrong with this code. Thanks for anyway
    – Hakan
    Nov 13 at 19:07










  • @Hakan no problem! ask me if you need anything else, that was just a sample code as guide! +1 if you found it useful!
    – alvarobartt
    Nov 14 at 8:35










  • This is the code how I scraped all other attributes...etc.
    – Hakan
    Nov 15 at 10:52










  • //Get the location of property Elements locations = items.get(i).getElementsByTag("h6"); //Get the post code of property Elements postcodes = items.get(i).getElementsByTag("h6.a[href]"); //Get the longitude Elements longitude = items.get(i).select("div");
    – Hakan
    Nov 15 at 10:52










  • foxtons.co.uk/… This is the link to web scraping.
    – Hakan
    Nov 15 at 10:53












up vote
0
down vote










up vote
0
down vote









You can use JSOUP for that, you just need to retrieve the href attribute value as it follows:



Document document = Jsoup.connect(URL).userAgent("Mozilla/5.0").get();

Elements elements = document.select("a");

String href = elements.attr("href");


Now that you have the href attribute as a String, you need to apply a RegEx (Regular Expression) to get the field you want, in this case, the Postal Code contained in: "/properties-for-sale/w2/chpk3848653". To do that you will need to:



String regex = "[a-zA-Z0-9]11";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(href);

String postalCode = matcher.find().group(0);


That's all, if you need anything else feel free to ask! Hope this helped you!






share|improve this answer












You can use JSOUP for that, you just need to retrieve the href attribute value as it follows:



Document document = Jsoup.connect(URL).userAgent("Mozilla/5.0").get();

Elements elements = document.select("a");

String href = elements.attr("href");


Now that you have the href attribute as a String, you need to apply a RegEx (Regular Expression) to get the field you want, in this case, the Postal Code contained in: "/properties-for-sale/w2/chpk3848653". To do that you will need to:



String regex = "[a-zA-Z0-9]11";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(href);

String postalCode = matcher.find().group(0);


That's all, if you need anything else feel free to ask! Hope this helped you!







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 13 at 13:06









alvarobartt

12517




12517











  • Something is wrong with this code. Thanks for anyway
    – Hakan
    Nov 13 at 19:07










  • @Hakan no problem! ask me if you need anything else, that was just a sample code as guide! +1 if you found it useful!
    – alvarobartt
    Nov 14 at 8:35










  • This is the code how I scraped all other attributes...etc.
    – Hakan
    Nov 15 at 10:52










  • //Get the location of property Elements locations = items.get(i).getElementsByTag("h6"); //Get the post code of property Elements postcodes = items.get(i).getElementsByTag("h6.a[href]"); //Get the longitude Elements longitude = items.get(i).select("div");
    – Hakan
    Nov 15 at 10:52










  • foxtons.co.uk/… This is the link to web scraping.
    – Hakan
    Nov 15 at 10:53
















  • Something is wrong with this code. Thanks for anyway
    – Hakan
    Nov 13 at 19:07










  • @Hakan no problem! ask me if you need anything else, that was just a sample code as guide! +1 if you found it useful!
    – alvarobartt
    Nov 14 at 8:35










  • This is the code how I scraped all other attributes...etc.
    – Hakan
    Nov 15 at 10:52










  • //Get the location of property Elements locations = items.get(i).getElementsByTag("h6"); //Get the post code of property Elements postcodes = items.get(i).getElementsByTag("h6.a[href]"); //Get the longitude Elements longitude = items.get(i).select("div");
    – Hakan
    Nov 15 at 10:52










  • foxtons.co.uk/… This is the link to web scraping.
    – Hakan
    Nov 15 at 10:53















Something is wrong with this code. Thanks for anyway
– Hakan
Nov 13 at 19:07




Something is wrong with this code. Thanks for anyway
– Hakan
Nov 13 at 19:07












@Hakan no problem! ask me if you need anything else, that was just a sample code as guide! +1 if you found it useful!
– alvarobartt
Nov 14 at 8:35




@Hakan no problem! ask me if you need anything else, that was just a sample code as guide! +1 if you found it useful!
– alvarobartt
Nov 14 at 8:35












This is the code how I scraped all other attributes...etc.
– Hakan
Nov 15 at 10:52




This is the code how I scraped all other attributes...etc.
– Hakan
Nov 15 at 10:52












//Get the location of property Elements locations = items.get(i).getElementsByTag("h6"); //Get the post code of property Elements postcodes = items.get(i).getElementsByTag("h6.a[href]"); //Get the longitude Elements longitude = items.get(i).select("div");
– Hakan
Nov 15 at 10:52




//Get the location of property Elements locations = items.get(i).getElementsByTag("h6"); //Get the post code of property Elements postcodes = items.get(i).getElementsByTag("h6.a[href]"); //Get the longitude Elements longitude = items.get(i).select("div");
– Hakan
Nov 15 at 10:52












foxtons.co.uk/… This is the link to web scraping.
– Hakan
Nov 15 at 10:53




foxtons.co.uk/… This is the link to web scraping.
– Hakan
Nov 15 at 10:53

















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53247125%2fscrap-the-web-page-using-jsoup%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Top Tejano songwriter Luis Silva dead of heart attack at 64

政党

天津地下鉄3号線