SQL slow only when comparing values in a JOIN (Presto / Amazon Athena)










0














I have two tables, one containing data the other is metadata.



The main data table contains a grid of geospatial coordinates (up to billions of rows). The coordinates are projected to a specific coordinate system. The relevant part of the schema is:



 ------------------
| x | smallint |
|------------------|
| y | smallint |
|------------------|
| value | string |
------------------


The metadata table contains the corresponding latitude and longitude values for the x, y coordinates. The relevant part of the schema is:



 ----------------------------
| x | smallint |
|----------------------------|
| y | smallint |
|----------------------------|
| latitude | decimal(18,15) |
|----------------------------|
| longitude | decimal(18,15) |
----------------------------


A JOIN on these two tables allows one to know the actual latitude/longitude for a particular X/Y coordinate. This will make queries on the table easier since you don't need to know anything about the map projection.



A sample query that retrieves rows from the table:



SELECT 
main.x,
main.y,
latitude,
longitude,
value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
main.x=1 AND main.y<=2


The result set would look like:



 --------------------------------------
| x | y | latitude | longitude | value |
|--------------------------------------|
| 1 | 1 | 12.345 | 54.321 | row1 |
|--------------------------------------|
| 1 | 2 | 12.345 | 98.765 | row2 |
--------------------------------------


This query only takes 1-2 seconds, great!



My problem is, when I run a query that adds a WHERE clause comparing latitude and longitudes, the query works but it takes over 60 seconds to run...



For example:



SELECT 
main.x,
main.y,
latitude,
longitude,
value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
latitude=DECIMAL '12.345' AND longitude=DECIMAL '98.765'


I know that I am probably missing something fundamental about SQL and JOINs that is causing this query to be slow. Independent queries on either table are ultra fast so I know I'm doing something wrong here that has to do with the JOIN.



So the question is, how do I make it so that this (seemingly) simple comparison completes in only a few seconds rather than over 60 seconds?










share|improve this question























  • Please share the indexing details on your table(s).
    – Madhur Bhaiya
    Nov 12 at 20:37










  • Unless I'm misunderstanding what you're asking for, there are no indexes. These queries are run against flat files using PrestoDB/Amazon Athena
    – Marty
    Nov 12 at 20:39










  • Well question was earlier tagged MySQL too. Not sure about PrestoDB though. Check if DESCRIBE command works ?
    – Madhur Bhaiya
    Nov 12 at 20:40










  • Yes, that tag was by mistake, sorry. I've included the relevant schema info in the question.
    – Marty
    Nov 12 at 20:41






  • 1




    It's a bit strange that the join by itself appears fast and you only have the issue on the filter. Could be that there is an arbitrary limit to the number of rows that return on the query with no filter. So the query stops or returns results when that limit is hit (IE limit 100). What's the performance like if you just select from main on your filter?
    – vanlee1987
    Nov 13 at 5:44















0














I have two tables, one containing data the other is metadata.



The main data table contains a grid of geospatial coordinates (up to billions of rows). The coordinates are projected to a specific coordinate system. The relevant part of the schema is:



 ------------------
| x | smallint |
|------------------|
| y | smallint |
|------------------|
| value | string |
------------------


The metadata table contains the corresponding latitude and longitude values for the x, y coordinates. The relevant part of the schema is:



 ----------------------------
| x | smallint |
|----------------------------|
| y | smallint |
|----------------------------|
| latitude | decimal(18,15) |
|----------------------------|
| longitude | decimal(18,15) |
----------------------------


A JOIN on these two tables allows one to know the actual latitude/longitude for a particular X/Y coordinate. This will make queries on the table easier since you don't need to know anything about the map projection.



A sample query that retrieves rows from the table:



SELECT 
main.x,
main.y,
latitude,
longitude,
value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
main.x=1 AND main.y<=2


The result set would look like:



 --------------------------------------
| x | y | latitude | longitude | value |
|--------------------------------------|
| 1 | 1 | 12.345 | 54.321 | row1 |
|--------------------------------------|
| 1 | 2 | 12.345 | 98.765 | row2 |
--------------------------------------


This query only takes 1-2 seconds, great!



My problem is, when I run a query that adds a WHERE clause comparing latitude and longitudes, the query works but it takes over 60 seconds to run...



For example:



SELECT 
main.x,
main.y,
latitude,
longitude,
value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
latitude=DECIMAL '12.345' AND longitude=DECIMAL '98.765'


I know that I am probably missing something fundamental about SQL and JOINs that is causing this query to be slow. Independent queries on either table are ultra fast so I know I'm doing something wrong here that has to do with the JOIN.



So the question is, how do I make it so that this (seemingly) simple comparison completes in only a few seconds rather than over 60 seconds?










share|improve this question























  • Please share the indexing details on your table(s).
    – Madhur Bhaiya
    Nov 12 at 20:37










  • Unless I'm misunderstanding what you're asking for, there are no indexes. These queries are run against flat files using PrestoDB/Amazon Athena
    – Marty
    Nov 12 at 20:39










  • Well question was earlier tagged MySQL too. Not sure about PrestoDB though. Check if DESCRIBE command works ?
    – Madhur Bhaiya
    Nov 12 at 20:40










  • Yes, that tag was by mistake, sorry. I've included the relevant schema info in the question.
    – Marty
    Nov 12 at 20:41






  • 1




    It's a bit strange that the join by itself appears fast and you only have the issue on the filter. Could be that there is an arbitrary limit to the number of rows that return on the query with no filter. So the query stops or returns results when that limit is hit (IE limit 100). What's the performance like if you just select from main on your filter?
    – vanlee1987
    Nov 13 at 5:44













0












0








0







I have two tables, one containing data the other is metadata.



The main data table contains a grid of geospatial coordinates (up to billions of rows). The coordinates are projected to a specific coordinate system. The relevant part of the schema is:



 ------------------
| x | smallint |
|------------------|
| y | smallint |
|------------------|
| value | string |
------------------


The metadata table contains the corresponding latitude and longitude values for the x, y coordinates. The relevant part of the schema is:



 ----------------------------
| x | smallint |
|----------------------------|
| y | smallint |
|----------------------------|
| latitude | decimal(18,15) |
|----------------------------|
| longitude | decimal(18,15) |
----------------------------


A JOIN on these two tables allows one to know the actual latitude/longitude for a particular X/Y coordinate. This will make queries on the table easier since you don't need to know anything about the map projection.



A sample query that retrieves rows from the table:



SELECT 
main.x,
main.y,
latitude,
longitude,
value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
main.x=1 AND main.y<=2


The result set would look like:



 --------------------------------------
| x | y | latitude | longitude | value |
|--------------------------------------|
| 1 | 1 | 12.345 | 54.321 | row1 |
|--------------------------------------|
| 1 | 2 | 12.345 | 98.765 | row2 |
--------------------------------------


This query only takes 1-2 seconds, great!



My problem is, when I run a query that adds a WHERE clause comparing latitude and longitudes, the query works but it takes over 60 seconds to run...



For example:



SELECT 
main.x,
main.y,
latitude,
longitude,
value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
latitude=DECIMAL '12.345' AND longitude=DECIMAL '98.765'


I know that I am probably missing something fundamental about SQL and JOINs that is causing this query to be slow. Independent queries on either table are ultra fast so I know I'm doing something wrong here that has to do with the JOIN.



So the question is, how do I make it so that this (seemingly) simple comparison completes in only a few seconds rather than over 60 seconds?










share|improve this question















I have two tables, one containing data the other is metadata.



The main data table contains a grid of geospatial coordinates (up to billions of rows). The coordinates are projected to a specific coordinate system. The relevant part of the schema is:



 ------------------
| x | smallint |
|------------------|
| y | smallint |
|------------------|
| value | string |
------------------


The metadata table contains the corresponding latitude and longitude values for the x, y coordinates. The relevant part of the schema is:



 ----------------------------
| x | smallint |
|----------------------------|
| y | smallint |
|----------------------------|
| latitude | decimal(18,15) |
|----------------------------|
| longitude | decimal(18,15) |
----------------------------


A JOIN on these two tables allows one to know the actual latitude/longitude for a particular X/Y coordinate. This will make queries on the table easier since you don't need to know anything about the map projection.



A sample query that retrieves rows from the table:



SELECT 
main.x,
main.y,
latitude,
longitude,
value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
main.x=1 AND main.y<=2


The result set would look like:



 --------------------------------------
| x | y | latitude | longitude | value |
|--------------------------------------|
| 1 | 1 | 12.345 | 54.321 | row1 |
|--------------------------------------|
| 1 | 2 | 12.345 | 98.765 | row2 |
--------------------------------------


This query only takes 1-2 seconds, great!



My problem is, when I run a query that adds a WHERE clause comparing latitude and longitudes, the query works but it takes over 60 seconds to run...



For example:



SELECT 
main.x,
main.y,
latitude,
longitude,
value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
latitude=DECIMAL '12.345' AND longitude=DECIMAL '98.765'


I know that I am probably missing something fundamental about SQL and JOINs that is causing this query to be slow. Independent queries on either table are ultra fast so I know I'm doing something wrong here that has to do with the JOIN.



So the question is, how do I make it so that this (seemingly) simple comparison completes in only a few seconds rather than over 60 seconds?







sql amazon-athena prestodb






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 12 at 20:36









Gordon Linoff

758k35291399




758k35291399










asked Nov 12 at 20:35









Marty

1,01311335




1,01311335











  • Please share the indexing details on your table(s).
    – Madhur Bhaiya
    Nov 12 at 20:37










  • Unless I'm misunderstanding what you're asking for, there are no indexes. These queries are run against flat files using PrestoDB/Amazon Athena
    – Marty
    Nov 12 at 20:39










  • Well question was earlier tagged MySQL too. Not sure about PrestoDB though. Check if DESCRIBE command works ?
    – Madhur Bhaiya
    Nov 12 at 20:40










  • Yes, that tag was by mistake, sorry. I've included the relevant schema info in the question.
    – Marty
    Nov 12 at 20:41






  • 1




    It's a bit strange that the join by itself appears fast and you only have the issue on the filter. Could be that there is an arbitrary limit to the number of rows that return on the query with no filter. So the query stops or returns results when that limit is hit (IE limit 100). What's the performance like if you just select from main on your filter?
    – vanlee1987
    Nov 13 at 5:44
















  • Please share the indexing details on your table(s).
    – Madhur Bhaiya
    Nov 12 at 20:37










  • Unless I'm misunderstanding what you're asking for, there are no indexes. These queries are run against flat files using PrestoDB/Amazon Athena
    – Marty
    Nov 12 at 20:39










  • Well question was earlier tagged MySQL too. Not sure about PrestoDB though. Check if DESCRIBE command works ?
    – Madhur Bhaiya
    Nov 12 at 20:40










  • Yes, that tag was by mistake, sorry. I've included the relevant schema info in the question.
    – Marty
    Nov 12 at 20:41






  • 1




    It's a bit strange that the join by itself appears fast and you only have the issue on the filter. Could be that there is an arbitrary limit to the number of rows that return on the query with no filter. So the query stops or returns results when that limit is hit (IE limit 100). What's the performance like if you just select from main on your filter?
    – vanlee1987
    Nov 13 at 5:44















Please share the indexing details on your table(s).
– Madhur Bhaiya
Nov 12 at 20:37




Please share the indexing details on your table(s).
– Madhur Bhaiya
Nov 12 at 20:37












Unless I'm misunderstanding what you're asking for, there are no indexes. These queries are run against flat files using PrestoDB/Amazon Athena
– Marty
Nov 12 at 20:39




Unless I'm misunderstanding what you're asking for, there are no indexes. These queries are run against flat files using PrestoDB/Amazon Athena
– Marty
Nov 12 at 20:39












Well question was earlier tagged MySQL too. Not sure about PrestoDB though. Check if DESCRIBE command works ?
– Madhur Bhaiya
Nov 12 at 20:40




Well question was earlier tagged MySQL too. Not sure about PrestoDB though. Check if DESCRIBE command works ?
– Madhur Bhaiya
Nov 12 at 20:40












Yes, that tag was by mistake, sorry. I've included the relevant schema info in the question.
– Marty
Nov 12 at 20:41




Yes, that tag was by mistake, sorry. I've included the relevant schema info in the question.
– Marty
Nov 12 at 20:41




1




1




It's a bit strange that the join by itself appears fast and you only have the issue on the filter. Could be that there is an arbitrary limit to the number of rows that return on the query with no filter. So the query stops or returns results when that limit is hit (IE limit 100). What's the performance like if you just select from main on your filter?
– vanlee1987
Nov 13 at 5:44




It's a bit strange that the join by itself appears fast and you only have the issue on the filter. Could be that there is an arbitrary limit to the number of rows that return on the query with no filter. So the query stops or returns results when that limit is hit (IE limit 100). What's the performance like if you just select from main on your filter?
– vanlee1987
Nov 13 at 5:44












1 Answer
1






active

oldest

votes


















-1














The most likely explanation for poor performance when adding a JOIN is that there is no index that can be used to optimize the join.



In your case, specifying on index on main(x,y) is most likely to speed up the JOIN.






share|improve this answer
















  • 1




    I don't think indexes are a thing in Presto since it quickly scans flat files in columnar format. Hence, the metadata table already is the index in this context. Also, when I perform the JOIN without the comparison, it is very quick, would a lack of index explain that problem? Or maybe I could alter the metadata table in such a way that it can perform better?
    – Marty
    Nov 12 at 20:46










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53269699%2fsql-slow-only-when-comparing-values-in-a-join-presto-amazon-athena%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









-1














The most likely explanation for poor performance when adding a JOIN is that there is no index that can be used to optimize the join.



In your case, specifying on index on main(x,y) is most likely to speed up the JOIN.






share|improve this answer
















  • 1




    I don't think indexes are a thing in Presto since it quickly scans flat files in columnar format. Hence, the metadata table already is the index in this context. Also, when I perform the JOIN without the comparison, it is very quick, would a lack of index explain that problem? Or maybe I could alter the metadata table in such a way that it can perform better?
    – Marty
    Nov 12 at 20:46















-1














The most likely explanation for poor performance when adding a JOIN is that there is no index that can be used to optimize the join.



In your case, specifying on index on main(x,y) is most likely to speed up the JOIN.






share|improve this answer
















  • 1




    I don't think indexes are a thing in Presto since it quickly scans flat files in columnar format. Hence, the metadata table already is the index in this context. Also, when I perform the JOIN without the comparison, it is very quick, would a lack of index explain that problem? Or maybe I could alter the metadata table in such a way that it can perform better?
    – Marty
    Nov 12 at 20:46













-1












-1








-1






The most likely explanation for poor performance when adding a JOIN is that there is no index that can be used to optimize the join.



In your case, specifying on index on main(x,y) is most likely to speed up the JOIN.






share|improve this answer












The most likely explanation for poor performance when adding a JOIN is that there is no index that can be used to optimize the join.



In your case, specifying on index on main(x,y) is most likely to speed up the JOIN.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 12 at 20:43









Larry Lustig

40k1282129




40k1282129







  • 1




    I don't think indexes are a thing in Presto since it quickly scans flat files in columnar format. Hence, the metadata table already is the index in this context. Also, when I perform the JOIN without the comparison, it is very quick, would a lack of index explain that problem? Or maybe I could alter the metadata table in such a way that it can perform better?
    – Marty
    Nov 12 at 20:46












  • 1




    I don't think indexes are a thing in Presto since it quickly scans flat files in columnar format. Hence, the metadata table already is the index in this context. Also, when I perform the JOIN without the comparison, it is very quick, would a lack of index explain that problem? Or maybe I could alter the metadata table in such a way that it can perform better?
    – Marty
    Nov 12 at 20:46







1




1




I don't think indexes are a thing in Presto since it quickly scans flat files in columnar format. Hence, the metadata table already is the index in this context. Also, when I perform the JOIN without the comparison, it is very quick, would a lack of index explain that problem? Or maybe I could alter the metadata table in such a way that it can perform better?
– Marty
Nov 12 at 20:46




I don't think indexes are a thing in Presto since it quickly scans flat files in columnar format. Hence, the metadata table already is the index in this context. Also, when I perform the JOIN without the comparison, it is very quick, would a lack of index explain that problem? Or maybe I could alter the metadata table in such a way that it can perform better?
– Marty
Nov 12 at 20:46

















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53269699%2fsql-slow-only-when-comparing-values-in-a-join-presto-amazon-athena%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Top Tejano songwriter Luis Silva dead of heart attack at 64

ReactJS Fetched API data displays live - need Data displayed static

政党