SQL slow only when comparing values in a JOIN (Presto / Amazon Athena)
I have two tables, one containing data the other is metadata.
The main data table contains a grid of geospatial coordinates (up to billions of rows). The coordinates are projected to a specific coordinate system. The relevant part of the schema is:
------------------
| x | smallint |
|------------------|
| y | smallint |
|------------------|
| value | string |
------------------
The metadata table contains the corresponding latitude and longitude values for the x, y coordinates. The relevant part of the schema is:
----------------------------
| x | smallint |
|----------------------------|
| y | smallint |
|----------------------------|
| latitude | decimal(18,15) |
|----------------------------|
| longitude | decimal(18,15) |
----------------------------
A JOIN on these two tables allows one to know the actual latitude/longitude for a particular X/Y coordinate. This will make queries on the table easier since you don't need to know anything about the map projection.
A sample query that retrieves rows from the table:
SELECT
main.x,
main.y,
latitude,
longitude,
value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
main.x=1 AND main.y<=2
The result set would look like:
--------------------------------------
| x | y | latitude | longitude | value |
|--------------------------------------|
| 1 | 1 | 12.345 | 54.321 | row1 |
|--------------------------------------|
| 1 | 2 | 12.345 | 98.765 | row2 |
--------------------------------------
This query only takes 1-2 seconds, great!
My problem is, when I run a query that adds a WHERE clause comparing latitude and longitudes, the query works but it takes over 60 seconds to run...
For example:
SELECT
main.x,
main.y,
latitude,
longitude,
value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
latitude=DECIMAL '12.345' AND longitude=DECIMAL '98.765'
I know that I am probably missing something fundamental about SQL and JOINs that is causing this query to be slow. Independent queries on either table are ultra fast so I know I'm doing something wrong here that has to do with the JOIN.
So the question is, how do I make it so that this (seemingly) simple comparison completes in only a few seconds rather than over 60 seconds?
sql amazon-athena prestodb
|
show 7 more comments
I have two tables, one containing data the other is metadata.
The main data table contains a grid of geospatial coordinates (up to billions of rows). The coordinates are projected to a specific coordinate system. The relevant part of the schema is:
------------------
| x | smallint |
|------------------|
| y | smallint |
|------------------|
| value | string |
------------------
The metadata table contains the corresponding latitude and longitude values for the x, y coordinates. The relevant part of the schema is:
----------------------------
| x | smallint |
|----------------------------|
| y | smallint |
|----------------------------|
| latitude | decimal(18,15) |
|----------------------------|
| longitude | decimal(18,15) |
----------------------------
A JOIN on these two tables allows one to know the actual latitude/longitude for a particular X/Y coordinate. This will make queries on the table easier since you don't need to know anything about the map projection.
A sample query that retrieves rows from the table:
SELECT
main.x,
main.y,
latitude,
longitude,
value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
main.x=1 AND main.y<=2
The result set would look like:
--------------------------------------
| x | y | latitude | longitude | value |
|--------------------------------------|
| 1 | 1 | 12.345 | 54.321 | row1 |
|--------------------------------------|
| 1 | 2 | 12.345 | 98.765 | row2 |
--------------------------------------
This query only takes 1-2 seconds, great!
My problem is, when I run a query that adds a WHERE clause comparing latitude and longitudes, the query works but it takes over 60 seconds to run...
For example:
SELECT
main.x,
main.y,
latitude,
longitude,
value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
latitude=DECIMAL '12.345' AND longitude=DECIMAL '98.765'
I know that I am probably missing something fundamental about SQL and JOINs that is causing this query to be slow. Independent queries on either table are ultra fast so I know I'm doing something wrong here that has to do with the JOIN.
So the question is, how do I make it so that this (seemingly) simple comparison completes in only a few seconds rather than over 60 seconds?
sql amazon-athena prestodb
Please share the indexing details on your table(s).
– Madhur Bhaiya
Nov 12 at 20:37
Unless I'm misunderstanding what you're asking for, there are no indexes. These queries are run against flat files using PrestoDB/Amazon Athena
– Marty
Nov 12 at 20:39
Well question was earlier tagged MySQL too. Not sure about PrestoDB though. Check ifDESCRIBE
command works ?
– Madhur Bhaiya
Nov 12 at 20:40
Yes, that tag was by mistake, sorry. I've included the relevant schema info in the question.
– Marty
Nov 12 at 20:41
1
It's a bit strange that the join by itself appears fast and you only have the issue on the filter. Could be that there is an arbitrary limit to the number of rows that return on the query with no filter. So the query stops or returns results when that limit is hit (IE limit 100). What's the performance like if you just select from main on your filter?
– vanlee1987
Nov 13 at 5:44
|
show 7 more comments
I have two tables, one containing data the other is metadata.
The main data table contains a grid of geospatial coordinates (up to billions of rows). The coordinates are projected to a specific coordinate system. The relevant part of the schema is:
------------------
| x | smallint |
|------------------|
| y | smallint |
|------------------|
| value | string |
------------------
The metadata table contains the corresponding latitude and longitude values for the x, y coordinates. The relevant part of the schema is:
----------------------------
| x | smallint |
|----------------------------|
| y | smallint |
|----------------------------|
| latitude | decimal(18,15) |
|----------------------------|
| longitude | decimal(18,15) |
----------------------------
A JOIN on these two tables allows one to know the actual latitude/longitude for a particular X/Y coordinate. This will make queries on the table easier since you don't need to know anything about the map projection.
A sample query that retrieves rows from the table:
SELECT
main.x,
main.y,
latitude,
longitude,
value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
main.x=1 AND main.y<=2
The result set would look like:
--------------------------------------
| x | y | latitude | longitude | value |
|--------------------------------------|
| 1 | 1 | 12.345 | 54.321 | row1 |
|--------------------------------------|
| 1 | 2 | 12.345 | 98.765 | row2 |
--------------------------------------
This query only takes 1-2 seconds, great!
My problem is, when I run a query that adds a WHERE clause comparing latitude and longitudes, the query works but it takes over 60 seconds to run...
For example:
SELECT
main.x,
main.y,
latitude,
longitude,
value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
latitude=DECIMAL '12.345' AND longitude=DECIMAL '98.765'
I know that I am probably missing something fundamental about SQL and JOINs that is causing this query to be slow. Independent queries on either table are ultra fast so I know I'm doing something wrong here that has to do with the JOIN.
So the question is, how do I make it so that this (seemingly) simple comparison completes in only a few seconds rather than over 60 seconds?
sql amazon-athena prestodb
I have two tables, one containing data the other is metadata.
The main data table contains a grid of geospatial coordinates (up to billions of rows). The coordinates are projected to a specific coordinate system. The relevant part of the schema is:
------------------
| x | smallint |
|------------------|
| y | smallint |
|------------------|
| value | string |
------------------
The metadata table contains the corresponding latitude and longitude values for the x, y coordinates. The relevant part of the schema is:
----------------------------
| x | smallint |
|----------------------------|
| y | smallint |
|----------------------------|
| latitude | decimal(18,15) |
|----------------------------|
| longitude | decimal(18,15) |
----------------------------
A JOIN on these two tables allows one to know the actual latitude/longitude for a particular X/Y coordinate. This will make queries on the table easier since you don't need to know anything about the map projection.
A sample query that retrieves rows from the table:
SELECT
main.x,
main.y,
latitude,
longitude,
value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
main.x=1 AND main.y<=2
The result set would look like:
--------------------------------------
| x | y | latitude | longitude | value |
|--------------------------------------|
| 1 | 1 | 12.345 | 54.321 | row1 |
|--------------------------------------|
| 1 | 2 | 12.345 | 98.765 | row2 |
--------------------------------------
This query only takes 1-2 seconds, great!
My problem is, when I run a query that adds a WHERE clause comparing latitude and longitudes, the query works but it takes over 60 seconds to run...
For example:
SELECT
main.x,
main.y,
latitude,
longitude,
value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
latitude=DECIMAL '12.345' AND longitude=DECIMAL '98.765'
I know that I am probably missing something fundamental about SQL and JOINs that is causing this query to be slow. Independent queries on either table are ultra fast so I know I'm doing something wrong here that has to do with the JOIN.
So the question is, how do I make it so that this (seemingly) simple comparison completes in only a few seconds rather than over 60 seconds?
sql amazon-athena prestodb
sql amazon-athena prestodb
edited Nov 12 at 20:36
Gordon Linoff
758k35291399
758k35291399
asked Nov 12 at 20:35
Marty
1,01311335
1,01311335
Please share the indexing details on your table(s).
– Madhur Bhaiya
Nov 12 at 20:37
Unless I'm misunderstanding what you're asking for, there are no indexes. These queries are run against flat files using PrestoDB/Amazon Athena
– Marty
Nov 12 at 20:39
Well question was earlier tagged MySQL too. Not sure about PrestoDB though. Check ifDESCRIBE
command works ?
– Madhur Bhaiya
Nov 12 at 20:40
Yes, that tag was by mistake, sorry. I've included the relevant schema info in the question.
– Marty
Nov 12 at 20:41
1
It's a bit strange that the join by itself appears fast and you only have the issue on the filter. Could be that there is an arbitrary limit to the number of rows that return on the query with no filter. So the query stops or returns results when that limit is hit (IE limit 100). What's the performance like if you just select from main on your filter?
– vanlee1987
Nov 13 at 5:44
|
show 7 more comments
Please share the indexing details on your table(s).
– Madhur Bhaiya
Nov 12 at 20:37
Unless I'm misunderstanding what you're asking for, there are no indexes. These queries are run against flat files using PrestoDB/Amazon Athena
– Marty
Nov 12 at 20:39
Well question was earlier tagged MySQL too. Not sure about PrestoDB though. Check ifDESCRIBE
command works ?
– Madhur Bhaiya
Nov 12 at 20:40
Yes, that tag was by mistake, sorry. I've included the relevant schema info in the question.
– Marty
Nov 12 at 20:41
1
It's a bit strange that the join by itself appears fast and you only have the issue on the filter. Could be that there is an arbitrary limit to the number of rows that return on the query with no filter. So the query stops or returns results when that limit is hit (IE limit 100). What's the performance like if you just select from main on your filter?
– vanlee1987
Nov 13 at 5:44
Please share the indexing details on your table(s).
– Madhur Bhaiya
Nov 12 at 20:37
Please share the indexing details on your table(s).
– Madhur Bhaiya
Nov 12 at 20:37
Unless I'm misunderstanding what you're asking for, there are no indexes. These queries are run against flat files using PrestoDB/Amazon Athena
– Marty
Nov 12 at 20:39
Unless I'm misunderstanding what you're asking for, there are no indexes. These queries are run against flat files using PrestoDB/Amazon Athena
– Marty
Nov 12 at 20:39
Well question was earlier tagged MySQL too. Not sure about PrestoDB though. Check if
DESCRIBE
command works ?– Madhur Bhaiya
Nov 12 at 20:40
Well question was earlier tagged MySQL too. Not sure about PrestoDB though. Check if
DESCRIBE
command works ?– Madhur Bhaiya
Nov 12 at 20:40
Yes, that tag was by mistake, sorry. I've included the relevant schema info in the question.
– Marty
Nov 12 at 20:41
Yes, that tag was by mistake, sorry. I've included the relevant schema info in the question.
– Marty
Nov 12 at 20:41
1
1
It's a bit strange that the join by itself appears fast and you only have the issue on the filter. Could be that there is an arbitrary limit to the number of rows that return on the query with no filter. So the query stops or returns results when that limit is hit (IE limit 100). What's the performance like if you just select from main on your filter?
– vanlee1987
Nov 13 at 5:44
It's a bit strange that the join by itself appears fast and you only have the issue on the filter. Could be that there is an arbitrary limit to the number of rows that return on the query with no filter. So the query stops or returns results when that limit is hit (IE limit 100). What's the performance like if you just select from main on your filter?
– vanlee1987
Nov 13 at 5:44
|
show 7 more comments
1 Answer
1
active
oldest
votes
The most likely explanation for poor performance when adding a JOIN
is that there is no index that can be used to optimize the join.
In your case, specifying on index on main(x,y)
is most likely to speed up the JOIN
.
1
I don't think indexes are a thing in Presto since it quickly scans flat files in columnar format. Hence, the metadata table already is the index in this context. Also, when I perform the JOIN without the comparison, it is very quick, would a lack of index explain that problem? Or maybe I could alter the metadata table in such a way that it can perform better?
– Marty
Nov 12 at 20:46
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53269699%2fsql-slow-only-when-comparing-values-in-a-join-presto-amazon-athena%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The most likely explanation for poor performance when adding a JOIN
is that there is no index that can be used to optimize the join.
In your case, specifying on index on main(x,y)
is most likely to speed up the JOIN
.
1
I don't think indexes are a thing in Presto since it quickly scans flat files in columnar format. Hence, the metadata table already is the index in this context. Also, when I perform the JOIN without the comparison, it is very quick, would a lack of index explain that problem? Or maybe I could alter the metadata table in such a way that it can perform better?
– Marty
Nov 12 at 20:46
add a comment |
The most likely explanation for poor performance when adding a JOIN
is that there is no index that can be used to optimize the join.
In your case, specifying on index on main(x,y)
is most likely to speed up the JOIN
.
1
I don't think indexes are a thing in Presto since it quickly scans flat files in columnar format. Hence, the metadata table already is the index in this context. Also, when I perform the JOIN without the comparison, it is very quick, would a lack of index explain that problem? Or maybe I could alter the metadata table in such a way that it can perform better?
– Marty
Nov 12 at 20:46
add a comment |
The most likely explanation for poor performance when adding a JOIN
is that there is no index that can be used to optimize the join.
In your case, specifying on index on main(x,y)
is most likely to speed up the JOIN
.
The most likely explanation for poor performance when adding a JOIN
is that there is no index that can be used to optimize the join.
In your case, specifying on index on main(x,y)
is most likely to speed up the JOIN
.
answered Nov 12 at 20:43
Larry Lustig
40k1282129
40k1282129
1
I don't think indexes are a thing in Presto since it quickly scans flat files in columnar format. Hence, the metadata table already is the index in this context. Also, when I perform the JOIN without the comparison, it is very quick, would a lack of index explain that problem? Or maybe I could alter the metadata table in such a way that it can perform better?
– Marty
Nov 12 at 20:46
add a comment |
1
I don't think indexes are a thing in Presto since it quickly scans flat files in columnar format. Hence, the metadata table already is the index in this context. Also, when I perform the JOIN without the comparison, it is very quick, would a lack of index explain that problem? Or maybe I could alter the metadata table in such a way that it can perform better?
– Marty
Nov 12 at 20:46
1
1
I don't think indexes are a thing in Presto since it quickly scans flat files in columnar format. Hence, the metadata table already is the index in this context. Also, when I perform the JOIN without the comparison, it is very quick, would a lack of index explain that problem? Or maybe I could alter the metadata table in such a way that it can perform better?
– Marty
Nov 12 at 20:46
I don't think indexes are a thing in Presto since it quickly scans flat files in columnar format. Hence, the metadata table already is the index in this context. Also, when I perform the JOIN without the comparison, it is very quick, would a lack of index explain that problem? Or maybe I could alter the metadata table in such a way that it can perform better?
– Marty
Nov 12 at 20:46
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53269699%2fsql-slow-only-when-comparing-values-in-a-join-presto-amazon-athena%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Please share the indexing details on your table(s).
– Madhur Bhaiya
Nov 12 at 20:37
Unless I'm misunderstanding what you're asking for, there are no indexes. These queries are run against flat files using PrestoDB/Amazon Athena
– Marty
Nov 12 at 20:39
Well question was earlier tagged MySQL too. Not sure about PrestoDB though. Check if
DESCRIBE
command works ?– Madhur Bhaiya
Nov 12 at 20:40
Yes, that tag was by mistake, sorry. I've included the relevant schema info in the question.
– Marty
Nov 12 at 20:41
1
It's a bit strange that the join by itself appears fast and you only have the issue on the filter. Could be that there is an arbitrary limit to the number of rows that return on the query with no filter. So the query stops or returns results when that limit is hit (IE limit 100). What's the performance like if you just select from main on your filter?
– vanlee1987
Nov 13 at 5:44