SQL slow only when comparing values in a JOIN (Presto / Amazon Athena)

I have two tables, one containing data the other is metadata.

The main data table contains a grid of geospatial coordinates (up to billions of rows). The coordinates are projected to a specific coordinate system. The relevant part of the schema is:

 ------------------
| x | smallint |
|------------------|
| y | smallint |
|------------------|
| value | string |
 ------------------

The metadata table contains the corresponding latitude and longitude values for the x, y coordinates. The relevant part of the schema is:

 ----------------------------
| x | smallint |
|----------------------------|
| y | smallint |
|----------------------------|
| latitude | decimal(18,15) |
|----------------------------|
| longitude | decimal(18,15) |
 ----------------------------

A JOIN on these two tables allows one to know the actual latitude/longitude for a particular X/Y coordinate. This will make queries on the table easier since you don't need to know anything about the map projection.

A sample query that retrieves rows from the table:

SELECT 
 main.x,
 main.y,
 latitude,
 longitude,
 value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
 main.x=1 AND main.y<=2

The result set would look like:

 --------------------------------------
| x | y | latitude | longitude | value |
|--------------------------------------|
| 1 | 1 | 12.345 | 54.321 | row1 |
|--------------------------------------|
| 1 | 2 | 12.345 | 98.765 | row2 |
 --------------------------------------

This query only takes 1-2 seconds, great!

My problem is, when I run a query that adds a WHERE clause comparing latitude and longitudes, the query works but it takes over 60 seconds to run...

For example:

SELECT 
 main.x,
 main.y,
 latitude,
 longitude,
 value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
 latitude=DECIMAL '12.345' AND longitude=DECIMAL '98.765'

I know that I am probably missing something fundamental about SQL and JOINs that is causing this query to be slow. Independent queries on either table are ultra fast so I know I'm doing something wrong here that has to do with the JOIN.

So the question is, how do I make it so that this (seemingly) simple comparison completes in only a few seconds rather than over 60 seconds?

edited Nov 12 at 20:36

Gordon Linoff

758k35291399

asked Nov 12 at 20:35

Marty

1,01311335

Please share the indexing details on your table(s).
– Madhur Bhaiya
Nov 12 at 20:37

Unless I'm misunderstanding what you're asking for, there are no indexes. These queries are run against flat files using PrestoDB/Amazon Athena
– Marty
Nov 12 at 20:39

Well question was earlier tagged MySQL too. Not sure about PrestoDB though. Check if DESCRIBE command works ?
– Madhur Bhaiya
Nov 12 at 20:40

Yes, that tag was by mistake, sorry. I've included the relevant schema info in the question.
– Marty
Nov 12 at 20:41

1

It's a bit strange that the join by itself appears fast and you only have the issue on the filter. Could be that there is an arbitrary limit to the number of rows that return on the query with no filter. So the query stops or returns results when that limit is hit (IE limit 100). What's the performance like if you just select from main on your filter?
– vanlee1987
Nov 13 at 5:44

|
show 7 more comments

I have two tables, one containing data the other is metadata.

The main data table contains a grid of geospatial coordinates (up to billions of rows). The coordinates are projected to a specific coordinate system. The relevant part of the schema is:

 ------------------
| x | smallint |
|------------------|
| y | smallint |
|------------------|
| value | string |
 ------------------

The metadata table contains the corresponding latitude and longitude values for the x, y coordinates. The relevant part of the schema is:

 ----------------------------
| x | smallint |
|----------------------------|
| y | smallint |
|----------------------------|
| latitude | decimal(18,15) |
|----------------------------|
| longitude | decimal(18,15) |
 ----------------------------

A sample query that retrieves rows from the table:

SELECT 
 main.x,
 main.y,
 latitude,
 longitude,
 value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
 main.x=1 AND main.y<=2

The result set would look like:

 --------------------------------------
| x | y | latitude | longitude | value |
|--------------------------------------|
| 1 | 1 | 12.345 | 54.321 | row1 |
|--------------------------------------|
| 1 | 2 | 12.345 | 98.765 | row2 |
 --------------------------------------

This query only takes 1-2 seconds, great!

My problem is, when I run a query that adds a WHERE clause comparing latitude and longitudes, the query works but it takes over 60 seconds to run...

For example:

SELECT 
 main.x,
 main.y,
 latitude,
 longitude,
 value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
 latitude=DECIMAL '12.345' AND longitude=DECIMAL '98.765'

So the question is, how do I make it so that this (seemingly) simple comparison completes in only a few seconds rather than over 60 seconds?

edited Nov 12 at 20:36

Gordon Linoff

758k35291399

asked Nov 12 at 20:35

Marty

1,01311335

Please share the indexing details on your table(s).
– Madhur Bhaiya
Nov 12 at 20:37

Unless I'm misunderstanding what you're asking for, there are no indexes. These queries are run against flat files using PrestoDB/Amazon Athena
– Marty
Nov 12 at 20:39

Well question was earlier tagged MySQL too. Not sure about PrestoDB though. Check if DESCRIBE command works ?
– Madhur Bhaiya
Nov 12 at 20:40

Yes, that tag was by mistake, sorry. I've included the relevant schema info in the question.
– Marty
Nov 12 at 20:41

1

It's a bit strange that the join by itself appears fast and you only have the issue on the filter. Could be that there is an arbitrary limit to the number of rows that return on the query with no filter. So the query stops or returns results when that limit is hit (IE limit 100). What's the performance like if you just select from main on your filter?
– vanlee1987
Nov 13 at 5:44

|
show 7 more comments

I have two tables, one containing data the other is metadata.

The main data table contains a grid of geospatial coordinates (up to billions of rows). The coordinates are projected to a specific coordinate system. The relevant part of the schema is:

 ------------------
| x | smallint |
|------------------|
| y | smallint |
|------------------|
| value | string |
 ------------------

The metadata table contains the corresponding latitude and longitude values for the x, y coordinates. The relevant part of the schema is:

 ----------------------------
| x | smallint |
|----------------------------|
| y | smallint |
|----------------------------|
| latitude | decimal(18,15) |
|----------------------------|
| longitude | decimal(18,15) |
 ----------------------------

A sample query that retrieves rows from the table:

SELECT 
 main.x,
 main.y,
 latitude,
 longitude,
 value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
 main.x=1 AND main.y<=2

The result set would look like:

 --------------------------------------
| x | y | latitude | longitude | value |
|--------------------------------------|
| 1 | 1 | 12.345 | 54.321 | row1 |
|--------------------------------------|
| 1 | 2 | 12.345 | 98.765 | row2 |
 --------------------------------------

This query only takes 1-2 seconds, great!

My problem is, when I run a query that adds a WHERE clause comparing latitude and longitudes, the query works but it takes over 60 seconds to run...

For example:

SELECT 
 main.x,
 main.y,
 latitude,
 longitude,
 value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
 latitude=DECIMAL '12.345' AND longitude=DECIMAL '98.765'

So the question is, how do I make it so that this (seemingly) simple comparison completes in only a few seconds rather than over 60 seconds?

edited Nov 12 at 20:36

Gordon Linoff

758k35291399

asked Nov 12 at 20:35

Marty

1,01311335

I have two tables, one containing data the other is metadata.

The main data table contains a grid of geospatial coordinates (up to billions of rows). The coordinates are projected to a specific coordinate system. The relevant part of the schema is:

 ------------------
| x | smallint |
|------------------|
| y | smallint |
|------------------|
| value | string |
 ------------------

The metadata table contains the corresponding latitude and longitude values for the x, y coordinates. The relevant part of the schema is:

 ----------------------------
| x | smallint |
|----------------------------|
| y | smallint |
|----------------------------|
| latitude | decimal(18,15) |
|----------------------------|
| longitude | decimal(18,15) |
 ----------------------------

A sample query that retrieves rows from the table:

SELECT 
 main.x,
 main.y,
 latitude,
 longitude,
 value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
 main.x=1 AND main.y<=2

The result set would look like:

 --------------------------------------
| x | y | latitude | longitude | value |
|--------------------------------------|
| 1 | 1 | 12.345 | 54.321 | row1 |
|--------------------------------------|
| 1 | 2 | 12.345 | 98.765 | row2 |
 --------------------------------------

This query only takes 1-2 seconds, great!

My problem is, when I run a query that adds a WHERE clause comparing latitude and longitudes, the query works but it takes over 60 seconds to run...

For example:

SELECT 
 main.x,
 main.y,
 latitude,
 longitude,
 value
FROM database.main JOIN database.meta
ON main.x=meta.x AND main.y=meta.y
WHERE
 latitude=DECIMAL '12.345' AND longitude=DECIMAL '98.765'

So the question is, how do I make it so that this (seemingly) simple comparison completes in only a few seconds rather than over 60 seconds?

sql amazon-athena prestodb

edited Nov 12 at 20:36

Gordon Linoff

758k35291399

asked Nov 12 at 20:35

Marty

1,01311335

edited Nov 12 at 20:36

Gordon Linoff

758k35291399

asked Nov 12 at 20:35

Marty

1,01311335

edited Nov 12 at 20:36

Gordon Linoff

758k35291399

edited Nov 12 at 20:36

Gordon Linoff

758k35291399

edited Nov 12 at 20:36

Gordon Linoff

758k35291399

asked Nov 12 at 20:35

Marty

1,01311335

asked Nov 12 at 20:35

Marty

1,01311335

asked Nov 12 at 20:35

Marty

1,01311335

Please share the indexing details on your table(s).
– Madhur Bhaiya
Nov 12 at 20:37

Unless I'm misunderstanding what you're asking for, there are no indexes. These queries are run against flat files using PrestoDB/Amazon Athena
– Marty
Nov 12 at 20:39

Well question was earlier tagged MySQL too. Not sure about PrestoDB though. Check if DESCRIBE command works ?
– Madhur Bhaiya
Nov 12 at 20:40

Yes, that tag was by mistake, sorry. I've included the relevant schema info in the question.
– Marty
Nov 12 at 20:41

1

It's a bit strange that the join by itself appears fast and you only have the issue on the filter. Could be that there is an arbitrary limit to the number of rows that return on the query with no filter. So the query stops or returns results when that limit is hit (IE limit 100). What's the performance like if you just select from main on your filter?
– vanlee1987
Nov 13 at 5:44

|
show 7 more comments

Please share the indexing details on your table(s).
– Madhur Bhaiya
Nov 12 at 20:37

Unless I'm misunderstanding what you're asking for, there are no indexes. These queries are run against flat files using PrestoDB/Amazon Athena
– Marty
Nov 12 at 20:39

Well question was earlier tagged MySQL too. Not sure about PrestoDB though. Check if DESCRIBE command works ?
– Madhur Bhaiya
Nov 12 at 20:40

Yes, that tag was by mistake, sorry. I've included the relevant schema info in the question.
– Marty
Nov 12 at 20:41

1

It's a bit strange that the join by itself appears fast and you only have the issue on the filter. Could be that there is an arbitrary limit to the number of rows that return on the query with no filter. So the query stops or returns results when that limit is hit (IE limit 100). What's the performance like if you just select from main on your filter?
– vanlee1987
Nov 13 at 5:44

Please share the indexing details on your table(s).
– Madhur Bhaiya
Nov 12 at 20:37

Unless I'm misunderstanding what you're asking for, there are no indexes. These queries are run against flat files using PrestoDB/Amazon Athena
– Marty
Nov 12 at 20:39

Well question was earlier tagged MySQL too. Not sure about PrestoDB though. Check if DESCRIBE command works ?
– Madhur Bhaiya
Nov 12 at 20:40

Yes, that tag was by mistake, sorry. I've included the relevant schema info in the question.
– Marty
Nov 12 at 20:41

It's a bit strange that the join by itself appears fast and you only have the issue on the filter. Could be that there is an arbitrary limit to the number of rows that return on the query with no filter. So the query stops or returns results when that limit is hit (IE limit 100). What's the performance like if you just select from main on your filter?
– vanlee1987
Nov 13 at 5:44

|
show 7 more comments

1 Answer
1

active

oldest

votes

-1

The most likely explanation for poor performance when adding a JOIN is that there is no index that can be used to optimize the join.

In your case, specifying on index on main(x,y) is most likely to speed up the JOIN.

answered Nov 12 at 20:43

Larry Lustig

40k1282129

1

I don't think indexes are a thing in Presto since it quickly scans flat files in columnar format. Hence, the metadata table already is the index in this context. Also, when I perform the JOIN without the comparison, it is very quick, would a lack of index explain that problem? Or maybe I could alter the metadata table in such a way that it can perform better?
– Marty
Nov 12 at 20:46

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53269699%2fsql-slow-only-when-comparing-values-in-a-join-presto-amazon-athena%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

-1

The most likely explanation for poor performance when adding a JOIN is that there is no index that can be used to optimize the join.

In your case, specifying on index on main(x,y) is most likely to speed up the JOIN.

answered Nov 12 at 20:43

Larry Lustig

40k1282129

1

I don't think indexes are a thing in Presto since it quickly scans flat files in columnar format. Hence, the metadata table already is the index in this context. Also, when I perform the JOIN without the comparison, it is very quick, would a lack of index explain that problem? Or maybe I could alter the metadata table in such a way that it can perform better?
– Marty
Nov 12 at 20:46

add a comment |

-1

The most likely explanation for poor performance when adding a JOIN is that there is no index that can be used to optimize the join.

In your case, specifying on index on main(x,y) is most likely to speed up the JOIN.

answered Nov 12 at 20:43

Larry Lustig

40k1282129

1

I don't think indexes are a thing in Presto since it quickly scans flat files in columnar format. Hence, the metadata table already is the index in this context. Also, when I perform the JOIN without the comparison, it is very quick, would a lack of index explain that problem? Or maybe I could alter the metadata table in such a way that it can perform better?
– Marty
Nov 12 at 20:46

add a comment |

-1

The most likely explanation for poor performance when adding a JOIN is that there is no index that can be used to optimize the join.

In your case, specifying on index on main(x,y) is most likely to speed up the JOIN.

answered Nov 12 at 20:43

Larry Lustig

40k1282129

The most likely explanation for poor performance when adding a JOIN is that there is no index that can be used to optimize the join.

In your case, specifying on index on main(x,y) is most likely to speed up the JOIN.

answered Nov 12 at 20:43

Larry Lustig

40k1282129

answered Nov 12 at 20:43

Larry Lustig

40k1282129

answered Nov 12 at 20:43

Larry Lustig

40k1282129

answered Nov 12 at 20:43

Larry Lustig

40k1282129

1

I don't think indexes are a thing in Presto since it quickly scans flat files in columnar format. Hence, the metadata table already is the index in this context. Also, when I perform the JOIN without the comparison, it is very quick, would a lack of index explain that problem? Or maybe I could alter the metadata table in such a way that it can perform better?
– Marty
Nov 12 at 20:46

add a comment |

1

I don't think indexes are a thing in Presto since it quickly scans flat files in columnar format. Hence, the metadata table already is the index in this context. Also, when I perform the JOIN without the comparison, it is very quick, would a lack of index explain that problem? Or maybe I could alter the metadata table in such a way that it can perform better?
– Marty
Nov 12 at 20:46

I don't think indexes are a thing in Presto since it quickly scans flat files in columnar format. Hence, the metadata table already is the index in this context. Also, when I perform the JOIN without the comparison, it is very quick, would a lack of index explain that problem? Or maybe I could alter the metadata table in such a way that it can perform better?
– Marty
Nov 12 at 20:46

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

6RhcvM3,RknXue UFKHfH,lJFgo8YJcd8Zs5VAyxwCvELi553Z,Nn L96dLllg 5wBjLg6,JmhjD,fS ubZxfu

搜尋此網誌

Myujth