BigQuery-SQL - joining 2 table for revenue calculation









up vote
0
down vote

favorite












I've the following 2 tables:



Table1 - raw data without revenue data:



datehour | geo | platform | dimension_4 | dimension_5
metric_1 | metric_2 | billable_actions (int)


Table2 - the revenue data, which is originated from another source
but eventually inserted into a table with the following schema
(please note that it has 2 missing dimensions):



datehour | geo | platform | revenue (float)


I need to merge Table2 into Table1 ,
so I can calculate the revenue only from table1.



THE QUERY



SELECT
datehour,
cc,
platform,
dimension_1,
dimension_2,
billable_actions,
CASE
WHEN revenue > 0 AND billable_actions > 0 THEN (revenue/total_billable_actions)*billable_actions
WHEN metric_1 IS NULL AND metric_2 IS NULL THEN revenue
ELSE 0
END calc_revenue
FROM (
SELECT
IFNULL(e.datehour,d.from_ts) datehour,
IFNULL(e.cc, d.cc) cc,
IFNULL(e.platform, d.platform) platform,
e.dimension1,
e.dimension2,
e.billable_actions,
SUM(e.billable_actions) OVER (PARTITION BY e.datehour, e.platform, e.geo) total_billable_actions,
d.revenue
FROM Table1 e
FULL JOIN Table2 d
ON
e.datehour = d.from_ts AND
e.appkey = d.appkey AND
e.cc = d.cc AND
e.platform = d.platform
)


EXPECTED RESULT & THE PROBLEM



Both tables revenues should be equal after the join.
I'm testing:



select sum(revenue) from Table2 ==
select sum(row_revenue) from JoinedTable


but row_revenue is always lower:
Table2 sum = 0.44449199771042913
Joined table sum = 0.4421989977126941



Can some one please point me for the reason why they are not equal?
I've a working example at the end of the question .



NOTES



note 1 - the revenue is calculated the following way:



1) Table1 FULL JOIN Table2 ON:



T1.datehour = T2.datehour AND
T1.geo = T2.geo AND
T1.platform = T2.platform


*please read note 2,3 regard why I use full join.



2) on Table1 , sum the billable actions, partition by datehour, geo and platform.



SUM(billable_actions) OVER (PARTITION BY datehour, geo, platform) 
AS total_billable_actions


3) Now , we have a table , where each row contains the total_billable_actions per the group , the total revenue and the total actions for the row. wrapping it with another select and calculating the row revenue:



row_revenue = (revenue / total_billable_actions) * billable_actions


note 2
we may have rows from Table1 which do not match any row from table2,
but we still need the data found in those rows . those rows will not affect the revenue calculations.



note 3
we may have rows from Table2 which also do not fit the join.
we need those rows , stay untouched , with the same revenue.



note 4 regard floating point issues:
BigQuery uses 8byte double precision for float,
and I've multiplied all results with great numbers (1000, 1000000) to find out if the issue is decimal point precision.
It seems that multiplying the result gives the same exact result.



working example



for simplicity I've taken out one dimension and the other metrics.



Left table:



datehour geo platform dimension_1 billable_actions
15:00 US Android shmoo 10
15:00 US Android foo 5
15:00 US IOS shmoo 8
15:00 US IOS foo 4
15:00 US UNKNOWN shmoo 4


Right table:



datehour geo platform revenue
15:00 US Android 5$
15:00 US IOS 10$
15:00 UNKNOWN IOS 2$


Joined table:



15:00 US Android shmoo 10 15 (5$/15)*10 = 3.333..3
15:00 US Android foo 5 15 (5$/15)*5 = 1.6666..7
15:00 US IOS shmoo 8 12 (10$/12)*8 = 6.6666..7
15:00 US IOS foo 4 12 (10$/12)*4 = 3.3333..3
15:00 US UNKNOWN shmoo 4 4 0$
15:00 UNKNOWN IOS null 0 0 2$


sum(row_revenue) = 17$



Thanks all !










share|improve this question























  • You should use NUMERIC instead of FLOAT64 to rule out precision loss
    – Elliott Brossard
    Nov 10 at 22:57










  • A Minimal, Complete, and Verifiable example includes DBMS (with version), a clear specification, cut & paste & runnable code (with small representative input) (format table initializations as tables) & diffable (hence ordered) desired output. But absolute basics of debugging say: Show that your program does what you expect as it goes through (sub)expressions by saying what that is & showing that it actually does it via incremental output. On adding problem code that you can't fix, research (the manual & the web). Repeat, minimizing working & wrong code. Then ask re the (small) difference between working & non-working examples.
    – philipxy
    Nov 10 at 23:01











  • @ElliottBrossard . I can't believe I missed it . Thx.
    – Amit Triffon
    Nov 10 at 23:04











  • @ElliottBrossard , Hi ! Unfortunately it did not help , I'm getting a very very close result , i've also changed the full join into inner join for a sprcific period I know I don't have any leftovers (both left and right) but the revenue is still very close but not the same . 1) SOURCE 0.20103616 2) QUERY 0.200784255
    – Amit Triffon
    Nov 11 at 11:47















up vote
0
down vote

favorite












I've the following 2 tables:



Table1 - raw data without revenue data:



datehour | geo | platform | dimension_4 | dimension_5
metric_1 | metric_2 | billable_actions (int)


Table2 - the revenue data, which is originated from another source
but eventually inserted into a table with the following schema
(please note that it has 2 missing dimensions):



datehour | geo | platform | revenue (float)


I need to merge Table2 into Table1 ,
so I can calculate the revenue only from table1.



THE QUERY



SELECT
datehour,
cc,
platform,
dimension_1,
dimension_2,
billable_actions,
CASE
WHEN revenue > 0 AND billable_actions > 0 THEN (revenue/total_billable_actions)*billable_actions
WHEN metric_1 IS NULL AND metric_2 IS NULL THEN revenue
ELSE 0
END calc_revenue
FROM (
SELECT
IFNULL(e.datehour,d.from_ts) datehour,
IFNULL(e.cc, d.cc) cc,
IFNULL(e.platform, d.platform) platform,
e.dimension1,
e.dimension2,
e.billable_actions,
SUM(e.billable_actions) OVER (PARTITION BY e.datehour, e.platform, e.geo) total_billable_actions,
d.revenue
FROM Table1 e
FULL JOIN Table2 d
ON
e.datehour = d.from_ts AND
e.appkey = d.appkey AND
e.cc = d.cc AND
e.platform = d.platform
)


EXPECTED RESULT & THE PROBLEM



Both tables revenues should be equal after the join.
I'm testing:



select sum(revenue) from Table2 ==
select sum(row_revenue) from JoinedTable


but row_revenue is always lower:
Table2 sum = 0.44449199771042913
Joined table sum = 0.4421989977126941



Can some one please point me for the reason why they are not equal?
I've a working example at the end of the question .



NOTES



note 1 - the revenue is calculated the following way:



1) Table1 FULL JOIN Table2 ON:



T1.datehour = T2.datehour AND
T1.geo = T2.geo AND
T1.platform = T2.platform


*please read note 2,3 regard why I use full join.



2) on Table1 , sum the billable actions, partition by datehour, geo and platform.



SUM(billable_actions) OVER (PARTITION BY datehour, geo, platform) 
AS total_billable_actions


3) Now , we have a table , where each row contains the total_billable_actions per the group , the total revenue and the total actions for the row. wrapping it with another select and calculating the row revenue:



row_revenue = (revenue / total_billable_actions) * billable_actions


note 2
we may have rows from Table1 which do not match any row from table2,
but we still need the data found in those rows . those rows will not affect the revenue calculations.



note 3
we may have rows from Table2 which also do not fit the join.
we need those rows , stay untouched , with the same revenue.



note 4 regard floating point issues:
BigQuery uses 8byte double precision for float,
and I've multiplied all results with great numbers (1000, 1000000) to find out if the issue is decimal point precision.
It seems that multiplying the result gives the same exact result.



working example



for simplicity I've taken out one dimension and the other metrics.



Left table:



datehour geo platform dimension_1 billable_actions
15:00 US Android shmoo 10
15:00 US Android foo 5
15:00 US IOS shmoo 8
15:00 US IOS foo 4
15:00 US UNKNOWN shmoo 4


Right table:



datehour geo platform revenue
15:00 US Android 5$
15:00 US IOS 10$
15:00 UNKNOWN IOS 2$


Joined table:



15:00 US Android shmoo 10 15 (5$/15)*10 = 3.333..3
15:00 US Android foo 5 15 (5$/15)*5 = 1.6666..7
15:00 US IOS shmoo 8 12 (10$/12)*8 = 6.6666..7
15:00 US IOS foo 4 12 (10$/12)*4 = 3.3333..3
15:00 US UNKNOWN shmoo 4 4 0$
15:00 UNKNOWN IOS null 0 0 2$


sum(row_revenue) = 17$



Thanks all !










share|improve this question























  • You should use NUMERIC instead of FLOAT64 to rule out precision loss
    – Elliott Brossard
    Nov 10 at 22:57










  • A Minimal, Complete, and Verifiable example includes DBMS (with version), a clear specification, cut & paste & runnable code (with small representative input) (format table initializations as tables) & diffable (hence ordered) desired output. But absolute basics of debugging say: Show that your program does what you expect as it goes through (sub)expressions by saying what that is & showing that it actually does it via incremental output. On adding problem code that you can't fix, research (the manual & the web). Repeat, minimizing working & wrong code. Then ask re the (small) difference between working & non-working examples.
    – philipxy
    Nov 10 at 23:01











  • @ElliottBrossard . I can't believe I missed it . Thx.
    – Amit Triffon
    Nov 10 at 23:04











  • @ElliottBrossard , Hi ! Unfortunately it did not help , I'm getting a very very close result , i've also changed the full join into inner join for a sprcific period I know I don't have any leftovers (both left and right) but the revenue is still very close but not the same . 1) SOURCE 0.20103616 2) QUERY 0.200784255
    – Amit Triffon
    Nov 11 at 11:47













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I've the following 2 tables:



Table1 - raw data without revenue data:



datehour | geo | platform | dimension_4 | dimension_5
metric_1 | metric_2 | billable_actions (int)


Table2 - the revenue data, which is originated from another source
but eventually inserted into a table with the following schema
(please note that it has 2 missing dimensions):



datehour | geo | platform | revenue (float)


I need to merge Table2 into Table1 ,
so I can calculate the revenue only from table1.



THE QUERY



SELECT
datehour,
cc,
platform,
dimension_1,
dimension_2,
billable_actions,
CASE
WHEN revenue > 0 AND billable_actions > 0 THEN (revenue/total_billable_actions)*billable_actions
WHEN metric_1 IS NULL AND metric_2 IS NULL THEN revenue
ELSE 0
END calc_revenue
FROM (
SELECT
IFNULL(e.datehour,d.from_ts) datehour,
IFNULL(e.cc, d.cc) cc,
IFNULL(e.platform, d.platform) platform,
e.dimension1,
e.dimension2,
e.billable_actions,
SUM(e.billable_actions) OVER (PARTITION BY e.datehour, e.platform, e.geo) total_billable_actions,
d.revenue
FROM Table1 e
FULL JOIN Table2 d
ON
e.datehour = d.from_ts AND
e.appkey = d.appkey AND
e.cc = d.cc AND
e.platform = d.platform
)


EXPECTED RESULT & THE PROBLEM



Both tables revenues should be equal after the join.
I'm testing:



select sum(revenue) from Table2 ==
select sum(row_revenue) from JoinedTable


but row_revenue is always lower:
Table2 sum = 0.44449199771042913
Joined table sum = 0.4421989977126941



Can some one please point me for the reason why they are not equal?
I've a working example at the end of the question .



NOTES



note 1 - the revenue is calculated the following way:



1) Table1 FULL JOIN Table2 ON:



T1.datehour = T2.datehour AND
T1.geo = T2.geo AND
T1.platform = T2.platform


*please read note 2,3 regard why I use full join.



2) on Table1 , sum the billable actions, partition by datehour, geo and platform.



SUM(billable_actions) OVER (PARTITION BY datehour, geo, platform) 
AS total_billable_actions


3) Now , we have a table , where each row contains the total_billable_actions per the group , the total revenue and the total actions for the row. wrapping it with another select and calculating the row revenue:



row_revenue = (revenue / total_billable_actions) * billable_actions


note 2
we may have rows from Table1 which do not match any row from table2,
but we still need the data found in those rows . those rows will not affect the revenue calculations.



note 3
we may have rows from Table2 which also do not fit the join.
we need those rows , stay untouched , with the same revenue.



note 4 regard floating point issues:
BigQuery uses 8byte double precision for float,
and I've multiplied all results with great numbers (1000, 1000000) to find out if the issue is decimal point precision.
It seems that multiplying the result gives the same exact result.



working example



for simplicity I've taken out one dimension and the other metrics.



Left table:



datehour geo platform dimension_1 billable_actions
15:00 US Android shmoo 10
15:00 US Android foo 5
15:00 US IOS shmoo 8
15:00 US IOS foo 4
15:00 US UNKNOWN shmoo 4


Right table:



datehour geo platform revenue
15:00 US Android 5$
15:00 US IOS 10$
15:00 UNKNOWN IOS 2$


Joined table:



15:00 US Android shmoo 10 15 (5$/15)*10 = 3.333..3
15:00 US Android foo 5 15 (5$/15)*5 = 1.6666..7
15:00 US IOS shmoo 8 12 (10$/12)*8 = 6.6666..7
15:00 US IOS foo 4 12 (10$/12)*4 = 3.3333..3
15:00 US UNKNOWN shmoo 4 4 0$
15:00 UNKNOWN IOS null 0 0 2$


sum(row_revenue) = 17$



Thanks all !










share|improve this question















I've the following 2 tables:



Table1 - raw data without revenue data:



datehour | geo | platform | dimension_4 | dimension_5
metric_1 | metric_2 | billable_actions (int)


Table2 - the revenue data, which is originated from another source
but eventually inserted into a table with the following schema
(please note that it has 2 missing dimensions):



datehour | geo | platform | revenue (float)


I need to merge Table2 into Table1 ,
so I can calculate the revenue only from table1.



THE QUERY



SELECT
datehour,
cc,
platform,
dimension_1,
dimension_2,
billable_actions,
CASE
WHEN revenue > 0 AND billable_actions > 0 THEN (revenue/total_billable_actions)*billable_actions
WHEN metric_1 IS NULL AND metric_2 IS NULL THEN revenue
ELSE 0
END calc_revenue
FROM (
SELECT
IFNULL(e.datehour,d.from_ts) datehour,
IFNULL(e.cc, d.cc) cc,
IFNULL(e.platform, d.platform) platform,
e.dimension1,
e.dimension2,
e.billable_actions,
SUM(e.billable_actions) OVER (PARTITION BY e.datehour, e.platform, e.geo) total_billable_actions,
d.revenue
FROM Table1 e
FULL JOIN Table2 d
ON
e.datehour = d.from_ts AND
e.appkey = d.appkey AND
e.cc = d.cc AND
e.platform = d.platform
)


EXPECTED RESULT & THE PROBLEM



Both tables revenues should be equal after the join.
I'm testing:



select sum(revenue) from Table2 ==
select sum(row_revenue) from JoinedTable


but row_revenue is always lower:
Table2 sum = 0.44449199771042913
Joined table sum = 0.4421989977126941



Can some one please point me for the reason why they are not equal?
I've a working example at the end of the question .



NOTES



note 1 - the revenue is calculated the following way:



1) Table1 FULL JOIN Table2 ON:



T1.datehour = T2.datehour AND
T1.geo = T2.geo AND
T1.platform = T2.platform


*please read note 2,3 regard why I use full join.



2) on Table1 , sum the billable actions, partition by datehour, geo and platform.



SUM(billable_actions) OVER (PARTITION BY datehour, geo, platform) 
AS total_billable_actions


3) Now , we have a table , where each row contains the total_billable_actions per the group , the total revenue and the total actions for the row. wrapping it with another select and calculating the row revenue:



row_revenue = (revenue / total_billable_actions) * billable_actions


note 2
we may have rows from Table1 which do not match any row from table2,
but we still need the data found in those rows . those rows will not affect the revenue calculations.



note 3
we may have rows from Table2 which also do not fit the join.
we need those rows , stay untouched , with the same revenue.



note 4 regard floating point issues:
BigQuery uses 8byte double precision for float,
and I've multiplied all results with great numbers (1000, 1000000) to find out if the issue is decimal point precision.
It seems that multiplying the result gives the same exact result.



working example



for simplicity I've taken out one dimension and the other metrics.



Left table:



datehour geo platform dimension_1 billable_actions
15:00 US Android shmoo 10
15:00 US Android foo 5
15:00 US IOS shmoo 8
15:00 US IOS foo 4
15:00 US UNKNOWN shmoo 4


Right table:



datehour geo platform revenue
15:00 US Android 5$
15:00 US IOS 10$
15:00 UNKNOWN IOS 2$


Joined table:



15:00 US Android shmoo 10 15 (5$/15)*10 = 3.333..3
15:00 US Android foo 5 15 (5$/15)*5 = 1.6666..7
15:00 US IOS shmoo 8 12 (10$/12)*8 = 6.6666..7
15:00 US IOS foo 4 12 (10$/12)*4 = 3.3333..3
15:00 US UNKNOWN shmoo 4 4 0$
15:00 UNKNOWN IOS null 0 0 2$


sum(row_revenue) = 17$



Thanks all !







sql join google-bigquery






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 10 at 23:01

























asked Nov 10 at 22:37









Amit Triffon

112




112











  • You should use NUMERIC instead of FLOAT64 to rule out precision loss
    – Elliott Brossard
    Nov 10 at 22:57










  • A Minimal, Complete, and Verifiable example includes DBMS (with version), a clear specification, cut & paste & runnable code (with small representative input) (format table initializations as tables) & diffable (hence ordered) desired output. But absolute basics of debugging say: Show that your program does what you expect as it goes through (sub)expressions by saying what that is & showing that it actually does it via incremental output. On adding problem code that you can't fix, research (the manual & the web). Repeat, minimizing working & wrong code. Then ask re the (small) difference between working & non-working examples.
    – philipxy
    Nov 10 at 23:01











  • @ElliottBrossard . I can't believe I missed it . Thx.
    – Amit Triffon
    Nov 10 at 23:04











  • @ElliottBrossard , Hi ! Unfortunately it did not help , I'm getting a very very close result , i've also changed the full join into inner join for a sprcific period I know I don't have any leftovers (both left and right) but the revenue is still very close but not the same . 1) SOURCE 0.20103616 2) QUERY 0.200784255
    – Amit Triffon
    Nov 11 at 11:47

















  • You should use NUMERIC instead of FLOAT64 to rule out precision loss
    – Elliott Brossard
    Nov 10 at 22:57










  • A Minimal, Complete, and Verifiable example includes DBMS (with version), a clear specification, cut & paste & runnable code (with small representative input) (format table initializations as tables) & diffable (hence ordered) desired output. But absolute basics of debugging say: Show that your program does what you expect as it goes through (sub)expressions by saying what that is & showing that it actually does it via incremental output. On adding problem code that you can't fix, research (the manual & the web). Repeat, minimizing working & wrong code. Then ask re the (small) difference between working & non-working examples.
    – philipxy
    Nov 10 at 23:01











  • @ElliottBrossard . I can't believe I missed it . Thx.
    – Amit Triffon
    Nov 10 at 23:04











  • @ElliottBrossard , Hi ! Unfortunately it did not help , I'm getting a very very close result , i've also changed the full join into inner join for a sprcific period I know I don't have any leftovers (both left and right) but the revenue is still very close but not the same . 1) SOURCE 0.20103616 2) QUERY 0.200784255
    – Amit Triffon
    Nov 11 at 11:47
















You should use NUMERIC instead of FLOAT64 to rule out precision loss
– Elliott Brossard
Nov 10 at 22:57




You should use NUMERIC instead of FLOAT64 to rule out precision loss
– Elliott Brossard
Nov 10 at 22:57












A Minimal, Complete, and Verifiable example includes DBMS (with version), a clear specification, cut & paste & runnable code (with small representative input) (format table initializations as tables) & diffable (hence ordered) desired output. But absolute basics of debugging say: Show that your program does what you expect as it goes through (sub)expressions by saying what that is & showing that it actually does it via incremental output. On adding problem code that you can't fix, research (the manual & the web). Repeat, minimizing working & wrong code. Then ask re the (small) difference between working & non-working examples.
– philipxy
Nov 10 at 23:01





A Minimal, Complete, and Verifiable example includes DBMS (with version), a clear specification, cut & paste & runnable code (with small representative input) (format table initializations as tables) & diffable (hence ordered) desired output. But absolute basics of debugging say: Show that your program does what you expect as it goes through (sub)expressions by saying what that is & showing that it actually does it via incremental output. On adding problem code that you can't fix, research (the manual & the web). Repeat, minimizing working & wrong code. Then ask re the (small) difference between working & non-working examples.
– philipxy
Nov 10 at 23:01













@ElliottBrossard . I can't believe I missed it . Thx.
– Amit Triffon
Nov 10 at 23:04





@ElliottBrossard . I can't believe I missed it . Thx.
– Amit Triffon
Nov 10 at 23:04













@ElliottBrossard , Hi ! Unfortunately it did not help , I'm getting a very very close result , i've also changed the full join into inner join for a sprcific period I know I don't have any leftovers (both left and right) but the revenue is still very close but not the same . 1) SOURCE 0.20103616 2) QUERY 0.200784255
– Amit Triffon
Nov 11 at 11:47





@ElliottBrossard , Hi ! Unfortunately it did not help , I'm getting a very very close result , i've also changed the full join into inner join for a sprcific period I know I don't have any leftovers (both left and right) but the revenue is still very close but not the same . 1) SOURCE 0.20103616 2) QUERY 0.200784255
– Amit Triffon
Nov 11 at 11:47


















active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244118%2fbigquery-sql-joining-2-table-for-revenue-calculation%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244118%2fbigquery-sql-joining-2-table-for-revenue-calculation%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Top Tejano songwriter Luis Silva dead of heart attack at 64

政党

天津地下鉄3号線