Finding the collection length in firebase

I have over 20k objects in my Firebase Realtime Database. I now need to take out all these objects and do stuff to them. The problem is the server runs out of memory every time I do it. This is my current code:

sendEmail.get('/:types/:message', cors(), async (req, res, next) => 
 console.log(5);
 const types = JSON.parse(req.params.types);
 console.log('types', types);
 let recipients = ;
 let mails = ;
 if (types.includes('students')) 
 console.log(1);
 const tmpUsers = await admin.database().ref('Users').orderByChild('student').equalTo(true).once('value').then(r => r.val()).catch(e => console.log(e));
 recipients = recipients.concat(tmpUsers);
 
 if (types.includes('solvers')) 
 console.log(2);
 let tmpUsers = await admin.database().ref('Users').orderByChild('userType').equalTo('person').once('value').then(r => r.val()).catch(e => console.log(e));
 tmpUsers = tmpUsers.concat(arrayFromObject(await admin.database().ref('Users').orderByChild('userType').equalTo('company').once('value').then(r => r.val()).catch(e => console.log(e))));
 recipients = recipients.concat(tmpUsers);
 
);

So I have two options. Streaming or limiting the response with startAt and endAt. But to limit the responses I need to know how many objects exactly I have. And to do this I need to download the whole collection... You see my problem now. How can I learn how many documents I have, without downloading the whole collection?

edited Nov 14 '18 at 14:27

Frank van Puffelen

233k29380406

asked Nov 14 '18 at 10:37

Alex Ironside

1,021725

1

The trick is to use limitToFirst/limitToLast combined with startAt/endAt. For example, you can perform the first query with limitToFirst(100), then obtain the last key from this returned list and use that with: startAt(key) and another limitToFirst(100). There is a simple example available in the documentation discussing range queries that hints at this. I'll see if I can draw up a node.js example.

– Grimthorr
Nov 14 '18 at 11:11

I'd appreciate an example. I think I see what you're going for, but I'm not sure I fully understand

– Alex Ironside
Nov 14 '18 at 11:24

I'll try to whip one up. Thinking about this a little more though: if it's the Cloud Function that's timing out when dealing with large datasets, you could increase the function's timeout and memory allocation instead.

– Grimthorr
Nov 14 '18 at 11:30

I already tried that. 2GB are not enough. I was shocked myself

– Alex Ironside
Nov 14 '18 at 11:31

I'm actually not sure if paginating will help now that I've written an example. I'll post my answer anyway, but you might have to split the data processing across multiple function invocations instead.

– Grimthorr
Nov 14 '18 at 13:38

|
show 1 more comment

sendEmail.get('/:types/:message', cors(), async (req, res, next) => 
 console.log(5);
 const types = JSON.parse(req.params.types);
 console.log('types', types);
 let recipients = ;
 let mails = ;
 if (types.includes('students')) 
 console.log(1);
 const tmpUsers = await admin.database().ref('Users').orderByChild('student').equalTo(true).once('value').then(r => r.val()).catch(e => console.log(e));
 recipients = recipients.concat(tmpUsers);
 
 if (types.includes('solvers')) 
 console.log(2);
 let tmpUsers = await admin.database().ref('Users').orderByChild('userType').equalTo('person').once('value').then(r => r.val()).catch(e => console.log(e));
 tmpUsers = tmpUsers.concat(arrayFromObject(await admin.database().ref('Users').orderByChild('userType').equalTo('company').once('value').then(r => r.val()).catch(e => console.log(e))));
 recipients = recipients.concat(tmpUsers);
 
);

edited Nov 14 '18 at 14:27

Frank van Puffelen

233k29380406

asked Nov 14 '18 at 10:37

Alex Ironside

1,021725

1

The trick is to use limitToFirst/limitToLast combined with startAt/endAt. For example, you can perform the first query with limitToFirst(100), then obtain the last key from this returned list and use that with: startAt(key) and another limitToFirst(100). There is a simple example available in the documentation discussing range queries that hints at this. I'll see if I can draw up a node.js example.

– Grimthorr
Nov 14 '18 at 11:11

I'd appreciate an example. I think I see what you're going for, but I'm not sure I fully understand

– Alex Ironside
Nov 14 '18 at 11:24

I'll try to whip one up. Thinking about this a little more though: if it's the Cloud Function that's timing out when dealing with large datasets, you could increase the function's timeout and memory allocation instead.

– Grimthorr
Nov 14 '18 at 11:30

I already tried that. 2GB are not enough. I was shocked myself

– Alex Ironside
Nov 14 '18 at 11:31

I'm actually not sure if paginating will help now that I've written an example. I'll post my answer anyway, but you might have to split the data processing across multiple function invocations instead.

– Grimthorr
Nov 14 '18 at 13:38

|
show 1 more comment

sendEmail.get('/:types/:message', cors(), async (req, res, next) => 
 console.log(5);
 const types = JSON.parse(req.params.types);
 console.log('types', types);
 let recipients = ;
 let mails = ;
 if (types.includes('students')) 
 console.log(1);
 const tmpUsers = await admin.database().ref('Users').orderByChild('student').equalTo(true).once('value').then(r => r.val()).catch(e => console.log(e));
 recipients = recipients.concat(tmpUsers);
 
 if (types.includes('solvers')) 
 console.log(2);
 let tmpUsers = await admin.database().ref('Users').orderByChild('userType').equalTo('person').once('value').then(r => r.val()).catch(e => console.log(e));
 tmpUsers = tmpUsers.concat(arrayFromObject(await admin.database().ref('Users').orderByChild('userType').equalTo('company').once('value').then(r => r.val()).catch(e => console.log(e))));
 recipients = recipients.concat(tmpUsers);
 
);

edited Nov 14 '18 at 14:27

Frank van Puffelen

233k29380406

asked Nov 14 '18 at 10:37

Alex Ironside

1,021725

sendEmail.get('/:types/:message', cors(), async (req, res, next) => 
 console.log(5);
 const types = JSON.parse(req.params.types);
 console.log('types', types);
 let recipients = ;
 let mails = ;
 if (types.includes('students')) 
 console.log(1);
 const tmpUsers = await admin.database().ref('Users').orderByChild('student').equalTo(true).once('value').then(r => r.val()).catch(e => console.log(e));
 recipients = recipients.concat(tmpUsers);
 
 if (types.includes('solvers')) 
 console.log(2);
 let tmpUsers = await admin.database().ref('Users').orderByChild('userType').equalTo('person').once('value').then(r => r.val()).catch(e => console.log(e));
 tmpUsers = tmpUsers.concat(arrayFromObject(await admin.database().ref('Users').orderByChild('userType').equalTo('company').once('value').then(r => r.val()).catch(e => console.log(e))));
 recipients = recipients.concat(tmpUsers);
 
);

javascript node.js firebase firebase-realtime-database

edited Nov 14 '18 at 14:27

Frank van Puffelen

233k29380406

asked Nov 14 '18 at 10:37

Alex Ironside

1,021725

edited Nov 14 '18 at 14:27

Frank van Puffelen

233k29380406

asked Nov 14 '18 at 10:37

Alex Ironside

1,021725

edited Nov 14 '18 at 14:27

Frank van Puffelen

233k29380406

edited Nov 14 '18 at 14:27

Frank van Puffelen

233k29380406

edited Nov 14 '18 at 14:27

Frank van Puffelen

233k29380406

asked Nov 14 '18 at 10:37

Alex Ironside

1,021725

asked Nov 14 '18 at 10:37

Alex Ironside

1,021725

asked Nov 14 '18 at 10:37

Alex Ironside

1,021725

1

The trick is to use limitToFirst/limitToLast combined with startAt/endAt. For example, you can perform the first query with limitToFirst(100), then obtain the last key from this returned list and use that with: startAt(key) and another limitToFirst(100). There is a simple example available in the documentation discussing range queries that hints at this. I'll see if I can draw up a node.js example.

– Grimthorr
Nov 14 '18 at 11:11

I'd appreciate an example. I think I see what you're going for, but I'm not sure I fully understand

– Alex Ironside
Nov 14 '18 at 11:24

I'll try to whip one up. Thinking about this a little more though: if it's the Cloud Function that's timing out when dealing with large datasets, you could increase the function's timeout and memory allocation instead.

– Grimthorr
Nov 14 '18 at 11:30

I already tried that. 2GB are not enough. I was shocked myself

– Alex Ironside
Nov 14 '18 at 11:31

I'm actually not sure if paginating will help now that I've written an example. I'll post my answer anyway, but you might have to split the data processing across multiple function invocations instead.

– Grimthorr
Nov 14 '18 at 13:38

|
show 1 more comment

1

The trick is to use limitToFirst/limitToLast combined with startAt/endAt. For example, you can perform the first query with limitToFirst(100), then obtain the last key from this returned list and use that with: startAt(key) and another limitToFirst(100). There is a simple example available in the documentation discussing range queries that hints at this. I'll see if I can draw up a node.js example.

– Grimthorr
Nov 14 '18 at 11:11

I'd appreciate an example. I think I see what you're going for, but I'm not sure I fully understand

– Alex Ironside
Nov 14 '18 at 11:24

I'll try to whip one up. Thinking about this a little more though: if it's the Cloud Function that's timing out when dealing with large datasets, you could increase the function's timeout and memory allocation instead.

– Grimthorr
Nov 14 '18 at 11:30

I already tried that. 2GB are not enough. I was shocked myself

– Alex Ironside
Nov 14 '18 at 11:31

I'm actually not sure if paginating will help now that I've written an example. I'll post my answer anyway, but you might have to split the data processing across multiple function invocations instead.

– Grimthorr
Nov 14 '18 at 13:38

The trick is to use limitToFirst/limitToLast combined with startAt/endAt. For example, you can perform the first query with limitToFirst(100), then obtain the last key from this returned list and use that with: startAt(key) and another limitToFirst(100). There is a simple example available in the documentation discussing range queries that hints at this. I'll see if I can draw up a node.js example.

– Grimthorr
Nov 14 '18 at 11:11

I'd appreciate an example. I think I see what you're going for, but I'm not sure I fully understand

– Alex Ironside
Nov 14 '18 at 11:24

I'll try to whip one up. Thinking about this a little more though: if it's the Cloud Function that's timing out when dealing with large datasets, you could increase the function's timeout and memory allocation instead.

– Grimthorr
Nov 14 '18 at 11:30

I already tried that. 2GB are not enough. I was shocked myself

– Alex Ironside
Nov 14 '18 at 11:31

I'm actually not sure if paginating will help now that I've written an example. I'll post my answer anyway, but you might have to split the data processing across multiple function invocations instead.

– Grimthorr
Nov 14 '18 at 13:38

|
show 1 more comment

2 Answers
2

active

oldest

votes

You could try paginating your query by combining limitToFirst/limitToLast and startAt/endAt.

For example, you could perform the first query with limitToFirst(1000), then obtain the last key from this returned list and use that with startAt(key) and another limitToFirst(1000), repeating until you reach the end of the collection.

In node.js, it might look something like this (untested code):

let recipients = ;

let tmpUsers = next();
recipients = filter(recipients, tmpUsers);

// startAt is inclusive, so when this reaches the last result there will only be 1
while (tmpUsers.length>1) 
 let lastKey = tmpUsers.slice(-1).pop().key;
 tmpUsers = next(lastKey);
 if (tmpUsers.length>1) // Avoid duplicating last result
 recipients = filter(recipients, tmpUsers);
 


async function next(startAt) 
 if (!startAt) 
 return await admin.database().ref('Users')
 .orderByKey()
 .limitToFirst(1000)
 .once('value').then(r => r.val()).catch(e => console.log(e));
 else 
 return await admin.database().ref('Users')
 .orderByKey()
 .startAt(startAt)
 .limitToFirst(1000)
 .once('value').then(r => r.val()).catch(e => console.log(e));
 


function filter(array1, array2) 
 // TODO: Filter the results here as we can't combine orderByChild/orderByKey
 return array1.concat(array2);

The problem with this is that you won't be able to use database-side filtering, so you'd need to filter the results manually, which might make things worse, depending on how many items you need to keep in the recipients variable at a time.

Another option would be to process them in batches (of 1000 for example), pop them from the recipients array to free up resources and then move onto the next batch. It does depend entirely on what actions you need to perform on the objects, and you'll need to weigh up whether it's actually necessary to process (and keep in memory) the entire result set in one go.

answered Nov 14 '18 at 13:54

Grimthorr

4,42442235

One question. How will the code know when to stop?

– Alex Ironside
Nov 14 '18 at 13:57

1

It will keep going until the database only returns 1 result or less (while (tmpUsers.length>1)) which means it's reached the end of the list. This is because startAt is inclusive, so returning 1 results means it's reached the end. At least, that's the idea, if my code works!

– Grimthorr
Nov 14 '18 at 14:05

Ooook. That makes sense. I'll take a look at it

– Alex Ironside
Nov 14 '18 at 14:12

1

Great answer @Grimthorr!

– Frank van Puffelen
Nov 14 '18 at 14:27

add a comment |

You don't need to know the size of the collection to process them by batch.

You can do it by ordering them by key, limiting to 1000 or so, and then on next batch start the last key of the first batch.

If you still want to know how to get the size of the collection, the only good way is to maintain the size of collection in separate node and keep it updated when the collection is updated.

answered Nov 14 '18 at 14:00

emil

1,98831424

add a comment |

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53298204%2ffinding-the-collection-length-in-firebase%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

You could try paginating your query by combining limitToFirst/limitToLast and startAt/endAt.

In node.js, it might look something like this (untested code):

let recipients = ;

let tmpUsers = next();
recipients = filter(recipients, tmpUsers);

// startAt is inclusive, so when this reaches the last result there will only be 1
while (tmpUsers.length>1) 
 let lastKey = tmpUsers.slice(-1).pop().key;
 tmpUsers = next(lastKey);
 if (tmpUsers.length>1) // Avoid duplicating last result
 recipients = filter(recipients, tmpUsers);
 


async function next(startAt) 
 if (!startAt) 
 return await admin.database().ref('Users')
 .orderByKey()
 .limitToFirst(1000)
 .once('value').then(r => r.val()).catch(e => console.log(e));
 else 
 return await admin.database().ref('Users')
 .orderByKey()
 .startAt(startAt)
 .limitToFirst(1000)
 .once('value').then(r => r.val()).catch(e => console.log(e));
 


function filter(array1, array2) 
 // TODO: Filter the results here as we can't combine orderByChild/orderByKey
 return array1.concat(array2);

answered Nov 14 '18 at 13:54

Grimthorr

4,42442235

One question. How will the code know when to stop?

– Alex Ironside
Nov 14 '18 at 13:57

1

It will keep going until the database only returns 1 result or less (while (tmpUsers.length>1)) which means it's reached the end of the list. This is because startAt is inclusive, so returning 1 results means it's reached the end. At least, that's the idea, if my code works!

– Grimthorr
Nov 14 '18 at 14:05

Ooook. That makes sense. I'll take a look at it

– Alex Ironside
Nov 14 '18 at 14:12

1

Great answer @Grimthorr!

– Frank van Puffelen
Nov 14 '18 at 14:27

add a comment |

You could try paginating your query by combining limitToFirst/limitToLast and startAt/endAt.

In node.js, it might look something like this (untested code):

let recipients = ;

let tmpUsers = next();
recipients = filter(recipients, tmpUsers);

// startAt is inclusive, so when this reaches the last result there will only be 1
while (tmpUsers.length>1) 
 let lastKey = tmpUsers.slice(-1).pop().key;
 tmpUsers = next(lastKey);
 if (tmpUsers.length>1) // Avoid duplicating last result
 recipients = filter(recipients, tmpUsers);
 


async function next(startAt) 
 if (!startAt) 
 return await admin.database().ref('Users')
 .orderByKey()
 .limitToFirst(1000)
 .once('value').then(r => r.val()).catch(e => console.log(e));
 else 
 return await admin.database().ref('Users')
 .orderByKey()
 .startAt(startAt)
 .limitToFirst(1000)
 .once('value').then(r => r.val()).catch(e => console.log(e));
 


function filter(array1, array2) 
 // TODO: Filter the results here as we can't combine orderByChild/orderByKey
 return array1.concat(array2);

answered Nov 14 '18 at 13:54

Grimthorr

4,42442235

One question. How will the code know when to stop?

– Alex Ironside
Nov 14 '18 at 13:57

1

It will keep going until the database only returns 1 result or less (while (tmpUsers.length>1)) which means it's reached the end of the list. This is because startAt is inclusive, so returning 1 results means it's reached the end. At least, that's the idea, if my code works!

– Grimthorr
Nov 14 '18 at 14:05

Ooook. That makes sense. I'll take a look at it

– Alex Ironside
Nov 14 '18 at 14:12

1

Great answer @Grimthorr!

– Frank van Puffelen
Nov 14 '18 at 14:27

add a comment |

You could try paginating your query by combining limitToFirst/limitToLast and startAt/endAt.

In node.js, it might look something like this (untested code):

let recipients = ;

let tmpUsers = next();
recipients = filter(recipients, tmpUsers);

// startAt is inclusive, so when this reaches the last result there will only be 1
while (tmpUsers.length>1) 
 let lastKey = tmpUsers.slice(-1).pop().key;
 tmpUsers = next(lastKey);
 if (tmpUsers.length>1) // Avoid duplicating last result
 recipients = filter(recipients, tmpUsers);
 


async function next(startAt) 
 if (!startAt) 
 return await admin.database().ref('Users')
 .orderByKey()
 .limitToFirst(1000)
 .once('value').then(r => r.val()).catch(e => console.log(e));
 else 
 return await admin.database().ref('Users')
 .orderByKey()
 .startAt(startAt)
 .limitToFirst(1000)
 .once('value').then(r => r.val()).catch(e => console.log(e));
 


function filter(array1, array2) 
 // TODO: Filter the results here as we can't combine orderByChild/orderByKey
 return array1.concat(array2);

answered Nov 14 '18 at 13:54

Grimthorr

4,42442235

You could try paginating your query by combining limitToFirst/limitToLast and startAt/endAt.

In node.js, it might look something like this (untested code):

let recipients = ;

let tmpUsers = next();
recipients = filter(recipients, tmpUsers);

// startAt is inclusive, so when this reaches the last result there will only be 1
while (tmpUsers.length>1) 
 let lastKey = tmpUsers.slice(-1).pop().key;
 tmpUsers = next(lastKey);
 if (tmpUsers.length>1) // Avoid duplicating last result
 recipients = filter(recipients, tmpUsers);
 


async function next(startAt) 
 if (!startAt) 
 return await admin.database().ref('Users')
 .orderByKey()
 .limitToFirst(1000)
 .once('value').then(r => r.val()).catch(e => console.log(e));
 else 
 return await admin.database().ref('Users')
 .orderByKey()
 .startAt(startAt)
 .limitToFirst(1000)
 .once('value').then(r => r.val()).catch(e => console.log(e));
 


function filter(array1, array2) 
 // TODO: Filter the results here as we can't combine orderByChild/orderByKey
 return array1.concat(array2);

answered Nov 14 '18 at 13:54

Grimthorr

4,42442235

answered Nov 14 '18 at 13:54

Grimthorr

4,42442235

answered Nov 14 '18 at 13:54

Grimthorr

4,42442235

answered Nov 14 '18 at 13:54

Grimthorr

4,42442235

One question. How will the code know when to stop?

– Alex Ironside
Nov 14 '18 at 13:57

1

It will keep going until the database only returns 1 result or less (while (tmpUsers.length>1)) which means it's reached the end of the list. This is because startAt is inclusive, so returning 1 results means it's reached the end. At least, that's the idea, if my code works!

– Grimthorr
Nov 14 '18 at 14:05

Ooook. That makes sense. I'll take a look at it

– Alex Ironside
Nov 14 '18 at 14:12

1

Great answer @Grimthorr!

– Frank van Puffelen
Nov 14 '18 at 14:27

add a comment |

One question. How will the code know when to stop?

– Alex Ironside
Nov 14 '18 at 13:57

1

It will keep going until the database only returns 1 result or less (while (tmpUsers.length>1)) which means it's reached the end of the list. This is because startAt is inclusive, so returning 1 results means it's reached the end. At least, that's the idea, if my code works!

– Grimthorr
Nov 14 '18 at 14:05

Ooook. That makes sense. I'll take a look at it

– Alex Ironside
Nov 14 '18 at 14:12

1

Great answer @Grimthorr!

– Frank van Puffelen
Nov 14 '18 at 14:27

One question. How will the code know when to stop?

– Alex Ironside
Nov 14 '18 at 13:57

It will keep going until the database only returns 1 result or less (while (tmpUsers.length>1)) which means it's reached the end of the list. This is because startAt is inclusive, so returning 1 results means it's reached the end. At least, that's the idea, if my code works!

– Grimthorr
Nov 14 '18 at 14:05

Ooook. That makes sense. I'll take a look at it

– Alex Ironside
Nov 14 '18 at 14:12

Great answer @Grimthorr!

– Frank van Puffelen
Nov 14 '18 at 14:27

add a comment |

You don't need to know the size of the collection to process them by batch.

You can do it by ordering them by key, limiting to 1000 or so, and then on next batch start the last key of the first batch.

If you still want to know how to get the size of the collection, the only good way is to maintain the size of collection in separate node and keep it updated when the collection is updated.

answered Nov 14 '18 at 14:00

emil

1,98831424

add a comment |

You don't need to know the size of the collection to process them by batch.

You can do it by ordering them by key, limiting to 1000 or so, and then on next batch start the last key of the first batch.

If you still want to know how to get the size of the collection, the only good way is to maintain the size of collection in separate node and keep it updated when the collection is updated.

answered Nov 14 '18 at 14:00

emil

1,98831424

add a comment |

You don't need to know the size of the collection to process them by batch.

You can do it by ordering them by key, limiting to 1000 or so, and then on next batch start the last key of the first batch.

If you still want to know how to get the size of the collection, the only good way is to maintain the size of collection in separate node and keep it updated when the collection is updated.

answered Nov 14 '18 at 14:00

emil

1,98831424

You don't need to know the size of the collection to process them by batch.

You can do it by ordering them by key, limiting to 1000 or so, and then on next batch start the last key of the first batch.

If you still want to know how to get the size of the collection, the only good way is to maintain the size of collection in separate node and keep it updated when the collection is updated.

answered Nov 14 '18 at 14:00

emil

1,98831424

answered Nov 14 '18 at 14:00

emil

1,98831424

answered Nov 14 '18 at 14:00

emil

1,98831424

answered Nov 14 '18 at 14:00

emil

1,98831424

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Myujth