Multiplex between Iterators in TensorFlow

What I need, functionally: My dataset is partitioned in blocks, and each block sits in a binary file.
I have an algorithm that also operates on blocks to reduce computational complexity, and then merges the results together, after visiting all blocks. It's important to have a single minibatch of data to originate from a single block, and to know which block exactly to be able to pass some parameters into the graph specific to that particular block. The next iteration, when starting again at block 0, the next minibatch from all blocks should be used. Blocks can have non-equal lengths and should be forever repeating.

My current solution: Currently, I create a tf.Iterator per block (i.e.: per file), created with a tf.data.FixedLengthRecordDataset:

// for every file:
ds = tf.dataFixedLengthRecordDataset(...)
ds = ds.repeat()
ds = ds.batch(...)
ds = ds.map(...)
ds = ds.prefetch(buffer_size=1)

it = ds.make_one_shot_iterator()

Then I have an "master" iterator that multiplexes between the file-level iterators. This is done through:

itr_handle = tf.placeholder(tf.string, shape=())
master_itr = tf.data.Iterator.from_string_handle(itr_handle, output_types)
master_next = master_itr.get_next()

So each time the graph is executed, I pass into the placeholder the string handle of the respective iterator I want to use for this execution. This way every file-level iterator still has its own state; so when the same block-file is asked for the next minibatch, it returns effectively the next minibatch, instead of reopening the file, and simply returning the first minibatch again.

The problem: Creating the file-level iterators is slow. It takes at least 200ms to create an Iterator per file. The dataset I use can easily contain up to 100 block-files, which causes TensorFlow/Python to be sitting there making these Iterator objects and graph nodes for 20 seconds, not actually processing any data.

Question:

Another approach to tackle this problem with only 1 iterator for example?

If not, how to speed up Iterator creation?

asked Nov 12 '18 at 23:21

Martijn Courteaux

49k36165258

Looks like you can use the tf.data api, specifically interleave or parallel_interleave You would create a single dataset, interleaving data in according to the ordering you need, and then you batch it and have a single iterator iterate through it.
– kvish
Nov 13 '18 at 2:02

Do you need to explicitly control which block to pick a batch from on every graph run, or is this just something you added as part of your current solution?
– jdehesa
Nov 13 '18 at 11:53

@jdehesa: I want every block exactly once, then merge the results. Then I can repeat the whole process. While processing block by block, I need to know which block to configure the parameters in the algorithm, specific to that block.
– Martijn Courteaux
Nov 13 '18 at 12:06

add a comment |

My current solution: Currently, I create a tf.Iterator per block (i.e.: per file), created with a tf.data.FixedLengthRecordDataset:

// for every file:
ds = tf.dataFixedLengthRecordDataset(...)
ds = ds.repeat()
ds = ds.batch(...)
ds = ds.map(...)
ds = ds.prefetch(buffer_size=1)

it = ds.make_one_shot_iterator()

Then I have an "master" iterator that multiplexes between the file-level iterators. This is done through:

itr_handle = tf.placeholder(tf.string, shape=())
master_itr = tf.data.Iterator.from_string_handle(itr_handle, output_types)
master_next = master_itr.get_next()

Question:

Another approach to tackle this problem with only 1 iterator for example?

If not, how to speed up Iterator creation?

asked Nov 12 '18 at 23:21

Martijn Courteaux

49k36165258

Looks like you can use the tf.data api, specifically interleave or parallel_interleave You would create a single dataset, interleaving data in according to the ordering you need, and then you batch it and have a single iterator iterate through it.
– kvish
Nov 13 '18 at 2:02

Do you need to explicitly control which block to pick a batch from on every graph run, or is this just something you added as part of your current solution?
– jdehesa
Nov 13 '18 at 11:53

@jdehesa: I want every block exactly once, then merge the results. Then I can repeat the whole process. While processing block by block, I need to know which block to configure the parameters in the algorithm, specific to that block.
– Martijn Courteaux
Nov 13 '18 at 12:06

add a comment |

My current solution: Currently, I create a tf.Iterator per block (i.e.: per file), created with a tf.data.FixedLengthRecordDataset:

// for every file:
ds = tf.dataFixedLengthRecordDataset(...)
ds = ds.repeat()
ds = ds.batch(...)
ds = ds.map(...)
ds = ds.prefetch(buffer_size=1)

it = ds.make_one_shot_iterator()

Then I have an "master" iterator that multiplexes between the file-level iterators. This is done through:

itr_handle = tf.placeholder(tf.string, shape=())
master_itr = tf.data.Iterator.from_string_handle(itr_handle, output_types)
master_next = master_itr.get_next()

Question:

Another approach to tackle this problem with only 1 iterator for example?

If not, how to speed up Iterator creation?

asked Nov 12 '18 at 23:21

Martijn Courteaux

49k36165258

My current solution: Currently, I create a tf.Iterator per block (i.e.: per file), created with a tf.data.FixedLengthRecordDataset:

// for every file:
ds = tf.dataFixedLengthRecordDataset(...)
ds = ds.repeat()
ds = ds.batch(...)
ds = ds.map(...)
ds = ds.prefetch(buffer_size=1)

it = ds.make_one_shot_iterator()

Then I have an "master" iterator that multiplexes between the file-level iterators. This is done through:

itr_handle = tf.placeholder(tf.string, shape=())
master_itr = tf.data.Iterator.from_string_handle(itr_handle, output_types)
master_next = master_itr.get_next()

Question:

Another approach to tackle this problem with only 1 iterator for example?

If not, how to speed up Iterator creation?

python tensorflow

asked Nov 12 '18 at 23:21

Martijn Courteaux

49k36165258

asked Nov 12 '18 at 23:21

Martijn Courteaux

49k36165258

asked Nov 12 '18 at 23:21

Martijn Courteaux

49k36165258

asked Nov 12 '18 at 23:21

Martijn Courteaux

49k36165258

asked Nov 12 '18 at 23:21

Martijn Courteaux

49k36165258

Looks like you can use the tf.data api, specifically interleave or parallel_interleave You would create a single dataset, interleaving data in according to the ordering you need, and then you batch it and have a single iterator iterate through it.
– kvish
Nov 13 '18 at 2:02

Do you need to explicitly control which block to pick a batch from on every graph run, or is this just something you added as part of your current solution?
– jdehesa
Nov 13 '18 at 11:53

@jdehesa: I want every block exactly once, then merge the results. Then I can repeat the whole process. While processing block by block, I need to know which block to configure the parameters in the algorithm, specific to that block.
– Martijn Courteaux
Nov 13 '18 at 12:06

add a comment |

Looks like you can use the tf.data api, specifically interleave or parallel_interleave You would create a single dataset, interleaving data in according to the ordering you need, and then you batch it and have a single iterator iterate through it.
– kvish
Nov 13 '18 at 2:02

Do you need to explicitly control which block to pick a batch from on every graph run, or is this just something you added as part of your current solution?
– jdehesa
Nov 13 '18 at 11:53

@jdehesa: I want every block exactly once, then merge the results. Then I can repeat the whole process. While processing block by block, I need to know which block to configure the parameters in the algorithm, specific to that block.
– Martijn Courteaux
Nov 13 '18 at 12:06

Looks like you can use the tf.data api, specifically interleave or parallel_interleave You would create a single dataset, interleaving data in according to the ordering you need, and then you batch it and have a single iterator iterate through it.
– kvish
Nov 13 '18 at 2:02

Do you need to explicitly control which block to pick a batch from on every graph run, or is this just something you added as part of your current solution?
– jdehesa
Nov 13 '18 at 11:53

@jdehesa: I want every block exactly once, then merge the results. Then I can repeat the whole process. While processing block by block, I need to know which block to configure the parameters in the algorithm, specific to that block.
– Martijn Courteaux
Nov 13 '18 at 12:06

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53271529%2fmultiplex-between-iterators-in-tensorflow%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Myujth