PyTorch next(iter(training_loader)) extremely slow, simple data, can't num_workers?
Here x_dat
and y_dat
are just really long 1-dimensional tensors.
class FunctionDataset(Dataset):
def __init__(self):
x_dat, y_dat = data_product()
self.length = len(x_dat)
self.y_dat = y_dat
self.x_dat = x_dat
def __getitem__(self, index):
sample = self.x_dat[index]
label = self.y_dat[index]
return sample, label
def __len__(self):
return self.length
...
data_set = FunctionDataset()
...
training_sampler = SubsetRandomSampler(train_indices)
validation_sampler = SubsetRandomSampler(validation_indices)
training_loader = DataLoader(data_set, sampler=training_sampler, batch_size=params['batch_size'], shuffle=False)
validation_loader = DataLoader(data_set, sampler=validation_sampler, batch_size=valid_size, shuffle=False)
I have also tried pinning the memory for the two loaders. Setting num_workers
to > 0 gives me run-time errors between the processes (like EOF error and interruption errors). I get my batch with:
x_val, target = next(iter(training_loader))
The entire data-set would fit into memory/gpu but I would like to emulate batches for this experiment. Profiling my process gives me the following:
16276989 function calls (16254744 primitive calls) in 38.779 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1745/1 0.028 0.000 38.780 38.780 built-in method builtins.exec
1 0.052 0.052 38.780 38.780 simple aprox.py:3(<module>)
1 0.000 0.000 36.900 36.900 simple aprox.py:519(exploreHeatmap)
1 0.000 0.000 36.900 36.900 simple aprox.py:497(optFromSample)
1 0.033 0.033 36.900 36.900 simple aprox.py:274(train)
705/483 0.001 0.000 34.495 0.071 built-in method builtins.next
222 1.525 0.007 34.493 0.155 dataloader.py:311(__next__)
222 0.851 0.004 12.752 0.057 dataloader.py:314(<listcomp>)
3016001 11.901 0.000 11.901 0.000 simple aprox.py:176(__getitem__)
21 0.010 0.000 10.891 0.519 simple aprox.py:413(validationError)
443 1.380 0.003 9.664 0.022 sampler.py:136(__iter__)
663/221 2.209 0.003 8.652 0.039 dataloader.py:151(default_collate)
221 0.070 0.000 6.441 0.029 dataloader.py:187(<listcomp>)
442 6.369 0.014 6.369 0.014 built-in method stack
3060221 2.799 0.000 5.890 0.000 sampler.py:68(<genexpr>)
3060000 3.091 0.000 3.091 0.000 tensor.py:382(<lambda>)
222 0.001 0.000 1.985 0.009 sampler.py:67(__iter__)
222 1.982 0.009 1.982 0.009 built-in method randperm
663/221 0.002 0.000 1.901 0.009 dataloader.py:192(pin_memory_batch)
221 0.000 0.000 1.899 0.009 dataloader.py:200(<listcomp>)
....
Suggesting the data loader is immensely slow compared to the remaining activity of my experiment (training the model and lots of other computations etc.). What's going wrong and what would be the best way to speed this up?
python performance machine-learning iterator pytorch
add a comment |
Here x_dat
and y_dat
are just really long 1-dimensional tensors.
class FunctionDataset(Dataset):
def __init__(self):
x_dat, y_dat = data_product()
self.length = len(x_dat)
self.y_dat = y_dat
self.x_dat = x_dat
def __getitem__(self, index):
sample = self.x_dat[index]
label = self.y_dat[index]
return sample, label
def __len__(self):
return self.length
...
data_set = FunctionDataset()
...
training_sampler = SubsetRandomSampler(train_indices)
validation_sampler = SubsetRandomSampler(validation_indices)
training_loader = DataLoader(data_set, sampler=training_sampler, batch_size=params['batch_size'], shuffle=False)
validation_loader = DataLoader(data_set, sampler=validation_sampler, batch_size=valid_size, shuffle=False)
I have also tried pinning the memory for the two loaders. Setting num_workers
to > 0 gives me run-time errors between the processes (like EOF error and interruption errors). I get my batch with:
x_val, target = next(iter(training_loader))
The entire data-set would fit into memory/gpu but I would like to emulate batches for this experiment. Profiling my process gives me the following:
16276989 function calls (16254744 primitive calls) in 38.779 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1745/1 0.028 0.000 38.780 38.780 built-in method builtins.exec
1 0.052 0.052 38.780 38.780 simple aprox.py:3(<module>)
1 0.000 0.000 36.900 36.900 simple aprox.py:519(exploreHeatmap)
1 0.000 0.000 36.900 36.900 simple aprox.py:497(optFromSample)
1 0.033 0.033 36.900 36.900 simple aprox.py:274(train)
705/483 0.001 0.000 34.495 0.071 built-in method builtins.next
222 1.525 0.007 34.493 0.155 dataloader.py:311(__next__)
222 0.851 0.004 12.752 0.057 dataloader.py:314(<listcomp>)
3016001 11.901 0.000 11.901 0.000 simple aprox.py:176(__getitem__)
21 0.010 0.000 10.891 0.519 simple aprox.py:413(validationError)
443 1.380 0.003 9.664 0.022 sampler.py:136(__iter__)
663/221 2.209 0.003 8.652 0.039 dataloader.py:151(default_collate)
221 0.070 0.000 6.441 0.029 dataloader.py:187(<listcomp>)
442 6.369 0.014 6.369 0.014 built-in method stack
3060221 2.799 0.000 5.890 0.000 sampler.py:68(<genexpr>)
3060000 3.091 0.000 3.091 0.000 tensor.py:382(<lambda>)
222 0.001 0.000 1.985 0.009 sampler.py:67(__iter__)
222 1.982 0.009 1.982 0.009 built-in method randperm
663/221 0.002 0.000 1.901 0.009 dataloader.py:192(pin_memory_batch)
221 0.000 0.000 1.899 0.009 dataloader.py:200(<listcomp>)
....
Suggesting the data loader is immensely slow compared to the remaining activity of my experiment (training the model and lots of other computations etc.). What's going wrong and what would be the best way to speed this up?
python performance machine-learning iterator pytorch
add a comment |
Here x_dat
and y_dat
are just really long 1-dimensional tensors.
class FunctionDataset(Dataset):
def __init__(self):
x_dat, y_dat = data_product()
self.length = len(x_dat)
self.y_dat = y_dat
self.x_dat = x_dat
def __getitem__(self, index):
sample = self.x_dat[index]
label = self.y_dat[index]
return sample, label
def __len__(self):
return self.length
...
data_set = FunctionDataset()
...
training_sampler = SubsetRandomSampler(train_indices)
validation_sampler = SubsetRandomSampler(validation_indices)
training_loader = DataLoader(data_set, sampler=training_sampler, batch_size=params['batch_size'], shuffle=False)
validation_loader = DataLoader(data_set, sampler=validation_sampler, batch_size=valid_size, shuffle=False)
I have also tried pinning the memory for the two loaders. Setting num_workers
to > 0 gives me run-time errors between the processes (like EOF error and interruption errors). I get my batch with:
x_val, target = next(iter(training_loader))
The entire data-set would fit into memory/gpu but I would like to emulate batches for this experiment. Profiling my process gives me the following:
16276989 function calls (16254744 primitive calls) in 38.779 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1745/1 0.028 0.000 38.780 38.780 built-in method builtins.exec
1 0.052 0.052 38.780 38.780 simple aprox.py:3(<module>)
1 0.000 0.000 36.900 36.900 simple aprox.py:519(exploreHeatmap)
1 0.000 0.000 36.900 36.900 simple aprox.py:497(optFromSample)
1 0.033 0.033 36.900 36.900 simple aprox.py:274(train)
705/483 0.001 0.000 34.495 0.071 built-in method builtins.next
222 1.525 0.007 34.493 0.155 dataloader.py:311(__next__)
222 0.851 0.004 12.752 0.057 dataloader.py:314(<listcomp>)
3016001 11.901 0.000 11.901 0.000 simple aprox.py:176(__getitem__)
21 0.010 0.000 10.891 0.519 simple aprox.py:413(validationError)
443 1.380 0.003 9.664 0.022 sampler.py:136(__iter__)
663/221 2.209 0.003 8.652 0.039 dataloader.py:151(default_collate)
221 0.070 0.000 6.441 0.029 dataloader.py:187(<listcomp>)
442 6.369 0.014 6.369 0.014 built-in method stack
3060221 2.799 0.000 5.890 0.000 sampler.py:68(<genexpr>)
3060000 3.091 0.000 3.091 0.000 tensor.py:382(<lambda>)
222 0.001 0.000 1.985 0.009 sampler.py:67(__iter__)
222 1.982 0.009 1.982 0.009 built-in method randperm
663/221 0.002 0.000 1.901 0.009 dataloader.py:192(pin_memory_batch)
221 0.000 0.000 1.899 0.009 dataloader.py:200(<listcomp>)
....
Suggesting the data loader is immensely slow compared to the remaining activity of my experiment (training the model and lots of other computations etc.). What's going wrong and what would be the best way to speed this up?
python performance machine-learning iterator pytorch
Here x_dat
and y_dat
are just really long 1-dimensional tensors.
class FunctionDataset(Dataset):
def __init__(self):
x_dat, y_dat = data_product()
self.length = len(x_dat)
self.y_dat = y_dat
self.x_dat = x_dat
def __getitem__(self, index):
sample = self.x_dat[index]
label = self.y_dat[index]
return sample, label
def __len__(self):
return self.length
...
data_set = FunctionDataset()
...
training_sampler = SubsetRandomSampler(train_indices)
validation_sampler = SubsetRandomSampler(validation_indices)
training_loader = DataLoader(data_set, sampler=training_sampler, batch_size=params['batch_size'], shuffle=False)
validation_loader = DataLoader(data_set, sampler=validation_sampler, batch_size=valid_size, shuffle=False)
I have also tried pinning the memory for the two loaders. Setting num_workers
to > 0 gives me run-time errors between the processes (like EOF error and interruption errors). I get my batch with:
x_val, target = next(iter(training_loader))
The entire data-set would fit into memory/gpu but I would like to emulate batches for this experiment. Profiling my process gives me the following:
16276989 function calls (16254744 primitive calls) in 38.779 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1745/1 0.028 0.000 38.780 38.780 built-in method builtins.exec
1 0.052 0.052 38.780 38.780 simple aprox.py:3(<module>)
1 0.000 0.000 36.900 36.900 simple aprox.py:519(exploreHeatmap)
1 0.000 0.000 36.900 36.900 simple aprox.py:497(optFromSample)
1 0.033 0.033 36.900 36.900 simple aprox.py:274(train)
705/483 0.001 0.000 34.495 0.071 built-in method builtins.next
222 1.525 0.007 34.493 0.155 dataloader.py:311(__next__)
222 0.851 0.004 12.752 0.057 dataloader.py:314(<listcomp>)
3016001 11.901 0.000 11.901 0.000 simple aprox.py:176(__getitem__)
21 0.010 0.000 10.891 0.519 simple aprox.py:413(validationError)
443 1.380 0.003 9.664 0.022 sampler.py:136(__iter__)
663/221 2.209 0.003 8.652 0.039 dataloader.py:151(default_collate)
221 0.070 0.000 6.441 0.029 dataloader.py:187(<listcomp>)
442 6.369 0.014 6.369 0.014 built-in method stack
3060221 2.799 0.000 5.890 0.000 sampler.py:68(<genexpr>)
3060000 3.091 0.000 3.091 0.000 tensor.py:382(<lambda>)
222 0.001 0.000 1.985 0.009 sampler.py:67(__iter__)
222 1.982 0.009 1.982 0.009 built-in method randperm
663/221 0.002 0.000 1.901 0.009 dataloader.py:192(pin_memory_batch)
221 0.000 0.000 1.899 0.009 dataloader.py:200(<listcomp>)
....
Suggesting the data loader is immensely slow compared to the remaining activity of my experiment (training the model and lots of other computations etc.). What's going wrong and what would be the best way to speed this up?
python performance machine-learning iterator pytorch
python performance machine-learning iterator pytorch
edited Nov 14 '18 at 4:47
Milo Lu
1,60311327
1,60311327
asked Nov 13 '18 at 12:24
ZirconCodeZirconCode
484621
484621
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
When retrieving a batch with
x, y = next(iter(training_loader))
you actually create a new instance of dataloader iterator at each call (!) See this thread for more infotrmation.
What you should do instead is create the iterator once (per epoch):
training_loader_iter = iter(training_loader)
and then call next
for each batch on the iterator
for i in range(num_batches_in_epoch):
x, y = next(training_loader_iter)
I had similar issue before, and this also made the EOF errors you experience when using multiple workers go away.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53280967%2fpytorch-nextitertraining-loader-extremely-slow-simple-data-cant-num-worke%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
When retrieving a batch with
x, y = next(iter(training_loader))
you actually create a new instance of dataloader iterator at each call (!) See this thread for more infotrmation.
What you should do instead is create the iterator once (per epoch):
training_loader_iter = iter(training_loader)
and then call next
for each batch on the iterator
for i in range(num_batches_in_epoch):
x, y = next(training_loader_iter)
I had similar issue before, and this also made the EOF errors you experience when using multiple workers go away.
add a comment |
When retrieving a batch with
x, y = next(iter(training_loader))
you actually create a new instance of dataloader iterator at each call (!) See this thread for more infotrmation.
What you should do instead is create the iterator once (per epoch):
training_loader_iter = iter(training_loader)
and then call next
for each batch on the iterator
for i in range(num_batches_in_epoch):
x, y = next(training_loader_iter)
I had similar issue before, and this also made the EOF errors you experience when using multiple workers go away.
add a comment |
When retrieving a batch with
x, y = next(iter(training_loader))
you actually create a new instance of dataloader iterator at each call (!) See this thread for more infotrmation.
What you should do instead is create the iterator once (per epoch):
training_loader_iter = iter(training_loader)
and then call next
for each batch on the iterator
for i in range(num_batches_in_epoch):
x, y = next(training_loader_iter)
I had similar issue before, and this also made the EOF errors you experience when using multiple workers go away.
When retrieving a batch with
x, y = next(iter(training_loader))
you actually create a new instance of dataloader iterator at each call (!) See this thread for more infotrmation.
What you should do instead is create the iterator once (per epoch):
training_loader_iter = iter(training_loader)
and then call next
for each batch on the iterator
for i in range(num_batches_in_epoch):
x, y = next(training_loader_iter)
I had similar issue before, and this also made the EOF errors you experience when using multiple workers go away.
answered Nov 14 '18 at 5:58
ShaiShai
69.4k22135242
69.4k22135242
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53280967%2fpytorch-nextitertraining-loader-extremely-slow-simple-data-cant-num-worke%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown