shortnames and parallel file copy
I am dealing with code that:
enumerates source and destination directories and generates
src
,dst
pairs... each pair is sent to a pool of worker threads
... which performs the work, for example "copy
src
todst
"
(all this is simplified quite a bit).
Problem:
When file gets created it also gets a shortname which can be the same as another file in source directory (name collision) which leads to variety of effects (depending on the order of operations). For example, copying two files my file
and MYFILE~1
can produce 2 or 1 files in destination (depending on your luck), probably with corrupted content (in latter case).
Question:
How to avoid problems that arise from such collisions? Would be nice to have a function that creates/opens a file ignoring shortnames...
Notes:
can't assume anything about the way shortname is generated. Various systems employ different schemes (see this)
even if you run these jobs in sequential manner (one-by-one) -- they need to be executed in order which depends on shortname generation logic (which is unknown). Plus this implies loading and sorting/etc entire directory in memory before running any jobs
both source and destination can be very big (potentially millions files), (if possible) I'd like to avoid loading entire directory into memory or enumerating it multiple times
can't switch off shortname generation in destination volume and making it a requirement is not an option (plus, switching it off doesn't remove existing shortnames anyway)
application is limited only to Win32 API and NT API
Edit: it occurred to me that in general case you can't do it even if everything happens on one thread -- simply because regardless of order you choose, there will be a shortname generation scheme and a set of filenames that is guaranteed to produce a collision during processing.
If this is correct -- how system utilities copy files? Do they assume something about shortnames or perform "validate and fix discrepancies" after copy is complete?
winapi filesystems ntdll
|
show 4 more comments
I am dealing with code that:
enumerates source and destination directories and generates
src
,dst
pairs... each pair is sent to a pool of worker threads
... which performs the work, for example "copy
src
todst
"
(all this is simplified quite a bit).
Problem:
When file gets created it also gets a shortname which can be the same as another file in source directory (name collision) which leads to variety of effects (depending on the order of operations). For example, copying two files my file
and MYFILE~1
can produce 2 or 1 files in destination (depending on your luck), probably with corrupted content (in latter case).
Question:
How to avoid problems that arise from such collisions? Would be nice to have a function that creates/opens a file ignoring shortnames...
Notes:
can't assume anything about the way shortname is generated. Various systems employ different schemes (see this)
even if you run these jobs in sequential manner (one-by-one) -- they need to be executed in order which depends on shortname generation logic (which is unknown). Plus this implies loading and sorting/etc entire directory in memory before running any jobs
both source and destination can be very big (potentially millions files), (if possible) I'd like to avoid loading entire directory into memory or enumerating it multiple times
can't switch off shortname generation in destination volume and making it a requirement is not an option (plus, switching it off doesn't remove existing shortnames anyway)
application is limited only to Win32 API and NT API
Edit: it occurred to me that in general case you can't do it even if everything happens on one thread -- simply because regardless of order you choose, there will be a shortname generation scheme and a set of filenames that is guaranteed to produce a collision during processing.
If this is correct -- how system utilities copy files? Do they assume something about shortnames or perform "validate and fix discrepancies" after copy is complete?
winapi filesystems ntdll
Are you sure you've diagnosed this correctly? I really doubt the filesystem will create two files with the same short name.
– Jonathan Potter
Nov 16 '18 at 3:12
1
No, it won't. Copyingmy file
(which hasMYFILE~2
shortname) will createmy file
in dst directory (likely withMYFILE~1
shortname). If after this you copyMYFILE~1
(which doesn't have shortname) -- it will overwrite file created on previous step and (depending on many things) probably corrupt it's data (if copy happens in parallel).
– C.M.
Nov 16 '18 at 3:17
Why are you using short filenames at all? Copy the files using their full filenames. On some filesystems, you can even turn off the generation of short filenames.
– Remy Lebeau
Nov 16 '18 at 3:24
@Remy I think the asker is copying using the full filenames, but the short filename generated on the destination is different from that at the source, and then interferes with subsequent operations.
– David Heffernan
Nov 16 '18 at 9:25
1
@C.M. I'd wonder why you are using threads for an operation which is not CPU bound
– David Heffernan
Nov 16 '18 at 9:26
|
show 4 more comments
I am dealing with code that:
enumerates source and destination directories and generates
src
,dst
pairs... each pair is sent to a pool of worker threads
... which performs the work, for example "copy
src
todst
"
(all this is simplified quite a bit).
Problem:
When file gets created it also gets a shortname which can be the same as another file in source directory (name collision) which leads to variety of effects (depending on the order of operations). For example, copying two files my file
and MYFILE~1
can produce 2 or 1 files in destination (depending on your luck), probably with corrupted content (in latter case).
Question:
How to avoid problems that arise from such collisions? Would be nice to have a function that creates/opens a file ignoring shortnames...
Notes:
can't assume anything about the way shortname is generated. Various systems employ different schemes (see this)
even if you run these jobs in sequential manner (one-by-one) -- they need to be executed in order which depends on shortname generation logic (which is unknown). Plus this implies loading and sorting/etc entire directory in memory before running any jobs
both source and destination can be very big (potentially millions files), (if possible) I'd like to avoid loading entire directory into memory or enumerating it multiple times
can't switch off shortname generation in destination volume and making it a requirement is not an option (plus, switching it off doesn't remove existing shortnames anyway)
application is limited only to Win32 API and NT API
Edit: it occurred to me that in general case you can't do it even if everything happens on one thread -- simply because regardless of order you choose, there will be a shortname generation scheme and a set of filenames that is guaranteed to produce a collision during processing.
If this is correct -- how system utilities copy files? Do they assume something about shortnames or perform "validate and fix discrepancies" after copy is complete?
winapi filesystems ntdll
I am dealing with code that:
enumerates source and destination directories and generates
src
,dst
pairs... each pair is sent to a pool of worker threads
... which performs the work, for example "copy
src
todst
"
(all this is simplified quite a bit).
Problem:
When file gets created it also gets a shortname which can be the same as another file in source directory (name collision) which leads to variety of effects (depending on the order of operations). For example, copying two files my file
and MYFILE~1
can produce 2 or 1 files in destination (depending on your luck), probably with corrupted content (in latter case).
Question:
How to avoid problems that arise from such collisions? Would be nice to have a function that creates/opens a file ignoring shortnames...
Notes:
can't assume anything about the way shortname is generated. Various systems employ different schemes (see this)
even if you run these jobs in sequential manner (one-by-one) -- they need to be executed in order which depends on shortname generation logic (which is unknown). Plus this implies loading and sorting/etc entire directory in memory before running any jobs
both source and destination can be very big (potentially millions files), (if possible) I'd like to avoid loading entire directory into memory or enumerating it multiple times
can't switch off shortname generation in destination volume and making it a requirement is not an option (plus, switching it off doesn't remove existing shortnames anyway)
application is limited only to Win32 API and NT API
Edit: it occurred to me that in general case you can't do it even if everything happens on one thread -- simply because regardless of order you choose, there will be a shortname generation scheme and a set of filenames that is guaranteed to produce a collision during processing.
If this is correct -- how system utilities copy files? Do they assume something about shortnames or perform "validate and fix discrepancies" after copy is complete?
winapi filesystems ntdll
winapi filesystems ntdll
edited Nov 16 '18 at 2:51
C.M.
asked Nov 16 '18 at 2:36
C.M.C.M.
797621
797621
Are you sure you've diagnosed this correctly? I really doubt the filesystem will create two files with the same short name.
– Jonathan Potter
Nov 16 '18 at 3:12
1
No, it won't. Copyingmy file
(which hasMYFILE~2
shortname) will createmy file
in dst directory (likely withMYFILE~1
shortname). If after this you copyMYFILE~1
(which doesn't have shortname) -- it will overwrite file created on previous step and (depending on many things) probably corrupt it's data (if copy happens in parallel).
– C.M.
Nov 16 '18 at 3:17
Why are you using short filenames at all? Copy the files using their full filenames. On some filesystems, you can even turn off the generation of short filenames.
– Remy Lebeau
Nov 16 '18 at 3:24
@Remy I think the asker is copying using the full filenames, but the short filename generated on the destination is different from that at the source, and then interferes with subsequent operations.
– David Heffernan
Nov 16 '18 at 9:25
1
@C.M. I'd wonder why you are using threads for an operation which is not CPU bound
– David Heffernan
Nov 16 '18 at 9:26
|
show 4 more comments
Are you sure you've diagnosed this correctly? I really doubt the filesystem will create two files with the same short name.
– Jonathan Potter
Nov 16 '18 at 3:12
1
No, it won't. Copyingmy file
(which hasMYFILE~2
shortname) will createmy file
in dst directory (likely withMYFILE~1
shortname). If after this you copyMYFILE~1
(which doesn't have shortname) -- it will overwrite file created on previous step and (depending on many things) probably corrupt it's data (if copy happens in parallel).
– C.M.
Nov 16 '18 at 3:17
Why are you using short filenames at all? Copy the files using their full filenames. On some filesystems, you can even turn off the generation of short filenames.
– Remy Lebeau
Nov 16 '18 at 3:24
@Remy I think the asker is copying using the full filenames, but the short filename generated on the destination is different from that at the source, and then interferes with subsequent operations.
– David Heffernan
Nov 16 '18 at 9:25
1
@C.M. I'd wonder why you are using threads for an operation which is not CPU bound
– David Heffernan
Nov 16 '18 at 9:26
Are you sure you've diagnosed this correctly? I really doubt the filesystem will create two files with the same short name.
– Jonathan Potter
Nov 16 '18 at 3:12
Are you sure you've diagnosed this correctly? I really doubt the filesystem will create two files with the same short name.
– Jonathan Potter
Nov 16 '18 at 3:12
1
1
No, it won't. Copying
my file
(which has MYFILE~2
shortname) will create my file
in dst directory (likely with MYFILE~1
shortname). If after this you copy MYFILE~1
(which doesn't have shortname) -- it will overwrite file created on previous step and (depending on many things) probably corrupt it's data (if copy happens in parallel).– C.M.
Nov 16 '18 at 3:17
No, it won't. Copying
my file
(which has MYFILE~2
shortname) will create my file
in dst directory (likely with MYFILE~1
shortname). If after this you copy MYFILE~1
(which doesn't have shortname) -- it will overwrite file created on previous step and (depending on many things) probably corrupt it's data (if copy happens in parallel).– C.M.
Nov 16 '18 at 3:17
Why are you using short filenames at all? Copy the files using their full filenames. On some filesystems, you can even turn off the generation of short filenames.
– Remy Lebeau
Nov 16 '18 at 3:24
Why are you using short filenames at all? Copy the files using their full filenames. On some filesystems, you can even turn off the generation of short filenames.
– Remy Lebeau
Nov 16 '18 at 3:24
@Remy I think the asker is copying using the full filenames, but the short filename generated on the destination is different from that at the source, and then interferes with subsequent operations.
– David Heffernan
Nov 16 '18 at 9:25
@Remy I think the asker is copying using the full filenames, but the short filename generated on the destination is different from that at the source, and then interferes with subsequent operations.
– David Heffernan
Nov 16 '18 at 9:25
1
1
@C.M. I'd wonder why you are using threads for an operation which is not CPU bound
– David Heffernan
Nov 16 '18 at 9:26
@C.M. I'd wonder why you are using threads for an operation which is not CPU bound
– David Heffernan
Nov 16 '18 at 9:26
|
show 4 more comments
1 Answer
1
active
oldest
votes
M.,
If you want to disable the file shorname, You can take steps below:
- Find the key under the registry : HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlFileSystem
- set the value of "NtfsDisable8dot3NameCreation" to 1, 1 means disable file shortname and 0 means enable shortname.
2
Don’t use global state to manage a local problem.
– IInspectable
Nov 16 '18 at 10:18
I did mention that switching off shortname generation is not an option (I have no control over destination volume).
– C.M.
Nov 16 '18 at 19:43
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53330645%2fshortnames-and-parallel-file-copy%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
M.,
If you want to disable the file shorname, You can take steps below:
- Find the key under the registry : HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlFileSystem
- set the value of "NtfsDisable8dot3NameCreation" to 1, 1 means disable file shortname and 0 means enable shortname.
2
Don’t use global state to manage a local problem.
– IInspectable
Nov 16 '18 at 10:18
I did mention that switching off shortname generation is not an option (I have no control over destination volume).
– C.M.
Nov 16 '18 at 19:43
add a comment |
M.,
If you want to disable the file shorname, You can take steps below:
- Find the key under the registry : HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlFileSystem
- set the value of "NtfsDisable8dot3NameCreation" to 1, 1 means disable file shortname and 0 means enable shortname.
2
Don’t use global state to manage a local problem.
– IInspectable
Nov 16 '18 at 10:18
I did mention that switching off shortname generation is not an option (I have no control over destination volume).
– C.M.
Nov 16 '18 at 19:43
add a comment |
M.,
If you want to disable the file shorname, You can take steps below:
- Find the key under the registry : HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlFileSystem
- set the value of "NtfsDisable8dot3NameCreation" to 1, 1 means disable file shortname and 0 means enable shortname.
M.,
If you want to disable the file shorname, You can take steps below:
- Find the key under the registry : HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlFileSystem
- set the value of "NtfsDisable8dot3NameCreation" to 1, 1 means disable file shortname and 0 means enable shortname.
answered Nov 16 '18 at 9:47
Drake Wu - MSFTDrake Wu - MSFT
63017
63017
2
Don’t use global state to manage a local problem.
– IInspectable
Nov 16 '18 at 10:18
I did mention that switching off shortname generation is not an option (I have no control over destination volume).
– C.M.
Nov 16 '18 at 19:43
add a comment |
2
Don’t use global state to manage a local problem.
– IInspectable
Nov 16 '18 at 10:18
I did mention that switching off shortname generation is not an option (I have no control over destination volume).
– C.M.
Nov 16 '18 at 19:43
2
2
Don’t use global state to manage a local problem.
– IInspectable
Nov 16 '18 at 10:18
Don’t use global state to manage a local problem.
– IInspectable
Nov 16 '18 at 10:18
I did mention that switching off shortname generation is not an option (I have no control over destination volume).
– C.M.
Nov 16 '18 at 19:43
I did mention that switching off shortname generation is not an option (I have no control over destination volume).
– C.M.
Nov 16 '18 at 19:43
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53330645%2fshortnames-and-parallel-file-copy%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Are you sure you've diagnosed this correctly? I really doubt the filesystem will create two files with the same short name.
– Jonathan Potter
Nov 16 '18 at 3:12
1
No, it won't. Copying
my file
(which hasMYFILE~2
shortname) will createmy file
in dst directory (likely withMYFILE~1
shortname). If after this you copyMYFILE~1
(which doesn't have shortname) -- it will overwrite file created on previous step and (depending on many things) probably corrupt it's data (if copy happens in parallel).– C.M.
Nov 16 '18 at 3:17
Why are you using short filenames at all? Copy the files using their full filenames. On some filesystems, you can even turn off the generation of short filenames.
– Remy Lebeau
Nov 16 '18 at 3:24
@Remy I think the asker is copying using the full filenames, but the short filename generated on the destination is different from that at the source, and then interferes with subsequent operations.
– David Heffernan
Nov 16 '18 at 9:25
1
@C.M. I'd wonder why you are using threads for an operation which is not CPU bound
– David Heffernan
Nov 16 '18 at 9:26