shortnames and parallel file copy










0















I am dealing with code that:



  • enumerates source and destination directories and generates src, dst pairs


  • ... each pair is sent to a pool of worker threads


  • ... which performs the work, for example "copy src to dst"


(all this is simplified quite a bit).



Problem:



When file gets created it also gets a shortname which can be the same as another file in source directory (name collision) which leads to variety of effects (depending on the order of operations). For example, copying two files my file and MYFILE~1 can produce 2 or 1 files in destination (depending on your luck), probably with corrupted content (in latter case).



Question:



How to avoid problems that arise from such collisions? Would be nice to have a function that creates/opens a file ignoring shortnames...



Notes:



  • can't assume anything about the way shortname is generated. Various systems employ different schemes (see this)


  • even if you run these jobs in sequential manner (one-by-one) -- they need to be executed in order which depends on shortname generation logic (which is unknown). Plus this implies loading and sorting/etc entire directory in memory before running any jobs


  • both source and destination can be very big (potentially millions files), (if possible) I'd like to avoid loading entire directory into memory or enumerating it multiple times


  • can't switch off shortname generation in destination volume and making it a requirement is not an option (plus, switching it off doesn't remove existing shortnames anyway)


  • application is limited only to Win32 API and NT API


Edit: it occurred to me that in general case you can't do it even if everything happens on one thread -- simply because regardless of order you choose, there will be a shortname generation scheme and a set of filenames that is guaranteed to produce a collision during processing.



If this is correct -- how system utilities copy files? Do they assume something about shortnames or perform "validate and fix discrepancies" after copy is complete?










share|improve this question
























  • Are you sure you've diagnosed this correctly? I really doubt the filesystem will create two files with the same short name.

    – Jonathan Potter
    Nov 16 '18 at 3:12






  • 1





    No, it won't. Copying my file (which has MYFILE~2 shortname) will create my file in dst directory (likely with MYFILE~1 shortname). If after this you copy MYFILE~1 (which doesn't have shortname) -- it will overwrite file created on previous step and (depending on many things) probably corrupt it's data (if copy happens in parallel).

    – C.M.
    Nov 16 '18 at 3:17












  • Why are you using short filenames at all? Copy the files using their full filenames. On some filesystems, you can even turn off the generation of short filenames.

    – Remy Lebeau
    Nov 16 '18 at 3:24












  • @Remy I think the asker is copying using the full filenames, but the short filename generated on the destination is different from that at the source, and then interferes with subsequent operations.

    – David Heffernan
    Nov 16 '18 at 9:25






  • 1





    @C.M. I'd wonder why you are using threads for an operation which is not CPU bound

    – David Heffernan
    Nov 16 '18 at 9:26















0















I am dealing with code that:



  • enumerates source and destination directories and generates src, dst pairs


  • ... each pair is sent to a pool of worker threads


  • ... which performs the work, for example "copy src to dst"


(all this is simplified quite a bit).



Problem:



When file gets created it also gets a shortname which can be the same as another file in source directory (name collision) which leads to variety of effects (depending on the order of operations). For example, copying two files my file and MYFILE~1 can produce 2 or 1 files in destination (depending on your luck), probably with corrupted content (in latter case).



Question:



How to avoid problems that arise from such collisions? Would be nice to have a function that creates/opens a file ignoring shortnames...



Notes:



  • can't assume anything about the way shortname is generated. Various systems employ different schemes (see this)


  • even if you run these jobs in sequential manner (one-by-one) -- they need to be executed in order which depends on shortname generation logic (which is unknown). Plus this implies loading and sorting/etc entire directory in memory before running any jobs


  • both source and destination can be very big (potentially millions files), (if possible) I'd like to avoid loading entire directory into memory or enumerating it multiple times


  • can't switch off shortname generation in destination volume and making it a requirement is not an option (plus, switching it off doesn't remove existing shortnames anyway)


  • application is limited only to Win32 API and NT API


Edit: it occurred to me that in general case you can't do it even if everything happens on one thread -- simply because regardless of order you choose, there will be a shortname generation scheme and a set of filenames that is guaranteed to produce a collision during processing.



If this is correct -- how system utilities copy files? Do they assume something about shortnames or perform "validate and fix discrepancies" after copy is complete?










share|improve this question
























  • Are you sure you've diagnosed this correctly? I really doubt the filesystem will create two files with the same short name.

    – Jonathan Potter
    Nov 16 '18 at 3:12






  • 1





    No, it won't. Copying my file (which has MYFILE~2 shortname) will create my file in dst directory (likely with MYFILE~1 shortname). If after this you copy MYFILE~1 (which doesn't have shortname) -- it will overwrite file created on previous step and (depending on many things) probably corrupt it's data (if copy happens in parallel).

    – C.M.
    Nov 16 '18 at 3:17












  • Why are you using short filenames at all? Copy the files using their full filenames. On some filesystems, you can even turn off the generation of short filenames.

    – Remy Lebeau
    Nov 16 '18 at 3:24












  • @Remy I think the asker is copying using the full filenames, but the short filename generated on the destination is different from that at the source, and then interferes with subsequent operations.

    – David Heffernan
    Nov 16 '18 at 9:25






  • 1





    @C.M. I'd wonder why you are using threads for an operation which is not CPU bound

    – David Heffernan
    Nov 16 '18 at 9:26













0












0








0








I am dealing with code that:



  • enumerates source and destination directories and generates src, dst pairs


  • ... each pair is sent to a pool of worker threads


  • ... which performs the work, for example "copy src to dst"


(all this is simplified quite a bit).



Problem:



When file gets created it also gets a shortname which can be the same as another file in source directory (name collision) which leads to variety of effects (depending on the order of operations). For example, copying two files my file and MYFILE~1 can produce 2 or 1 files in destination (depending on your luck), probably with corrupted content (in latter case).



Question:



How to avoid problems that arise from such collisions? Would be nice to have a function that creates/opens a file ignoring shortnames...



Notes:



  • can't assume anything about the way shortname is generated. Various systems employ different schemes (see this)


  • even if you run these jobs in sequential manner (one-by-one) -- they need to be executed in order which depends on shortname generation logic (which is unknown). Plus this implies loading and sorting/etc entire directory in memory before running any jobs


  • both source and destination can be very big (potentially millions files), (if possible) I'd like to avoid loading entire directory into memory or enumerating it multiple times


  • can't switch off shortname generation in destination volume and making it a requirement is not an option (plus, switching it off doesn't remove existing shortnames anyway)


  • application is limited only to Win32 API and NT API


Edit: it occurred to me that in general case you can't do it even if everything happens on one thread -- simply because regardless of order you choose, there will be a shortname generation scheme and a set of filenames that is guaranteed to produce a collision during processing.



If this is correct -- how system utilities copy files? Do they assume something about shortnames or perform "validate and fix discrepancies" after copy is complete?










share|improve this question
















I am dealing with code that:



  • enumerates source and destination directories and generates src, dst pairs


  • ... each pair is sent to a pool of worker threads


  • ... which performs the work, for example "copy src to dst"


(all this is simplified quite a bit).



Problem:



When file gets created it also gets a shortname which can be the same as another file in source directory (name collision) which leads to variety of effects (depending on the order of operations). For example, copying two files my file and MYFILE~1 can produce 2 or 1 files in destination (depending on your luck), probably with corrupted content (in latter case).



Question:



How to avoid problems that arise from such collisions? Would be nice to have a function that creates/opens a file ignoring shortnames...



Notes:



  • can't assume anything about the way shortname is generated. Various systems employ different schemes (see this)


  • even if you run these jobs in sequential manner (one-by-one) -- they need to be executed in order which depends on shortname generation logic (which is unknown). Plus this implies loading and sorting/etc entire directory in memory before running any jobs


  • both source and destination can be very big (potentially millions files), (if possible) I'd like to avoid loading entire directory into memory or enumerating it multiple times


  • can't switch off shortname generation in destination volume and making it a requirement is not an option (plus, switching it off doesn't remove existing shortnames anyway)


  • application is limited only to Win32 API and NT API


Edit: it occurred to me that in general case you can't do it even if everything happens on one thread -- simply because regardless of order you choose, there will be a shortname generation scheme and a set of filenames that is guaranteed to produce a collision during processing.



If this is correct -- how system utilities copy files? Do they assume something about shortnames or perform "validate and fix discrepancies" after copy is complete?







winapi filesystems ntdll






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 16 '18 at 2:51







C.M.

















asked Nov 16 '18 at 2:36









C.M.C.M.

797621




797621












  • Are you sure you've diagnosed this correctly? I really doubt the filesystem will create two files with the same short name.

    – Jonathan Potter
    Nov 16 '18 at 3:12






  • 1





    No, it won't. Copying my file (which has MYFILE~2 shortname) will create my file in dst directory (likely with MYFILE~1 shortname). If after this you copy MYFILE~1 (which doesn't have shortname) -- it will overwrite file created on previous step and (depending on many things) probably corrupt it's data (if copy happens in parallel).

    – C.M.
    Nov 16 '18 at 3:17












  • Why are you using short filenames at all? Copy the files using their full filenames. On some filesystems, you can even turn off the generation of short filenames.

    – Remy Lebeau
    Nov 16 '18 at 3:24












  • @Remy I think the asker is copying using the full filenames, but the short filename generated on the destination is different from that at the source, and then interferes with subsequent operations.

    – David Heffernan
    Nov 16 '18 at 9:25






  • 1





    @C.M. I'd wonder why you are using threads for an operation which is not CPU bound

    – David Heffernan
    Nov 16 '18 at 9:26

















  • Are you sure you've diagnosed this correctly? I really doubt the filesystem will create two files with the same short name.

    – Jonathan Potter
    Nov 16 '18 at 3:12






  • 1





    No, it won't. Copying my file (which has MYFILE~2 shortname) will create my file in dst directory (likely with MYFILE~1 shortname). If after this you copy MYFILE~1 (which doesn't have shortname) -- it will overwrite file created on previous step and (depending on many things) probably corrupt it's data (if copy happens in parallel).

    – C.M.
    Nov 16 '18 at 3:17












  • Why are you using short filenames at all? Copy the files using their full filenames. On some filesystems, you can even turn off the generation of short filenames.

    – Remy Lebeau
    Nov 16 '18 at 3:24












  • @Remy I think the asker is copying using the full filenames, but the short filename generated on the destination is different from that at the source, and then interferes with subsequent operations.

    – David Heffernan
    Nov 16 '18 at 9:25






  • 1





    @C.M. I'd wonder why you are using threads for an operation which is not CPU bound

    – David Heffernan
    Nov 16 '18 at 9:26
















Are you sure you've diagnosed this correctly? I really doubt the filesystem will create two files with the same short name.

– Jonathan Potter
Nov 16 '18 at 3:12





Are you sure you've diagnosed this correctly? I really doubt the filesystem will create two files with the same short name.

– Jonathan Potter
Nov 16 '18 at 3:12




1




1





No, it won't. Copying my file (which has MYFILE~2 shortname) will create my file in dst directory (likely with MYFILE~1 shortname). If after this you copy MYFILE~1 (which doesn't have shortname) -- it will overwrite file created on previous step and (depending on many things) probably corrupt it's data (if copy happens in parallel).

– C.M.
Nov 16 '18 at 3:17






No, it won't. Copying my file (which has MYFILE~2 shortname) will create my file in dst directory (likely with MYFILE~1 shortname). If after this you copy MYFILE~1 (which doesn't have shortname) -- it will overwrite file created on previous step and (depending on many things) probably corrupt it's data (if copy happens in parallel).

– C.M.
Nov 16 '18 at 3:17














Why are you using short filenames at all? Copy the files using their full filenames. On some filesystems, you can even turn off the generation of short filenames.

– Remy Lebeau
Nov 16 '18 at 3:24






Why are you using short filenames at all? Copy the files using their full filenames. On some filesystems, you can even turn off the generation of short filenames.

– Remy Lebeau
Nov 16 '18 at 3:24














@Remy I think the asker is copying using the full filenames, but the short filename generated on the destination is different from that at the source, and then interferes with subsequent operations.

– David Heffernan
Nov 16 '18 at 9:25





@Remy I think the asker is copying using the full filenames, but the short filename generated on the destination is different from that at the source, and then interferes with subsequent operations.

– David Heffernan
Nov 16 '18 at 9:25




1




1





@C.M. I'd wonder why you are using threads for an operation which is not CPU bound

– David Heffernan
Nov 16 '18 at 9:26





@C.M. I'd wonder why you are using threads for an operation which is not CPU bound

– David Heffernan
Nov 16 '18 at 9:26












1 Answer
1






active

oldest

votes


















-2














M.,
If you want to disable the file shorname, You can take steps below:



  1. Find the key under the registry : HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlFileSystem

  2. set the value of "NtfsDisable8dot3NameCreation" to 1, 1 means disable file shortname and 0 means enable shortname.





share|improve this answer


















  • 2





    Don’t use global state to manage a local problem.

    – IInspectable
    Nov 16 '18 at 10:18











  • I did mention that switching off shortname generation is not an option (I have no control over destination volume).

    – C.M.
    Nov 16 '18 at 19:43










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53330645%2fshortnames-and-parallel-file-copy%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









-2














M.,
If you want to disable the file shorname, You can take steps below:



  1. Find the key under the registry : HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlFileSystem

  2. set the value of "NtfsDisable8dot3NameCreation" to 1, 1 means disable file shortname and 0 means enable shortname.





share|improve this answer


















  • 2





    Don’t use global state to manage a local problem.

    – IInspectable
    Nov 16 '18 at 10:18











  • I did mention that switching off shortname generation is not an option (I have no control over destination volume).

    – C.M.
    Nov 16 '18 at 19:43















-2














M.,
If you want to disable the file shorname, You can take steps below:



  1. Find the key under the registry : HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlFileSystem

  2. set the value of "NtfsDisable8dot3NameCreation" to 1, 1 means disable file shortname and 0 means enable shortname.





share|improve this answer


















  • 2





    Don’t use global state to manage a local problem.

    – IInspectable
    Nov 16 '18 at 10:18











  • I did mention that switching off shortname generation is not an option (I have no control over destination volume).

    – C.M.
    Nov 16 '18 at 19:43













-2












-2








-2







M.,
If you want to disable the file shorname, You can take steps below:



  1. Find the key under the registry : HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlFileSystem

  2. set the value of "NtfsDisable8dot3NameCreation" to 1, 1 means disable file shortname and 0 means enable shortname.





share|improve this answer













M.,
If you want to disable the file shorname, You can take steps below:



  1. Find the key under the registry : HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlFileSystem

  2. set the value of "NtfsDisable8dot3NameCreation" to 1, 1 means disable file shortname and 0 means enable shortname.






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 16 '18 at 9:47









Drake Wu - MSFTDrake Wu - MSFT

63017




63017







  • 2





    Don’t use global state to manage a local problem.

    – IInspectable
    Nov 16 '18 at 10:18











  • I did mention that switching off shortname generation is not an option (I have no control over destination volume).

    – C.M.
    Nov 16 '18 at 19:43












  • 2





    Don’t use global state to manage a local problem.

    – IInspectable
    Nov 16 '18 at 10:18











  • I did mention that switching off shortname generation is not an option (I have no control over destination volume).

    – C.M.
    Nov 16 '18 at 19:43







2




2





Don’t use global state to manage a local problem.

– IInspectable
Nov 16 '18 at 10:18





Don’t use global state to manage a local problem.

– IInspectable
Nov 16 '18 at 10:18













I did mention that switching off shortname generation is not an option (I have no control over destination volume).

– C.M.
Nov 16 '18 at 19:43





I did mention that switching off shortname generation is not an option (I have no control over destination volume).

– C.M.
Nov 16 '18 at 19:43



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53330645%2fshortnames-and-parallel-file-copy%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Top Tejano songwriter Luis Silva dead of heart attack at 64

政党

天津地下鉄3号線