How to identify and potentially remove big binary commits inside an SVN repository?



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








14















I am working with an SVN repository that is over 3 years old, contains over 6,100 commits and is over 1.5 GB in size. I want to reduce the size of the SVN repository (I'm not talking about the size of a full SVN export - I mean the full repository as it would exist on the server) before moving it to a new server.



The current repository contains the source code for all of our software projects but it also contains relatively large binary files of no significance such as:



  • Full installers for a number of 3rd party tools.

  • .jpg & .png files (which are unmodified exports of PSDs that live in the same folder).

  • Bin and Obj folders (which are then 'svn ignored' the next commit).

  • Resharper directories.

A number of these large files have been 'SVN deleted' since they were added, creating a further problem of identifing the biggest offenders.



I want to either:



  • Create a new SVN repository that contains only the code for all of the software projects - it is really important that the copied files maintain their SVN history from the old repository.

  • Remove the large binary commits and files from the existing repository.

Are either of these possible?










share|improve this question



















  • 1





    The day will come when you regret doing this. But Otherside is right about "svnadmin dump" if you go ahead anyway.

    – Ross Patterson
    Feb 2 '10 at 1:54






  • 2





    Why would I regret this (honest question - rather than a challenge!)? I'm just trying to get rid of the content inside SVN that can either be stored elsewhere (which I will do) or it doesn't need to be stored at all. As far as I see it now the only regret I would have is if the svnadmin dump and svndumpfilter corrupt the repository history and its only identified after many many commits are made. Do you mean that historical corruption is likely?

    – InvertedAcceleration
    Feb 2 '10 at 9:45

















14















I am working with an SVN repository that is over 3 years old, contains over 6,100 commits and is over 1.5 GB in size. I want to reduce the size of the SVN repository (I'm not talking about the size of a full SVN export - I mean the full repository as it would exist on the server) before moving it to a new server.



The current repository contains the source code for all of our software projects but it also contains relatively large binary files of no significance such as:



  • Full installers for a number of 3rd party tools.

  • .jpg & .png files (which are unmodified exports of PSDs that live in the same folder).

  • Bin and Obj folders (which are then 'svn ignored' the next commit).

  • Resharper directories.

A number of these large files have been 'SVN deleted' since they were added, creating a further problem of identifing the biggest offenders.



I want to either:



  • Create a new SVN repository that contains only the code for all of the software projects - it is really important that the copied files maintain their SVN history from the old repository.

  • Remove the large binary commits and files from the existing repository.

Are either of these possible?










share|improve this question



















  • 1





    The day will come when you regret doing this. But Otherside is right about "svnadmin dump" if you go ahead anyway.

    – Ross Patterson
    Feb 2 '10 at 1:54






  • 2





    Why would I regret this (honest question - rather than a challenge!)? I'm just trying to get rid of the content inside SVN that can either be stored elsewhere (which I will do) or it doesn't need to be stored at all. As far as I see it now the only regret I would have is if the svnadmin dump and svndumpfilter corrupt the repository history and its only identified after many many commits are made. Do you mean that historical corruption is likely?

    – InvertedAcceleration
    Feb 2 '10 at 9:45













14












14








14


5






I am working with an SVN repository that is over 3 years old, contains over 6,100 commits and is over 1.5 GB in size. I want to reduce the size of the SVN repository (I'm not talking about the size of a full SVN export - I mean the full repository as it would exist on the server) before moving it to a new server.



The current repository contains the source code for all of our software projects but it also contains relatively large binary files of no significance such as:



  • Full installers for a number of 3rd party tools.

  • .jpg & .png files (which are unmodified exports of PSDs that live in the same folder).

  • Bin and Obj folders (which are then 'svn ignored' the next commit).

  • Resharper directories.

A number of these large files have been 'SVN deleted' since they were added, creating a further problem of identifing the biggest offenders.



I want to either:



  • Create a new SVN repository that contains only the code for all of the software projects - it is really important that the copied files maintain their SVN history from the old repository.

  • Remove the large binary commits and files from the existing repository.

Are either of these possible?










share|improve this question
















I am working with an SVN repository that is over 3 years old, contains over 6,100 commits and is over 1.5 GB in size. I want to reduce the size of the SVN repository (I'm not talking about the size of a full SVN export - I mean the full repository as it would exist on the server) before moving it to a new server.



The current repository contains the source code for all of our software projects but it also contains relatively large binary files of no significance such as:



  • Full installers for a number of 3rd party tools.

  • .jpg & .png files (which are unmodified exports of PSDs that live in the same folder).

  • Bin and Obj folders (which are then 'svn ignored' the next commit).

  • Resharper directories.

A number of these large files have been 'SVN deleted' since they were added, creating a further problem of identifing the biggest offenders.



I want to either:



  • Create a new SVN repository that contains only the code for all of the software projects - it is really important that the copied files maintain their SVN history from the old repository.

  • Remove the large binary commits and files from the existing repository.

Are either of these possible?







svn fsfs






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 16 '18 at 13:45









bahrep

22.6k1076111




22.6k1076111










asked Feb 1 '10 at 13:06









InvertedAccelerationInvertedAcceleration

4,51373666




4,51373666







  • 1





    The day will come when you regret doing this. But Otherside is right about "svnadmin dump" if you go ahead anyway.

    – Ross Patterson
    Feb 2 '10 at 1:54






  • 2





    Why would I regret this (honest question - rather than a challenge!)? I'm just trying to get rid of the content inside SVN that can either be stored elsewhere (which I will do) or it doesn't need to be stored at all. As far as I see it now the only regret I would have is if the svnadmin dump and svndumpfilter corrupt the repository history and its only identified after many many commits are made. Do you mean that historical corruption is likely?

    – InvertedAcceleration
    Feb 2 '10 at 9:45












  • 1





    The day will come when you regret doing this. But Otherside is right about "svnadmin dump" if you go ahead anyway.

    – Ross Patterson
    Feb 2 '10 at 1:54






  • 2





    Why would I regret this (honest question - rather than a challenge!)? I'm just trying to get rid of the content inside SVN that can either be stored elsewhere (which I will do) or it doesn't need to be stored at all. As far as I see it now the only regret I would have is if the svnadmin dump and svndumpfilter corrupt the repository history and its only identified after many many commits are made. Do you mean that historical corruption is likely?

    – InvertedAcceleration
    Feb 2 '10 at 9:45







1




1





The day will come when you regret doing this. But Otherside is right about "svnadmin dump" if you go ahead anyway.

– Ross Patterson
Feb 2 '10 at 1:54





The day will come when you regret doing this. But Otherside is right about "svnadmin dump" if you go ahead anyway.

– Ross Patterson
Feb 2 '10 at 1:54




2




2





Why would I regret this (honest question - rather than a challenge!)? I'm just trying to get rid of the content inside SVN that can either be stored elsewhere (which I will do) or it doesn't need to be stored at all. As far as I see it now the only regret I would have is if the svnadmin dump and svndumpfilter corrupt the repository history and its only identified after many many commits are made. Do you mean that historical corruption is likely?

– InvertedAcceleration
Feb 2 '10 at 9:45





Why would I regret this (honest question - rather than a challenge!)? I'm just trying to get rid of the content inside SVN that can either be stored elsewhere (which I will do) or it doesn't need to be stored at all. As far as I see it now the only regret I would have is if the svnadmin dump and svndumpfilter corrupt the repository history and its only identified after many many commits are made. Do you mean that historical corruption is likely?

– InvertedAcceleration
Feb 2 '10 at 9:45












7 Answers
7






active

oldest

votes


















4














You will have to use svnadmin dump to get a dump file of your current repository and possibly svndumpfilter to process the dump file. You can also manually modify the dumpfile as long as you're carefull.



It's probably not going to be a quick and easy job, but it can be done. I've done something similar, only to a much smaller repository. I had a repo with about 150 revisions that took about 600MB.



Make a dump from your current repository, make the necessary changes and try to load the modified dumpfile in a new repository. Then check the new repository to make sure everything is still making sense (History is still correct, no weird changes in paths, ...).






share|improve this answer
































    8














    Otherside is right about svnadmin dump, etc. Something like this will get you a rough pointer to revisions that added lots of data to your repo, and are candidates for svndumpfilter:



    for r in `svn log -q | grep ^r | cut -d ' ' -f 1 | tr -d r`; do
    echo "revision $r is " `svn diff -c $r | wc -c` " bytes";
    done


    You could also try something like this to find revisions that added files with a particular extension (here, .jpg):



    svn log -vq | egrep "^r|.jpg$" | grep -B 1 ".jpg$"





    share|improve this answer
































      1














      If you deleted files from the repository using "SVN Delete", you didn't actually deleted the files. This would be the beauty of the SVN. Once a file is added to the repository, it is there forever (unless using dump & load). Upon "deleting" the files, you actually create a new revision that marks the deletion, but the files continue to exist in previous revisions.



      I've done some dump & load, but to a much much bigger repository. Around 60,000 (!!!) revisions. It took time but at the end, after careful loading, the repository is again built.



      Your only way is to list the revisions that the files were added, modified and deleted. Then dump the revisions in between, and load them in the right order. BE AWARE, there is no room for mistakes. If you make a mistake, you will have to start over. Dump & load from the start.



      My suggestion, if the large files are such a problem, consider creating a newly fresh repository with no history. Keep the old one for history comparison, and start working from fresh.



      Good Luck.






      share|improve this answer






























        1














        If you just need to find the offending commits and you have access to the server hosting the repository: look for large files in db/revs subdirectory of the repository (assuming it uses fsfs format).






        share|improve this answer






























          0














          Isn't this just a different problem, with an extra step? I.e. you need to locate files that you consider to be large and binary, and then check if they are indeed managed by SVN or have been built locally (or imported from the parallel asset system, if it's already in place).



          So, just find the files, then do svn info on them to find out if they're part of the repository.






          share|improve this answer























          • The SVN repository has been alive for over 3 years and during that time a large percentage of the files I'm referring to have been 'SVN deleted'. There is also the problem of large binary files that were in flux during development (like large PSDs) that have since then solidified and will no longer change - so there may be 20 MB in deltas across varies commits for such a file (which I'm not sure how to find).

            – InvertedAcceleration
            Feb 1 '10 at 13:20












          • I have substantially updated the question based on your answer to make sure I'm communicating the situation correctly. I hope it helps clarify a number of points. Thanks for the initial answer.

            – InvertedAcceleration
            Feb 1 '10 at 13:46


















          0














          Just a small thought, you say that the current state of the repository (the current HEAD) is good, i.e. the large binary files have been svn delete'ed in the past. Therefore your issue is purely the size of the repository?



          I know you said you would like to keep all the commit history, but as an option, you could do two dumps, one for the whole revision history, and one for the current HEAD revision.



          If you put the full dump on to a DVD for example you would have the data available if you ever needed it, but you could then delete the whole repository and svn load the revision dump, leaving you with a small clean repository.



          it is also possible to dump from a specific revision onwards, rather than just the head, so for example you could keep the last 3 months of revisions and dump everything older on to a DVD....






          share|improve this answer






























            0














            Elaborating on Otherside's answer, here's what specifically worked for me:



            svnadmin create new-repo
            svnadmin dump old-repo | svndumpfilter exclude --pattern '*.exe' '*.jpg' '*.png' | svnadmin load new-repo


            You might be able to exclude your Obj and Bin directories by adding them to the svndumpfilter command – I didn't try it.



            Also, Subversion's fsfs-stats program (new in Subversion 1.8, replaced by in 1.9 by svnfsfs stats) might be useful for quantifying the file types and specific files that are filling up your repository.



            This might be useful for comparing the repositories afterward:



            colordiff -u <(svn log -v file:///.../old-repo ) <(svn log -v file:///.../new-repo)





            share|improve this answer























              Your Answer






              StackExchange.ifUsing("editor", function ()
              StackExchange.using("externalEditor", function ()
              StackExchange.using("snippets", function ()
              StackExchange.snippets.init();
              );
              );
              , "code-snippets");

              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "1"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f2176803%2fhow-to-identify-and-potentially-remove-big-binary-commits-inside-an-svn-reposito%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              7 Answers
              7






              active

              oldest

              votes








              7 Answers
              7






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              4














              You will have to use svnadmin dump to get a dump file of your current repository and possibly svndumpfilter to process the dump file. You can also manually modify the dumpfile as long as you're carefull.



              It's probably not going to be a quick and easy job, but it can be done. I've done something similar, only to a much smaller repository. I had a repo with about 150 revisions that took about 600MB.



              Make a dump from your current repository, make the necessary changes and try to load the modified dumpfile in a new repository. Then check the new repository to make sure everything is still making sense (History is still correct, no weird changes in paths, ...).






              share|improve this answer





























                4














                You will have to use svnadmin dump to get a dump file of your current repository and possibly svndumpfilter to process the dump file. You can also manually modify the dumpfile as long as you're carefull.



                It's probably not going to be a quick and easy job, but it can be done. I've done something similar, only to a much smaller repository. I had a repo with about 150 revisions that took about 600MB.



                Make a dump from your current repository, make the necessary changes and try to load the modified dumpfile in a new repository. Then check the new repository to make sure everything is still making sense (History is still correct, no weird changes in paths, ...).






                share|improve this answer



























                  4












                  4








                  4







                  You will have to use svnadmin dump to get a dump file of your current repository and possibly svndumpfilter to process the dump file. You can also manually modify the dumpfile as long as you're carefull.



                  It's probably not going to be a quick and easy job, but it can be done. I've done something similar, only to a much smaller repository. I had a repo with about 150 revisions that took about 600MB.



                  Make a dump from your current repository, make the necessary changes and try to load the modified dumpfile in a new repository. Then check the new repository to make sure everything is still making sense (History is still correct, no weird changes in paths, ...).






                  share|improve this answer















                  You will have to use svnadmin dump to get a dump file of your current repository and possibly svndumpfilter to process the dump file. You can also manually modify the dumpfile as long as you're carefull.



                  It's probably not going to be a quick and easy job, but it can be done. I've done something similar, only to a much smaller repository. I had a repo with about 150 revisions that took about 600MB.



                  Make a dump from your current repository, make the necessary changes and try to load the modified dumpfile in a new repository. Then check the new repository to make sure everything is still making sense (History is still correct, no weird changes in paths, ...).







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Mar 22 '13 at 8:59









                  bahrep

                  22.6k1076111




                  22.6k1076111










                  answered Feb 1 '10 at 14:38









                  OthersideOtherside

                  2,5451720




                  2,5451720























                      8














                      Otherside is right about svnadmin dump, etc. Something like this will get you a rough pointer to revisions that added lots of data to your repo, and are candidates for svndumpfilter:



                      for r in `svn log -q | grep ^r | cut -d ' ' -f 1 | tr -d r`; do
                      echo "revision $r is " `svn diff -c $r | wc -c` " bytes";
                      done


                      You could also try something like this to find revisions that added files with a particular extension (here, .jpg):



                      svn log -vq | egrep "^r|.jpg$" | grep -B 1 ".jpg$"





                      share|improve this answer





























                        8














                        Otherside is right about svnadmin dump, etc. Something like this will get you a rough pointer to revisions that added lots of data to your repo, and are candidates for svndumpfilter:



                        for r in `svn log -q | grep ^r | cut -d ' ' -f 1 | tr -d r`; do
                        echo "revision $r is " `svn diff -c $r | wc -c` " bytes";
                        done


                        You could also try something like this to find revisions that added files with a particular extension (here, .jpg):



                        svn log -vq | egrep "^r|.jpg$" | grep -B 1 ".jpg$"





                        share|improve this answer



























                          8












                          8








                          8







                          Otherside is right about svnadmin dump, etc. Something like this will get you a rough pointer to revisions that added lots of data to your repo, and are candidates for svndumpfilter:



                          for r in `svn log -q | grep ^r | cut -d ' ' -f 1 | tr -d r`; do
                          echo "revision $r is " `svn diff -c $r | wc -c` " bytes";
                          done


                          You could also try something like this to find revisions that added files with a particular extension (here, .jpg):



                          svn log -vq | egrep "^r|.jpg$" | grep -B 1 ".jpg$"





                          share|improve this answer















                          Otherside is right about svnadmin dump, etc. Something like this will get you a rough pointer to revisions that added lots of data to your repo, and are candidates for svndumpfilter:



                          for r in `svn log -q | grep ^r | cut -d ' ' -f 1 | tr -d r`; do
                          echo "revision $r is " `svn diff -c $r | wc -c` " bytes";
                          done


                          You could also try something like this to find revisions that added files with a particular extension (here, .jpg):



                          svn log -vq | egrep "^r|.jpg$" | grep -B 1 ".jpg$"






                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Feb 2 '10 at 12:22

























                          answered Feb 2 '10 at 1:40









                          Matt McHenryMatt McHenry

                          14.6k75659




                          14.6k75659





















                              1














                              If you deleted files from the repository using "SVN Delete", you didn't actually deleted the files. This would be the beauty of the SVN. Once a file is added to the repository, it is there forever (unless using dump & load). Upon "deleting" the files, you actually create a new revision that marks the deletion, but the files continue to exist in previous revisions.



                              I've done some dump & load, but to a much much bigger repository. Around 60,000 (!!!) revisions. It took time but at the end, after careful loading, the repository is again built.



                              Your only way is to list the revisions that the files were added, modified and deleted. Then dump the revisions in between, and load them in the right order. BE AWARE, there is no room for mistakes. If you make a mistake, you will have to start over. Dump & load from the start.



                              My suggestion, if the large files are such a problem, consider creating a newly fresh repository with no history. Keep the old one for history comparison, and start working from fresh.



                              Good Luck.






                              share|improve this answer



























                                1














                                If you deleted files from the repository using "SVN Delete", you didn't actually deleted the files. This would be the beauty of the SVN. Once a file is added to the repository, it is there forever (unless using dump & load). Upon "deleting" the files, you actually create a new revision that marks the deletion, but the files continue to exist in previous revisions.



                                I've done some dump & load, but to a much much bigger repository. Around 60,000 (!!!) revisions. It took time but at the end, after careful loading, the repository is again built.



                                Your only way is to list the revisions that the files were added, modified and deleted. Then dump the revisions in between, and load them in the right order. BE AWARE, there is no room for mistakes. If you make a mistake, you will have to start over. Dump & load from the start.



                                My suggestion, if the large files are such a problem, consider creating a newly fresh repository with no history. Keep the old one for history comparison, and start working from fresh.



                                Good Luck.






                                share|improve this answer

























                                  1












                                  1








                                  1







                                  If you deleted files from the repository using "SVN Delete", you didn't actually deleted the files. This would be the beauty of the SVN. Once a file is added to the repository, it is there forever (unless using dump & load). Upon "deleting" the files, you actually create a new revision that marks the deletion, but the files continue to exist in previous revisions.



                                  I've done some dump & load, but to a much much bigger repository. Around 60,000 (!!!) revisions. It took time but at the end, after careful loading, the repository is again built.



                                  Your only way is to list the revisions that the files were added, modified and deleted. Then dump the revisions in between, and load them in the right order. BE AWARE, there is no room for mistakes. If you make a mistake, you will have to start over. Dump & load from the start.



                                  My suggestion, if the large files are such a problem, consider creating a newly fresh repository with no history. Keep the old one for history comparison, and start working from fresh.



                                  Good Luck.






                                  share|improve this answer













                                  If you deleted files from the repository using "SVN Delete", you didn't actually deleted the files. This would be the beauty of the SVN. Once a file is added to the repository, it is there forever (unless using dump & load). Upon "deleting" the files, you actually create a new revision that marks the deletion, but the files continue to exist in previous revisions.



                                  I've done some dump & load, but to a much much bigger repository. Around 60,000 (!!!) revisions. It took time but at the end, after careful loading, the repository is again built.



                                  Your only way is to list the revisions that the files were added, modified and deleted. Then dump the revisions in between, and load them in the right order. BE AWARE, there is no room for mistakes. If you make a mistake, you will have to start over. Dump & load from the start.



                                  My suggestion, if the large files are such a problem, consider creating a newly fresh repository with no history. Keep the old one for history comparison, and start working from fresh.



                                  Good Luck.







                                  share|improve this answer












                                  share|improve this answer



                                  share|improve this answer










                                  answered Feb 1 '10 at 14:49









                                  OdedOded

                                  50221029




                                  50221029





















                                      1














                                      If you just need to find the offending commits and you have access to the server hosting the repository: look for large files in db/revs subdirectory of the repository (assuming it uses fsfs format).






                                      share|improve this answer



























                                        1














                                        If you just need to find the offending commits and you have access to the server hosting the repository: look for large files in db/revs subdirectory of the repository (assuming it uses fsfs format).






                                        share|improve this answer

























                                          1












                                          1








                                          1







                                          If you just need to find the offending commits and you have access to the server hosting the repository: look for large files in db/revs subdirectory of the repository (assuming it uses fsfs format).






                                          share|improve this answer













                                          If you just need to find the offending commits and you have access to the server hosting the repository: look for large files in db/revs subdirectory of the repository (assuming it uses fsfs format).







                                          share|improve this answer












                                          share|improve this answer



                                          share|improve this answer










                                          answered Dec 18 '13 at 6:59









                                          sendmoreinfosendmoreinfo

                                          548622




                                          548622





















                                              0














                                              Isn't this just a different problem, with an extra step? I.e. you need to locate files that you consider to be large and binary, and then check if they are indeed managed by SVN or have been built locally (or imported from the parallel asset system, if it's already in place).



                                              So, just find the files, then do svn info on them to find out if they're part of the repository.






                                              share|improve this answer























                                              • The SVN repository has been alive for over 3 years and during that time a large percentage of the files I'm referring to have been 'SVN deleted'. There is also the problem of large binary files that were in flux during development (like large PSDs) that have since then solidified and will no longer change - so there may be 20 MB in deltas across varies commits for such a file (which I'm not sure how to find).

                                                – InvertedAcceleration
                                                Feb 1 '10 at 13:20












                                              • I have substantially updated the question based on your answer to make sure I'm communicating the situation correctly. I hope it helps clarify a number of points. Thanks for the initial answer.

                                                – InvertedAcceleration
                                                Feb 1 '10 at 13:46















                                              0














                                              Isn't this just a different problem, with an extra step? I.e. you need to locate files that you consider to be large and binary, and then check if they are indeed managed by SVN or have been built locally (or imported from the parallel asset system, if it's already in place).



                                              So, just find the files, then do svn info on them to find out if they're part of the repository.






                                              share|improve this answer























                                              • The SVN repository has been alive for over 3 years and during that time a large percentage of the files I'm referring to have been 'SVN deleted'. There is also the problem of large binary files that were in flux during development (like large PSDs) that have since then solidified and will no longer change - so there may be 20 MB in deltas across varies commits for such a file (which I'm not sure how to find).

                                                – InvertedAcceleration
                                                Feb 1 '10 at 13:20












                                              • I have substantially updated the question based on your answer to make sure I'm communicating the situation correctly. I hope it helps clarify a number of points. Thanks for the initial answer.

                                                – InvertedAcceleration
                                                Feb 1 '10 at 13:46













                                              0












                                              0








                                              0







                                              Isn't this just a different problem, with an extra step? I.e. you need to locate files that you consider to be large and binary, and then check if they are indeed managed by SVN or have been built locally (or imported from the parallel asset system, if it's already in place).



                                              So, just find the files, then do svn info on them to find out if they're part of the repository.






                                              share|improve this answer













                                              Isn't this just a different problem, with an extra step? I.e. you need to locate files that you consider to be large and binary, and then check if they are indeed managed by SVN or have been built locally (or imported from the parallel asset system, if it's already in place).



                                              So, just find the files, then do svn info on them to find out if they're part of the repository.







                                              share|improve this answer












                                              share|improve this answer



                                              share|improve this answer










                                              answered Feb 1 '10 at 13:13









                                              unwindunwind

                                              325k52398529




                                              325k52398529












                                              • The SVN repository has been alive for over 3 years and during that time a large percentage of the files I'm referring to have been 'SVN deleted'. There is also the problem of large binary files that were in flux during development (like large PSDs) that have since then solidified and will no longer change - so there may be 20 MB in deltas across varies commits for such a file (which I'm not sure how to find).

                                                – InvertedAcceleration
                                                Feb 1 '10 at 13:20












                                              • I have substantially updated the question based on your answer to make sure I'm communicating the situation correctly. I hope it helps clarify a number of points. Thanks for the initial answer.

                                                – InvertedAcceleration
                                                Feb 1 '10 at 13:46

















                                              • The SVN repository has been alive for over 3 years and during that time a large percentage of the files I'm referring to have been 'SVN deleted'. There is also the problem of large binary files that were in flux during development (like large PSDs) that have since then solidified and will no longer change - so there may be 20 MB in deltas across varies commits for such a file (which I'm not sure how to find).

                                                – InvertedAcceleration
                                                Feb 1 '10 at 13:20












                                              • I have substantially updated the question based on your answer to make sure I'm communicating the situation correctly. I hope it helps clarify a number of points. Thanks for the initial answer.

                                                – InvertedAcceleration
                                                Feb 1 '10 at 13:46
















                                              The SVN repository has been alive for over 3 years and during that time a large percentage of the files I'm referring to have been 'SVN deleted'. There is also the problem of large binary files that were in flux during development (like large PSDs) that have since then solidified and will no longer change - so there may be 20 MB in deltas across varies commits for such a file (which I'm not sure how to find).

                                              – InvertedAcceleration
                                              Feb 1 '10 at 13:20






                                              The SVN repository has been alive for over 3 years and during that time a large percentage of the files I'm referring to have been 'SVN deleted'. There is also the problem of large binary files that were in flux during development (like large PSDs) that have since then solidified and will no longer change - so there may be 20 MB in deltas across varies commits for such a file (which I'm not sure how to find).

                                              – InvertedAcceleration
                                              Feb 1 '10 at 13:20














                                              I have substantially updated the question based on your answer to make sure I'm communicating the situation correctly. I hope it helps clarify a number of points. Thanks for the initial answer.

                                              – InvertedAcceleration
                                              Feb 1 '10 at 13:46





                                              I have substantially updated the question based on your answer to make sure I'm communicating the situation correctly. I hope it helps clarify a number of points. Thanks for the initial answer.

                                              – InvertedAcceleration
                                              Feb 1 '10 at 13:46











                                              0














                                              Just a small thought, you say that the current state of the repository (the current HEAD) is good, i.e. the large binary files have been svn delete'ed in the past. Therefore your issue is purely the size of the repository?



                                              I know you said you would like to keep all the commit history, but as an option, you could do two dumps, one for the whole revision history, and one for the current HEAD revision.



                                              If you put the full dump on to a DVD for example you would have the data available if you ever needed it, but you could then delete the whole repository and svn load the revision dump, leaving you with a small clean repository.



                                              it is also possible to dump from a specific revision onwards, rather than just the head, so for example you could keep the last 3 months of revisions and dump everything older on to a DVD....






                                              share|improve this answer



























                                                0














                                                Just a small thought, you say that the current state of the repository (the current HEAD) is good, i.e. the large binary files have been svn delete'ed in the past. Therefore your issue is purely the size of the repository?



                                                I know you said you would like to keep all the commit history, but as an option, you could do two dumps, one for the whole revision history, and one for the current HEAD revision.



                                                If you put the full dump on to a DVD for example you would have the data available if you ever needed it, but you could then delete the whole repository and svn load the revision dump, leaving you with a small clean repository.



                                                it is also possible to dump from a specific revision onwards, rather than just the head, so for example you could keep the last 3 months of revisions and dump everything older on to a DVD....






                                                share|improve this answer

























                                                  0












                                                  0








                                                  0







                                                  Just a small thought, you say that the current state of the repository (the current HEAD) is good, i.e. the large binary files have been svn delete'ed in the past. Therefore your issue is purely the size of the repository?



                                                  I know you said you would like to keep all the commit history, but as an option, you could do two dumps, one for the whole revision history, and one for the current HEAD revision.



                                                  If you put the full dump on to a DVD for example you would have the data available if you ever needed it, but you could then delete the whole repository and svn load the revision dump, leaving you with a small clean repository.



                                                  it is also possible to dump from a specific revision onwards, rather than just the head, so for example you could keep the last 3 months of revisions and dump everything older on to a DVD....






                                                  share|improve this answer













                                                  Just a small thought, you say that the current state of the repository (the current HEAD) is good, i.e. the large binary files have been svn delete'ed in the past. Therefore your issue is purely the size of the repository?



                                                  I know you said you would like to keep all the commit history, but as an option, you could do two dumps, one for the whole revision history, and one for the current HEAD revision.



                                                  If you put the full dump on to a DVD for example you would have the data available if you ever needed it, but you could then delete the whole repository and svn load the revision dump, leaving you with a small clean repository.



                                                  it is also possible to dump from a specific revision onwards, rather than just the head, so for example you could keep the last 3 months of revisions and dump everything older on to a DVD....







                                                  share|improve this answer












                                                  share|improve this answer



                                                  share|improve this answer










                                                  answered Feb 1 '10 at 16:22









                                                  BParkerBParker

                                                  109112




                                                  109112





















                                                      0














                                                      Elaborating on Otherside's answer, here's what specifically worked for me:



                                                      svnadmin create new-repo
                                                      svnadmin dump old-repo | svndumpfilter exclude --pattern '*.exe' '*.jpg' '*.png' | svnadmin load new-repo


                                                      You might be able to exclude your Obj and Bin directories by adding them to the svndumpfilter command – I didn't try it.



                                                      Also, Subversion's fsfs-stats program (new in Subversion 1.8, replaced by in 1.9 by svnfsfs stats) might be useful for quantifying the file types and specific files that are filling up your repository.



                                                      This might be useful for comparing the repositories afterward:



                                                      colordiff -u <(svn log -v file:///.../old-repo ) <(svn log -v file:///.../new-repo)





                                                      share|improve this answer



























                                                        0














                                                        Elaborating on Otherside's answer, here's what specifically worked for me:



                                                        svnadmin create new-repo
                                                        svnadmin dump old-repo | svndumpfilter exclude --pattern '*.exe' '*.jpg' '*.png' | svnadmin load new-repo


                                                        You might be able to exclude your Obj and Bin directories by adding them to the svndumpfilter command – I didn't try it.



                                                        Also, Subversion's fsfs-stats program (new in Subversion 1.8, replaced by in 1.9 by svnfsfs stats) might be useful for quantifying the file types and specific files that are filling up your repository.



                                                        This might be useful for comparing the repositories afterward:



                                                        colordiff -u <(svn log -v file:///.../old-repo ) <(svn log -v file:///.../new-repo)





                                                        share|improve this answer

























                                                          0












                                                          0








                                                          0







                                                          Elaborating on Otherside's answer, here's what specifically worked for me:



                                                          svnadmin create new-repo
                                                          svnadmin dump old-repo | svndumpfilter exclude --pattern '*.exe' '*.jpg' '*.png' | svnadmin load new-repo


                                                          You might be able to exclude your Obj and Bin directories by adding them to the svndumpfilter command – I didn't try it.



                                                          Also, Subversion's fsfs-stats program (new in Subversion 1.8, replaced by in 1.9 by svnfsfs stats) might be useful for quantifying the file types and specific files that are filling up your repository.



                                                          This might be useful for comparing the repositories afterward:



                                                          colordiff -u <(svn log -v file:///.../old-repo ) <(svn log -v file:///.../new-repo)





                                                          share|improve this answer













                                                          Elaborating on Otherside's answer, here's what specifically worked for me:



                                                          svnadmin create new-repo
                                                          svnadmin dump old-repo | svndumpfilter exclude --pattern '*.exe' '*.jpg' '*.png' | svnadmin load new-repo


                                                          You might be able to exclude your Obj and Bin directories by adding them to the svndumpfilter command – I didn't try it.



                                                          Also, Subversion's fsfs-stats program (new in Subversion 1.8, replaced by in 1.9 by svnfsfs stats) might be useful for quantifying the file types and specific files that are filling up your repository.



                                                          This might be useful for comparing the repositories afterward:



                                                          colordiff -u <(svn log -v file:///.../old-repo ) <(svn log -v file:///.../new-repo)






                                                          share|improve this answer












                                                          share|improve this answer



                                                          share|improve this answer










                                                          answered Oct 3 '17 at 19:46









                                                          Robert FlemingRobert Fleming

                                                          852910




                                                          852910



























                                                              draft saved

                                                              draft discarded
















































                                                              Thanks for contributing an answer to Stack Overflow!


                                                              • Please be sure to answer the question. Provide details and share your research!

                                                              But avoid


                                                              • Asking for help, clarification, or responding to other answers.

                                                              • Making statements based on opinion; back them up with references or personal experience.

                                                              To learn more, see our tips on writing great answers.




                                                              draft saved


                                                              draft discarded














                                                              StackExchange.ready(
                                                              function ()
                                                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f2176803%2fhow-to-identify-and-potentially-remove-big-binary-commits-inside-an-svn-reposito%23new-answer', 'question_page');

                                                              );

                                                              Post as a guest















                                                              Required, but never shown





















































                                                              Required, but never shown














                                                              Required, but never shown












                                                              Required, but never shown







                                                              Required, but never shown

































                                                              Required, but never shown














                                                              Required, but never shown












                                                              Required, but never shown







                                                              Required, but never shown







                                                              Popular posts from this blog

                                                              Top Tejano songwriter Luis Silva dead of heart attack at 64

                                                              ReactJS Fetched API data displays live - need Data displayed static

                                                              政党