How to identify and potentially remove big binary commits inside an SVN repository?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I am working with an SVN repository that is over 3 years old, contains over 6,100 commits and is over 1.5 GB in size. I want to reduce the size of the SVN repository (I'm not talking about the size of a full SVN export - I mean the full repository as it would exist on the server) before moving it to a new server.
The current repository contains the source code for all of our software projects but it also contains relatively large binary files of no significance such as:
- Full installers for a number of 3rd party tools.
- .jpg & .png files (which are unmodified exports of PSDs that live in the same folder).
- Bin and Obj folders (which are then 'svn ignored' the next commit).
- Resharper directories.
A number of these large files have been 'SVN deleted' since they were added, creating a further problem of identifing the biggest offenders.
I want to either:
- Create a new SVN repository that contains only the code for all of the software projects - it is really important that the copied files maintain their SVN history from the old repository.
- Remove the large binary commits and files from the existing repository.
Are either of these possible?
svn fsfs
add a comment |
I am working with an SVN repository that is over 3 years old, contains over 6,100 commits and is over 1.5 GB in size. I want to reduce the size of the SVN repository (I'm not talking about the size of a full SVN export - I mean the full repository as it would exist on the server) before moving it to a new server.
The current repository contains the source code for all of our software projects but it also contains relatively large binary files of no significance such as:
- Full installers for a number of 3rd party tools.
- .jpg & .png files (which are unmodified exports of PSDs that live in the same folder).
- Bin and Obj folders (which are then 'svn ignored' the next commit).
- Resharper directories.
A number of these large files have been 'SVN deleted' since they were added, creating a further problem of identifing the biggest offenders.
I want to either:
- Create a new SVN repository that contains only the code for all of the software projects - it is really important that the copied files maintain their SVN history from the old repository.
- Remove the large binary commits and files from the existing repository.
Are either of these possible?
svn fsfs
1
The day will come when you regret doing this. But Otherside is right about "svnadmin dump" if you go ahead anyway.
– Ross Patterson
Feb 2 '10 at 1:54
2
Why would I regret this (honest question - rather than a challenge!)? I'm just trying to get rid of the content inside SVN that can either be stored elsewhere (which I will do) or it doesn't need to be stored at all. As far as I see it now the only regret I would have is if the svnadmin dump and svndumpfilter corrupt the repository history and its only identified after many many commits are made. Do you mean that historical corruption is likely?
– InvertedAcceleration
Feb 2 '10 at 9:45
add a comment |
I am working with an SVN repository that is over 3 years old, contains over 6,100 commits and is over 1.5 GB in size. I want to reduce the size of the SVN repository (I'm not talking about the size of a full SVN export - I mean the full repository as it would exist on the server) before moving it to a new server.
The current repository contains the source code for all of our software projects but it also contains relatively large binary files of no significance such as:
- Full installers for a number of 3rd party tools.
- .jpg & .png files (which are unmodified exports of PSDs that live in the same folder).
- Bin and Obj folders (which are then 'svn ignored' the next commit).
- Resharper directories.
A number of these large files have been 'SVN deleted' since they were added, creating a further problem of identifing the biggest offenders.
I want to either:
- Create a new SVN repository that contains only the code for all of the software projects - it is really important that the copied files maintain their SVN history from the old repository.
- Remove the large binary commits and files from the existing repository.
Are either of these possible?
svn fsfs
I am working with an SVN repository that is over 3 years old, contains over 6,100 commits and is over 1.5 GB in size. I want to reduce the size of the SVN repository (I'm not talking about the size of a full SVN export - I mean the full repository as it would exist on the server) before moving it to a new server.
The current repository contains the source code for all of our software projects but it also contains relatively large binary files of no significance such as:
- Full installers for a number of 3rd party tools.
- .jpg & .png files (which are unmodified exports of PSDs that live in the same folder).
- Bin and Obj folders (which are then 'svn ignored' the next commit).
- Resharper directories.
A number of these large files have been 'SVN deleted' since they were added, creating a further problem of identifing the biggest offenders.
I want to either:
- Create a new SVN repository that contains only the code for all of the software projects - it is really important that the copied files maintain their SVN history from the old repository.
- Remove the large binary commits and files from the existing repository.
Are either of these possible?
svn fsfs
svn fsfs
edited Nov 16 '18 at 13:45
bahrep
22.6k1076111
22.6k1076111
asked Feb 1 '10 at 13:06
InvertedAccelerationInvertedAcceleration
4,51373666
4,51373666
1
The day will come when you regret doing this. But Otherside is right about "svnadmin dump" if you go ahead anyway.
– Ross Patterson
Feb 2 '10 at 1:54
2
Why would I regret this (honest question - rather than a challenge!)? I'm just trying to get rid of the content inside SVN that can either be stored elsewhere (which I will do) or it doesn't need to be stored at all. As far as I see it now the only regret I would have is if the svnadmin dump and svndumpfilter corrupt the repository history and its only identified after many many commits are made. Do you mean that historical corruption is likely?
– InvertedAcceleration
Feb 2 '10 at 9:45
add a comment |
1
The day will come when you regret doing this. But Otherside is right about "svnadmin dump" if you go ahead anyway.
– Ross Patterson
Feb 2 '10 at 1:54
2
Why would I regret this (honest question - rather than a challenge!)? I'm just trying to get rid of the content inside SVN that can either be stored elsewhere (which I will do) or it doesn't need to be stored at all. As far as I see it now the only regret I would have is if the svnadmin dump and svndumpfilter corrupt the repository history and its only identified after many many commits are made. Do you mean that historical corruption is likely?
– InvertedAcceleration
Feb 2 '10 at 9:45
1
1
The day will come when you regret doing this. But Otherside is right about "svnadmin dump" if you go ahead anyway.
– Ross Patterson
Feb 2 '10 at 1:54
The day will come when you regret doing this. But Otherside is right about "svnadmin dump" if you go ahead anyway.
– Ross Patterson
Feb 2 '10 at 1:54
2
2
Why would I regret this (honest question - rather than a challenge!)? I'm just trying to get rid of the content inside SVN that can either be stored elsewhere (which I will do) or it doesn't need to be stored at all. As far as I see it now the only regret I would have is if the svnadmin dump and svndumpfilter corrupt the repository history and its only identified after many many commits are made. Do you mean that historical corruption is likely?
– InvertedAcceleration
Feb 2 '10 at 9:45
Why would I regret this (honest question - rather than a challenge!)? I'm just trying to get rid of the content inside SVN that can either be stored elsewhere (which I will do) or it doesn't need to be stored at all. As far as I see it now the only regret I would have is if the svnadmin dump and svndumpfilter corrupt the repository history and its only identified after many many commits are made. Do you mean that historical corruption is likely?
– InvertedAcceleration
Feb 2 '10 at 9:45
add a comment |
7 Answers
7
active
oldest
votes
You will have to use svnadmin dump to get a dump file of your current repository and possibly svndumpfilter to process the dump file. You can also manually modify the dumpfile as long as you're carefull.
It's probably not going to be a quick and easy job, but it can be done. I've done something similar, only to a much smaller repository. I had a repo with about 150 revisions that took about 600MB.
Make a dump from your current repository, make the necessary changes and try to load the modified dumpfile in a new repository. Then check the new repository to make sure everything is still making sense (History is still correct, no weird changes in paths, ...).
add a comment |
Otherside is right about svnadmin dump
, etc. Something like this will get you a rough pointer to revisions that added lots of data to your repo, and are candidates for svndumpfilter
:
for r in `svn log -q | grep ^r | cut -d ' ' -f 1 | tr -d r`; do
echo "revision $r is " `svn diff -c $r | wc -c` " bytes";
done
You could also try something like this to find revisions that added files with a particular extension (here, .jpg):
svn log -vq | egrep "^r|.jpg$" | grep -B 1 ".jpg$"
add a comment |
If you deleted files from the repository using "SVN Delete", you didn't actually deleted the files. This would be the beauty of the SVN. Once a file is added to the repository, it is there forever (unless using dump & load). Upon "deleting" the files, you actually create a new revision that marks the deletion, but the files continue to exist in previous revisions.
I've done some dump & load, but to a much much bigger repository. Around 60,000 (!!!) revisions. It took time but at the end, after careful loading, the repository is again built.
Your only way is to list the revisions that the files were added, modified and deleted. Then dump the revisions in between, and load them in the right order. BE AWARE, there is no room for mistakes. If you make a mistake, you will have to start over. Dump & load from the start.
My suggestion, if the large files are such a problem, consider creating a newly fresh repository with no history. Keep the old one for history comparison, and start working from fresh.
Good Luck.
add a comment |
If you just need to find the offending commits and you have access to the server hosting the repository: look for large files in db/revs subdirectory of the repository (assuming it uses fsfs format).
add a comment |
Isn't this just a different problem, with an extra step? I.e. you need to locate files that you consider to be large and binary, and then check if they are indeed managed by SVN or have been built locally (or imported from the parallel asset system, if it's already in place).
So, just find the files, then do svn info
on them to find out if they're part of the repository.
The SVN repository has been alive for over 3 years and during that time a large percentage of the files I'm referring to have been 'SVN deleted'. There is also the problem of large binary files that were in flux during development (like large PSDs) that have since then solidified and will no longer change - so there may be 20 MB in deltas across varies commits for such a file (which I'm not sure how to find).
– InvertedAcceleration
Feb 1 '10 at 13:20
I have substantially updated the question based on your answer to make sure I'm communicating the situation correctly. I hope it helps clarify a number of points. Thanks for the initial answer.
– InvertedAcceleration
Feb 1 '10 at 13:46
add a comment |
Just a small thought, you say that the current state of the repository (the current HEAD) is good, i.e. the large binary files have been svn delete'ed in the past. Therefore your issue is purely the size of the repository?
I know you said you would like to keep all the commit history, but as an option, you could do two dumps, one for the whole revision history, and one for the current HEAD revision.
If you put the full dump on to a DVD for example you would have the data available if you ever needed it, but you could then delete the whole repository and svn load the revision dump, leaving you with a small clean repository.
it is also possible to dump from a specific revision onwards, rather than just the head, so for example you could keep the last 3 months of revisions and dump everything older on to a DVD....
add a comment |
Elaborating on Otherside's answer, here's what specifically worked for me:
svnadmin create new-repo
svnadmin dump old-repo | svndumpfilter exclude --pattern '*.exe' '*.jpg' '*.png' | svnadmin load new-repo
You might be able to exclude your Obj
and Bin
directories by adding them to the svndumpfilter
command – I didn't try it.
Also, Subversion's fsfs-stats
program (new in Subversion 1.8, replaced by in 1.9 by svnfsfs stats
) might be useful for quantifying the file types and specific files that are filling up your repository.
This might be useful for comparing the repositories afterward:
colordiff -u <(svn log -v file:///.../old-repo ) <(svn log -v file:///.../new-repo)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f2176803%2fhow-to-identify-and-potentially-remove-big-binary-commits-inside-an-svn-reposito%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
7 Answers
7
active
oldest
votes
7 Answers
7
active
oldest
votes
active
oldest
votes
active
oldest
votes
You will have to use svnadmin dump to get a dump file of your current repository and possibly svndumpfilter to process the dump file. You can also manually modify the dumpfile as long as you're carefull.
It's probably not going to be a quick and easy job, but it can be done. I've done something similar, only to a much smaller repository. I had a repo with about 150 revisions that took about 600MB.
Make a dump from your current repository, make the necessary changes and try to load the modified dumpfile in a new repository. Then check the new repository to make sure everything is still making sense (History is still correct, no weird changes in paths, ...).
add a comment |
You will have to use svnadmin dump to get a dump file of your current repository and possibly svndumpfilter to process the dump file. You can also manually modify the dumpfile as long as you're carefull.
It's probably not going to be a quick and easy job, but it can be done. I've done something similar, only to a much smaller repository. I had a repo with about 150 revisions that took about 600MB.
Make a dump from your current repository, make the necessary changes and try to load the modified dumpfile in a new repository. Then check the new repository to make sure everything is still making sense (History is still correct, no weird changes in paths, ...).
add a comment |
You will have to use svnadmin dump to get a dump file of your current repository and possibly svndumpfilter to process the dump file. You can also manually modify the dumpfile as long as you're carefull.
It's probably not going to be a quick and easy job, but it can be done. I've done something similar, only to a much smaller repository. I had a repo with about 150 revisions that took about 600MB.
Make a dump from your current repository, make the necessary changes and try to load the modified dumpfile in a new repository. Then check the new repository to make sure everything is still making sense (History is still correct, no weird changes in paths, ...).
You will have to use svnadmin dump to get a dump file of your current repository and possibly svndumpfilter to process the dump file. You can also manually modify the dumpfile as long as you're carefull.
It's probably not going to be a quick and easy job, but it can be done. I've done something similar, only to a much smaller repository. I had a repo with about 150 revisions that took about 600MB.
Make a dump from your current repository, make the necessary changes and try to load the modified dumpfile in a new repository. Then check the new repository to make sure everything is still making sense (History is still correct, no weird changes in paths, ...).
edited Mar 22 '13 at 8:59
bahrep
22.6k1076111
22.6k1076111
answered Feb 1 '10 at 14:38
OthersideOtherside
2,5451720
2,5451720
add a comment |
add a comment |
Otherside is right about svnadmin dump
, etc. Something like this will get you a rough pointer to revisions that added lots of data to your repo, and are candidates for svndumpfilter
:
for r in `svn log -q | grep ^r | cut -d ' ' -f 1 | tr -d r`; do
echo "revision $r is " `svn diff -c $r | wc -c` " bytes";
done
You could also try something like this to find revisions that added files with a particular extension (here, .jpg):
svn log -vq | egrep "^r|.jpg$" | grep -B 1 ".jpg$"
add a comment |
Otherside is right about svnadmin dump
, etc. Something like this will get you a rough pointer to revisions that added lots of data to your repo, and are candidates for svndumpfilter
:
for r in `svn log -q | grep ^r | cut -d ' ' -f 1 | tr -d r`; do
echo "revision $r is " `svn diff -c $r | wc -c` " bytes";
done
You could also try something like this to find revisions that added files with a particular extension (here, .jpg):
svn log -vq | egrep "^r|.jpg$" | grep -B 1 ".jpg$"
add a comment |
Otherside is right about svnadmin dump
, etc. Something like this will get you a rough pointer to revisions that added lots of data to your repo, and are candidates for svndumpfilter
:
for r in `svn log -q | grep ^r | cut -d ' ' -f 1 | tr -d r`; do
echo "revision $r is " `svn diff -c $r | wc -c` " bytes";
done
You could also try something like this to find revisions that added files with a particular extension (here, .jpg):
svn log -vq | egrep "^r|.jpg$" | grep -B 1 ".jpg$"
Otherside is right about svnadmin dump
, etc. Something like this will get you a rough pointer to revisions that added lots of data to your repo, and are candidates for svndumpfilter
:
for r in `svn log -q | grep ^r | cut -d ' ' -f 1 | tr -d r`; do
echo "revision $r is " `svn diff -c $r | wc -c` " bytes";
done
You could also try something like this to find revisions that added files with a particular extension (here, .jpg):
svn log -vq | egrep "^r|.jpg$" | grep -B 1 ".jpg$"
edited Feb 2 '10 at 12:22
answered Feb 2 '10 at 1:40
Matt McHenryMatt McHenry
14.6k75659
14.6k75659
add a comment |
add a comment |
If you deleted files from the repository using "SVN Delete", you didn't actually deleted the files. This would be the beauty of the SVN. Once a file is added to the repository, it is there forever (unless using dump & load). Upon "deleting" the files, you actually create a new revision that marks the deletion, but the files continue to exist in previous revisions.
I've done some dump & load, but to a much much bigger repository. Around 60,000 (!!!) revisions. It took time but at the end, after careful loading, the repository is again built.
Your only way is to list the revisions that the files were added, modified and deleted. Then dump the revisions in between, and load them in the right order. BE AWARE, there is no room for mistakes. If you make a mistake, you will have to start over. Dump & load from the start.
My suggestion, if the large files are such a problem, consider creating a newly fresh repository with no history. Keep the old one for history comparison, and start working from fresh.
Good Luck.
add a comment |
If you deleted files from the repository using "SVN Delete", you didn't actually deleted the files. This would be the beauty of the SVN. Once a file is added to the repository, it is there forever (unless using dump & load). Upon "deleting" the files, you actually create a new revision that marks the deletion, but the files continue to exist in previous revisions.
I've done some dump & load, but to a much much bigger repository. Around 60,000 (!!!) revisions. It took time but at the end, after careful loading, the repository is again built.
Your only way is to list the revisions that the files were added, modified and deleted. Then dump the revisions in between, and load them in the right order. BE AWARE, there is no room for mistakes. If you make a mistake, you will have to start over. Dump & load from the start.
My suggestion, if the large files are such a problem, consider creating a newly fresh repository with no history. Keep the old one for history comparison, and start working from fresh.
Good Luck.
add a comment |
If you deleted files from the repository using "SVN Delete", you didn't actually deleted the files. This would be the beauty of the SVN. Once a file is added to the repository, it is there forever (unless using dump & load). Upon "deleting" the files, you actually create a new revision that marks the deletion, but the files continue to exist in previous revisions.
I've done some dump & load, but to a much much bigger repository. Around 60,000 (!!!) revisions. It took time but at the end, after careful loading, the repository is again built.
Your only way is to list the revisions that the files were added, modified and deleted. Then dump the revisions in between, and load them in the right order. BE AWARE, there is no room for mistakes. If you make a mistake, you will have to start over. Dump & load from the start.
My suggestion, if the large files are such a problem, consider creating a newly fresh repository with no history. Keep the old one for history comparison, and start working from fresh.
Good Luck.
If you deleted files from the repository using "SVN Delete", you didn't actually deleted the files. This would be the beauty of the SVN. Once a file is added to the repository, it is there forever (unless using dump & load). Upon "deleting" the files, you actually create a new revision that marks the deletion, but the files continue to exist in previous revisions.
I've done some dump & load, but to a much much bigger repository. Around 60,000 (!!!) revisions. It took time but at the end, after careful loading, the repository is again built.
Your only way is to list the revisions that the files were added, modified and deleted. Then dump the revisions in between, and load them in the right order. BE AWARE, there is no room for mistakes. If you make a mistake, you will have to start over. Dump & load from the start.
My suggestion, if the large files are such a problem, consider creating a newly fresh repository with no history. Keep the old one for history comparison, and start working from fresh.
Good Luck.
answered Feb 1 '10 at 14:49
OdedOded
50221029
50221029
add a comment |
add a comment |
If you just need to find the offending commits and you have access to the server hosting the repository: look for large files in db/revs subdirectory of the repository (assuming it uses fsfs format).
add a comment |
If you just need to find the offending commits and you have access to the server hosting the repository: look for large files in db/revs subdirectory of the repository (assuming it uses fsfs format).
add a comment |
If you just need to find the offending commits and you have access to the server hosting the repository: look for large files in db/revs subdirectory of the repository (assuming it uses fsfs format).
If you just need to find the offending commits and you have access to the server hosting the repository: look for large files in db/revs subdirectory of the repository (assuming it uses fsfs format).
answered Dec 18 '13 at 6:59
sendmoreinfosendmoreinfo
548622
548622
add a comment |
add a comment |
Isn't this just a different problem, with an extra step? I.e. you need to locate files that you consider to be large and binary, and then check if they are indeed managed by SVN or have been built locally (or imported from the parallel asset system, if it's already in place).
So, just find the files, then do svn info
on them to find out if they're part of the repository.
The SVN repository has been alive for over 3 years and during that time a large percentage of the files I'm referring to have been 'SVN deleted'. There is also the problem of large binary files that were in flux during development (like large PSDs) that have since then solidified and will no longer change - so there may be 20 MB in deltas across varies commits for such a file (which I'm not sure how to find).
– InvertedAcceleration
Feb 1 '10 at 13:20
I have substantially updated the question based on your answer to make sure I'm communicating the situation correctly. I hope it helps clarify a number of points. Thanks for the initial answer.
– InvertedAcceleration
Feb 1 '10 at 13:46
add a comment |
Isn't this just a different problem, with an extra step? I.e. you need to locate files that you consider to be large and binary, and then check if they are indeed managed by SVN or have been built locally (or imported from the parallel asset system, if it's already in place).
So, just find the files, then do svn info
on them to find out if they're part of the repository.
The SVN repository has been alive for over 3 years and during that time a large percentage of the files I'm referring to have been 'SVN deleted'. There is also the problem of large binary files that were in flux during development (like large PSDs) that have since then solidified and will no longer change - so there may be 20 MB in deltas across varies commits for such a file (which I'm not sure how to find).
– InvertedAcceleration
Feb 1 '10 at 13:20
I have substantially updated the question based on your answer to make sure I'm communicating the situation correctly. I hope it helps clarify a number of points. Thanks for the initial answer.
– InvertedAcceleration
Feb 1 '10 at 13:46
add a comment |
Isn't this just a different problem, with an extra step? I.e. you need to locate files that you consider to be large and binary, and then check if they are indeed managed by SVN or have been built locally (or imported from the parallel asset system, if it's already in place).
So, just find the files, then do svn info
on them to find out if they're part of the repository.
Isn't this just a different problem, with an extra step? I.e. you need to locate files that you consider to be large and binary, and then check if they are indeed managed by SVN or have been built locally (or imported from the parallel asset system, if it's already in place).
So, just find the files, then do svn info
on them to find out if they're part of the repository.
answered Feb 1 '10 at 13:13
unwindunwind
325k52398529
325k52398529
The SVN repository has been alive for over 3 years and during that time a large percentage of the files I'm referring to have been 'SVN deleted'. There is also the problem of large binary files that were in flux during development (like large PSDs) that have since then solidified and will no longer change - so there may be 20 MB in deltas across varies commits for such a file (which I'm not sure how to find).
– InvertedAcceleration
Feb 1 '10 at 13:20
I have substantially updated the question based on your answer to make sure I'm communicating the situation correctly. I hope it helps clarify a number of points. Thanks for the initial answer.
– InvertedAcceleration
Feb 1 '10 at 13:46
add a comment |
The SVN repository has been alive for over 3 years and during that time a large percentage of the files I'm referring to have been 'SVN deleted'. There is also the problem of large binary files that were in flux during development (like large PSDs) that have since then solidified and will no longer change - so there may be 20 MB in deltas across varies commits for such a file (which I'm not sure how to find).
– InvertedAcceleration
Feb 1 '10 at 13:20
I have substantially updated the question based on your answer to make sure I'm communicating the situation correctly. I hope it helps clarify a number of points. Thanks for the initial answer.
– InvertedAcceleration
Feb 1 '10 at 13:46
The SVN repository has been alive for over 3 years and during that time a large percentage of the files I'm referring to have been 'SVN deleted'. There is also the problem of large binary files that were in flux during development (like large PSDs) that have since then solidified and will no longer change - so there may be 20 MB in deltas across varies commits for such a file (which I'm not sure how to find).
– InvertedAcceleration
Feb 1 '10 at 13:20
The SVN repository has been alive for over 3 years and during that time a large percentage of the files I'm referring to have been 'SVN deleted'. There is also the problem of large binary files that were in flux during development (like large PSDs) that have since then solidified and will no longer change - so there may be 20 MB in deltas across varies commits for such a file (which I'm not sure how to find).
– InvertedAcceleration
Feb 1 '10 at 13:20
I have substantially updated the question based on your answer to make sure I'm communicating the situation correctly. I hope it helps clarify a number of points. Thanks for the initial answer.
– InvertedAcceleration
Feb 1 '10 at 13:46
I have substantially updated the question based on your answer to make sure I'm communicating the situation correctly. I hope it helps clarify a number of points. Thanks for the initial answer.
– InvertedAcceleration
Feb 1 '10 at 13:46
add a comment |
Just a small thought, you say that the current state of the repository (the current HEAD) is good, i.e. the large binary files have been svn delete'ed in the past. Therefore your issue is purely the size of the repository?
I know you said you would like to keep all the commit history, but as an option, you could do two dumps, one for the whole revision history, and one for the current HEAD revision.
If you put the full dump on to a DVD for example you would have the data available if you ever needed it, but you could then delete the whole repository and svn load the revision dump, leaving you with a small clean repository.
it is also possible to dump from a specific revision onwards, rather than just the head, so for example you could keep the last 3 months of revisions and dump everything older on to a DVD....
add a comment |
Just a small thought, you say that the current state of the repository (the current HEAD) is good, i.e. the large binary files have been svn delete'ed in the past. Therefore your issue is purely the size of the repository?
I know you said you would like to keep all the commit history, but as an option, you could do two dumps, one for the whole revision history, and one for the current HEAD revision.
If you put the full dump on to a DVD for example you would have the data available if you ever needed it, but you could then delete the whole repository and svn load the revision dump, leaving you with a small clean repository.
it is also possible to dump from a specific revision onwards, rather than just the head, so for example you could keep the last 3 months of revisions and dump everything older on to a DVD....
add a comment |
Just a small thought, you say that the current state of the repository (the current HEAD) is good, i.e. the large binary files have been svn delete'ed in the past. Therefore your issue is purely the size of the repository?
I know you said you would like to keep all the commit history, but as an option, you could do two dumps, one for the whole revision history, and one for the current HEAD revision.
If you put the full dump on to a DVD for example you would have the data available if you ever needed it, but you could then delete the whole repository and svn load the revision dump, leaving you with a small clean repository.
it is also possible to dump from a specific revision onwards, rather than just the head, so for example you could keep the last 3 months of revisions and dump everything older on to a DVD....
Just a small thought, you say that the current state of the repository (the current HEAD) is good, i.e. the large binary files have been svn delete'ed in the past. Therefore your issue is purely the size of the repository?
I know you said you would like to keep all the commit history, but as an option, you could do two dumps, one for the whole revision history, and one for the current HEAD revision.
If you put the full dump on to a DVD for example you would have the data available if you ever needed it, but you could then delete the whole repository and svn load the revision dump, leaving you with a small clean repository.
it is also possible to dump from a specific revision onwards, rather than just the head, so for example you could keep the last 3 months of revisions and dump everything older on to a DVD....
answered Feb 1 '10 at 16:22
BParkerBParker
109112
109112
add a comment |
add a comment |
Elaborating on Otherside's answer, here's what specifically worked for me:
svnadmin create new-repo
svnadmin dump old-repo | svndumpfilter exclude --pattern '*.exe' '*.jpg' '*.png' | svnadmin load new-repo
You might be able to exclude your Obj
and Bin
directories by adding them to the svndumpfilter
command – I didn't try it.
Also, Subversion's fsfs-stats
program (new in Subversion 1.8, replaced by in 1.9 by svnfsfs stats
) might be useful for quantifying the file types and specific files that are filling up your repository.
This might be useful for comparing the repositories afterward:
colordiff -u <(svn log -v file:///.../old-repo ) <(svn log -v file:///.../new-repo)
add a comment |
Elaborating on Otherside's answer, here's what specifically worked for me:
svnadmin create new-repo
svnadmin dump old-repo | svndumpfilter exclude --pattern '*.exe' '*.jpg' '*.png' | svnadmin load new-repo
You might be able to exclude your Obj
and Bin
directories by adding them to the svndumpfilter
command – I didn't try it.
Also, Subversion's fsfs-stats
program (new in Subversion 1.8, replaced by in 1.9 by svnfsfs stats
) might be useful for quantifying the file types and specific files that are filling up your repository.
This might be useful for comparing the repositories afterward:
colordiff -u <(svn log -v file:///.../old-repo ) <(svn log -v file:///.../new-repo)
add a comment |
Elaborating on Otherside's answer, here's what specifically worked for me:
svnadmin create new-repo
svnadmin dump old-repo | svndumpfilter exclude --pattern '*.exe' '*.jpg' '*.png' | svnadmin load new-repo
You might be able to exclude your Obj
and Bin
directories by adding them to the svndumpfilter
command – I didn't try it.
Also, Subversion's fsfs-stats
program (new in Subversion 1.8, replaced by in 1.9 by svnfsfs stats
) might be useful for quantifying the file types and specific files that are filling up your repository.
This might be useful for comparing the repositories afterward:
colordiff -u <(svn log -v file:///.../old-repo ) <(svn log -v file:///.../new-repo)
Elaborating on Otherside's answer, here's what specifically worked for me:
svnadmin create new-repo
svnadmin dump old-repo | svndumpfilter exclude --pattern '*.exe' '*.jpg' '*.png' | svnadmin load new-repo
You might be able to exclude your Obj
and Bin
directories by adding them to the svndumpfilter
command – I didn't try it.
Also, Subversion's fsfs-stats
program (new in Subversion 1.8, replaced by in 1.9 by svnfsfs stats
) might be useful for quantifying the file types and specific files that are filling up your repository.
This might be useful for comparing the repositories afterward:
colordiff -u <(svn log -v file:///.../old-repo ) <(svn log -v file:///.../new-repo)
answered Oct 3 '17 at 19:46
Robert FlemingRobert Fleming
852910
852910
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f2176803%2fhow-to-identify-and-potentially-remove-big-binary-commits-inside-an-svn-reposito%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
The day will come when you regret doing this. But Otherside is right about "svnadmin dump" if you go ahead anyway.
– Ross Patterson
Feb 2 '10 at 1:54
2
Why would I regret this (honest question - rather than a challenge!)? I'm just trying to get rid of the content inside SVN that can either be stored elsewhere (which I will do) or it doesn't need to be stored at all. As far as I see it now the only regret I would have is if the svnadmin dump and svndumpfilter corrupt the repository history and its only identified after many many commits are made. Do you mean that historical corruption is likely?
– InvertedAcceleration
Feb 2 '10 at 9:45