Pruning Large Files From Git with BFG
These are my notes on removing large files from a git repository with the BFG Repo-Cleaner. It assumes that you have already pushed it to a remote repository. If you've committed the changes but haven't pushed them, see this post.
1 Background
I accidentally committed an 89 MB file to git and pushed it upstream to github. This is within the allowed file-size limit, but when I looked in the file it was filled with the same error message over and over again, so it wasn't useful to keep, anyway. I was just working with small log-files, so any large files indicated an error, anyway, so I decided to clean anything over 50 MB from the repository using BFG. The instructions on the home page mostly work, but didn't exactly work so I'm making some notes here for the next time.
2 The Process
Assuming you've downloaded the BFG jar file, this is what you need to do.
The --mirror
flag creates a bare repository so it will look a little odd (the top level has the contents of what are normally in the .git
folder). When I first tried this I thought I could clone my local copy but when I ran BFG on the clone it said that it couldn't find any large files. As noted in this bug-report the large files would be in the packfile if you clone it from a remote repository, but not necessarily in the local repository, so I had to clone it from github.
Next run the BFG.
Now you have to change into the cloned repository and execute some commands to update it.
Now push it back to github.
At this point the original local copy of the repository will still have the large file(s) in the history so if you just do a git pull it will think you are ahead of the remote, so you have to remove your original local repository and re-clone the remote.
One thing that tripped me up a little was that I had removed the master branch but the BFG re-added it, so it originally looked like I had lost some changes. Once I changed back into my working branch everything was as I had expected.