Everything I wish I’d known about git...
Aug. 24th, 2012 01:12 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
... But didn't know to ask
Many intelligent, knowledgeable, and well-intentioned people tried very hard to help me make the switch to git, and I am grateful for their efforts. They helped a lot, even if I wasn't always very appreciative at the time! That there was a gap between how they understand/explain things, and what I needed to know, is just one of those things that happens in life.
This post is primarily aimed at those, like me, who had many years of experience in other version control systems and are struggling with the transition to git. It is experiential rather than strictly practical, and goes over my background as well as my process of learning git. It is an opinion piece and I expect some will disagree rather vehemently with my point of view. It is possible that some of my explanations of how git works are not entirely technically correct, as I am still far from an expert on the subject. If you see something, let me know and I will correct it.
Before git, I had used SVN, CVS, MKS, and a little RCS. I had years of using both CVS and SVN, and knew how to use them well and without much pain (though if you'd asked me, I wouldn't have said I understood either one). The switch from CVS to SVN wasn't completely smooth, but it didn't take too long. The main thing I remember getting caught by was SVN remembering files when you moved them, something I wasn't used to having to worry about.
I will admit, I was a little annoyed with git before I ever used it. I don't enjoy interacting with zealots of any stripe. I want to hear "hey, this thing is great, here are some of the advantages," not "if you aren't using this you're doing it wrong." I tried to read so many pages about making the transition that threw in some comment like "this might be hard if SVN has broken your brain." This is alienating and just not helpful. I think git champions would get much happier (and more!) converts if they focused on conveying the necessary knowledge with less superciliousness.
Another important data point is that I was using SVN within Eclipse, which took almost all the pain out of branching and merging. Occasionally there would be a particularly bad situation and I'd have to get into three-way merges on the command line, but it was rare. For my normal, day-to-day case, I didn't have to put much thought or effort into version control.
I knew that distributed version control was the new way, and I was going to have to switch, and I understand there are some very good reasons to use git or hg. I was, however, entirely comfortable with the situation I had, and in no hurry. And then, my workplace announced the decision had been made, we were switching to DVCS and it was going to be git, not hg.
The Egit team has obviously been working very hard, and the plugin is much better now than it was when I first tried it. However, it still doesn't feel like using other version control in Eclipse (if there is a way to have the nice editable diff viewer, I haven't found it). I would much prefer to stay in my IDE, but I looked at everything available for OS X at the time, including various GUI tools (gitx, gity, etc.) and the command line. I ended up using PHPStorm mostly because it was the only thing I could find with decent git integration, and I fell in love with it (but that's a topic for another post :) ). All the JetBrains IDEs are excellent and integrate with git very nicely. If you prefer a standalone GUI client, I understand there are now far better git GUIs than when I was looking.
One of the earliest difficulties I encountered was not being able to keep track of what commands I needed to achieve the results I wanted. I ascribe this to two things:
My (slightly facetious) advice for those first learning git:
Silliness aside, I really do think that part of my great difficulty in picking up git was trying to map functionality onto commands based on the verb of the command. Things went much better once I started treating the commands as (almost) random tokens that had to be assigned a meaning. I also had to accept that the division of functionality across commands was very different than what I was used to. There are some obscure commands for things that didn't have a separate command in SVN, and some commands are overloaded with multiple useful functions. I am given to understand that if you know git internals, the commands make sense, but frankly, I don't want to need to know that sort of thing to use a version control system. I found it a lot easier to not try to make sense of it and do my best to just accept it.
Which leads me to my next point! With git, you can't just ignore how it works, and you may not be able to pick it up on the fly. I know some people for whom it came really naturally, but I was not one of them. As much as I wanted to go back to my peaceful little world where version control took up a minimal part of my time and mental space, it was not to be. I do think that once you know git, using it doesn't take more time, but it is significantly more complex than something like SVN so I don't think you can avoid the mental space requirement.
Complexities that caught me up:
At first I found it very frustrating that it seemed to take 3 commands to achieve 1 useful result. Many of my common use cases were tedious (e.g., commit all my modified files) in order to allow for special cases I rarely used. I think most people using git from the command line have a whole lot of customization to make this work nicely. In my case, I mostly use tools that take care of the details. Do yourself a favour and get tab completion for git! I don't know why you don't get it by default with the install, and I cannot believe I didn't realize this was an option for the first many months I was using git.
I heard a lot about how wonderful git is because you can have local commits that aren't pushed. This feature didn't seem that great until I used it, and I still worry a little about code being lost because it wasn't pushed. At the same time, it is pretty sweet to be on an airplane and still be able to make atomic commits as you work. What I didn't hear about as much, at least at the beginning, is that there are actually four distinct states [EDIT: there may be more, these are the ones I know], not three:
It was the second case that caught me up a lot at first. It is a bit mysterious how files become staged or unstaged as you merge/rebase/resolve conflicts/etc. "git status" is super helpful here, and things went better once I started using it all the time rather than trying to keep track of file state in my head. I do appreciate the helpful messages it provides, now that I know enough to understand them!
And now, the single biggest pain point I had in starting to use git: commits are not independent entities! They point at a tree that contains knowledge of their ancestry. You cannot just take a commit and put it on an arbitrary branch, even if the code changes do not conflict. This was a major stumbling block for me. I was used to keeping a mental model of whether changes were non-conflicting, and I expected that if a commit did not conflict, I could move it across branches at will. In git, however, each commit knows where it came from. Because I did not understand this, at first I found branch management in git unbearably painful, far worse than my experience in SVN.
The cause of the problem seems to boil down to the following situation:
This caused merry havoc in git. I routinely ended up with sets of changes that I knew did not conflict, but could not be merged or rebased or even cherry-picked from my branch into the release branch. Several times, I ended up re-making the changes by hand (an obviously terrible solution!) It is possible there was some way of recovering, but we had some pretty knowledgeable folks and none of them could find a way to resolve my situation using git commands. I was already starting to deduce that in some way, git commits knew what had come before. There used to be a fantastic page about the git object model, but http://vrac.cofares.net/git-one-page/ is now defunct. You may find the git book enlightening.
Before I had this moment of enlightenment, I routinely trashed my local repository such that nobody could figure out how to make it workable again, and I'd end up re-cloning. This happened so often, and cloning our large repo was so slow, I made a clean clone I never touched, and copied it into my workspace each time I needed to start afresh. I think it's not a bad thing to do when you are first learning. Just remember to git pull after you copy it to workspace and before you do anything else!
As you might have guessed by now, I didn't get a lot of coding done for the first few months I was using git. I seemed to spend all my time struggling with the version control. I think much of that was me just not getting how to interact with it in a way it liked. I have heard people say that git supports any workflow, but it most certainly did not support the one I used. I realize now that I actually had an excellent subconscious knowledge of SVN, such that I was able to use it and very rarely fall into one of its pitfalls. Learning a whole new way of working, so that I could equally well avoid git's pitfalls, took significant time. I still wish I had a way of knowing what will happen when I push (if you know, please comment!) [EDIT: "git diff HEAD..origin/master" does the trick and could probably be made prettier with a little work]. My solution has been to push and then go look in GitHub, which makes me a little sad.
I think I'd have made the adjustment more smoothly if I'd done some training sooner. As mentioned before, I had found it hard to read many resources due to the tone. Also, I'd never needed training to use version control effectively before, and so I resisted. Luckily for me, I finally stopped being stubborn and took the training course with Matthew McCullough. It was excellent: he was very clear and friendly and not condescending at all. By the time I took the course, I had learned a fair bit the hard way; I just wish I'd done the course sooner and saved myself all that grief.
It has now been over a year since I started using git, and it no longer causes me much pain. I can't say I love it, but I've made my peace with it and can work productively again. I am still uncomfortable with the degree to which it lets you rewrite history; I feel this defeats some of the purpose of version control. The design of the command line UI still does not impress me, but the messages given while doing commands are very helpful once you know enough to understand them. Stash is brilliant and I use it all the time - it's much better than the patching I used to do. I do like having little branches for each feature. I even occasionally make use of the ability to stage only some of my changes, though I generally feel one shouldn't need to do that often.
GitHub is great and allows for a really nice workflow. I love how easy it is to fork any project and play with it, and it's always fun to see people have forked your projects and possibly even submitted pull requests! GitHub is also pretty good for workplace teams, and I like not needing another tool for viewing diffs and doing code review. I recently had to decide on source control for a new company project, and while I looked around I chose GitHub in the end. I'm not sure I'd pick git on its own, but GitHub definitely tips the balance.
Partly, git was a hard sell to me because I didn't feel the pain of SVN and many of the listed advantages of DVCS were not important to me. There are some definite wins, though, and hopefully those coming after me will find the wins sooner and with less pain than I did.
Many intelligent, knowledgeable, and well-intentioned people tried very hard to help me make the switch to git, and I am grateful for their efforts. They helped a lot, even if I wasn't always very appreciative at the time! That there was a gap between how they understand/explain things, and what I needed to know, is just one of those things that happens in life.
This post is primarily aimed at those, like me, who had many years of experience in other version control systems and are struggling with the transition to git. It is experiential rather than strictly practical, and goes over my background as well as my process of learning git. It is an opinion piece and I expect some will disagree rather vehemently with my point of view. It is possible that some of my explanations of how git works are not entirely technically correct, as I am still far from an expert on the subject. If you see something, let me know and I will correct it.
Before git, I had used SVN, CVS, MKS, and a little RCS. I had years of using both CVS and SVN, and knew how to use them well and without much pain (though if you'd asked me, I wouldn't have said I understood either one). The switch from CVS to SVN wasn't completely smooth, but it didn't take too long. The main thing I remember getting caught by was SVN remembering files when you moved them, something I wasn't used to having to worry about.
I will admit, I was a little annoyed with git before I ever used it. I don't enjoy interacting with zealots of any stripe. I want to hear "hey, this thing is great, here are some of the advantages," not "if you aren't using this you're doing it wrong." I tried to read so many pages about making the transition that threw in some comment like "this might be hard if SVN has broken your brain." This is alienating and just not helpful. I think git champions would get much happier (and more!) converts if they focused on conveying the necessary knowledge with less superciliousness.
Another important data point is that I was using SVN within Eclipse, which took almost all the pain out of branching and merging. Occasionally there would be a particularly bad situation and I'd have to get into three-way merges on the command line, but it was rare. For my normal, day-to-day case, I didn't have to put much thought or effort into version control.
I knew that distributed version control was the new way, and I was going to have to switch, and I understand there are some very good reasons to use git or hg. I was, however, entirely comfortable with the situation I had, and in no hurry. And then, my workplace announced the decision had been made, we were switching to DVCS and it was going to be git, not hg.
The Egit team has obviously been working very hard, and the plugin is much better now than it was when I first tried it. However, it still doesn't feel like using other version control in Eclipse (if there is a way to have the nice editable diff viewer, I haven't found it). I would much prefer to stay in my IDE, but I looked at everything available for OS X at the time, including various GUI tools (gitx, gity, etc.) and the command line. I ended up using PHPStorm mostly because it was the only thing I could find with decent git integration, and I fell in love with it (but that's a topic for another post :) ). All the JetBrains IDEs are excellent and integrate with git very nicely. If you prefer a standalone GUI client, I understand there are now far better git GUIs than when I was looking.
One of the earliest difficulties I encountered was not being able to keep track of what commands I needed to achieve the results I wanted. I ascribe this to two things:
- the division of functionality across commands is very different from anything else I've seen
- in many cases, names of commands are not obviously logically connected to their function
My (slightly facetious) advice for those first learning git:
- If you know another version control system, do your best to forget the names of commands and the association of functionality with commands
- If you know English (which you probably do, if you are reading this :) do your best to forget what the words used for commands mean in English
- If you don't know what command to use for a common operation, the answer is probably checkout or add, so look at them first!
Silliness aside, I really do think that part of my great difficulty in picking up git was trying to map functionality onto commands based on the verb of the command. Things went much better once I started treating the commands as (almost) random tokens that had to be assigned a meaning. I also had to accept that the division of functionality across commands was very different than what I was used to. There are some obscure commands for things that didn't have a separate command in SVN, and some commands are overloaded with multiple useful functions. I am given to understand that if you know git internals, the commands make sense, but frankly, I don't want to need to know that sort of thing to use a version control system. I found it a lot easier to not try to make sense of it and do my best to just accept it.
Which leads me to my next point! With git, you can't just ignore how it works, and you may not be able to pick it up on the fly. I know some people for whom it came really naturally, but I was not one of them. As much as I wanted to go back to my peaceful little world where version control took up a minimal part of my time and mental space, it was not to be. I do think that once you know git, using it doesn't take more time, but it is significantly more complex than something like SVN so I don't think you can avoid the mental space requirement.
Complexities that caught me up:
- doing anything takes multiple commands
- changes actually have 4 states, not 3
- many entities, including commits, point at trees and are not independent objects
At first I found it very frustrating that it seemed to take 3 commands to achieve 1 useful result. Many of my common use cases were tedious (e.g., commit all my modified files) in order to allow for special cases I rarely used. I think most people using git from the command line have a whole lot of customization to make this work nicely. In my case, I mostly use tools that take care of the details. Do yourself a favour and get tab completion for git! I don't know why you don't get it by default with the install, and I cannot believe I didn't realize this was an option for the first many months I was using git.
I heard a lot about how wonderful git is because you can have local commits that aren't pushed. This feature didn't seem that great until I used it, and I still worry a little about code being lost because it wasn't pushed. At the same time, it is pretty sweet to be on an airplane and still be able to make atomic commits as you work. What I didn't hear about as much, at least at the beginning, is that there are actually four distinct states [EDIT: there may be more, these are the ones I know], not three:
- A file that has been modified
- A file that has been modified and staged (i.e. will be included in your next commit)
- Committed changes that have not been pushed
- Pushed changes
It was the second case that caught me up a lot at first. It is a bit mysterious how files become staged or unstaged as you merge/rebase/resolve conflicts/etc. "git status" is super helpful here, and things went better once I started using it all the time rather than trying to keep track of file state in my head. I do appreciate the helpful messages it provides, now that I know enough to understand them!
And now, the single biggest pain point I had in starting to use git: commits are not independent entities! They point at a tree that contains knowledge of their ancestry. You cannot just take a commit and put it on an arbitrary branch, even if the code changes do not conflict. This was a major stumbling block for me. I was used to keeping a mental model of whether changes were non-conflicting, and I expected that if a commit did not conflict, I could move it across branches at will. In git, however, each commit knows where it came from. Because I did not understand this, at first I found branch management in git unbearably painful, far worse than my experience in SVN.
The cause of the problem seems to boil down to the following situation:
- There was a main branch, with a branch created for each release.
- Local branches were created for features.
- The nature of my work was that I did not know at the time I started it if it would make it into a given release.
- I was typically the only person working in this part of the code, so I knew if there were conflicting changes on different branches.
- I was in the habit of making branches from the latest version on main, if there were no conflicting changes.
This caused merry havoc in git. I routinely ended up with sets of changes that I knew did not conflict, but could not be merged or rebased or even cherry-picked from my branch into the release branch. Several times, I ended up re-making the changes by hand (an obviously terrible solution!) It is possible there was some way of recovering, but we had some pretty knowledgeable folks and none of them could find a way to resolve my situation using git commands. I was already starting to deduce that in some way, git commits knew what had come before. There used to be a fantastic page about the git object model, but http://vrac.cofares.net/git-one-page/ is now defunct. You may find the git book enlightening.
Before I had this moment of enlightenment, I routinely trashed my local repository such that nobody could figure out how to make it workable again, and I'd end up re-cloning. This happened so often, and cloning our large repo was so slow, I made a clean clone I never touched, and copied it into my workspace each time I needed to start afresh. I think it's not a bad thing to do when you are first learning. Just remember to git pull after you copy it to workspace and before you do anything else!
As you might have guessed by now, I didn't get a lot of coding done for the first few months I was using git. I seemed to spend all my time struggling with the version control. I think much of that was me just not getting how to interact with it in a way it liked. I have heard people say that git supports any workflow, but it most certainly did not support the one I used. I realize now that I actually had an excellent subconscious knowledge of SVN, such that I was able to use it and very rarely fall into one of its pitfalls. Learning a whole new way of working, so that I could equally well avoid git's pitfalls, took significant time. I still wish I had a way of knowing what will happen when I push (if you know, please comment!) [EDIT: "git diff HEAD..origin/master" does the trick and could probably be made prettier with a little work]. My solution has been to push and then go look in GitHub, which makes me a little sad.
I think I'd have made the adjustment more smoothly if I'd done some training sooner. As mentioned before, I had found it hard to read many resources due to the tone. Also, I'd never needed training to use version control effectively before, and so I resisted. Luckily for me, I finally stopped being stubborn and took the training course with Matthew McCullough. It was excellent: he was very clear and friendly and not condescending at all. By the time I took the course, I had learned a fair bit the hard way; I just wish I'd done the course sooner and saved myself all that grief.
It has now been over a year since I started using git, and it no longer causes me much pain. I can't say I love it, but I've made my peace with it and can work productively again. I am still uncomfortable with the degree to which it lets you rewrite history; I feel this defeats some of the purpose of version control. The design of the command line UI still does not impress me, but the messages given while doing commands are very helpful once you know enough to understand them. Stash is brilliant and I use it all the time - it's much better than the patching I used to do. I do like having little branches for each feature. I even occasionally make use of the ability to stage only some of my changes, though I generally feel one shouldn't need to do that often.
GitHub is great and allows for a really nice workflow. I love how easy it is to fork any project and play with it, and it's always fun to see people have forked your projects and possibly even submitted pull requests! GitHub is also pretty good for workplace teams, and I like not needing another tool for viewing diffs and doing code review. I recently had to decide on source control for a new company project, and while I looked around I chose GitHub in the end. I'm not sure I'd pick git on its own, but GitHub definitely tips the balance.
Partly, git was a hard sell to me because I didn't feel the pain of SVN and many of the listed advantages of DVCS were not important to me. There are some definite wins, though, and hopefully those coming after me will find the wins sooner and with less pain than I did.
no subject
Date: 2012-10-09 12:16 am (UTC)