Factual Errors in “Git vs Mercurial: Why Git?” -- and corrections shown by example

Fr, 01/31/2014 - 16:42 — Draketo

Update 2016: Instead of fixing the article, the Atlassian web workers removed the comments which point out the misinformation in the article. *sigh*

Summary:

In the Atlassian Blog, a Git proponent spread blatant misinformation which the Atlassian folks are leaving uncommented even though the falseness has been shown by multiple people and even in examples in the article itself.

The claims and corrections:

Claim: Git never loses unreferenced data. Mercurial needs special handling to retrieve unreferenced data. Reality: Due to automatic garbage collection, history editing in git unpredictably loses unreferenced history while Mercurial stores permanent backups which can be retrieved with core commands.
Claim: Only git branches are namespaced. Reality: Mercurial bookmarks are namespaced with bookmark@path, when there could be confusion. This is equivalent to git’s use of path/branch, but only used where it is needed, while git forces the user to always make that distinction.
Claim: Only git can provide a staging area. Reality: Activating mercurial queues (mq) and the record extension provides a staging area like the git index — for those who want it.
Claim: Git is more powerful. Reality: Both have the same raw power (as proven by transparent access with Mercurial to Git repos via hg-git), but
its “cuddly command line” gives Mercurial an efficiency during actual usage which most people do not find in Git.

2 years ago, Atlassian developer Charles O’Farrell published the article Git vs. Mercurial: Why Git? in which he claimed to show "the winning side of Git”. This article was part of the Dev Tools series at Atlassian and written as a reply to the article Why Mercurial?. It was spiced with so much misinformation about Mercurial (statements which were factually wrong) that the comments exploded right away. But the article was never corrected. Just now I was referred to the text again, and I decided to do what I should have done 2 years ago: Write an answer which debunks the myths.

“I also think that git isn’t the most beginner-friendly program. That’s why I’m only using its elementary features” — “I hear that from many git-users …” — part of the discussion which got me to write this article

Safer history and rewriting history with Git
Branching
Staging
Blame
Conclusion

Safer history and rewriting history with Git

Charles starts off by contradicting himself: He claims that git is safer, because it “actually never lets you change anything” - and goes on to explain, that all unreferenced data can be garbage collected after 30 days. Since nowadays the git garbage collector runs automatically, all unreferenced changes are lost after approximately 30 days.

This obviously means that git does allow you to change something. That this change only becomes irreversible after 30 days is an implementation detail which you have to keep in mind if you want to be safe.¹

He then goes on to say how this allows for easy history rewriting with the interactive rebase and correctly includes, that the histedit extension of Mercurial allows you to do the same. (He also mentions the Mercurial Queues Extension (mq), just to admit that it is not the equivalent of git rebase -i but instead provides a staging area for future commits).

Then he starts the FUD²: Since histedit stores its backup in an external file, he asks rhetorically what new commands he would have to learn to restore it.

Dear reader, what new command might be required to pull data out of a backup? Something like git ref? Something like git reflog to find it and then something else?

Turns out, this is as easy and consistent as most things in Mercurial: Backup bundles can be treated just like repositories: To restore the changes, simply use

hg pull backup.bundle

So, all FUD removed, his take on safer history and rewriting history is reduced to “in hg it’s different, and potentially confusing features are shipped as extensions. Recovering changes from backups is consistent with your day-to-day usage of hg”.

(note that the flexibility of hg also enables the creation of extensions like mutable hg which avoids all the potential race conditions with git rebase - even for code you share between repositories (which is a total no-go in git), with a safety net which warns you if you try to change published history; thanks to the core feature phases)

Branching

On branching Charles goes deep into misinformation: He wrote his article in the year 2012, when Mercurial had already provided named branches as well as anonymous branching for 6 years, and one year after bookmarks became a core feature in hg 1.8, and he kept talking about how Mercurial advised to keep one clone per branch by referencing to a blog post which incorrectly assumed that the hg developers were using that workflow (obviously he did not bother to check that claim). Also he went on clamoring, that bookmarks initially could not be pushed between repositories, and how they were added “due to popular demand”. The reality is, that at some point a developer simply said “I’ll write that”. And within a few months, he implemented the equivalent of git branches. Before that, no hg developer saw enough need for them to excert that effort and today most still simply use named branches.

But obviously Charles could not imagine named branches to work, so he kept talking about how bookmarks do not have namespaces while git branches have them, and that this would create confusion. He showed the following example for git and Mercurial (shortened here):

* 9e4b1b8 (origin/master, origin/test) Remove unused variable
| * 565ad9c (HEAD, master) Added Hello example
|/
* 46f0ac9 Initial commit

and

o  changeset:   2:67deb4acba33
|  bookmark:    master@default
|  summary:     Third commit
|
| @  changeset:   1:2d479c025719
|/   bookmark:    master
|    summary:     Second commit
|
o  changeset:   0:e0e024ff06ad
   summary:     First commit

Then he asked: “would the real master branch please stand up?”

Let’s try to answer that:

Git: there is a commit marked as (origin/master, origin/test), and one marked as (HEAD, master). If you know that origin is the canonical remote repository in git, then you can guess, that the names prefixed with origin/ come from the remote repository.

Mercurial: There is a commit with the bookmark master@default and one with the bookmark master. When you know that default is the canonical remote repository in Mercurial, then you can guess, that the bookmark postfixed with @default comes from the remote repository.

But Charles concludes his example with the sentence: “Because there is no notion of namespaces, we have no way of knowing which bookmarks are local and which ones are remote, and depending on what we call them, we might start running into conflicts.”

And this is not only FUD, it is factually wrong and disproven in his own example. After this, I cannot understand how anyone could take his text seriously.

But he goes on.

Staging

His final misinformation is about the git index - a staging area for uncommitted changes. He correctly identifies the index as “one of the things that people either love or hate about Git”. As Mercurial cares a lot about giving newcomers a safe environment to work in, it ships this controversial feature as extension and not as core command.

Charles now claims that the equivalent of the git index is the record extension - and then complains that it does not imitate the index exactly, because it does not give a staging area but rather allows committing partial changes. Instead of now turning towards the Mercurial Queues Extension which he mentioned earlier as staging area for commits, he asserts that record cannot provide the same feature as git.

Not very surprisingly, when you have an extension to provide partial commits (record) and one to provide a staging area (mq), if you want both, you simply activate both extensions. When you do that, Mercurial offers the qrecord command which stores partial changes in the current staging area.

Not mentioning this is simply a matter of not having done proper research for his article - and not updating the post means that he intentionally continues to spread misinformation.

Blame

The only thing he got right is that git blame is able to reconstruct copies of code from one file to another.

Mercurial provides this for renamed files, but not for directly copy-pasted lines. Analysis of the commits would naturally allow doing the same, and all the information for that is available, but this is not implemented yet. If people ask for it loud enough, it will only be a matter of time, though. As bookmarks showed, the Mercurial code base is clean enough that it suffices to have a single developer who steps up and creates an extension for this. If enough people use it, the extension can become a core feature later on.

Conclusion

“There is a reason why hg users tend to talk less about hg: There is no need to talk about it that much.” — Arne Babenhauserheide as answer to Why Mercurial?

Charles concludes with “Git means never having to say, you should have”, and “Mercurial feels like Git lite”. Since he obviously did not do his research on Mercurial while he took the time to acquire in-depth knowledge of git, it’s quite understandable that he thinks this. But it is no base for writing an article - especially not for Atlassian, the most prominent Mercurial hosting provider since their acquisition of Bitbucket, which grew big as pure Mercurial hoster and added git after being acquired by Atlassian.

He then manages to finish his article with one more unfounded smoke bomb: The repository format drives what is possible with our DVCS tools, now and in the future.

While this statement actually is true, in the context of git-vs-mercurial it is a horrible misfit: The hg-git extension shows since 2009, 3 years before Charles wrote his article, that it is possible to convert transparently from git to Mercurial and back. So the repository format of Mercurial has all capabilities of the repository format of git - and since git cannot natively store named branches, represent branches with multiple heads or push changes into a checked out branch, the capabilities of the repository format of Mercurial are actually a superset of the capabilities of the storage format of Git.

But what he also states is that “there are more important things than having a cuddly command line”. And this is the final misleading statement to debunk: While the command line does not determine what is theoretically possible with the tool, it does determine what regular users can do with it. The horrible command line of git likely contributes to the many git users who never use anything but commit -a, push and pull - and to the proliferation of git gurus whom the normal users call when git shot them into their foot again.

It’s sad when someone uses his writing skills to wrap FUD and misinformation into pretty packaging to get people to take his side. Even more sad is, that this often works for quite some time and that few people read the comments section.³

And now that I finished debunking the article, there is one final thing I want to share. It is a quote from the discussion which prompted me to write this piece:

<…> btw. I also think that git isn’t the most beginner-friendly program.
<…> That’s why I’m only using its elementary features
<ArneBab> I hear that from many git-users…
<…> oh, maybe I should have another look at hg after all

This is a translation of the real quote in German:

<…> ich finde btw auch dass git nicht gerade das anfängerfreundlichste programm ist
<…> darum nutze ich das auch nur recht rudimentär
<ArneBab> das höre ich von vielen git-Nutzern…
<…> oha. nagut, dann sollte ich mir hg vielleicht doch nochmal ansehen

Note: hg is short for Mercurial. It is how Mercurial is called on the command line.

Footnotes:

Garbage collection after 30 days means that you have to remember additional information while you work. And that is a problem: You waste resources which would be better spent on the code you write. A DVCS should be about having to remember less, because your DVCS keeps the state for you.

FUD means fear-uncertainty-doubt and is a pretty common technique used to discredit things when one has no real arguments: Instead of giving a clear argument which can be debunked, just make some vague hints that something might be wrong or that there might be some deficiency or danger. Most readers will never check this and so this establishes the notion that something IS wrong.

Lesson learned: If you take the time to debunk something in the comments, be sure to also write an article about it. Otherwise you might find the same misinformation still being spread 2 years later by the same people. When Atlassian bought Bitbucket, that essentially amounted to a hostile takeover of a Mercurial team by git-zealots. And they got away with this, because too few people called them up on it in public.

Druckversion
Login to post comments

Use Node:

⚙ Babcom is trying to load the comments ⚙

This textbox will disappear when the comments have been loaded.

If the box below shows an error-page, you need to install Freenet with the Sone-Plugin or set the node-path to your freenet node and click the Reload Comments button (or return).

If you see something like Invalid key: java.net.MalformedURLException: There is no @ in that URI! (Sone/search.html), you need to setup Sone and the Web of Trust

If you had Javascript enabled, you would see comments for this page instead of the Sone page of the sites author.

Note: To make a comment which isn’t a reply visible to others here, include a link to this site somewhere in the text of your comment. It will then show up here. To ensure that I get notified of your comment, also include my Sone-ID.

Link to this site and my Sone ID: sone://6~ZDYdvAgMoUfG6M5Kwi7SQqyS-gTcyFeaNN1Pf3FvY

This spam-resistant comment-field is made with babcom.

Factual Errors in “Git vs Mercurial: Why Git?” -- and corrections shown by example

Table of Contents

Safer history and rewriting history with Git

Branching

Staging

Blame

Conclusion

Footnotes:

Beliebte Inhalte

Heute:

Zuletzt angezeigt:

Draketo neu: Beiträge

Draketo neu: Kommentare

Sep. 12th decides the fate of the internet in the EU!