The Daily WTF: Curious Perversions in Information Technology

2011-09-07 Reply Admin

Touché.

On the other hand, when you know not everyone else is as passionate, you still have a chance ;)

And, between "funny" and "interesting", I know which one I'll pick any day -- passion is great.

And if the ACID 'I' sounds uninteresting to you, some of your code surely deserves to stand atop the very frist comment ;)

2011-09-07 Reply Admin

Wikipedia is a huge asset, surely the main reason I think a good smartphone and 3G are good.

Look at the talk page for mysql, there is already something about that subject and clearly noone wants to see those updates made...

2011-09-07 Reply Admin

L.:
Anonymous Cow-Herd:
L.:
(yes, all of you who use MySQL can be included in this if you think innoDB is strictly ACID compliant for example, etc.)
I guess you're including MySQL and InnoBASE in this, since they seem to think InnoDB is ACID-compliant. Eight of the page 1 results for "innodb acid compliant" claim that it is, the other two are a bug report where someone claims that it isn't only to find they're wrong (and by "bug report", I mean "rant that ended up in the bug tracker"), and a MySQL vs PostgreSQL comparison which claims it but doesn't substantiate it. So, we could do with an explanation of why it's not the case, and those external anyone-can-edit sources could do with updating with said same.

The only ones claiming that MySQL is acid compliant, is MySQL / Oracle themselves.

ACID : 'C' compliance means any transaction will bring the database from a consistent state to a consistent state, both of which of course respect every single rule implemented in the system.

Due to the way MySQL treats CASCADE, triggers will NOT be fired on cascade operations, which violates the consistency rule by making a cascaded action bypass triggers which inherently contain consistency rules.

On the same topic, MSSQL's trigger nesting is limited to 32 levels, which implies that in the event that a 33rd trigger should have been fired, the database will be left in an inconsistent state, thus breaking 'C' compliance aswell.

On the exact same topic, PostgreSQL's trigger nesting is NOT limited and their doc states developers should be careful not to create infinite trigger loops.

I do not know Oracle a lot but I would expect it to do the same as Postgres, considering how both are extremely focused on SQL standards, consistency and reliability.

Yes, most people don't care and most people don't notice and most people don't quite understand what ACID means and buy the sticker wether it's true or not, and that is why you can read everywhere that InnoDB is fine - written by people who don't use triggers/cascades/both (at least I hope so ...consequences would be interesting).

On the same ACID topic, for those who are interested, the 'I' is a very interesting beast ;)

If one is not in any position to pick the tools one uses, then it would be prudent for any engineer to learn the limitations of the tool to ensure that none of those limits are exceeded. I will wholeheartedly agree: use of a particular language, tool or environment etc. it not in itself a WTF, but making non-robust tools available for easy use by non-technical personnel definitely is.

This discussion about MySQL is a prime example of this. Without knowing about the limitations on triggers, and without familiarising myself with the ACID compliance (DB admin is not within my main field of expertise), I would have been blissfully ignorant about all this.

However, knowing that more than 32 levels of trigger are a Bad Thing on MySQL, I am a wiser man.

Having said that, my intuition already informs me that 32 levels of trigger is probably a WTF however you cut it.

2011-09-07 Reply Admin

L.:
Wikipedia is a huge asset, surely the main reason I think a good smartphone and 3G are good.
Look at the talk page for mysql, there is already something about that subject and clearly noone wants to see those updates made...

I read it as: noone wants to see those updates made without some decent sourcing to back it up. Understandably, they're not interested in "This says A, this says B, and I say A+B=C" - they're attempting an encyclopaedia, after all. As I mentioned earlier, most sources seem to say that InnoDB is ACID compliant, therefore, as far as Wikipedia is concerned, the proposition might not be entirely true but it is the only one that is properly "verifiable" (by their definition).

tl;dr it's not enough to provide the pieces, you need someone else to have completed the jigsaw.

2011-09-07 Reply Admin

Poo:
So, have you tried git?

Yes. It's far better than CVS and SVN, and branch creation is trivially simple, but non-trivial merges (e.g. between two lines that don't directly descend from each other, but have a common ancestor) don't always happen cleanly.

And when there are merge conflicts, it does a 2-way diff (yours and the other branch), instead of a 3-way diff (adding in the common-ancestor revision) which sometimes makes it hard to figure out what changed without extra work.

Finally, git has little concept of revision history for individual files, preferring to work instead on the entire repo. So I can't easily get a list of the last 5 changes to a single file, or diff the current version against the previous version. I have written scripts to do this, but I don't think I should have to.

2011-09-07 Reply Admin

32 levels is MSSQL and no it is not necessarily a WTF to have them. (mysql is the other problem)

I do agree that I haven't yet found a good reason to model anything using trigger chains expanding further than 32, but it makes sense that there would be extremely complex systems for which the best logical model would make extensive use of triggers.

2011-09-07 Reply Admin

Part-time dev:
I rather suspect that Alex doesn't understand distributed source control systems like Git and Mercurial. ... One of the really key things I've found with Git is that you never have to 'check-out' a file and two people working on the same file is rarely an issue. ... With VSS, if she gets to the repository before me then I can't do anything. In other ones, it's extremely easy for either of us to accidentally wipe out the other's work. (Sync A, Sync B, Commit A, Commit B - A's commit just vanished!)

This feature doesn't require a distributed system.

Perforce (sorry again for sounding like an ad) lets multiple users check out the same file at once. When submitting (i.e. "committing" in git terminology), the first one in has no issues. The others, when submitting are told that the file changed. They then issue a "resolve" command to merge the changes (using a 3-way diff to deal with conflicts) and re-submit once everything is merged satisfactorily.

So your sequence ends up as: Sync A. Sync B. A edits. B edits. Submit A. Submit B - generating error Sync B - informing B about changes that need to be resolved Resolve B - merging the diffs Submit B - which now succeeds

All this using a centralized server.

Git does the same thing, except that you get the errors and have to perform the merges at "push" time instead of at "commit" time.

The ability to have multiple people editing files at once is critical for any project of non-trivial scope, but the feature can be implemented using centralized servers as well as with distributed systems.

2011-09-07 Reply Admin

Looks like you read the page. It's all a matter of subject knowledge:

If you understand math, you will accept that anyone writes 1+1=2
If you understand ACID and MySQL info pages, you will accept that MySQL InnoDB is not acid compliant

There is no jigsaw, the information is right there without any modification IF you know the subject.

Abso · 2011-09-07 Reply Admin

Your introduction to source control probably was a lot like mine: “here’s how you open SourceSafe, here’s your login, and here’s how you get your files... now get to work.”

Actually, it was closer to "you don't have to use version control for your class projects, but it's a good idea. Here's how to set up RCS..."

The same prof also recommended learning either vi or emacs. That was a great course.

2011-09-07 Reply Admin

QJo:
Having said that, my intuition already informs me that 32 levels of trigger is probably a WTF however you cut it.

This reminds me that "The only numbers that make sense are zero, one, and infinity." I can't remember where I first heard it, but I found a reference:

http://www.catb.org/jargon/html/Z/Zero-One-Infinity-Rule.html

Gee, I guess URLs really do make the system think it's spam.

Matt Westwood · 2011-09-07 Reply Admin

swahl:
QJo:
Having said that, my intuition already informs me that 32 levels of trigger is probably a WTF however you cut it.

This reminds me that "The only numbers that make sense are zero, one, and infinity." I can't remember where I first heard it, but I found a reference:

http://www.catb.org/jargon/html/Z/Zero-One-Infinity-Rule.html

Gee, I guess URLs really do make the system think it's spam.

Robert Ainsley: Bluff Your Way In Maths (a.k.a. The Bluffer's Guide to Maths): "You will be expected to be something of a professional mathematician at university, and you should choose your image accordingly. There are three sharply defined groups of university mathematicians which we will number 0, 1 and \infty (the numbers 2 and 3 do not, of course, exist in university mathematics)."

2011-09-07 Reply Admin

annie the moose:
You're doing it wrong!
C:\VersionControl MyProg.201109060900.c MyProg.201109060904.c MyProg.201109060915.c

It's so easy.

O worked in one shop where the code was label prog.old, prog.new, prog.bad. The really hard part was finding the correlation between source and binaries. I instituted simple version control ( whatever came native in Unix, don't remember what is was ) and life became sane.

boomzilla · 2011-09-07 Reply Admin

David C.:
So your sequence ends up as: Sync A. Sync B. A edits. B edits. Submit A. Submit B - generating error Sync B - informing B about changes that need to be resolved Resolve B - merging the diffs Submit B - which now succeeds
All this using a centralized server.

Git does the same thing, except that you get the errors and have to perform the merges at "push" time instead of at "commit" time.

No, it doesn't do the same thing. B gets fully committed using git (or other modern DVCSes). This is a nontrivial difference, since there's no risk of losing B's changes during the merge. This is also one of the problems with svn.

David C.:
The ability to have multiple people editing files at once is critical for any project of non-trivial scope, but the feature can be implemented using centralized servers as well as with distributed systems.

I agree that there's no reason why not, but I'm not aware of one that does it like the modern DVCSes do, including Perforce, at least based on your description.

matchbox · 2011-09-07 Reply Admin

Great article... please post some more on various software engineering topics :)

However i got some serious questions.

Can anyone give me a good example where "Branching by Rule" would feel like the right thing to do? I can only come up with "Branching by Exception" examples.
The author seems to dislike distributed source control a little but i had one scenario in the past where i wished i had it and i would like to hear your thoughts. Let's assume i'm on the train (no internet connection) and i want to work on two tickets. I got all the code and environment set up on my laptop so i'm good to go.

With a distributed source control system i would finish the first ticket and commit it locally with some appropriate meta information to close the ticket too.

With a traditional source control system i can either fix both tickets together and commit them together or i could create two branches in advance one for each ticket just to avoid mixing those two together which seems like alot of overhead if they are just small bugfixes.

What are your thoughts on that? How would you approach this situation?

2011-09-07 Reply Admin

QJo:
wva:
Why the hell would you want to keep documentation out of the source control system?
What makes a latex/org/... file different from a c/py/.. file? In both cases you want to track and merge changes and see who did what when, and branch as ideas are tried or different "stable" versions are needed...

Documentation is usually written in Word...

Then you're doing it wrong. Way wrong.

matchbox · 2011-09-07 Reply Admin

some dude:
QJo:
wva:
Why the hell would you want to keep documentation out of the source control system? What makes a latex/org/...
Documentation is usually written in Word...
Then you're doing it wrong. Way wrong.

Documentation is usually written in Doxygen or Javadoc i'd say. Don't your mean specification and requirements? Which is reasonable to write in Word since it's read by non-developers too.

2011-09-07 Reply Admin

David C.:
Finally, git has little concept of revision history for individual files, preferring to work instead on the entire repo. So I can't easily get a list of the last 5 changes to a single file, or diff the current version against the previous version. I have written scripts to do this, but I don't think I should have to.

You are correct about git having little concept of individual files. The upside is that it makes it easy to follow code that is moved around between files. This does not however make it hard to follow individual files. Are you are running an ancient version of git? It has been easy to track file changes for years.

git clone https://github.com/git/git.git cd git Change log for a file: git log url.c Changes between current and another commit for specific file: git diff 3793a url.c View changes in a gui: gitk url.c

Alex Papadimoulis · 2011-09-07 Reply Admin

matchbox:
1. Can anyone give me a good example where "Branching by Rule" would feel like the right thing to do?

As a rule of thumb, you should Branch By Rule when most of the releases are considered exceptional under Branch by Exception. There are a lot of scenarios for this, but here's one: 50 developers split into 6 teams that maintain a large, application that's released on a monthly basis. Each team would work on a feature that's planned for a release 2-, 3-, 4-, 5-, or 6-months out.

matchbox:
2. The author seems to dislike distributed source control a little

I'm more frustrated by the buzz and excitement about it. The "delayed merging" and "easy shelving" has existed in propietary systems (Perforce, Accurev, etc), and could have easily been added to Subversion clients.

Heck, Subversion could have even added the one benefit of distributed systems (offline history viewing), but instead, we just started from scratch again with Git/Mercurial/etc.

It's like sea mammals: one step forward (underwater, woo hoo!), several steps back (no gills).

2011-09-07 Reply Admin

David C.:
Yes. It's far better than CVS and SVN, and branch creation is trivially simple, but non-trivial merges (e.g. between two lines that don't directly descend from each other, but have a common ancestor) don't always happen cleanly.
And when there are merge conflicts, it does a 2-way diff (yours and the other branch), instead of a 3-way diff (adding in the common-ancestor revision) which sometimes makes it hard to figure out what changed without extra work.

Finally, git has little concept of revision history for individual files, preferring to work instead on the entire repo. So I can't easily get a list of the last 5 changes to a single file, or diff the current version against the previous version. I have written scripts to do this, but I don't think I should have to.

Git is considered to be good at merging, so I'm not sure what specific issues you've been having. I think there's a bit of terminology confusion here: a three-way diff means that the merging code compares both versions to an ancestor version (all source control systems work this way), but it doesn't necessarily mean that those ancestor lines are displayed with a merge conflict. Git can be configured to display the ancestor lines (git config --global merge.conflictstyle diff3) but it's unfortunate that this isn't the default.

For viewing the last 5 changes to a specific file, try git log -n 5 -p filename.txt

2011-09-07 Reply Admin

L.:
Luiz Felipe:
The poop... of DOOM!:
Paratus:
The poop... of DOOM!:
The "Real" WTF:
6000 - using ACCESS as a database
7000 WTFP for using VB 7000 WTFP for using PHP

VB and PHP are certainly RWTFs, but there's no way that they're worse than using Access.
He said using Access as a database, so you can combine that.

A PHP application calling an Access database would result in 13000 WTFP (and a developer who's been committed to a mental hospital)

20000 Using firebird/interbase (its worse than access).

Access is a little db for simple use, its not WTF to use in correct situation, but its easy to abuse. There nothing wrong in using simple rdbms.

Firebird is crap, access can sustain more records and users.

Access is total crap, there is no valid reason to use Access instead of MySQL (which already is a simple rdbms that sucks a lot). I do agree that for very simple and basic db use, one can stick to mySQL or other half-assed dbms's, but it is also clear that a LOT of these cases are misunderstood.
I.E. developpers who know nothing about SQL think it's only good to store objects in a table, thus take no advantage of the tool and thus design an application that uses little or no features which IS a WTF in itself, for using the wrong tools for the job.

I'm not a DBA and I'm quite surprised to see how much other devs have no clue about SQL in general (yes, all of you who use MySQL can be included in this if you think innoDB is strictly ACID compliant for example, etc.) - in the end, know your tools and use them right, also remember some tools are USELESS for some projects, there is NO using them right (like access for anything or MySQL for complex applications).

In the end, the only good ones are and will be those who try to do better every single time, spend time reading and learning all they can (and posting their own fails on tdwtf for our enjoyment).

I agree that developers dont know SQL, and that mysql isam is not ACID.

But access is not bad, its only a tiny database.

Except that outlook uses access (its a variant of blue jet that access use), and it work good (when you dont have more than 2GB database of craper emails, then you need exchange).

Also, the windows instaler uses access to install the entire windows, office, and visual studio, and whatever uses msi installer, uses the red jet, its a variant of blue jet.

Access (JET) is full ACID compliant, except when you have more than +-512 locks (limited by filesystem), it will broke. It suports SQL also. Crap is what people has done with it. partly because classic asp and "webdevelopers" that think they can use a database, these people ruins everthing to shit.

People like to blame thinks that they dont know. I use JET to store temporary transactions when my (of my client) central sql server goes off-line because of lan/net problems (cheapo equipment, and eletromagnetic interference). For this purpose its is very god at it.

I agree also that JET is useless for most softwares, because its so simple and limited. But you cannot consider a knife to be a worse thing if you need to cut a tree.

Sorry, my english is poor, its not my native.

2011-09-08 Reply Admin

matchbox:
some dude:
QJo:
wva:
Why the hell would you want to keep documentation out of the source control system? What makes a latex/org/...
Documentation is usually written in Word...
Then you're doing it wrong. Way wrong.
Documentation is usually written in Doxygen or Javadoc i'd say. Don't your mean specification and requirements? Which is reasonable to write in Word since it's read by non-developers too.

Silly (and possibly deliberate for the sake of being obstreperous) misunderstanding. Javadocs are actually part of the code and are generated automatically and dynamically. As such, this documentation is, by default, part of the source code itself and this aspect of the documentation is subsumed into the source code version control system.

By "documentation" in the context of "what ought to be stored in the document version control system", we are talking about standalone documents, which are written either by or for the customer, which define what the application is supposed to do in the first place. It consists of things like invitation to tender, project initiation documents, purchase agreements, project plans, records of meeting minutes, technical architecture documents, business requirements, technical requirements, migration strategies, UAT strategies, and so on.

I would be prepared to agree that writing it in Word (and Excel) is "wrong. Way wrong" except that in order to be able to do business with our potential customers at all we need to be able to generate and receive documentation in such a format as the customer is prepared to work with. In every single project in which I have been involved, at least some documentation is written using Word and Excel.

Nasty as this is, it is a business truth which is ultimately futile to try and circumvent.

Here endeth the lesson.

2011-09-08 Reply Admin

No, seriously Access IS a WTF, it has numerous fails and you just said it's not acid compliant either, it's under windows (lol ?? windows server is a WTF too) and you have numerous less limited alternatives.

2011-09-08 Reply Admin

L.:
No, seriously Access IS a WTF, it has numerous fails and you just said it's not acid compliant either, it's under windows (lol ?? windows server is a WTF too) and you have numerous less limited alternatives.

I see your Windows server and I raise you an OSX server. All this: "It's supposed to be for end-users with no technical knowledge" and "it just works" bullcrap and then they make it into a server. You get a "genius" to pop a DVD into the drive and then you got a server, fully secure and set up to your specific needs? Rightoh!

2011-09-08 Reply Admin

Alex Papadimoulis:
Heck, Subversion could have even added the one benefit of distributed systems (offline history viewing), but instead, we just started from scratch again with Git/Mercurial/etc.

Fixing old systems is not the Open Source Way. Start by assuming everything made before now is utterly compromised and kludged by well meaning but clueless engineers trying to fix fundamental problems that simply can't go away without rearchitecting.

Sometimes it is even true.

Alex Papadimoulis:
It's like sea mammals: one step forward (underwater, woo hoo!), several steps back (no gills).

Gills are awesome if you're not endothermic. Otherwise, they're a bit like running your blood supply through a pair of bloody great heatsinks.

Tuna and sharks and probably other species have some sort of awful hack in the form of clever heat exchangers that let them have a body temperature a couple of degrees above ambient (which enables them to be a bit more energetic than other species) but it isn't going to be any more than a half-arsed attempt at fixing a fundamental problem with the architecture ;-)

Sea mammals on the other hand get to be quite adaptable to a range of temperatures and environments, they can be very effective predators and they get to have penetrative sex. That's a bit of a killer app, as I'm sure you'll agree.

2011-09-08 Reply Admin

gnasher729:
If you had worked 40 hours a week and told them for 18 months that everything was going to plan, they would have got exactly what they paid for, you would have enjoyed those 18 months a lot more, and you would have found a new job just the same.

You could ask yourself what would Wally do? Wally would do nothing other than write status reports for 18 months. Or you could do what a friend did back in the day, write status reports while working a contract job under their noses (Double Tap).

2011-09-08 Reply Admin

The Poop... of DOOM!:
L.:
No, seriously Access IS a WTF, it has numerous fails and you just said it's not acid compliant either, it's under windows (lol ?? windows server is a WTF too) and you have numerous less limited alternatives.
I see your Windows server and I raise you an OSX server. All this: "It's supposed to be for end-users with no technical knowledge" and "it just works" bullcrap and then they make it into a server. You get a "genius" to pop a DVD into the drive and then you got a server, fully secure and set up to your specific needs? Rightoh!

Alright you win ... damn OSX. Is there really anything that could compete with it ??

2011-09-08 Reply Admin

Gibbon1:
gnasher729:
If you had worked 40 hours a week and told them for 18 months that everything was going to plan, they would have got exactly what they paid for, you would have enjoyed those 18 months a lot more, and you would have found a new job just the same.

You could ask yourself what would Wally do? Wally would do nothing other than write status reports for 18 months. Or you could do what a friend did back in the day, write status reports while working a contract job under their noses (Double Tap).

Looking back on it, I think the reason I stuck with it so long was that I was actually enjoying the challenge, and it appeared at the time to be an opportunity to add some proper quality. Unfortunately that sort of relentlessness eventually takes its toll and you change your attitude towards it.

2011-09-08 Reply Admin

Wouldn't it be great to have a place where the Git/Maven/Hibernate/you name it fanboys would not pollute everything with their belief confessions?

Lack of meaning in your work? Finally something you feel you are ahead with?

That alone make me dislike Git (not too mention its terrible user interface, the lack of handling empty folders, partial checkouts, and pushes,... )

2011-09-08 Reply Admin

I've worked on both proprietary and open-source projects, and I've had to submit patches to someone else and hope they're ultimately included in the shipping product (and fix them and resubmit if they're rejected) in both cases. It's called code review, and proprietary developers do it too.

The main difference between open-source and proprietary development that I've seen is that proprietary products have multiple standards for quality and use different standards on different branches in the same SCM repo. If a customer is paying for an ugly hack, they may get such a hack in their branch, while the same patch might be rejected by the main product team--but that's just like any patch for an open-source project that gets shipped in a product somewhere, but isn't merged upstream.

It's possible to set up a DVCS as a drop-in replacement for a non-distributed SCM, but doing so wastes the opportunity to do process flow improvements that DVCS can enable. Since a DVCS 4th-dimension object can physically live anywhere, there's no reason why integration, build, QA, production, custom development services, and major product revision branches can't have their own repos with stars of users around them--and plenty of reasons why they shouldn't all necessarily share one giant churning burning repo, no matter what SCM you're using.

No one should start a new project on SVN today. Subversion doesn't just need a central server with excellent network connectivity--it needs a central server 150 to 200 times the size of the equivalent git server for program source code, and that server needs a low-latency network link to its users as well as a high-bandwidth one. If you have a large product, that kind of waste puts stress on every system near it, from storage to backups to IT hardware budgets to network operations staff.

frits · 2011-09-08 Reply Admin

Isn't it advisable to avoid shiny-new-toy syndrome when it comes to source and revision control?

2011-09-08 Reply Admin

boomzilla:
David C.:
So your sequence ends up as: Sync A. Sync B. A edits. B edits. Submit A. Submit B - generating error Sync B - informing B about changes that need to be resolved Resolve B - merging the diffs Submit B - which now succeeds
All this using a centralized server.

Git does the same thing, except that you get the errors and have to perform the merges at "push" time instead of at "commit" time.

No, it doesn't do the same thing. B gets fully committed using git (or other modern DVCSes). This is a nontrivial difference, since there's no risk of losing B's changes during the merge. This is also one of the problems with svn.

It really is the same thing. When you push your changes to the parent repo, you need to merge your changes with the other changes, and resolve conflicts.

The distributed systems preserve individual local changes because each person works with a local copy. Effectively, every client is a separate "shelf" branch.

A centralized system can do the exact same thing if every developer creates his own private branch. He can periodically merge the parent into his branch, similar to a "git pull", commit his changes to his branch without conflict, and then merge his changes back to the parent, resolving conflicts, similar to a "git push".

Same functionality, different command sequence. This applies equally well to any VCS that allows developers to easily create/merge branches at will, whether they are distributed or centralized.

2011-09-08 Reply Admin

Mr.'; Drop Database --:
Git is considered to be good at merging, so I'm not sure what specific issues you've been having. I think there's a bit of terminology confusion here: a three-way diff means that the merging code compares both versions to an ancestor version (all source control systems work this way), but it doesn't necessarily mean that those ancestor lines are displayed with a merge conflict. Git can be configured to display the ancestor lines (git config --global merge.conflictstyle diff3) but it's unfortunate that this isn't the default.
For viewing the last 5 changes to a specific file, try git log -n 5 -p filename.txt

Thanks. I didn't know you could make it show the ancestor lines with conflicts. IMO, that makes it much much easier to resolve said conflicts. I assumed, because it wasn't showing the ancestor, that it didn't use it in the merge process either. I'll add this configuration option to my current git clients (My current work involves git, which has been quite a learning curve, compared to Perforce, which my previous project used. But not nearly as scary as ClearCase, which is also used here.)

WRT viewing changes, what I'd like to do is what I frequently did with Perforce. Over there, I could type "p4 diff foo#4" and it would show me the diffs between the current version and the fourth commit on the current branch.

"git log -n 5" shows me the most recent five commits, but not the diffs.

I wrote a perl script to give me the functionality I like. My "gitediff" script allows me to type "gitediff foo.c#-5" which will copy the fifth-most-recent commit to a temporary file, launch emacs with that and the current file, and start an "ediff" to let me compare them.

The script for this was not hard to write, but it wasn't trivial either. It does a "git log" to get the commits for a file, and numbers them. Then it counts the number of edits provided as an argument to get the commit ID string for that revision, then does a "git show" to extract the file before handing it off to emacs.

It's about a 230 line script for producing all kinds of diffs using git:

gitediff foo - compare foo against the latest committed version (HEAD)

gitediff foo#<ver> - compare foo against a specified version

gitediff foo#<ver1>#<ver2> - compare two revisions of foo

where <ver> may be either an integer - representing a sequential commit on the current branch, or a negative integer - representing the most recent "nth" version, or a git commit string.

I had to write the logic for this because the built-in syntax is repo-based instead of file-based. For example, HEAD~5 shows the file as it was in the fifth-most-recent commit, even if the file in question didn't change since then. In contrast #-5 (in my script's syntax) represents the fifth-most-recent change to the specified file, even if that change took place hundreds of commits ago.

If people are interested, I can post this script for others to enjoy. Or maybe people can point out an easier approach to the problem.

2011-09-08 Reply Admin

frits:
Isn't it advisable to avoid shiny-new-toy syndrome when it comes to source and revision control?

Isn't it advisable to always avoid the shiny-new-toy syndrome unless there is some justifiable benefit excluding "it's new" and "it's shiny" ;). Guess that's why i'm still happy with xp. With CVS->SVN there was some real benefit. With SVN->GIT i guess it depends on the project.

2011-09-08 Reply Admin

even the lamest source control system (*cough*SourceSafe*cough*) will far outperform a Mercurial set-up with a bunch of haphazard commits and pushes

In other words, you can write Fortran in any language.

Matt Westwood · 2011-09-08 Reply Admin

Chris:
even the lamest source control system (*cough*SourceSafe*cough*) will far outperform a Mercurial set-up with a bunch of haphazard commits and pushes

In other words, you can write Fortran in any language.

Except COBOL, which of course isn't powerful enough.

2011-09-08 Reply Admin

David C.:
WRT viewing changes, what I'd like to do is what I frequently did with Perforce. Over there, I could type "p4 diff foo#4" and it would show me the diffs between the current version and the fourth commit on the current branch.
"git log -n 5" shows me the most recent five commits, but not the diffs.

You need the -p flag to make "git log" show diffs. It'll show one separate diff for each commit. I don't think there's a one-liner to view the combined diff across those versions though, short of specifying the file name twice and using shell trickery: git diff $(git log -n 5 --pretty=format:%H filename.txt | tail -n 1) HEAD filename.txt

Which I suppose is part of what your script does. So you're right, git doesn't provide the best tools for that sort of thing.

jnareb · 2011-09-09 Reply Admin

David C.:
Poo:
So, have you tried git?
Yes. It's far better than CVS and SVN, and branch creation is trivially simple, but non-trivial merges (e.g. between two lines that don't directly descend from each other, but have a common ancestor) don't always happen cleanly.
And when there are merge conflicts, it does a 2-way diff (yours and the other branch), instead of a 3-way diff (adding in the common-ancestor revision) which sometimes makes it hard to figure out what changed without extra work.

I think you meant here that it does not by default display the ancestor version in merge conflict markers, because Git always use 3-way merge when merging branches. You can make Git to include ancestor version either by configuring it by setting merge.conflictstyle config variable to "diff3", or run git checkout --conflict=diff3 file.

David C.:
Finally, git has little concept of revision history for individual files, preferring to work instead on the entire repo. So I can't easily get a list of the last 5 changes to a single file, or diff the current version against the previous version. I have written scripts to do this, but I don't think I should have to.

You can: `git log file`, `git diff HEAD^! -- file` (see git-log manpage for details on history simplification wrt former).

boomzilla · 2011-09-09 Reply Admin

David C.:

boomzilla:
No, it doesn't do the same thing. B gets fully committed using git (or other modern DVCSes). This is a nontrivial difference, since there's no risk of losing B's changes during the merge. This is also one of the problems with svn.
It really is the same thing. When you push your changes to the parent repo, you need to merge your changes with the other changes, and resolve conflicts.

Are you being deliberately dense? It truly isn't the same thing.

You do not have to do that merge. It's possible to have multiple anonymous branches (at least with mercurial, and I'd assume for most others, too).

David C.:
Same functionality, different command sequence. This applies equally well to any VCS that allows developers to easily create/merge branches at will, whether they are distributed or centralized.

Yes, it is possible to get the same end result with a lot more work by the users, and assuming that they always follow this pattern. But that's the problem. No one really does (it's just not the way humans function). It's a case of a tool making a common problem easier to solve. So far, I'm not aware of a centralized VCS that does it.

2011-09-09 Reply Admin

TRWTF is "WTF is this article doing on the TDWTF?"

2011-09-09 Reply Admin

valid user:
TRWTF is "WTF is this article doing on the TDWTF?"

After showing how it's done wrong all the time i appreciate they show how to do it right once in a while.

jnareb · 2011-09-09 Reply Admin

Alex Papadimoulis:
matchbox:
1. Can anyone give me a good example where "Branching by Rule" would feel like the right thing to do?
As a rule of thumb, you should Branch By Rule when most of the releases are considered exceptional under Branch by Exception. There are a lot of scenarios for this, but here's one: 50 developers split into 6 teams that maintain a large, application that's released on a monthly basis. Each team would work on a feature that's planned for a release 2-, 3-, 4-, 5-, or 6-months out.

You can find good example of feature branch (lots of branches) approach in last part of Eric Sink Version Control by Example (available on-line). Git development itself makes use of feature branches.

In short: using feature branches allows you to select which features to include in next release, and which are to be postponed.

Alex Papadimoulis:
matchbox:
2. The author seems to dislike distributed source control a little

I'm more frustrated by the buzz and excitement about it. The "delayed merging" and "easy shelving" has existed in propietary systems (Perforce, Accurev, etc), and could have easily been added to Subversion clients.

Heck, Subversion could have even added the one benefit of distributed systems (offline history viewing), but instead, we just started from scratch again with Git/Mercurial/etc.

"Delayed merging" and "easy shelving" is only a subset of workflows that DVCS allow.

Also Subversion made some design decisions which cannot work in distributed system, like global revision numbering (requires central numbering authority), and some bad design decisions, like "branch / tag is copy" (following Perforce AFAIK)... which makes branch deletion and merging complicated, and tags next to useless.

Starting from scratch (like in case of Git) was the only sensible choice.

jnareb · 2011-09-09 Reply Admin

G:
That alone make me dislike Git (not too mention its terrible user interface, the lack of handling empty folders, partial checkouts, and pushes,... )

Partial checkouts are available in modern Git (though not partial clone).

I don't know what you meant by ", and pushes,..." there.

2011-09-11 Reply Admin

David C.:
It really is the same thing. When you push your changes to the parent repo, you need to merge your changes with the other changes, and resolve conflicts.
The distributed systems preserve individual local changes because each person works with a local copy. Effectively, every client is a separate "shelf" branch.

A centralized system can do the exact same thing if every developer creates his own private branch. He can periodically merge the parent into his branch, similar to a "git pull", commit his changes to his branch without conflict, and then merge his changes back to the parent, resolving conflicts, similar to a "git push".

Same functionality, different command sequence. This applies equally well to any VCS that allows developers to easily create/merge branches at will, whether they are distributed or centralized.

It is not the same thing, as you are not forced to push after each one commit.

Or how will centralized system solve this: I am far away from workplace (maybe on my vacation in other country) with my NB set up, but without (reliable/usable/any) internet connection. There is a big problem with deadline soon after my leave ends, so there is no time to solve it AFTER return to work, but there is a time to connect to company net before that deadline.

The problem consist of couple small unrelated subproblems.

With git I would simply make a branch for every such subproblem, work in the branch, make a lot of commits, evetually even merge all those branches together in the end. Then after return to work, I would just merge master from server, resolve conflicts (if any) and push back - maybe only seconds of work.

With centralized VCS I would probabelly be stuck on the same start with making all those branches, not talking about unability to commit in those branches. (and to eventually undone some bad decisions as VCS allow, when it works).

Just because of missing connection I am effectivelly losing better half of the functionality of VCS on centralized systems.

2011-09-12 Reply Admin

gilhad:
It is not the same thing, as you are not forced to push after each one commit.
Or how will centralized system solve this: I am far away from workplace (maybe on my vacation in other country) with my NB set up, but without (reliable/usable/any) internet connection. There is a big problem with deadline soon after my leave ends, so there is no time to solve it AFTER return to work, but there is a time to connect to company net before that deadline.

The problem consist of couple small unrelated subproblems.

With git I would simply make a branch for every such subproblem, work in the branch, make a lot of commits, evetually even merge all those branches together in the end. Then after return to work, I would just merge master from server, resolve conflicts (if any) and push back - maybe only seconds of work.

With centralized VCS I would probabelly be stuck on the same start with making all those branches, not talking about unability to commit in those branches. (and to eventually undone some bad decisions as VCS allow, when it works).

Just because of missing connection I am effectivelly losing better half of the functionality of VCS on centralized systems.

You don't have to merge after each commit with a centralized system either. It's only required if everybody is working from the same branch.

Your distribution is simply creating extra branches behind the scenes. There are no semantic differences.

You talk about your git solution being to create a bunch of branches for your tasks. Centralized systems all allow this. I do it all the time. And when you're done with a branch, you merge it back to the parent, which will make you resolve conflicts.

As for "not forced to push after each commit", why does that change antying? The server maintains all the branches (including the ones you create while working). It won't conflict with anything else unless others start working with your branches. The only potential advantage here is not needing the network bandwidth at commit time.

You seem to be hung up on the fact that some centralized systems make it difficult to operate with hundreds or thousands of branches in flight at once. I know that some of the most popular free ones (like CVS) fall over and die under those circumstances, but that's a problem with specific products, not with the concept of a central server.

Distribution lets people commit changes while disconnected from the network, and this is useful for many applications, but it doesn't create any other capabilities.

2011-09-12 Reply Admin

David C.:
gilhad:
It is not the same thing, as you are not forced to push after each one commit.
Or how will centralized system solve this: I am far away from workplace (maybe on my vacation in other country) with my NB set up, but without (reliable/usable/any) internet connection. There is a big problem with deadline soon after my leave ends, so there is no time to solve it AFTER return to work, but there is a time to connect to company net before that deadline.

The problem consist of couple small unrelated subproblems.

With git I would simply make a branch for every such subproblem, work in the branch, make a lot of commits, evetually even merge all those branches together in the end. Then after return to work, I would just merge master from server, resolve conflicts (if any) and push back - maybe only seconds of work.

With centralized VCS I would probabelly be stuck on the same start with making all those branches, not talking about unability to commit in those branches. (and to eventually undone some bad decisions as VCS allow, when it works).

Just because of missing connection I am effectivelly losing better half of the functionality of VCS on centralized systems.

You don't have to merge after each commit with a centralized system either. It's only required if everybody is working from the same branch.
Your distribution is simply creating extra branches behind the scenes. There are no semantic differences.

No, it does not. It creates the branches on client, not on server, so I can do it offline. In VCS i cannot do that.

And I can create how many branches I want and do not polute the shared system with them.

David C.:
You talk about your git solution being to create a bunch of branches for your tasks. Centralized systems all allow this.

But only if you have good connection

David C.:
I do it all the time. And when you're done with a branch, you merge it back to the parent, which will make you resolve conflicts.
As for "not forced to push after each commit", why does that change antying? The server maintains all the branches (including the ones you create while working).

Only if I am online.

David C.:
It won't conflict with anything else unless others start working with your branches.

While in DVCS it simply does not conflict, regardless on what others do or do not.

David C.:
The only potential advantage here is not needing the network bandwidth at commit time.

Which allows me to commit as often as I need, whithout caring about resolving some conflict. And modify as much files as I need without making problems to all others.

David C.:

You seem to be hung up on the fact that some centralized systems make it difficult to operate with hundreds or thousands of branches in flight at once. I know that some of the most popular free ones (like CVS) fall over and die under those circumstances, but that's a problem with specific products, not with the concept of a central server.

Also the problem with concept is the connectivity, the space wasted on server and the difficulty of arranging safe cooperation of many developers working at the same project at one time

David C.:

Distribution lets people commit changes while disconnected from the network, and this is useful for many applications, but it doesn't create any other capabilities.

It also allows people to share the changes in more ways, that are possible (or acceptable) in centralized VCS.

Some things can be "emulated" in centralized VCS with a lot of unnecessary work, but usually the centralised VCS are ment to solve different and restricted problem.

2011-09-12 Reply Admin

Lets say (as happened to me) that there are three programmers working on one system. Then they deside to part, so two will continue on the system and the third will fork the system as different project. They diveded the company they formed and went away.

In DVCS the third simply removed one line of his configuration and was free.

In VCS the third should setup new server (including the need for HW to run it), copy over the full project with all history and then change the configuration to the new server.

But lets continue: then the second parted with the first and had to undergo the same task.

But later the second and third formed another pact and wanted to continue developing together, while keeping their respective history.

In DVCS it just needed one address to configuration and that was all for the setup. I do not know, how to manage that simply in centralized VCS.

2011-09-13 Reply Admin

QJo:
wva:
Why the hell would you want to keep documentation out of the source control system?
What makes a latex/org/... file different from a c/py/.. file? In both cases you want to track and merge changes and see who did what when, and branch as ideas are tried or different "stable" versions are needed...

Documentation is usually written in Word (or using a similarly ill-maintainable program) so in a source-control system usually need to be stored in binary form. In such a form it may not be as easy to establish what the differences are between versions.

If you're maintaining your documentation in e.g. TeX, then it may well be more appropriate to use a source-control system for the docs.

Another option is to use a wiki for the documentation.

There is a built-in diff-tool in MS Word (since it can/could do versioning on its own), so if your versioning system supports calling third-party diff-tools, it shouldn't be any problem having the document in source control.

2011-09-13 Reply Admin

The "Real" WTF:
6000 - using ACCESS as a database

666000 - using Excel as a database.

Alex Papadimoulis · 2011-09-13 Reply Admin

gilhad:
Lets say (as happened to me) that there are three programmers working on one system. Then they deside to part, so two will continue on the system and the third will fork the system as different project. They diveded the company they formed and went away.

This is quite possibly the most ridiculous argument in favor of DVCS I've heard to date. DVCS saves a few hours off of the untold hours needed to dissolve a business?

If this is your workflow, then you're not developing software. You're hacking.

There's nothing wrong with hacking, but it's a different world than commercial software development.

2011-09-14 Reply Admin

Alex Papadimoulis:
This is quite possibly the most ridiculous argument in favor of DVCS I've heard to date. DVCS saves a few hours off of the untold hours needed to dissolve a business?
If this is your workflow, then you're not developing software. You're hacking.

There's nothing wrong with hacking, but it's a different world than commercial software development.

If I remember correctly, then parting the company took less effort, than to setup SVN archive. A less working time too.

But the point was not leaving company, the point was easy forking and merging projects. Could happen even without company to start with :)

Source Control Done Right

Leave a comment on “Source Control Done Right”