Source control is not a feature you can postpone to vNext
I was taking part in a session in the MVP Summit today, and I came out of it absolutely shocked and bitterly disappointed with the product that was under discussion. I am not sure if I can talk about that or not, so we will skip the name and the purpose. I have several issues with the product itself and its vision, but that is beside the point that I am trying to make now.
What really bothered me is utter ignorance of a critical requirement from Microsoft, who is supposed to know what they are doing with software development. That requirement is source control.
- Source control is not a feature
- Source control is a mandatory requirement
The main issue is that the product uses XML files as its serialization format. Those files are not meant for human consumption, but should be used only through a tool. The major problem here is that no one took source control into consideration when designing those XML files, so they are unmergable.
Let me give you a simple scenario:
- Developer A makes a change using the tool, let us say that he is modifying an attribute on an object.
- Developer B makes a change using the tool, let us say tat he is modifying a different attribute on a different object.
The result?
Whoever tries to commit last will get an error, the file was already updated by another guy. Usually in such situations you simply merge the two versions together, and don't worry about this.
The problem is that this XML file is implemented in such a way that each time you save it, a whole bunch of stuff gets moved around, all sort of unrelated things change, etc. In short, even a very minor change cause a significant change in the underlying XML.
You can see this in products that are shipping today, like SSIS, WF, DSL Toolkit, etc.
The problem is that when you try to merge, you have too many unrelated changes, which completely defeat the purpose of merging.
This, in turn, means that you lose the ability to work in a team environment. This product is supposed to be aimed at big companies. But it can't suppose a team of more than one! To make things worse, when I brought up this issue, the answer was something along the line: "Yes, we know about this issue, but you can avoid this using exclusive checkouts."
At that point, I am not really sure what to say. Merges happen not just when two developers modify the same file, merges also happen when you have branches. As a simple scenario, I have a development branch and a production branch. Fixing a bug in the production branch requires touching this XML file. But if I made any change to it on the development branch, you can't merge that. What happen if I use a feature branch? Or version branches?
Not considering the implications of something as basic as source control is amateurish in the extreme. Repeating the same mistake, over and over again, across multiple products, despite customer feedback on how awful this is and how much it hurt the developers who are going to use it shows contempt to the end developers, and a sign of even more serious issue: the team isn't dogfooding the product. Not in any real capacity. Otherwise, they would have already noticed the issue, much sooner in the lifetime of the product, with enough time to actually fix that.
As it was, I was told that there is nothing to do for the v1 release, that puts the fix (at best) in two years or so. For something that is a complete deal breaker for any serious development.
I have run into issues where merge issues with SSIS caused us to have to drop days of work and having to recreating everything from scratch, costing us something in the order of two weeks. I know of people that had the same issue with WF, and from my experiments, the DSL toolkit has had the exact same issue. The SSIS issues were initially reported on 2005, but are not going to be fixed for the 2008 (or so I heard from public sources) , which puts the nearest fix for something as basic as getting source control right in a 2-3 years time.
The same for the product that I am talking about here. I am not really getting that, does Microsoft things that source control isn't an important issue? They keep repeating this critically serious mistake!
For me, this is unprofessional behavior of the first degree.
Deal breaker, buh-bye.
Comments
Don't tell me it's SourceSafe
We have the same problem with another Microsoft product - Visual Studio 2005. The contents of the .vcproj files used to store project details get moved around randomly every time a change is made. It makes merging changes automatically impossible.
Is the entity framework designer that bad? ;)
Btw, nhibernate xml mapping files (where everything is in 1 mapping file) has the same issue IMHO.
It's related to a file where references are stored between elements in that same file. you can only solve that by using some sort of textbased DSL, as sourcecontrol merge algorithms are good at merging these.
The thing with XML is that it's easy to write a parser for the xml format (as you can get away with serializers) than writing a DSL.
I would love to hear a little more context on this subject but I suppose that when the product you are talking about hits the market, it will probably be rather obvious :)
@Tom: Ouch, I didn't know that about vcproj files. Is 2008 any better?
Linq to SQL does not seem to have this problem so it must be entity framework.
Frans: Why are you storing everything in 1 mapping file? We use 1 file per entity. Also, I don't see how NH mapping files have this problem. They are edited by hand and dont randomly change and move stuff.
Is this an MS problem, a VSTS problem, an X problem (where X is a program whose output is in xml format and is under source control) or is it the nature of the XML beast?
Which is it?
Clearly XML doesn't force you to place tags in a specific order. So, wouldn't this be an issue with any product on any platform with any source control?
We don't need no education.
We don't need no source control...
@el Guapo: 1 mapping file could be a choice for people with hundreds of entities perhaps. Otherwise you'll have xml files all over the place.
That they're edited by hand is no excuse. 300 entity definitions which have references to eachother all over the place makes it hard to version it, especially with larger sets it means that changes can be bigger and happen more often than when a simple name change is performed.
Oh, I know what you are talking about...
If configuration were coded in Boo, it would be much easier to merge files.
Right?
I would guess the product is called Biztalk.
The problem is not that its an xml file, its the fact that when you change something the xml file structure gets reformatted, hence the conflict.
I would guess that product is whatever with VISUAL DESIGNER.
It is almost impossible to show the delta or facilitate merge visually and if you look under the hood, nothing is not human readable.
Ayende, I wrote a post kinda like this, about the minimum requirements for producing good software. I think it's even more than JUST source control.
To avoid being accused of link-whoring, I'll post the URL through tinyurl:
http://tinyurl.com/6bclpl
I think source control, continuous integration, etc all come back to 'repeatability'. If your software can't be found, built, and tested repeatably by someone on a PC other than yours, that's very bad!
The nHibernate mapping file argument doesn't hold up. Xml is (or should be) easy to merge using an XML diff tool. The point that is being made, which I would imagine is exactly the same problem as with SSIS today, is that the tool doesn't just "change" and attribute. It just serialises a load of "stuff" into an xml file that isn't structured very well, so there's no way a diff tool can do a good job. I can't see why you would ever completely re-order and re-structure an nHibernate mapping file and then try to merge it. Usually you will change a value, add a node, remove a node ... all sorts of things that are easy to merge.
A second problem with the SSIS file format is that it isn't just XML, it's actually serialised XML within an xml file. No way you can merge that correctly. So Alex, I don't think the argument that if it was Boo it would be better is a good one either. If it was Boo, and the software would re-organise the structure and add loads of meta data in random places each time you hit save, then you would end up with the same problem!
Nothing wrong with using an XML file as a source in itself (in terms of source control, I don't want to get into the XML vs programming argument), but keeping source control in mind when implementing it is crucial.
So, I agree 100%, you can't design a development tool and not think of these issues and still expect to be taken seriously. It's a deal breaker for me too.
Merge engines in source control generally look for contextual changes in text files. While XML is text, it really is more of a structured data format.
How hard would it be to right something that looked at two XML data structures and noted differences? It'd be like comparing tables in a database. Difficult, but not impossible.
Brian Harry strikes again. Checkout-Edit-Checkin is a far worse model than Edit-Merge-Commit, and you can't just shoehorn Edit-Merge-Commit on top of Checkout-Edit-Checkin in V2 and call it good, which is exactly the decision Brian made. They should not have put the VSS guy in charge of TFS source control, but they did. Of course they were already used to Source Depot (modified Perforce), so they didn't know any better regarding Edit-Merge-Commit, but still, the fact that they chose to forego source control entirely (WOW!) rather than use the model they knew (and hated) is a glaringly obvious sign that Checkout-Edit-Checkin is too heavyweight.
@ Oran: I don't think the Edit-Merge-Commit model would fix the problem that Ayende is ranting about. You still may have to merge two wildly different versions together at some point. Imagine that you have a team of developers working together in an Edit-Merge-Commit source control tool, but every time any one of them makes any kind of change, they randomly reorder all of the functions in the file. Somebody is going to feel some merge pain.
I agree that this sort of 'semi-structured' XML (for lack of a better term) has the potential to cause a TON of friction. I experienced this at my last job with SSAS (analysis services). I made a few changes to the cube to get my MDX queries to work, while my boss was making major cube changes to get something else to work simultaneously (unbeknownst to me). We discussed it over the phone and decided there shouldn't be a problem since our changes didn't overlap. (I changed completely seperate dimensions/measures than he did.) Accordingly he checked in his changes, and then I tried to check-in mine, expecting Subversion to be able to merge the two. Unfortunately, I got that desperate sounding "merge conflict" sound from TortoiseSVN and then was horrified to discover that there were a bazillion strange little differences between my boss' version of the cube (SSAS-specific XML) and mine, because of this exact type of random XML re-ordering that you're describing. I then proceeded to spend a half day re-implementing my changes on top of his his, rather than try to merge the two together. The horror! What were they thinking?
That is the exact reason I've stopped trusting and using .resx files. I'm developing my own layout file, where at least there it won't shuffle the file all over the place, making me lose entire days of work during the merge.
It ain't just Microsoft, throw corporate America, Google, Sun, and open source in there too.
The programmers at these outfits aren't any better or worse than anyone else - we just wish they would be better.
Bottom line, protect yourself with comments, asserts, logs, and source control. To heck with the other guy because no doubt you'll be straightening out his mess, but he won't be working on your mess because there is no mess.
Darius,
No, it is not a source control system. It is a product that cannot be used in source control.
Tom,
There are a LOT of stuff from MS that do that. This is broken, period.
Frans,
NH mapping files are not randomly shuffled each time you edit them. If you edit something, it is a local modification.
If you edit something using MS approach, 70% of the file have changed.
This is not an issue with XML, it is an issue with the way they are using XML.
El,
About Linq to SQL, no, it doesn't have this issue.
Will,
This is a problem in a lot of MS products. It is specifically an issue with a product that I saw that saved its files in such a way that make source control useless.
Frans,
In NH, you don't need to reference other entities in a way that is broken on each change.
If you make a modification, it is local and mergable.
I was working with a domain that had > 10,000 entities across ~700 databases, no issues.
Alex,
See the rest of the stories in the comments for clarifications. There is zero reason this can't work, it is just made broken.
Specifically, Driesie does a good job describing the problem.
Glyn,
Yes, that is the problem I am talking about.
Alex,
Yes, it has a designer.
Chad,
The first step, get your source control story right. This is about as basic as it can get.
After that, we can talk.
The other steve,
There is not issue with XML itself, but the way they save stuff is by randomly moving things around. A simple change affect the entire file.
Can't code review that, can't merge that, can't branch that, broken.
Oran & Eric,
Actually, checkout / edit / commit was the proposed "solution" to this issue.
It doesn't work, because branches are actually useful.
Exactly, exclusive checkouts are not the right solution to their lack of foresight, but it's unfortunately the default solution most Microsofties will think of because that's the style of source control everyone uses there. Even if they dogfooded it, they would dogfood it with exclusive-checkout blinders on. :-(
@Ayende
"Alex, Yes, it has a designer."
Oh, perfect... Now answer the following question:
You have created a class diagram and checked it in. Two other developers checked out diagram and moved boxes around (doesn't affect code generation, so nothing changed but designer file). Now they are checking in their files and asked to merge their changes...
Assuming that they completely understand file layout and it is not mixed up (sorted alphabetically :P ). How do they do it if one wants base class on the left and the other wants base class on the right (and I prefer them on top)?
Kind of checken and egg queation.
If a language is design to be not Source Control friendly.
I don't understand why this is a Source Control System's fault?
I don't think Source Control is any friendlier to LISP or PROLOG or T-SQL language files.
Source Control are special nice to procedual language like C, BASIC, PASCAL. (That's why CVS, preforce, are all line base diff. -- They are easier to implement)
I think I miss something here -- Are you claim that XML is also a procedual language so Source Control system must do a "same" job as they did for C, C++, Basic, pascal?
The problem is the designers use xaml, if you move a object on your diagram, the xaml is changed dramatically. Therefore when two developers try to merge there changes (eg linq to sql designer) you have merge hell.
I choose on my latest project just to use sqlmetal.exe instead of the horrible designer.
Alex,
That issue is a classic conflict, you have to decide what to do.
And that scenario, FYI, completely breaks in that product.
Alex: probably the ordering of boxes in the designer (which has no implication to the generated classes) should be persisted as a per-user preference and not being versioned.
Even if it was
<diagram name="uml1">
<class name="entity">
</class>
</diagram>
<visual>
<position element="class[@name=entity]" x="0" y="250" />
</visual>
this should be easy to version. The problem arises with carelessly created xml (usually xml serialized objects) that has no special order and random formats. If changing y="250" to y="251" leads to lots of linebreaks, reordered class elements, this can become unmergable.
@Jan
"should be persisted as a per-user preference"
Then solve this chicken/egg problem:
When you change class diagram, the designer changes generated classess, when you change generated classess, the designer updates diagram. All nice and peachy when you are on your own in the Visual Studio. Now you have checked out designer file and generated classess from the SCM. How do you validate the check out? Do you update diagram or classess?
@ Alex
I fail to see the problem, because the only part that could be out of sync is visual meta data (because that is what is not versioned). So new objects would be placed on some default locations defined by the program or removed if deleted.
Classes and diagrams should be in sync due to the diagramming tool (as you suggest). If not, it is the task of the designer to update itself from the concrete classes (the other way would be automatic, wouldn't it?).
Maybe I am missing something here...
Why do you still use .NET or Microsoft specific technologies then? Based on the Java SDK code that I've read, Sun and Java developers are far more competent.
@Jan
"So new objects would be placed on some default locations"
How do you know that new objects should be placed on the diagram first of all. By having one diagram for all objects in the system?
"Classes and diagrams should be in sync..." Exactly. So, if object is not on the diagram, should it ba added to diagram or deleted (because it is deleted from diagram). I use "diagram" as an example, you can imagine some UI designer that cannot be considered as secondary tool, it is primary tool to create something, but with limitations that force you to retort to code editing. (This is real life).
Oh crap.......Not Oslo......Dam....Crap
Whatever.
well this always happens if you choose "wizard" way of work. but i'm not sure if world without wizards would be better place to live.
what we can do now? use inteligent xml merge tools if only formatting and ordering is problem.
what tool vendors can do? split files into multiple parts (fundamental content, layout stuff, other unimportant fluff) so we can merge only fundamental part.
and yes i must confirm that most companies work in checkout-edit-checkin mode...
Even if you use the checkout-edit-checkin scenario, what if a bug comes up. If it was a minor change which caused the issue how do you find the minor change when 70% of the file has changed?
Makes me wonder what it would be like to compare a diff on word generated OXML file versus a Open Office generated ODF file....
@Oren: every o/r mapping file has elements which refer to other elements by name. there can be a situation (e.g. with complex inheritance scenarios) where person A removes or changes elements in such a way that his file is still correct but when merged with another file which is also a changed version of the SAME parent, you'll get merge conflicts.
That's where the gripe is all about I think: a diff tool can only go so far before it has to give up because it has to decide between two (or more) options and the user now has to fix it manually. With reshuffled XML this is hard to do. You don't sell me the story that making two copies from the same nhibernate mapping file, changing things at different places in these 2 files will never result in a merge conflict. That's not something bad about nhibernate, I just used that as an example. We use a binary file, which is also a pain in scc. (we therefore will move to a text dsl soon)
I think the main point is: HOW should XML be merged in a SCC system? I think Driessie gave a good hint: with an XML diff tool. So, a SCC shouldn't threat XML files as text but as XML data, and should use the appropriate diff tool to merge them. After all, using an XML diff tool, merging XML is easy, with a text-diff tool it can be a pain.
This goes further: merge conflicts arent solvable in a texteditor when the text is xml. Sure simple name changes are, but if the xml differs a lot, especially when elements refer to other elements (relationbetween entities based on a field which doesnt happen to be there in the merged data, -> relation has to go as well, but because that relation goes, the inheritance hierarchy has to go...), merge conflicts are only solvable with a tool which is made especially for xml merge conflicts, as it understands that the text at hand is XML, i.e. structured data, not random ascii.
@Chris: I don't know - haven't installed it yet but I have the DVD ready for a rainy day. It's possible this doesn't affect PC developers but we're doing builds for multiple WinCE devices (using different SDKs) and it's a major pain. I also found a bug which corrupts the .vcproj file when you edit the preprocessor settings but Microsoft can't recreate it - odd how it happens to me on a daily basis!
The easy way to look at this problem is that there is an impedance mismatch between the visual designer, the xml representation of the visual design, and text file view that a source control system has of the xml. You can either put some smarts in the visual -> xml layer that preserves ordering within what is nominally non order specific data, or put some smarts in the xml - source control layer. Both have their weaknesses. The contract for a visual designer persistance is that it can correctly save and reload changes, not generally that it needs to preserve whatever changes have occurred external to the designer. Adding a merge friendly contract to the persistance routines for a visual designer isn't a particularly easy thing to achieve (in my mind at least, and this is backed up by anecdotal evidence of SSIS, SSAS, ...). Using a flat file to represent structured data is the first impedance mismatch that needs to be addressed. Perhaps there is a data + delta format that would make more sense here?
Smarts in the source control is another way to handle this. Let's say in some source code, I make a change to a line in a method (for instance I correct an off by 1 bug) and checkin v2. Next I move that method before the previous method in the file and check in v3. Diffing v1 to v3 doesn't give me a real indication of the v1 to v2 changeas the v2 to v3 changes are more obvious and noticeable in the diff. Smarts in source control interpretation of the 'source' that is being stored would allow that kind of situation to be more opaque. Do any source control systems do this at present?
It happens the same with VisualStudio .sln files :(
This people seem to be working with VSSourceSafe+exclusive checkouts, damn...
Frans,
I am fine with XML files and merging them. And I am not saying that you'll never have merge conflicts.
The issue is what type of merge conflicts.
If changing a single property results in 70% of the file being changed, that is not mergable.
If changing a single property results in a single line being changed, that is mergable.
Ayende - do you have any ideas with XML files merging problem?
get all XML files generated from plain text by NAnt, and keep only that plain text in SVN instead of XML?
Or you can write a megasuper 2-way processor XML-> plain text of xml - > merge plain texts from "our" and "their" version -> convers merges back to XML?
How do you work with .csproj and .sln files, which are xml, during merging?
And thank you for that post, I'll never use XAML due for XML merging reason
in a way XML helped a lot (even though version control tools cannot catch ) because the only remaining "comfortable" options for most of vendors would be custom binary formats.
Another way again of looking at this is that the xml really stores two parts of relevant information. Data in the form of elements, attribs etc, and presentation in the form of whitespace ordering of elements, attributes etc. It is the presentation that is by xml's very definition open to interpretation. Given a particular schema and a particular set of data it is possible to fins infinite ways to present that data in xml. Source control could take advantage of this to store a normative version of the data (e.g. all non relevant whitespace stripped, attribs ordered by name alphabetically, elements similarly ordered when appropriate. Attached to this would be some for of transform (xslt perhaps?) which would get us to the presentation that was last saved.by the designer. This would be good for diffs between versions, and could be extensible enough to even solve code style arguments (K&R vs allman)
*find infinite ...
some *form of transform...
I can't agree that XML helped a lot because the only remaining format is binary.
Ruby on Rails have perfect YML format, which is realy simple and mergable
Even old [ini] format like
[section]
property=value
...
is more friendly source control
C# Programmer,
in what way is YAML more mergable than XML? if it's used for object graph serialization it really doesnt matter whether you enclose string in quoation marks or not.
INI is useless for anything more then simple app configuration. it cannot be used for complex data structures.
Joshua McKinney,
i aggre. XML is more about data than about formatting. i really like your idea about employing additional formating to support current text based diff-tools for xml data. xslt is nice way to achieve that. just take an example of XML Schema. in XML file with schema content formatting is even less important. ordering of type elements is ignored by xml validator but very important for text-diff. so what if we sort all types in a file by alphabetical order?
Would running an xml canonicalization transform over the xml file prior to check-in be a way to work around this issue? Naturally it will depend on the nature of the change (if it is element/attribute re-ordering then canonicalization will work). Either way it's unfortunate to be having to think about work arounds like this for something that is not out of the door yet.
Can it be that we see a XML-file as a text file and therefore want it to behave like one, while it in fact is a XML-file and should be treated different? In XML content hasn't changed if you put two spaces between two attributes instead of one, but as a text file it has. This could be seen as more of a problem of the tool performing the diff than the lack of "strictliness" from the XML editor in use (see the "could" here ;) ). My point is that it's very easy to claim that a plain text file readable to us humans should be processed as a plain text file, period. But maybe we rather should use tools to tell us if the content is really changed?
BTW: I alreday see the comments come flying :)
Comment preview