Academia in Real World Development
Frans Bouma has decided to unsubscribe from the alt.net mailing lists. I'll miss him there, I usually disagree with him, but the debates are interesting, educated and fun.
What I want to talk today is about something that he pointed out in the post:
What surprised me to no end was the total lack of any reference/debate about computer science research, papers etc. except perhaps pre/post conditions but only in the form of spec#, not in the form of CS research. Almost all the debates focused on tools and their direct techniques, not the computer science behind them. In general asking 'Why' wasn't answered with: "research has shown that..." but with replies which were pointing at techniques, tools and patterns, not the reasoning behind these tools, techniques and patterns.
I waited to respond to his post until I could formulate a coherent answer, and I think that this quote sums it up pretty well:
“Computer science education cannot make anybody an expert programmer any more than studying brushes and pigment can make somebody an expert painter.”
(Eric Raymond)
The problem is that there is a big gap between the academia and real world development.
Finding a path in a graph? Design a compiler? Analyzing an image? Choosing a search algorithm? Selecting appropriate data structure for a task?
For each of those I would head for the academia, directly. Those are technical issues, and I want the academic proofs and experience there. I want the mathematical foundation for the solution.
Designing a maintainable system? Building a usable framework? Creating the domain model?
For those I am going to not going to go to the academia. I am going to go to the real world practitioner. They guys (and gals) that have been burned in the field and learned from their mistakes.
Building a highly scalable system? Designing for scalability?
For those I am going to head to the papers by Amazon, EBay, etc. The people who are actually dealing with this complexity and can share how things break down at a high enough scale.
You can take a look at Java's API issues if you want to see what happens if you listen too closely to the academia. Hell, just take a look at SQL Server 2005's paging feature, to see just how complex an academic solution can make life.
For most real world situations, I want real world experience, because 90% of software development is not science, it is an art, and I am not interested in discussion about the chemical composition of the pigments when I examine a masterpiece.
Comments
Oren,
As they say "In theory there is no difference between theory and practice”.
However I believe that the field-burned practitioners are not always the best choice for turning to when you need some complex system. The analogue where would be – you do not normally ask experienced war veterans about the best strategy to win the war. At most the can tell you about the tactics. Even the fresh graduate from the military academy could do better with the strategy because they know the logics and have obtained systematic education.
Obviously, experienced strategist is better, when you can get that.
Rinat Abdullin
A veteran would know what doesn't work, however.
They would know to consider logistics and moral, to take into account the weather and Murphy.
Precisely.
They know “what does not work” since that’s what they’ve burnt on. That’s development tactics. And the development strategists (academicians) hold the systematic and aggregated knowledge about
“things that do work” and that the weather generally does not matter.
There is a great difference between school room strategy and real experience.
And the best strategy is composed of a lot of tactics. If they do not work in concrete, you have lost.
I agree that you’d better have both the worlds to win.
And that in order to maximize the probability of efficiently winning the development of framework or complex maintainable system, one would need more than just real world practitioners.
Software Engineering is an important field of study, and has been for long. Pretty much everything you do as a programmer today has been invented by the academia at some point (and sometimes then later “reinvented” in the field). The real problem, I’d say, is that so very many of today's programmers are ignorant and keep insisting on reinventing the squared wheel (often foolishly believing they actually invented something), instead of acquiring already existing knowledge from the academia. It’s almost as crazy as if 90% of all the medical students would conduct primitive experiments on patients in order to learn what works and what don’t. (Killing or hurting a computer program isn’t nearly as bad of course, but it’s still quite unnecessary, not to say expensive.)
What academic CS texts would you all recomend to a young programmer?
Argument based on opinion is bad kung-fu; alt.net is full of fanboys (not all though) that use this faulty logic to formulate their arguments about why a particular framework, idiom, or whatever is the best and why another isn't. I admit I read many of their posts....mostly its for petty amusement (how many logical fallacies can you possibly fit in a single post?). Its kind of like reading a tabloid newspaper.
Real world practitioners posses a skill set thats different than pure academia...in their genre the are the experts. However, no paper that bears my name would ever cite a reference to a post unless the purpose was purely to express an opinionated view....not to cite a fact; for that I would cite a reference based on scientific fact (academia) and from a peer respected source.
CS does matter. I realize that after I have a 7+ years real time development experience. Without a solid foundation of understanding the language, algorithm and even data structure, I cannot believe anyone can create a maintenable application. I know those basic stuff is not as "COOL" as Agile practice. Right now, powerful tool, such as Visual Studio, is so easy to create an functional application. It make the bar much lower to become a programmer who can skip solid CS education. I may not agree with Frans all of the time, but I think he made a very good and important point here.
I am a self taught programmer. It's been ten years since I started and I consider myself considerably accomplished. However, I recognize there may be holes in my knowledge due to the fact that I never studied CS. However, I don't know what these holes are. So, for those who know about these things, in what areas do you find that most non-CS degree programmers fall short? What texts could they read to help them improve their programming in a practical sense?
Having a background in CS does imply that you cannot be an excellent developer; it just ensures that you should have been exposed to the following: data structures, machine language, cpu architecture, algorithms, compilers and OS. Good developers make a concerted effort to separate themselves from the pack.
I have worked with remarkable developers that have no formal education and I have worked with horrible developers with amazing credentials. What separates the two seems to be related to that amount of work they (the good) put into continuous learning and striving to become at they're craft.
To be honest, I would say the thing I see missing (or inadequate) in most developers are: lack of problem solving skills, laziness, lack of passion, non-professionalism, lack of type system fundamentals, and the inability to do anything but code by google. It makes me sick when people complain about having to keep their skillset up-to-date...or read for that matter.
"For most real world situations, I want real world experience, because 90% of software development is not science, it is an art, and I am not interested in discussion about the chemical composition of the pigments when I examine a masterpiece."
That's cyclic reasoning: the result seems also to be the cause.
Let me give you an example, how wrong your assumption is.
One of my best friends will soon start his M.Sc graduation research project. It's about finding formal definitions for ways to refactor code by AST graph (it's actually a tree) transformations. And not on the AST from C#, but on the AST from their own parser, which can parse ANY language into an AST (and back).
I'm not talking about refactorings like rename a field, but refactorings like 'find clones in my code and reformulate them into a generic method', or 'reform this switch/case with a strategy pattern implementation'.
The project this research project is part of is a long running project at the CWI of the university of Amsterdam where they are researching ways to transform 10 million lines of C code of ASML into another language by AST transformations. What they're doing is finding ways to understand code so they can transform it. And they make good progress.
What does this mean? Well, it will in the end affect how YOU write software. Big time.
Closing your eyes for this kind of stuff is IMHO naive: it means you believe that knowledge learned in the editor means wisdom of how software should be build. But that's not the case. It LOOKS like you don't need the academics but that's not true. The CWI defined the service bus in the early nineties, before 'SOA' even existed. But because most people in our craft ignore academics, it seems things like SOA are invented in the field by some clever bookwriter. :)
"However, I recognize there may be holes in my knowledge due to the fact that I never studied CS. However, I don't know what these holes are. So, for those who know about these things, in what areas do you find that most non-CS degree programmers fall short? What texts could they read to help them improve their programming in a practical sense?"
Most non-CS schooled developers have a bag of tricks to overcome daily problems. These tricks often work OK. The thing is that if you are faced with a problem to overcome, you shouldn't fall for the first thing that pops in your mind and which is very tailored towards the specific situation at hand: you should take a step back and analyse if what you're facing is unique or a form of a general problem. 10 to 1 it's the latter. And thus it has a general solution, with a general algorithm and general datastructures.
The advantages of that is that you can re-use those algorithms and datastructures, because they're generic: A fibonacci heap works with any datastructure you dump into it. Developers accept this for a sort algorithm like quicksort, but they often fail to understand that for way more algorithms this is the same.
What I always advice to people who haven't got any formal training in CS but have spend years in writing software is to read one of the books of Sedgewick or Knutt. These books describe general algorithms and with code. The code should be seen as illustration how to implement such an algorithm. In general these books teach you how things are solved fundamentally, with generic ideas.
the beauty is: 1) you work with proven algorithms, so all you could fail in is a bug in the implementation (which are often easy to spot) but not in the algorithm! 2) you can re-use implementations of these algorithms as they're generic.
Frans,
A few days ago I built an analyzer to tell me which fields I am using in incoming message, in order to dynamically optimize the load of the server based on the current usage.
I am well familiar with the concepts that you are talking about. A generic AST seems to me like a bad idea, you can't express the same concepts of a functional language with an AST meant for a procedural or OO one.
Hell, compare the AST of Java and C#, two distinctly similar languages. How do you translate anonymous methods and classes.
Regardless of this meta AST, once you have things in AST forms, you can do transformations on it. You mention 10 millions LOC in C.
That will make it much harder, since you need to take into account all the tricks that C/C++ programmers can and will play.
Interesting topic for research, most certainly.
Switch/case -> strategy is a fairly simple refactoring, btw.
Frans,
Just to be clear. Real world development should be based on a solid foundation.
Understanding data structures and algorithms is key to creating good software.
If you can't understand those, you are going to have a hard time creating good solutions.
"I am well familiar with the concepts that you are talking about. A generic AST seems to me like a bad idea, you can't express the same concepts of a functional language with an AST meant for a procedural or OO one.
Hell, compare the AST of Java and C#, two distinctly similar languages. How do you translate anonymous methods and classes."
I don't think you're familiar with the concepts as much as you think you do if I read the above statements. An AST is just a tree. However if you can parse any language into an AST, you can thus do refactorings across languages. Which they can do, for example refactor code across a complete call chain from java to python for example. All you need is grammar x -> generic grammar and you're done. Note: I'm not part of that project nor have I been so I don't know the inner details about what they're doing, however once I saw it, it was amazing. I suspect that they're doing a more generic version of what intentional software is doing as well, namely parse languages to a generic language into an AST and then do operations on that AST and then convert that AST back into a language.
"Switch/case -> strategy is a fairly simple refactoring, btw."
and this generic way of refactoring you proposing works in _all _ cases? We're not talking about the steps you'll take to refactor it by hand, knowing the slices of your program you'll affect. We're talking about a way to refactor sourcecode in a way that it is understood by the refactoring engine, so it can do it without any help. If that's so simple, please write a paper about it so we all benefit from it, but trust me, it's not that simple. :)
The C code is one real life example of the theory they're after: soon, there won't be enough software engineers to maintain all the software across the world. We need more advanced tools to maintain code which is perhaps very old and no-one knows how it works. With sophisticated tools which are guaranteed to work because they're based on proven theory, one should be able to convert a big pile of code in language ABC into another language which matches more closely the interests of the current group of engineers employed. And, for example you can weed out clones etc.
I'd like to point you to a paper which is related to this and then I'll stop bugging you about it ;). Skip the formal formulae and graphs, but I think it will interest you :) (I linked to a version which isn't behind the ACM wall)
http://prog.vub.ac.be/~ttourwe/articles/icsm2004.pdf
I forgot the URL to the environment I was talking about
http://www.cwi.nl/htbin/sen1/twiki/bin/view/Meta-Environment
They've a couple of videos where you can see how the program text is represented in the tree.
I am usually uncomfortable around generic Xyz.
Either is it the lowest common denominator, which make it PITA to work with, or it is so board as to be useless.
The example that I gave make the case, I think. Translating between Java and C# when using anonymous classes and anonymous delegates.
Um, you need to capture the current context of the case clause, translate that to the Execute() method, and you are done.
Nothing really new or exciting here. It is taking the same semantic analysis that compiler do for closures and stick the new Genclass(context).Execute()
in the right place.
Doing this across all usages of particular switch (say, an enum) would be more challenging, but not drastically so.
Allow me to be doubtful of the usability of such a scenario.
It is not a programming language issue. It is usually an environment issue.
As a simple example, let us say I have this C code that I want to convert to Java:
UBTYE b = c << 3;
Can you do this safely?
I need to consider signing issues, for a start.
what about libraries? Semantics with those libraries?
Sorry, doing the language translation is the easy part here. Doing translation that would actually work would be much harder.
"What academic CS texts would you all recomend to a young programmer?"
Start with one of the best books ever written, fiction or non-fiction:
Structure and Interpretation of Computer Programs - 2nd Edition (MIT Electrical Engineering and Computer Science) (Hardcover)
by Harold Abelson (Author), Gerald Jay Sussman (Author)
None other than Paul Graham (read his essay Hackers and Painters http://www.paulgraham.com/hp.html which is relevant to the current discussion) gives it 5 stars.
Victor.
This is a very unhappy debate you're having here. There's value in both CS and real world-based approaches, they just have their strenghts on different levels. Being able to find patterns in problems that have already been solved by CS is certainly a good thing, but not all problems are algorithmical.
Where things get complex, CS approaches leave the path of exact science and start getting empirical. It's like moving from accounting to economics: no matter how much dead wood you produce, this stuff is to complex to be nailed down, and just like practitioners, different scientists will reach different, even opposed views. Important decisions are often informed by scientific models, but ultimately made by practitioners.
So we need both. Whether a certain problem requires understanding of CS, or just knowledge of the tools that CS-aware people produced, is not something that you can decide in general. Some problems benefit more from CS than others. Many real-world problems are best solved using real-world experience. If we discuss this on a general level, there's not much chance of any agreement. It will just go on forever, like ORM vs. stored procedures, dynamic vs. statically typed languages etc.
It's a good idea to involve CS experts in many discussions, but in some fields CS, like any science, is just as opinionated and trend-driven as today's blogosphere talk. Show me a mathematical analysis of an algorithm's performance, and I'll believe it. Show me a scientific study about what works in practice, and I'll immediately question your background, your methods, your values and your experiences.
Stefan, very good points.
This discussion is as if two people stand on the same bridge, and neither passes the other.
I see good points both in Frans and in Ayendes arguments. You can not neglect one or the other, both are needed and good.
Why are we subclassing people? If you're in the situation where you need to get a solution out, you work out how to do it and get it out. If you're in a situation where you need to research for a paper, you sit down and nut it out, then write the paper. This may be oversimplification, but who here doesn't back themselves under all conditions?? And if you don't want to do that, then why are you arguing over which is best? because you'll probably never try anything different to what you're doing now anyway.
We're also assuming that at some point in the future our mindset won't shift. Real world experience will back up anyone who goes into a lab to do research, academic experience will assist anyone who goes out into the world and gets a contract...
That being said, if you want to do anything well, you need to learn the required techniques. Same is true in surfing, singing, guitar, programming... The more you practise the techniques, the better you'll get at the art. Some pick it up quicker than others, some need to work with it, some study it. Who cares, we all eventually adapt to wherever our passions lie.
because we have not yet learned to value composition over inheritance?
" > Why are we subclassing people?
because we have not yet learned to value composition over inheritance?"
Yeah...management by Frankenstein. "Team - today I will compose the perfect programmer! Oren, I'm gonna need your fingers...hey Frans, going anywhere with those eyes?"
/Mats
... and who said you need two complete bodies for pair programming? an extra head oughta do. now here we could use some serious science, frans!
Do you have idea how many times I encountered code where people had used bubble sort? Working perfectly for the 20-50 items they were sorting while developing, and then getting catastrophic crashing when the items to sort (i.e. in a web GUI using vbscript) were already in the tens of thousands ..
Someone with some clue about CS would have never thought to use bubble sort!
Comment preview