The war of dynamically typed languages (e.g. Python, Ruby, JavaScript) versus the statically typed languages (e.g. Java, C#) of the world has raged on and on for years; it's second in age and ferocity only to the war over runtime-vs-nonruntime languages (e.g. Java vs. C++). With the latter war fading into the background and each side choosing different parts of the battlefield to entrench themselves, the former have reignited recently in a few notable skirmishes.
Cedric Beust, a fellow Googler and well known for his tireless work on TestNG, writes a blog article that points out something I think has been obvious to a lot of hardcore developers for a long time; that static typing isn't always such a bad thing.
The bottom line is actually fairly simple: nothing beats a dynamic language to write a prototype, but if you know that the code will have to be maintained and that it will be in use for several years, the initial cost of using a statically typed language is an investment that is very quickly amortized...
Fundamentally, Cedric raises what I think is one of the cornerstones that make strongly-typed languages so nice these days: refactoring. The ability to build easy syntax trees and perform predictable adjustments on those trees has been around for years; compilers have relied on what programmers these days refer to as refactoring for many many years. Compilers can look at the structure of the written code, decide what depends on what, and perform a series of transformations on that code that results in the exact same output when run, but performs more quickly. These same transforms are the cornerstone of modern refactoring - the ability for a developer to take a section of code, select it, and ask the editor to perform an operation on it - to extract it into a new class, or to inline it throughout the codebase. Rename it everywhere it appears, change its visibility, reorder its operands, whatever the developer wants, the editor can perform - because the editor has a way of verifying, with absolute and unquestionable certainty, that the code before and after the transformation is applied is syntactically sound and cannot, with a pure syntactic change, introduce semantic changes in behaviour.
This ability - the ability to essentially modify a code's structure through software, on-command, without having to alter its syntax, is what allows developers to rapidly restructure existing code in preparation for adding new features, to reorganise the structural representation without risking the alteration of semantic meaning of the actions performed in code.
One of the commenters, in opposition to his point of view, replies:
"A good test suite is way better than a compiler."
I'd posit that this is fundamentally false, or at best a bad comparison. They are fundamentally different things; there is a fundamental difference between syntactic, static analysis used to find syntactical errors - structural errors, discoverable at compile-time by compilers and other structural analysis systems - and semantic errors which exist only at the level of meaning.
As anyone who can read anything (in any human language) knows, it's fully possible to write something which is syntactically correct, and semantically garbage: perfectly formed code that does absolutely nothing, gramatically correct sentences that nonetheless say nothing intelligible. Static analysis - analysis of the structure, of the grammar - can never help you determine whether or not something is semantically correct, whether it means something. Meaning is not encoded in grammar.
Statically typed languages assert restrictions on developers; they demand that developers say not only that something exists, but demand to know what the something is, structurally. Dynamically typed languages behave as though blind to the nature of the things they represent; they never ask you what the something you're referring to is, assuming that you know best what is in it because you put it there.
The problem I have with the commenter's statement is thus: it is never the test suite's responsibility to find syntactical errors; a test suite can never help you test whether something is structurally sound. Technically, to be fair, it's the runtime exercising the code that discovers the "syntactical" errors when there is no compiler, when there is only runtime.
What the commenter is really trying to point out is a situation where the code under test expected one thing and got another - an example of a situation that can't really arise in strongly typed languages. There is no syntactical error for this in the loosely-typed language; just a semantic one.
Thus, the problem: In strong-typed languages, these kinds of problems are simple syntactical issues discoverable easily through static code analysis. A language with looser syntax converts these problems into semantic ones that can only be discovered through unit testing (or execution of the offending code).
Now, that's slightly facetious, some will say. You can, for example, do complex static code analysis to find problems in dynamic languages. But to refer back to the older war between Java and C++, there will always be optimisations which can be easily performed in C++ which cannot be easily performed in Java because too many things have to pass through the runtime; C++ will always be easier to build optimisations for because it never has to deal with the runtime's typing system and open-ended constraints (late binding, realtime type introductions, etc.).
This argument also comes full circle into the newer war: Static code analysis tools will always be stronger, easier to write, and return more useful information in statically typed languages than dynamic ones; for every leap that tools take in analysing dynamic ones, that same leap will have been taken sooner, and faster, in a strongly typed one. Whole subsets of analysis can be performed easily in strong typing environments that cannot be made easily in static typing. One problem is inherently easier than the other to solve; every leap dynamic languages make will be into a space already inhabited by their strongly-typed brethren, sometimes years beforehand. The battlefield may be new, but the same old arguments still work fine.
IMO, you may indeed end up writing more physical code to perform an action, or may be forced to do so less efficiently, but you do get a clear tradeoff - you get better static analysis, and a narrowing of the scope of semantic-issues-that-could-have-been-syntax-issues that now need additional test coverage. So to get back to the point of the comment: loosely typed languages demand larger test suites, and better test coverage, than strongly typed languages - because you now have to write test cases for something that everyone else gets for free through their syntax.
Thus, at least when considering refactoring and optimisation (and other types of structural modifications to code made without altering semantic information), strongly typed languages will always have the lead. For that reason alone, they are better suited to long-lived applications; they encode the expectations and the knowledge of the developer in the code more effectively than their loose cousins.
How's that for controversy.
Comments (1)
Hi Gregory,
I think your first quote is dead on! And the more blogs, forums, etc. the more it's reinforced.
Basically for smaller projects, with smaller teams, dynamically typed languages give you a large initial burst in speed. It's almost like hitting the nitro button on a race car.
The disadvantage is that as the project grows, the nitro quickly fades and what's left is the core engine. With statically typed languages you get a stronger core engine. The IDE's can do more. But it's much more than that, it makes coding simpler in the long run. Sure it's more verbose at first, but I suspect that within 1-2 years the core engine is going to be your main driver, and that's what ultimately going to make or break your company.
Posted by Stephane Grenier | April 30, 2008 9:13 PM
Posted on April 30, 2008 21:13