Lines of Code is a Bad Metric, Either Way

by Krishna on October 7, 2012

The Dropbox team had a post explaining their decision to use CoffeeScript instead of JavaScript and, in particular, re-writing their existing codebase in CoffeeScript. In case you are unfamiliar with CoffeeScript, it is a language that compiles down into JavaScript, so you have the option to do new development in CoffeeScript while retaining your previous code in JavaScript. It is more succinct and has language features that avoid some of the pitfalls of JavaScript. Recently, CoffeeScript has seen more competition with the release of Google’s Dart and Microsoft’s TypeScript, though each of these languages approaches things from a different angle. TypeScript tries to provide a static typing environment for programmers used to .Net, while CoffeeScript embraces the dynamic nature of JavaScript, while providing idioms that simplify programming.

The drive behind finding an alternative to JavaScript programming for convenience and power has been going on for several years and includes Java applets, server-side code that outputs JavaScript and so on. An important reason is the lack of cross-browser support for newer releases of JavaScript. Take a look at the JavaScript page on Mozilla or the features in JavaScript 1.8 and think about how many of those features you can (actually cannot!) use in a website open to the general public. Even Google Chrome, with its rapid releases, is far behind on supporting new language features. One way to get around this is the library approach, using tools such as JQuery, and Underscore. But if you are interested in having more power at the language level itself, libraries are of limited help. And CoffeeScript offers a compelling solution.

If you have already been doing significant JavaScript coding and are aware of the good and bad JS parts, CoffeeScript can take some time getting used to. Unlike a language like C# which built many functional paradigms on top of existing syntax, CoffeeScript got rid of JavaScript tokens like semi-colons, curly brackets, etc. and introduced words to replace symbols as operators, as well as significant whitespace, making the syntax look foreign to a JS programmer. This “war on semi-colons” seems suspiciously like making life easier for Ruby and Python web programmers than for existing JavaScript developers. Having said that, with the significant benefits that CoffeeScript offers over JavaScript (especially with avoiding gotchas), adjusting to the new syntax seems worth it.

I did find this from the Dropbox post simplistic:

In the process of converting, we shaved off more than 5000 lines of code, a 21% reduction. Granted, many of those lines looked like this:

[lots of lines with only brackets and semi-colons]

Regardless, fewer lines is beneficial for simple reasons — being able to fit more code into a single editor screen, for example.

Measuring reduction in code complexity is of course much harder, but we think the stats above, especially token count, are a good first-order approximation.

It is instructive that the only reason for fewer lines actually provided is being able to see more of the code (which may be important for you!) This is the reverse of the old project manager’s method of measuring programmer productivity by using lines of code written. The argument against, rightly, was that writing more LOC was simply a measure of how much you typed, and not how quality code you wrote. A strict use of LOC as a metric could introduce dysfunctional dynamics through people inflating their LOC by not refactoring their code properly and creating future maintenance problems.

But this doesn’t mean that fewer LOC automatically translates into quality code. There is a general truth that one line not written translates into one line not having potential bugs. But when you are replacing multiple lines of code with a single line, you are sometimes not eliminating those bugs. You are just bringing all potential bugs into one line. To give an example, if you replace an “if-else” statement with an “?” operator, you are reducing 5 lines of code with one statement. But you didn’t eliminate any bug. You just folded them together.

Another case is where you eliminate intermediate variables and roll them into a final statement, which looks especially neat if you can get a fluent interface going on. The problem is that each section of a chained statement can fail at runtime and so you need to have many more lines simply to ensure that the code works as intended or fails gracefully. Any serious code base will always have a significant percentage of code (and libraries) for error-checking and resilience.

Another instance is when you reduce code by moving duplicated code to common classes or methods. This seems like a sure-fire way of reducing huge chunks of code. But centralized code with global side effects can be dangerous because they can be called from any place in your code base. So you need to have a good state machine (and good scoping) to ensure that the code is not executed at inappropriate times.

People also tend to forget that a vast portion of the code is not in the code you write, but in the libraries that you use. One aspect is that the total LOC is way more than the LOC usually counted. But another aspect is that people shouldn’t be counting some of the LOC they write. For example, if you have written a bunch of code that is reusable and tested (both in test and production environments), then for all intents and purposes, that is code equivalent to an external library.

What I am trying to get at is that in the twenty thousand lines of Dropbox code, a huge percentage of code is already tested and working. Nobody even looks at much of the code, because it has a clear API. Unless there is a fundamental change to the architecture or there needs to be a significant improvement in its performance, that code won’t be touched. So why count them? The code you should count is the code that is in play. This figure should be kept small, but not only by reducing what is written, but also moving them into code libraries that can be tested and forgotten.


Comments on this entry are closed.

Previous post:

Next post: