Code Size in Software Projects

by Krishna on December 27, 2007

In a recent post, Steve Yegge talked about a software game he had written and how the code size (500,000 lines of code) had become too big for him to manage. He had previously written the application in Java, and now has decided to rewrite the application in Rhino to reduce the code size to around 150,000 lines.

For managers, the topic of code size has many implications for software projects in the areas of software quality, resource allocation and effort. Let us look at these:

  1. Code is written to create product functionality. However, writing code requires resources and is expensive. The less code that can be written to accomplish the same functionality, the better in terms of cost. A concise language is better than a verbose one.
  2. The more lines written by a developer, the greater the potential for bugs. This is particularly true of boiler-plate code where repetition can introduce typos and other mistakes.
  3. Fewer lines of code does not always mean less time for overall development. Sometimes, smaller code can be difficult to read and debug, resulting in greater costs for testing, debugging and maintenance.
  4. If software developers can reuse code or functionality in tested libraries, they can save a lot of development time. The best choice is the language framework itself, followed closely by open source libraries which have business-friendly licenses.
  5. Software developers are most efficient in their own languages and tools. Although another language may be more concise, they may take more time to effectively use it, or make mistakes, resulting in greater costs.

The ideal goal for a technology choice for a software project would be a language known to all developers on the team, and which has the best reusable libraries and expressive syntax. The reduced code size and development effort translates into tangible benefits (less cost, less time, greater quality).

To be most effective, the decision should be made at the start of the project. The project should be staffed with the best people (available to you) in that technology. You should purchase the best tools that you can afford for working with the specific technology.

Code reviews, when done right, are very effective in identifying copy-and-paste or inefficient code. The people participating in the code reviews can suggest different ideas for making the code better, such as redesigning a class. By sharing their ideas, the developers become increasingly knowledgeable in code reduction techniques and their work automatically improves.

Now, what happens if you never paid attention to the size of your code base for a long period, or worse, you have just been handed a large existing project? Before you proceed, the first thing to understand is that your primary goal is not making the code size smaller or trying to understand the code base. Your foremost objective is to enhance the functionality of the application.

With that in mind, the first question is: How much of the code base do you need to be familiar with to enhance the functionality? If you don’t need to make any changes to some portions of the code, you could work as if their source code never existed, and just link to them.

Secondly, do you completely know the existing functionality and dependencies of the code you will be modifying? Usually, the existing code base will have some convoluted code written for bug fixes and change requests. Sometimes, a particular line of code may affect other modules.

In this situation, refactoring code makes it much easier to make changes to the code without affecting functionality. Refactoring may result in larger number of lines, but a significant portion of them can be isolated away and never looked at again.

So, when you inherit a large code base, your objective should be to treat as much of the code base as a black box, never to be tinkered with. This will reduce the code that you need to learn, understand and modify. And most importantly, your goal of enhancing the application functionality will be met.

Still, developers will continue to worry about the huge code size that has now been isolated, usually citing performance, memory needs and maintainability. In several instances, there is no actual evidence that there is a negative impact detrimental to the user and it is an assumption by the developer.

However, if any part of the isolated code base does impact on performance, and is causing maintenance issues, then its status should now be upgraded to the “working source code” and become a candidate for refactoring (or in extreme case, rewriting).

So, what about Yegge? I think he is making a huge mistake by committing vast amounts of his productive time to rewriting hundreds of thousands of lines of code. He could be adding more functionality to the application by spending significantly less time to understand those pieces. He could release the application as open source and start work on creating some other application.

A good developer has a penchant for order and organization. Unfortunately, it can be taken to excess at the cost of useful work. Keeping your house clean is a good thing. It just doesn’t make sense to tear it down when you cannot remember where you kept your photo albums.

Comments on this entry are closed.

Previous post:

Next post: