Down the Rabbit Hole | Identify Pain Points in Legacy Code

We’ve all been there. Working with a legacy code base that had overgrown with time, no documentation, thousands of lines of code and you have one simple task, find where a value is set and change it. As you dig through volumes of code, winding your way through the logic and returns that lead you through a multitude of files, figuring out what the original developer was thinking, only to come to a dead end. You find this path wasn’t where you should have gone at all, go back to the entry point and look again. As things get curiouser and curiouser, you resist the urge to drive nails into your forehead and forge on.

Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.

John Woods (unconfirmed)

This post is not about rantings of bad code (much,) it is more about how the rabbit holes we often find ourselves in are an opportunity to see things differently, assimilate another way of thinking through the code, and most importantly contribute by improving what’s there. As I dig through the code, it gives me an opportunity to view it through the mind of the original coder, to get into their head and digest their way of thinking. I have learned to avoid a couple things that are road blocks to that cognitive process.

It’s important to note I have been guilty of everything in this post, even defended my position in many cases, but eventually accepted them for the mistakes they are and learned from them.


    $arr = [
     "foo" => "bar",
     "baz" => "boz"
    ];
    extract($arr);

Code Obfuscation

Avoid extract()

Given the array shown at the right, extract() converts $arr to $foo and $baz with values bar and boz. One has to stop and think where $foo and $baz are set, and shouldn’t have to, especially if these variables are used 50 lines down in a 100 line function (the two often go together.) PhpStorm even hates it, it looks like this.

Some will say this is more of a symptom of IDE dependence than obfuscation, which has a little truth to it (see Why I Love My IDE). But if $foo is discovered 60, 100 lines down in a function, it’s just one more thing to reason about: Where does $foo come from?

The second reason I avoid extract() is that you now have two allocations in memory for the same value, one in the original array and one in the variable created by extract(). This may seem trivial, but multiply it by all the other (potentially thousands of) little inefficiencies in the code base, and you’re asking the business manager to crack their wallets for more server resources.

Anonymizing Elements (That Should Be Obvious)

This is an unintentional side effect of making the code "shorter" by using abbreviations or single character tokens for variables, functions/methods, array members, and other named elements. In a huge code base it takes a lot of time to digest functionality, and this gets even worse if the tokens actually represent something else. A developer reading through the code has to learn a new language, a mapping of token to actual intended value, on top of the app language.


   function someFunction($p, $opt, $i, $d) {
      // ...
   }

We can figure out that $p is actually $product_id and not a product option or price, but we shouldn’t have to. $product_id tells us immediately what the value represents. It gets worse if the intent changes, but the variable name does not (I have seen this.) When $product_id is no longer an integer value but is used as a reference to a class object, things get even more difficult to digest.

<?php
   function someFunction($some_var)
   {
     $is_okay = false;

     if (! empty ($somevar)) {
        $is_okay = true;
     }

     // Imagine 100 lines of code processed
     // here that has little to do with
     // the value of $is_okay

     return $is_okay;
   }

Avoiding the Single Point of Return Paradigm

Back in the COBOL/FORTRAN days, it was entirely possible to jump into the middle of a program and begin executing, which could skip over variables instantiated earlier and cause all sorts of side effects. If returned early, expected later mutations on a value would be missed. The programming rule that came from this is "single point of entry, single point of return." For some reason, I see a lot of code that that clings to this paradigm, but in most modern languages it is no longer a valid rule.

Imagine the simple example at the left, but with an additional 100 lines of code to process and $is_okay is carried all the way to the end of the function/method in the interest of upholding single point of return. We spend unnecessary resources processing and memory going all the way down though the logic to return the value was set at line 7. Why do the extra processing?

<?php
   function someFunction($some_var)
   {
     if (! empty ($somevar)) {
        return true;
     }

     // Same 100 lines of code, but if true,
     // no resources are wasted processing it.

     return false;
   }

Not only have we saved nanoseconds of processing whatever comes after, we have eliminated yet another variable that holds space in memory. This concept is called early return and often manifests itself as the guard pattern (If a value needed to process a function is not present, no reason to continue, return now.) The biggest argument I’ve heard against early return is that even if this value is true, we still need to process the other 100 lines of code. Doesn’t that say since those those hundred lines of code have no bearing on the return value, they should be executed elsewhere and this function is violating SRP by doing too much?

Unnecessary Variable Instantiation

The previous code sample demonstrates another item I try to eliminate. Note how there is no variable $is_okay in the second example. If you don’t need it, why do it? "I was thinking I may modify it and it’s handy to have it in a variable if I do." When I get to that point, fine, take the time to store it in a variable, but if it’s not needed, simplify the code and eliminate it.

Another developer once described this to me in a way that makes perfect sense. Look at every line of code you write, every variable you create, as a "tick." With every tick, PHP has to allocate memory for it. A large portion of resource usage can be eliminated by just removing all the ticks you don’t need. It gets down to the old cliché: "because you can doesn’t mean you should." Am I doing it this way just because that’s how I like to do it?

Doing too Many Things

My queues that an object, function, method, or even a static page is doing too much:

Functions/methods with too many params. I limit myself to three, four if I’m feeling lazy. If I need 20 params in a method it’s doing too much.
Long functions/methods. There is no established standard for method length. I try to keep my methods around 10 lines or so, if it goes beyond that I start getting nervous. Twitchy even.
Deeply nested control structures. If a code block is multiple levels of nested for/foreach/switch blocks, peppered with arbitrary if/else blocks, it’s a sure sign most of this can be extracted into their own single purpose objects.
Use of "else." Heresy you say! Else is one of the four pillars of programming! See I am a Recovered Else Addict.
Any mixture of logic, data, or presentation.
If I have a database query, operate on that query with business logic, then output something all in one object, it’s doing too much.

I avoid all of the above to 1) allow the code to be unit-testable and reusable, 2) follow SRP (Single Responsibility Principle) as religiously as possible, 3) isolate functionality so when bugs arise, they are more easily found, 4) allow my objects to be extendable so methods can be overridden, and 5) to make the logic as legible as possible even at the cost of being more verbose.

For years my thinking was, "but it is doing one thing. It’s getting everything I need for a product." I eventually recognized that "one thing" is actually a collection of things. I also got really weary of one little change breaking something I never expected due to all the code being tightly coupled.

See I am a Recovered Else Addict for examples of how I begin.

The proper use of comments is to compensate for our failure to express ourselves in code.

Redundant comments are just places to collect lies and misinformation.

Robert C. Martin

Out of Date Comments

I am forever mindful of both of these quotes by the author of Clean Code. Code should be so legible and comprehensible that it does not need supplementation by comments. If I have to explain what it’s doing in the comment docblock, I need take another look at my logic. The most useful purpose for docblocks are that a documenting software can parse them for clean documentation pages.

When I’m down the rabbit hole, I often turn to the comments (begging) for guidance. Desperate times call for desperate measures. Often they are sorely out of date, often with missing or outdated @param or @return statements. PhpStorm complains about this too!

Updating the comments is another aspect of acute attention to detail, without it this is how "comments quickly become lies." It’s a vital part of the code environment so I keep them current and updated in case my rabbit hole becomes curiouser and curiouser for anyone reading the code.

These are the highlights of what slows down development time, what I avoid in new code, and attempt to correct in existing code. I’ve learned a lot from the the rabbit holes I’ve gone down, and hopefully have made the journey a little easier for developers who follow.