PSR’s and Legibility

A few years ago I was working through a large code base, wheedling my way through a 900 line function in a 3,000 line file of legacy procedural code, put my head in my hands in frustration and stared down at the keyboard. Over the years of beating up a code base that was difficult to read and even more difficult to reason about, I’d seen the Shift key had been worn off to express exactly what I was feeling. I had to laugh, the rest of the day went a lot better.

In this post I’m going to discuss (most of) the Extended Coding Style Guide as it relates to code legibility and general formatting and how it helps me address the Big Ball of Mud. I’m going to intentionally skip a lot of the recommendations to stick to the topic, how adhering to standards improves the legibility of my code and in the long run makes me a (little?) better coder. It has helped me to start thinking differently about how I code to improve both legibility and coding skills. The inset quote says it better than I ever could.

The intent of this specification is to reduce cognitive friction when scanning code from different authors. It does so by enumerating a shared set of rules and expectations about how to format PHP code.
PHP-Fig PSR-2 & 12

Have a look at this "before" code . I put it together as an example of how we can take loosely formatted PHP code and improve legibility. (I hope it gave you a chuckle!) There are a bazillion things wrong with this code and doesn’t do anything useful, but we’re going to refactor it to demonstrate how it makes the code more legible and maintainable in the long run.

What is so bad about it? It works and runs (see it here,) it even validates, so what’s the problem? There are many, but were going to stick to the topic, only meandering a little for other improvements.

Here is the list of items I’ll address with this code, and quotes from the PSR’s:

  • There MUST NOT be a hard limit on line length. The soft limit on line length MUST be 120 characters. This one has caused arguments in every company I’ve ever worked with. The arguments are that long lines create less lines in a file, dude you need a bigger monitor, we no longer live in the 80’s, I’d rather see it this way or that, and more. The bottom line for me is the shorter the lines, the faster I can scan. I agree – set the IDE margin at 120 characters, stick to it whether it’s comments or code. If I have code that exceeds 120 characters, it’s telling me I have a bigger problem and need to re-think the code.
    "The product $id widget is lightweight and compatible. We′re not yet sure what it′s compatible with, we′re hoping you could answer that for us.",
  • Code MUST use an indent of 4 spaces for each indent level, and MUST NOT use tabs for indenting. The "before" example uses two spaces for indenting. Seems trivial, right? IMO the larger the indent, the easier the scan and differentiation between code blocks. It gets way worse when you are using version control and there is dissonance about how to format code. A one line code change can show hundreds or thousands of whitespace-only line changes in a git diff *.
  • All PHP reserved keywords and types MUST be in lower case. Class names MUST be declared in StudlyCaps. Class constants MUST be declared in all upper case with underscore separators. Method names MUST be declared in camelCase(). I’m lumping these in with how I name variables and conventions for file names as well, another topic of heated debate, which is explicitly avoided by the PSR’s because a consensus cannot be reached. Thinking logically, if class names are in StudlyCaps and method names are in camelCase, it stands to reason that variables like $MyVariable or $myVariable can only lead to visual confusion. For that reason I always prefer to use $lower_case for variables. The same is true when it comes to file naming. If a file contains a class and only a class, it makes sense to name the class file with StudlyCaps and "regular PHP files" in lower case. An immediate glance at the file structure tells you what is a class and what is not (sans the deprecated .class in the file name.)
    
      class psr_sample_before{
       // ...
         function Get_my_product ($productId){
    
  • Any closing brace MUST NOT be followed by any comment or statement on the same line. Visibility MUST be declared on all methods. Method names MUST NOT be prefixed with a single underscore to indicate protected or private visibility. Method and function names MUST NOT be declared with space after the method name. The opening brace MUST go on its own line, and the closing brace MUST go on the next line following the body. There MUST NOT be a space after the opening parenthesis, and there MUST NOT be a space before the closing parenthesis. The PSR also mentions the difference in logic blocks in which the opening brace does go on the same line. This is one of the subtle visual queues that make code more legible, it’s easier to spot logic blocks apart from functions/methods.
    <?php
    function _get_product ($id){ // Get the product data
        if (! (intval($id) < 0))
        { // bail if it′s not a valid id
            return ["Invalid id!"];
        }
    }
    

    The underscore used to be a "visual queue" for a private method, but I’ve seen it used (lots) in global functions in a file, where it makes no sense.
  • You can use anti-patterns to essentially obfuscate the code so much … that nobody else can maintain it.
    "The" Tech Lead, Software Anti-Patterns
    In the argument list, there MUST NOT be a space before each comma, and there MUST be one space after each comma. Method and function arguments with default values MUST go at the end of the argument list. Argument lists MAY be split across multiple lines, where each subsequent line is indented once. When doing so, the first item in the list MUST be on the next line, and there MUST be only one argument per line. We’re also going to address two other issues that are discussed in Down the Rabbit Hole. The first is code obfuscation. We can figure out that $P means price (but it could mean product, or position, or whatever,) but we shouldn’t have to. It should be immediately obvious that $price is the product price. The second fix is the number of parameters in a method/function. There is no standard for this, but I learned from other coders that if your params exceed a certain number, you have larger problems, mostly that your method doesn’t follow the Single Responsibility Principle and is doing too much. For myself if I see more than three params, I begin to question if I should re-think the code design. If it turns out I really do need 20 params, it’s time to look at an array.
    
      list($T,$D,$P,$PO,$CL,$S,$R,$I,$C,$css) = $this->_get_product($productId);
      ...
      function _productBody(
        $T,$D,$P,$PO,$CL,
        $S,$R,$I,$C
      ){
      

The After Code

Here is the final revised code after applying PSR standards and (quite) a few other changes (that I’m sure you don’t want me to ramble about.) It still has quite a few issues and could stand more extraction, and is still useless, but you can see here it’s the same as the original with a few textual changes due to flexibility of the refactor. Also contrary to my statements about comments in Down the Rabbit Hole, the comments are more verbose due to the context of this article. Aside from coding to standards the directives applied,

  • Effort was applied to Separation of Concerns, moving data, business logic, and rendering into different areas (although they should actually be different objects.)
  • Related to this, an effort to uncouple code. There are many one-line functions, this may seem excessive but it allows them to be overriden in extensions and removes the dependency from the calling functions ("O" in SOLID.)
  • The only injected params are in the constructor.
  • All functions except one are short, and are unit testable.
  • The data is now handled generically, putting control of the values in the hands of the data, not the script. This allows us to move to the "O" in SOLID, open for extension, closed for modification.
  • Usage of the guard pattern and other implementations of early return. If you don’t need to, stop processing.
  • Eliminate unnecessary variable instantiation. Read more about this.
  • You will notice not a single "else." See how that improves code.

Which would you rather maintain?

If the Functionality is the Same, Why Bother?

If the front end is the same and it doesn’t change our visits or sales, what benefit does it have? Simply put, a business has to look at the costs and long-term ROI of the training curve and code base maintenance. Developers today have cut their teeth on the Gang of Four and are well versed in standards and frameworks written to them. They can jump right into a Laravel, Drupal, or Symphony framework and be immediately productive. With many legacy code bases, figuring out how all the moving parts fit together can take months.

Too often refactoring or reformatting a code base is considered an all-or-nothing task, it doesn’t have to be that way. It also doesn’t have to break the application or the company bank, or create more work in code reviews. (Aside: time budget for format/whitespace only pull/merge requests.) The reduced training curve and less long term maintenance will more than pay for itself.

* You can always hide white space changes when reviewing code, but when you start seeing this it is a team management issue that should be addressed.