What I Learned From Perl

Early in 1995 I began scripting in Perl and like all first loves, what that experience taught me remains with me to this day. Many of those lessons are barely abstract concepts to many developers who have never coded in Perl. Some of the tasks we perform daily are handled by built in functions or filters, in Perl they just don’t exist. You have to create them.

If any ideology is so serious that you can’t have fun while you’re doing it, it’s probably too serious.
Larry Wall
  • Stay in Good Humor. I’ve read the Camel Book, twice in the first year, and I browse back through it when needed. One of the things you get reading between the lines as Larry Wall describes his language is you have to stop taking the world so seriously. Programming and coding can be very frustrating in general, there are lots of rabbit holes and some can drive you insane. Yes, it’s clients’ time and money and yes it is highly valued, that is not what it means. See the inset, I’ll only add the addendum that if you’re not having fun with what you are doing, you should be doing something else.
  • User Input. In PHP, we have all sorts of filters and widgets to cleanse input data to our code. There are (almost) an industry standard set of rules to apply when accepting input and it’s something we don’t have to reason about nearly as much. In Perl, you don’t have those tools. You have to build it all from scratch.*
  • Learning how input is read. I was so relieved to discover the CGI module in the early days, I no longer had to do stuff like this:
    
    
    sub query {
    
        my ($method,$query_string,$pair,%query_hash);
        $method = $ENV{'REQUEST_METHOD'};
    
        $query_string = $ENV{'QUERY_STRING'} if $method eq 'GET';
        $query_string = <STDIN> if $method eq 'POST';
    
        return undef unless $query_string;
    
        foreach $pair (split(/&/, $query_string)) {
            ($_qsname, $_qsvalue) = split(/=/, $pair);
            $_qsname  =~ s/[\+]/ /g;
            $_qsname  =~ s/%([\da-f]{2})/pack("C",hex($1))/ieg;
            $_qsvalue =~ s/[\+]/ /g;
            $_qsvalue =~ s/%([\da-f]{2})/pack("C",hex($1))/ieg;
            $query_hash{$_qsname} = $_qsvalue;
        }
    
        return %query_hash;
    }
    
    Once I discovered CGI.pm, it handled all the grunt work for me, and a lot of things that were messy to implement by hand like file uploads and multipart-form data. But have a look at the code sample. If gives you an insight as to what goes on under the hood with the something like PHP’s global $_POST variable. Perl taught me invaluable lessons about how servers process data sent through the CGI interface.
  • Treat all user input like the poison it is. Even with CGI.pm, the data is still not safe. The CGI module and the above code are an open book, post anything to it and it will do exactly what you tell it to, including buffer overruns and various injection attacks. A few lessons Perl taught me:
    • Accept what you need and throw everything else away. This eliminates a lot of the user input issues. "throw anything away" includes filtering your data for size. There is no inherent settings such as upload_max_size or max_post_size in Perl, you have to implement it or use a module that does. Perl taught me how to do these without the safety nets of language settings.
    •       
               #!/usr/bin/perl -wTd
            
          
      Use the built in tools. Among them are the -w show warnings and -T tainted mode switches, which show warnings and prevent user input from (mostly) executing anything insecure. I say mostly because -T only prevents user input from passing through to eval, shell, or in any commands that affect external files and processes. In most cases from the web, your data should be clean and used only as intended in the first place. There is also a helpful -d option for debugging.
    • Client side validation is not enough. "Back in the day" Javascript client side scripting was still unstable, it even had a different name – LiveScript – and was largely only supported by Netscape Navigator. The go-to was always to validate data server-side, but when Javascript support grew and became a strong force in the Internet security still demanded that data be cleansed and validated server-side. It’s easy enough to disable Javascript and bypass all of it’s validation. A lot of developers today take the position it’s doing double work, and that the application will fail if Javascript is disabled. While that has some truth to it in Javascript-driven applications, I maintain and test server-side validation as a part of my development cycle.
    • Get good (better) with regexp. The strongest tool in cleansing input in Perl is to use regular expressions and follow the above rule, "throw everything else away." If you expect an integer, build a regex to only accept integers; word characters, only word characters. I learned that if I can pass <script>alert("Hello World");</script> into a form and on the response I get the alert . . . . I had a lot more to learn. So I did. Regular expressions were the foundation of input security at the time, and although I am still no regex guru and humbled by them at times, they have become an integral part of my development cycle.**
    • Data type mapping. It didn’t take long filtering each input field, piecemealing my way through their values and creating regular expressions to adequately filter them, before I discovered the value of data mapping. In this case it applies to form input, but the concept permeates all of my programming today. All I needed to do is detect a form input type and filter it through a subroutine (the Perl word for function, although Perl also has functions) that filtered that type. It may seem trivial, but this was a huge revelation for me that affects how I manage template and other data. Take, for example, this (oversimplified) logic block, which can be easily understood in any other language.
      
      
      # $qs is a global. Yuk, but go with it.
      
      sub useless_sub {
      
          $value = '';
      
          if ($qs{'input'} eq 'a') {
              $value = 'it is an A.';
          } elsif ($qs{'input'} eq 'b') {
              $value = 'It is a B';
          } elsif ($qs{'input'} eq 'c') {
              $value = 'It is a C.';
          } else  {
              $value = 'It is not defined';
          }
      
          return $value;
      }
      
      By discovering mapping, I now tend to do things like this.
      
      
      # $qs is still a global. Yuk X 2.
      
      sub useless_sub {
      
          %values = (
            'a' => 'It is an A.',
            'b' => 'It is a B.',
            'c' => 'It is a C.',
          );
      
          return defined($values{$qs{$name}}) ?
             values{$qs{$name}} :
             'It is not defined.'
      }
      
      By mapping to the array, I’ve eliminated logic overhead. It may seem trivial, but mapping data to values permeates most of my programming, avoids a lot of elaborate logic, and leads to a policy of separating business logic from content. I didn’t know it at the time, but it was my first step toward OOP.
  • D.R.Y. (and didn’t realize it.) The first time I copied a piece of code from one file to the another, I asked, "why did I just do that?" I was adamant that there should be no reason at all to duplicate a piece of code in multiple locations in the same app. I began to build reusable subroutines and access them from any areas of the app, and without realizing it was practicing D.R.Y, Don’t Repeat Yourself.
  • Share your code with peers. This one was a particularly tough one with me, being a newbie to programming and a lot of experienced developers are quite brutal. By sharing the code, taking the heat, and actually listening to what they are telling me, it makes my code better and I learn. Perl taught me the more I know, the less I really do know, and more I have to learn.
    I think it’s a new feature. Don’t tell anyone it was an accident.
    Larry Wall
  • Database Interaction. It wasn’t long until I needed to learn databases, and my go-to has always been mySQL, although over the years I have worked with many other database types. This brought a whole new set of problems to solve in terms of security and best practices. Learning database interaction in Perl caused me to go back over what I thought I already knew, and realize I was doing other things incorrectly.
  • Bad Habits. I’m not going to list them all, but you can imagine. I think I’ve made about every mistake possible and came up with some new ones I’m sure no one has ever thought of. The worst are the ones that became easy and natural for me, the really bad habits. When I look at old code I wrote, I have to wonder what the heck I was thinking. Among them are variable variables, abbreviated token naming like $p instead of $process_id, creating subroutines/functions that are doing far too many things, and coding in super long lines. I partially blame Perl for that, there is a large belief that one-liners are cool – but it is supposed to mean one short line doing the work of 20 lines. I somehow missed that idea . . .
    At any rate, reviewing old code (like some on this page) and the developers I’ve worked with who slapped my hand has allowed me to unlearn a lot of the bad habits. I’m sure I still have more to (un)learn.

Just writing this article has prompted me to look at some of the old Perl code and now I want to revise it the first chance I get. Most of it is procedural code, and the first thing I see is how I can improve it and turn it all into objects and modules. Another day, maybe. The lessons Perl has taught me are invaluable, I wouldn’t trade that knowledge for anything. Well, maybe a good pizza, but nothing else.

* Perl has advanced a lot since I started coding in it and today here are many more modules, libraries and extensions that invalidate this statement. But still, sometimes it’s good to know what a thing is doing, and you will always know what it’s doing if you write it yourself. The problem with that are the things you didn’t know about.

** Also I learned not to immediately turn to regular expressions to solve a problem, to only use them when appropriate, but that’s another story.