Paper Highlights Dangers of Legacy Code But Experience Shows Removing Legacy Code/Equipment/Services Difficult

A research paper published by the University of Pennsylvania in association with Stefan Frei of Securina coins the term “honeymoon effect” to describe the delay between the release of a software product and the discovery and subsequent patching of the first vulnerability and posits that the duration of such a delay is a function of the team’s familiarity with the code. The most interesting revelation in the paper stems from the postulation that code re-use, often encouraged to decrease development time appears to be a major contributor to the amount (and perhaps by inference – the severity) of vulnerabilities found.

It was at this point in the paper that I decided that an opposing view published on my blog would be potentially of benefit, if only to get this annoyance off my chest. I also noticed that this paper also made an appearance today on Schneier’s blog but he provided little discourse other than paraphrasing the key concepts outlined in the paper, something I found particularly strange given the glaring issues that are endemic to a paper with such a potentially controversial claim.

Of course it is without doubt that legacy installations are responsible for many of the breakdowns in security policy we have seen of late. The Adobe breach – where millions of passwords were leaked by an unidentified attacker was said to have occurred on a legacy AAA system that was scheduled for decommissioning. Many an Internet provider has had an ancient MX used as a springboard to further infiltrate their network as a consequence of an old and forgotten installation of sendmail. A major police department in a European nation had millions of records including vehicle/driver license data in addition to dossiers on the victims and perpetrators of crime accessed (and presumably duplicated) thanks to a legacy information storage system that was entirely proprietary and thus could not be exported via electronic means, necessitating time consuming manual entry which was scheduled to be completed shortly before the breach took place. I stress this to my clients regularly – if you no longer have a use for a piece of software, a service, router or other internet connected appliance then decommission the unit and eliminate the risk of an unauthorized incursion.

Problems may exist where an alternative solution has replaced the legacy equipment and/or software but only a partial importation was successful thus requiring the use of two systems in tandem. While this is an inexpensive approach for mitigating such a problem it is most certainly the wrong way to address such a challenge.

Rather than touch on my issues with this paper – which are numerous I would like to relate to you a personal story about replacing legacy equipment that has been untouched for not months, not even years – but decades. I will no doubt write a short post countering the author of the paper’s claim that code reuse is necessarily a bad thing. Indeed if you are unfamiliar with a certain function you are almost certainly better off using an already established library. Phillip Zimmerman himself said many years ago that any encryption solution that you design independently will be fundamentally flawed. It is far better to take a library written by those with detailed knowledge in the specific field than try and improvise something that may nor may not be superior. One pertinent example would be OpenSSL. I personally don’t like it – it’s code is terse, poorly commented and generally a big mess. Despite this I would not even consider writing my own SSL library. Other excellent solutions exist such as CyaSSL (tiny and perfect for embedded systems and with a much smaller attack surface as a consequence) or Mozilla’s NSS. There’s also GNUTLS and countless others. I am firmly of the opinion that you need not reinvent the wheel when there is no compelling reason to do so. That said, I think there is immeasurable value in auditing third party code you intend to use in your own software. It is part of the due diligence that we, as developers, should be undertaking and it improves the security of not just your own product but all of the other projects that happen to use your library. Nevertheless feel free to disagree, or even better perhaps you can comment on this post.

I once assisted a library who wished to upgrade their software and was presented with a “black box” of which the library staff knew very little. After a preliminary investigation the ancient unit ran SunOS 4 and also interfaced with their sister library elsewhere in the town over a leased line (that’s being kind – it was literally a jumpered connection at the exchange providing a direct electrical path over a pair between the two sites). To make matters more complicated the sister library had dissimilar hardware. All of the interaction between staff (at the borrowing desks) and library customers (at the enquiry workstations) was via VT100 dumb terminals.

The rear of the “black box” (the name the clueless customers gave the main SPARC server) had four RS232 connectors, an AUI connector that made its way to a 10base-T MAU amongst the usual connections for a local monitor and keyboard. The second smaller unit was in a slightly different style case and had another four RS232 connectors and an external SCSI enclosure.

So you can imagine my enthusiasm when I discovered this. We eventually obtained root access to both machines thanks to a decades old sticky note and started poking around. The company who supplied the library management software was long out of business and hardware for such machines would have been nearly impossible to find (as an aside I will one day tell you the story of a university who to this day relies upon a VAX to keep their admissions and keeps a stockpile of old VAXen for parts. Despite my continual recommendations the closest I was able to convince the board into modernization was the installation of a terminal server which allowed staff to ssh using PuTTY into the terminal server, which would find a free serial port and establish a connection) which effectively meant they were sitting on a time bomb that could very easily go off at any time the hardware chose to fail.

Looking through the configuration files we discovered that the second box was effectively used to add more RS232 ports to support more concurrent dumb terminals. I suspect that the SPARC boxes of the era had some kind of hardware limitation preventing someone from adding a heap of serial ports to a single host but I highly doubt this hypothesis has any merit given that running large banks of serial equipment was relatively common in an enterprise environment.

The second box had a very simple configuration. A customized getty was enabled on the first three ports. This sent 25 LFs (who knows, maybe they forgot that VT100s had a clear control code?), echoed the date in strictly 11 characters, e.g. “03 Jun 1983” followed by 62 spaces and then the time, again always fixed with at 7 characters e.g. “10:23am” suffixed with 11 LFs. It then sent 23 spaces then echoed “NORTHERN DISTRICT LIBRARY SERVICE” followed by 3 LFs, then 29 spaces followed by “Press RETURN to begin” finally suffixed by 8 LFs so the whole thing looked centered on the screen as the read input prompt took up the bottom row. Imagine that – and now imagine that at 9600bps on an amber screen dumb terminal. Yep, some joker spent real time and effort getting it all to display just right. They should have gone the whole nine yards and put in some ASCII art.

So in theory the user presses enter. If they do the little script opens an rlogin session to the other “black box”, and as they are already authenticated they do not get a repeat greeting. Normally the script would just be sitting there expecting input (ostensibly the enter key but it could be anything followed by a LF) forever but our intrepid coders have thought all about that with a cron job that runs every 30 minutes checking for idle gettys, flushing their tty with a heap of line feeds and then sending a SIGHUP to it to make it reset and pump out the issue file again. I was told that it was actually put on by a tech in the early 90s after angry librarians complained that kids weren’t pressing ENTER but rather writing expletives without hitting CR and thus leaving abuse for others to find. This limited the damage.

The other “black box” was where the magic happens. We found the software – no source unfortunately just a SunOS executable and also found that despite this being the main box that it was saving all of its data on the other box’s external SCSI array that was exported via NFS. Fun. The data files that we found looked unintelligible and weren’t in any format we’d encountered before.

On the other box with the SCSI HDD array we noticed that one of the serial ports was not running a getty and sought to figure out what was going on there. The answer was simple when we finally traced the wires. It led to a box that we now know as a PSTN emulator – it supposedly is installed at one end of one of these old school jumpered lines and essentially emulates the central office. It provides a dial tone, line voltage for ringing the other end, etc. Essentially it is like an 80s version of one of those FXO cards that we use nowadays with IP PBX software.

Anyway the modem hooked up to the line simply picked up the line, got carrier and the box on the other end did much the same thing by supplying four dumb terminals with access via transparently launching an rlogin session.

So we were pretty upset about how the hell we were going to get the data out of this antiquated system when a colleague suggested that instead of fighting with it we instead work with it. So we wrote a heap of shell scripts, got really comfortable with using expect and purchased a 10base-T PCI Ethernet card on eBay for all of $8 and as a bonus they threw in a T adaptor and a 8′ run of coax with BNCs on each end that turned out to be just the right length for us.

So we eventually got the LAN working and was finally able to access the system via rlogin. Our first script that succeeded was designed to login, navigate through the text menus and then display the catalog. Unfortunately it wouldn’t show more than 1000 entries so we ended up using two character prefixes and had our script work through aa to zz and also tacked on 0 to 9 (which got a few titles we would have missed). Each result window had the ISBN, description, replacement price, etc. By running each expect script individually and then through the liberal use of sed and egrep we were able to pull out all the data we needed and slowly populate our mysql database (via the command line mysql tool).

We then worked out scripts to scrape information from the borrower section. We were able to get not only their names, barcode number and phone number but also a complete history of every book they have ever borrowed. This was invaluable especially as we were able to use this information to compile a list of books that were currently out on loan. Nobody was spared – even banned and inactive users were exported so we could have the most complete data set.

We finally worked out the employee information and time clock history in addition to pretty much every note ever entered. Once we were satisfied our code was bug free we gave it a test run.

It worked beautifully but it became painfully apparent that this was not going to be a fast procedure, certainly not fast enough to do over a holiday weekend. We estimated it would take over two months to export everything. By this time the information would be stale and useless.

We considered doing the book information import first and then finally doing the customer data but a variety of factors conspired against that concept, which I believe would have been a reasonable option. It was then that I had my epiphany. Scripts are notoriously slow but mine was unnecessarily slow for a variety of reasons and it was this inefficiency that needed to be rectified. The software worked sequentially going through each record in order and carefully updating the table when the data is collected. A partially terminated attempt could not be resumed which also proved to be an issue.

I dramatically altered the architecture of my code. I added a journal to the database and created a supervisor process that invoked all of the scripts as required. I also carefully audited each script and removed any barriers that existed to concurrent operation (for example temporary files that had the same filename were instead modified to use the PID and the time in unixtime to ensure that one process won’t touch another’s temporary file. More importantly for most purposes I eliminated the use of temporary files instead putting raw information into the database and performing the cleanup within SQL which is far more efficient).

The supervisor script queried the journal table to determine what tasks are outstanding. It may perhaps find that books beginning with aa-cd are completed, as are pa-sz but a previous attempt on fa-gz is listed as being active but the last checkin was four hours ago. It will thus lookup the PID for the ga-gz job and kill the script. It will then remove the lock on the job and set a flag that indicates that there has been a failure. It will then make a new entry for ga-gb in the journal and indicate it as locked for processing, entering the current time in start and last check-in (it takes errored tasks in tiny chunks to narrow down error source) and start the script on that subsection. It will then execute the appropriate script with the arguments to attempt to extract ga-gb, background the task and enter its PID in the journal. If system load is still low it will find another uncompleted task in the journal, check it out and execute the script.
As the journal and all output data are on a mysql server we were able to bring in four Linux boxes to assist in the effort. We would have added more but network bandwidth (on the very slow coaxial 10base-T – we only had one old NIC so cheated and made a media converter out of an old PC, put our only card in it and bridged the two interfaces – this way we could connect our other PCs via standard UTP albeit at a very slow speed) became the bottleneck in addition to the software having often significant seek delays. You could hear the SCSI arrays churning so I suspect it is decades of fragmentation from a database that has never been reindexed). The journal system worked very well and eliminated duplication.

We managed to complete our final pre-production test in just four days and were able to convince the library staff to “look but don’t touch” (i.e. use the old dumb terminals to look up books but to write down people who return or loan out books on paper for later entry into the new system) when the big day (or should I say days) came.

We got the vendor for the new software to provide us with a template of exactly how he would like the data presented for importation. He went for the old school CSV, which suited us just fine. We had to clean up a lot of the data but eventually exported it into tab delimited and worked on it using sed, awk and a few cheeky scripts to sort out date formats, rogue whitespace and other minor things that can ruin your day. In the end we only had one or two rejections and two of them were as a result of foreign accents being mishandled by the old software and instead being pumped out as random extended ASCII characters which the vendor’s import tool didn’t like very much.

Once we pulled out all the old hardware and put in the Windows Server machine it was (almost) smooth sailing from this point on. We enabled Terminal Services and installed some lovely thin clients that were only marginally larger than a can of soda. We actually got away with rewiring much of the library as the RS232 was delivered through the library using UTP – I imagine it is similar to how the Cisco console cables function. A standard patch cable came from the wall and into an adaptor that turned the RJ45 into a DB9 which then seated into the serial port of the dumb terminal. I sincerely doubt that the cable would be of sufficient quality to carry 1000baseT but it was sufficient for our purposes and there was no noticeable lag or latency at 100baseTX speeds.

The final hurdle we encountered was the second library on the campus about ¾ mile up the road. We installed a bunch of thin clients and a VPN terminator and thus linked them to the primary site over their existing DSL connection but their experience was unacceptable with significant lag. Of course they didn’t think anything was wrong as they were used to the screen redraw time of a 9600bps dumb terminal. It was at this point that I considered that the jumpered line could be put to good use.

We removed the old equipment and instead installed an AT-MT605. These little beauties are great value for money (we got ours for about $350 each) and are simply connected at either end of your private line. The only configuration you need to do is flick a DIP switch to select one as the central office and the other as the subscriber. We were able to achieve well over 30mbit/sec (a bit better than the 9600bps modem they had interconnecting the place before!). The connection was so good the campus library (the smaller satellite one) has canceled their slow and unreliable DSL connection and are sharing the high speed Internet from the main library via the link. We have used QoS to ensure that the thin clients receive priority bandwidth to preserve the user experience.

I hope you’ve enjoyed this rather long winded explanation about my encounter with legacy hardware and how we were able to successfully migrate and modernize their entire operations through a combination of good hardware, a keen understanding and a broad base of knowledge – not to mention the mandatory abundance of patience.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s