Log4J shows we must all learn to understand our software
corporate crap

This week has led to yet another major vulnerability being discovered in major corporate software stacks, leading to a massive effort to discover, patch, and update. I remember the first time I ever used Log4J in a software development. It was such an improvement over having to code my own logging library. At the time, I was really just learning Java and Log4J was one of the first open source toolsets of practical value I could incorporate into my work. Now, we learn (a probably) honest error has created a cybersecurity crisis all over the world. If you follow the newscycle, its only going to get worse and stories like this one in Wired make it seem plausible.

Hype is endemic in the world of cybersecurity, as is the spread of fear, uncertainty, and doubt. Lots of software has flaws; they can’t all be so bad. By all accounts, though, the Log4j vulnerability—also known as Log4Shell—lives up to the hype for a host of reasons. First is the ubiquity of Log4j itself. As a logging framework, it helps developers keep track of whatever goes on inside their apps. Because it’s open source and reliable, plugging in Log4j instead of building your own logging library from scratch has become standard practice. Moreover, so much of modern software is cobbled together from various vendors and products that it may be difficult, if not impossible, for many potential victims to even know the full extent of their exposure. If your code’s innermost Matryoshka doll runs Log4j, good luck finding it.

Wired Magazine, The Next Wave of Log4J Attacks Will Be Brutal
TAILS OS Secure Download

What really struck me about this article was the last sentence above: “…good luck finding it.” It reminded me of when I was building the AMENDMENT4. On one hand, the whole point of the product was to put something powerful into the hands of everyday users. On the other hand, how would any of those potential users and customers be assured they could trust my product? The AMENDMENT4 relies on the TAILS operating system to create a secure software environment (once the hardware pin code is by-passed). So, I naturally decided to see how *I* could know I could trust TAILS because I knew TAILS had been compromised in the past. How was it compromised you ask? A Zero-Day inserted into upstream dependencies that no one noticed. Sounds Familiar, right?

Is a Signature Trustworthy?

It turns out that the means by which you “verify” TAILS is by signed digital certificate (HTTPS/DNS) and Signed Checksum. Now, if you dig enough, you’ll learn that the TAILS team is essentially a group of super-secret super-coders who can only be accessed directly by arcane cybersecurity incantations. You’ll also learn that TAILS guarantees nothing and points out (in the end) they are trustworthy (and a lot of other people tell you the same). My favorite little irony is the idea that a PGP Signature and Checksum is designed to protect against an “adversary” that potentially has the power to alter file requests and URL Destinations (like by intercepting Internet traffic and sending a compromised download) while in-transit but somehow they wouldn’t think to alter the transmitted Checksum value. Strikes me as a little short on logic but in the end everyone seems to fall back on an acceptable risk calculus. For which I’m in agreement regarding TAILS but I would have told you the same thing regarding Log4J a short time ago!

Getting to Know Your Software

In most cases an “acceptable risk” comes down to the notion popularized by Linus Torvalds, stating “all bugs are shallow with enough eyeballs.” But Log4J (and many other examples) show that maybe that’s not true often enough to be acceptable any more. So what can we do about it? I propose the answer is getting to know our software in a more practical way. In general, we all (users, practitioners, and developers) need an easy way to validate, review, and compare our software at the “stack” level. Isn’t that what a Signature or Checksum is supposed to do? It is but it lacks general clarity and at the same time it lacks specificity: if it turns out there is an issue, where is it and what do you do?

Making software public knowledge

What if there was a detailed manifest of all the files (ideally, source-code included) for all major software stacks? It could be a large online repository (blockchain anyone?), a github account, or just basic practice between professionals in various arenas that is then shared any time work is handed off or incorporated as a dependency. Astute readers might be tempted to say, “but isn’t that what happens everyday when coders work on software and use a source code version management system?” Indeed it is (supposed to be) but for the obvious rebuttal I refer to the above – that’s not really working, is it? More specifically, the problem may just be the fact that there is no broad-based vantage of all the software that comprises a stack: integrators are looking at integration points between packages while solution developers are concerned with their own code. But what if there was a generally educated populace where everyone could understand at a certain level what is right and what is (likely) wrong. When we think of software (and code in particular) this might sound overwhelming and too expansive but doesn’t our legal system actually work in a very similar manner? Very few of us are juris-prudent practitioners but yet somehow we (mostly) seem to get through the day without breaking the law. At the very least, we all seem to get through the day without breaking the law in ignorance.

A Simple Solution

Let’s create a quick fingerprint of the IPFire OS running on the Raspberry Pi 3B+. Here is a quick way to do it and a general explantion of what’s happening below: List all files under Root recursively including most hidden files; strip out directories that only have runtime representations of resources; format by column so it looks presentable; write to a file called “fingerprint.txt”. With a little bit of reading “man” pages, it is easy to change the level of scrutiny, directories, and reporting details of the fingerprint report.

#sudo ls -RlA / | grep -vE "/proc*|/dev*|/sys*|/boot*|/run*" | awk '{print$1" -  "$2" - "$5" - "$9}' > fingerprint.txt

Once that fingerprint is generated, it can be compared to one or more other similar “fingerprint” files at the byte level with the following line.

#sudo diff -iwy -w200 --suppress-empty-blank [./fingerprint1.txt] [./fingerprint2.txt] | less

In the following example, I added the text “#test” to the torrc file just to demonstrate the sensitivity of this approach to alterations in files even across an expansive list of files, documents, and binaries in a system that is brand new to me as a user. And for the record, even in this exercise I learned that /etc/tor/torrc is actually a symlink to another directory used to run and configure and I was not even changing the actual file I thought I was (in this example).

What problem would this solve, again?

The most basic problem this approach would solve is to give everyday users (of any operating system, really) the ability to see at a glance what is on their computer, where it is located, and a basic “signature” of size in bytes. That signature could then be easily compared to other users of the same software and all differences highlighted quickly. The simple example is that a user of CognitiveMetropolis software could compare a live, deployed signature between themselves and our signature, and even other users’ signatures. Curious users will naturally start to ask questions that highlights how configuration files might be different among all systems; some users might start to note where source code is stored (or not present); and yet others might wonder why source code signatures are the same but compiled versions between two similar systems seem to vary for some binaries but not others. It starts to address the gap between a Checksum and the areas where a problem might actually exist within a software stack. Similarly, it would smooth out the logic of needing a Checksum to validate software integrity in a world where an adversary might be able to inject malicious binaries in-transit, in upstream code repositories, or by taking over an Internet resource entirely. It might even turn out that Linus’ Law was right all along and collectively we just haven’t added as many eyeballs relative to the added bugs.

How can you trust me?

Discussing the AMENDMENT4 above, the simple answer is how can we trust any person or business? How can we trust Tesla that their car’s auto-pilot wont kill us? How can we trust the airbags in our Toyota cars will work right? How can we be sure we won’t die when we fly on a commercial aircraft? How can we trust we won’t drown when we go on a commercial ocean cruise? The simple answer is calculated risk. CognitiveMetropolis is not in the business of selling to spies and criminals, who by nature are so paranoid they (hopefully) would never buy our products. But for everyday folks who just want to not be tracked, packaged, and productized its worth the risk because in the worst case scenario they are simply no better off than using commercial products. Which is kind of funny, because Log4J demonstrates it turns out the same risk calculus applies to corporations themselves!

Tell Us What You Think