We all know the cybersecurity industry is massive. As it should be. Our personal lives and global economic growth are increasingly based on data. Protecting the integrity of that data is paramount to the safety and well-being of individuals, nation-states and the global economy.
Some of the smartest people in the world work on cybersecurity. All those smart people continually come up with new techniques and technologies to protect our digital assets, causing the bad guys to continually come up with ways to circumvent those protections.
But what if all those smart people are all thinking about cybersecurity in a legacy context? What if there was a new way to approach the problem? As an old strategy professor of mine used to say, “sometimes you can’t read the label when you’re inside the jar.” As it relates to cybersecurity, are we all inside the jar and not reading the label?
What Do We Think Is the Label on the Jar?
What exactly are we trying to “secure” in cybersecurity? It’s not really all things cyber, it’s really data. Sure, rogue programming code can do bad things, but programming code mostly exists to change data and even the programming code itself is just data at its most basic level. So, let’s think about it as “data security” instead of “cybersecurity.”
And then when we call it “security” we imply that data is a non-intelligent “thing” that must be guarded, like dollar bills in a bank vault. But what if guarding the data was only a means to an end? Are there other means to reach the same end? (Hint: Yes…keep reading.)
What we really care about is having data that we can trust (or in the bank analogy, having access to the dollar bills we can trust and spend as legal tender). Another old professor of mine — this one a marketing professor — liked to say that when you get a car wash you aren’t actually paying for a car wash. You are paying to receive a clean car. To put it in marketing terms, the car wash is the feature of the product and the clean car is the benefit of using the product.
So in the case of data security, the benefit we want is data that we can inherently trust. A better term for this — the label that should be on the jar — is “data integrity.” Protecting the data, i.e. data security, is one method of achieving data integrity. But what if new technologies allowed for other methods?
How Else Can We Achieve Data Integrity? Some Ideas As Old As Money Itself
Going back to the analogy of dollar bills in a vault…all dollar bills have serial numbers (otherwise we wouldn’t be able to play Liars Poker!). The “marked bills” referenced in TV shows and movies is a misnomer because all bills are marked with a serial number. The term “marked bills” simply refers to bills that have sequential serial numbers so if that range of bills is stolen it can be broadcast out far and wide and flagged if any of those serial numbers are spent. To complete the analogy, those ranges of serial numbers can no longer be trusted and are no longer legal tender.
And sometimes, banks put radio-controlled exploding dye packages into bounded packs of dollar bills so if they are stolen the dye package explodes and physically marks the bills so everyone who sees them (no serial number lookup needed) knows they cannot be trusted and are not legal tender. In effect, the dollar bills have been equipped with a mechanism to “defend themselves” from bad actors.
In these analogies, when the guarding “feature” of the bank vault failed, the “benefit” of knowing whether the dollar bills could be trusted — dollar bill integrity — was still maintained.
So how do we improve upon data integrity? Certainly better, faster, cheaper ways to guard the data helps. But what if we also looked at solutions that reside in the data itself, not just the security that surrounds the data?
What if each instance of data modification in a database had a unique identifier melded into it? Like a serial number on a dollar bill. We could then use those sequential serial numbers for data integrity purposes, as we’ll see in a real-world healthcare application below.
And what if the data elements themselves could be equipped with intelligent mechanisms to defend themselves for when security has been breached and serial number lookups are not viable? (Another hint: Yes…keep reading for a real-world example of how this feature is being used to help ensure America’s national security).
These are interesting data integrity concepts. Especially in light of blockchain-based data management tools — blockchain technology has both of these features natively embedded.
Blockchain? Yes, Blockchain.
The entire architecture of blockchain is to maintain the complete history of all data modification events going back to its day of creation, secured with cryptographic protection. This is a very big, foundational difference between databases and blockchains.
Remember, databases are designed to natively store only the current value of a data element — old values are overwritten — and when bad actors are doing the overwriting, the database itself doesn’t know the difference between good and bad actors and therefore the correct information is gone forever. Way back in days of yore when databases were first designed, the value of software was from programming logic that performed process-automation, the data itself was a second-class citizen, and data storage was expensive, so they weren’t designed to keep all historical values. And the original architects of the computer industry didn’t think about malicious intent and data integrity and all the other things that drive data management use cases today. All modern-day use cases are force-fit onto database architectures that were never designed with these use cases in mind (Security and Analytics and APIs, Oh My!).
And equally important as the architecture of storage, the actual data being stored was always assumed to be dumb, inert things. But the smart contracts popularized with Ethereum give the data intelligent mechanisms to trigger programmatic actions when read/write actions are performed on it, giving data stored in blockchains the ability to “defend itself” when it comes under attack.
So why have blockchain-based solutions not been used for enterprise data integrity purposes? Bitcoin’s 10-year history of never having been compromised despite the lure of untold riches for anyone who could compromise it is strong evidence of blockchain’s data integrity bona-fides.
Is it because blockchain solutions consume too many resources and operate too slowly for enterprise usage? Yes and no. If it’s a public, permission-less blockchain, then the answer is “yes”. As an example, it costs about $5M to store 1GB of data on Ethereum because blockchains operate as load balancers in reverse in that they duplicate every piece of data and code on every single node, so the whole system consumes more resources as more nodes come online (“permission-less” means anyone can light up a node, no prior permission needed). But when talking about private, permissioned blockchain networks, the answer is a resounding “no”. Private blockchain networks can operate at costs and speeds that are completely feasible for scaled enterprise usage.
Is it because existing blockchain solutions do not have query languages that are required by applications to do the things that they do, such as perform transactions, and update/analyze data? Bingo. That’s the real reason. To power an enterprise application with diverse data management needs, the data would have to be stripped off the chain and loaded into a database that has a query language, completely voiding the data integrity features of blockchain as if they had never existed in the first place. Obviously, this is non-sensical so it’s not done. The current use of blockchains in enterprise applications is to power decentralized asset-ownership ledger “services” running alongside applications that use traditional data management products in traditional ways, and therefore the core application data is subject to traditional cybersecurity threats and exposures.
Enter a New Species of Data Management Product
What if there was a new data management product that used private permissioned blockchains as its data storage method, and it contained a robust query language capable of performing the diverse data management needs of enterprise applications? That would be an entirely new species released into the ecosystem. One that is focused on data integrity, not overlapping or replacing existing cybersecurity measures, but complementing them. It’s a rudimentary analogy, but it’s similar to adding serial numbers and exploding dye packs to dollar bills that are already held in a vault.
A Little Backstory…And A Deeper Dive
My partners and I at 4490 Ventures were thinking about blockchain applications beyond cryptocurrencies and came across a small startup called Fluree. The team there was light years ahead of us in terms of their thinking of how blockchain could affect the data management industry.
The co-founders had started several companies in data management and software, and one of them was a true data management industry OG having founded and built what was at the time the largest provider of data warehousing tools and solutions in the world, growing from zero to $1B in revenue in 12 years, operating it as a profitable public company and selling it for nearly $4B in an all-cash M&A transaction, all after having only raised $5M of venture capital pre-IPO.
They had been tracking the cryptocurrency sector since its inception, but more as technologists than investors, exploring how the technology worked and what the other implications of it could be to the broader software industry. And based on their explorations they decided to build Fluree and spent three years and their own capital building the initial version of the software.
In full disclosure, after we understood what Fluree is and the founding team’s vision, my firm eagerly partnered with the team as their Seed round lead investor and I now sit on the board. So the following commentaries are that of a #ProudPapa, but we do believe the concepts that Fluree is introducing to the market should change the way we think about cybersecurity, and data management more broadly. The actual companies that successfully commercialize these concepts at scale are dependent on management skill, timing, luck and all the normal things that determine the winners in a market…but we feel that this is a market worth winning and that Fluree has the pole position.
Fluree is a semantic graph style database, called FlureeDB, that stores its data in enterprise-defined private permissioned blockchains (a.k.a. distributed ledgers, or DLs) called FlureeDL. The fact that it uses permissioned blockchains makes them very different in terms of speed, resource consumption and programmatic flexibility than the permission-less blockchains like Bitcoin and Ethereum, while still maintaining the data integrity features inherent with all blockchains.
Like all blockchains, FlureeDL creates a new block every time data elements change; these new blocks consist of Fluree Flakes (like snowflakes, every one is unique). Fluree Flakes are the equivalent of the serial numbers on the dollar bills, and use the industry standard RDF format (which future-proofs applications built on Fluree as RDF is the W3C standard for Web3, a.k.a. the Semantic Web…very important concept and the subject of a future post).
Fluree also has Turing-complete Fluree SmartFunctions that give the data the intelligence to take an action if some action has been taken on it. These SmartFunctions are stored as data in the blockchain itself and have many different uses, one of them being the ability to ensure data integrity by taking an action if a user is trying to access or modify data that they are not pre-approved for. This is equivalent to exploding dye packs if dollar bills are removed from the vault by someone not authorized to do so.
It’s early days, but let’s examine how this new species is behaving in the wild…
A leading academic medical center is using Fluree to store the drug-dosing instructions loaded into a drug-delivery machine that keeps preemies alive just after birth. This is a procedure that has life or death implications, and therefore has significant malpractice liability associated with it. The medical center chose Fluree because it wanted the native storage of sequential serial numbers for every data alteration event, to prove that the dosing instructions presented as part of a potential malpractice defense were the same instructions entered into the machine when the preemie first went on it. Importantly, this is a real-world application of blockchain technology that has nothing to do with decentralized sharing of information, which we believe is a first for the entire global IT industry.
A Defense Department branch of the government is using Fluree to have a single shared dataset that different military troops update in real-time so they know where everyone is, including non-U.S. allied forces. Sharing data with non-U.S. allied forces is particularly complicated from a cybersecurity perspective, and was one of the driving reasons for choosing Fluree. The SmartFunctions feature is being used as blockchain-stored logic, which at its atomic level is blockchain-stored data, that controls access to view and modify other blockchain-stored data. This allows the data 1) to not have to be behind a firewall, and 2) to be accessed and modified by authorized users only, and to do so without needing API’s. The blockchain-resident SmartFunctions eliminate both of these traditional costly and security-risk-prone components of legacy architectures. To put it in the dollar bill analogy, the dollars are being left out on the sidewalk but they’re loaded with exploding dye packs to render them useless to anyone other than authorized users.
These two use cases, while niche themselves, illustrate the potentially powerful application of Fluree’s blockchain-based technology to ensure data integrity — for both centralized and decentralized data — in ways that existing solutions cannot. Existing cyber security solutions are not even of the same species.
Abacus vs. Algebra: Zooming Back Out to the Big Picture
One last analogy to leave you with (totally borrowed from Bryce Merkl Sasaki of Neo4j) is the thousands-of-years-old history of the abacus and the Hindu-Arabic numeral system (numbers as we know them today: 0, 1, 2, 3…9). Both were very good at the use case of counting and simple arithmetic, which was the original purpose for both, and abacus had arguably better adoption potential because it didn’t require as much education and mental gymnastics. But the reason the abacus ended up in the dustbin of history is not because it wasn’t good at its job (it was), it’s because it was only good at that one job, whereas the Hindu-Arabic numeral system could do so much more, like algebra and calculus, which became the foundation for so many other things from statistics to particle physics.
I would postulate that blockchains that do not have sophisticated data manipulation capabilities are like an abacus. Really good at the simple use case best illustrated with a cryptocurrency: I own this asset, now you own this asset. That’s what the original architects of blockchain designed it to do, and they did a good job of that.
While cryptocurrencies are the proof that blockchain technology can “change the game” in terms of cybersecurity, the data integrity needs of the world go far beyond the simple asset ownership ledgers of cryptocurrencies. As the World Economic Forum has termed it, the Fourth Industrial Revolution is upon us and our continued development as a global society is dependent on data integrity. Data management tools that meld sophisticated graph-style data manipulation capabilities with blockchain data storage are like the Hindu-Arabic numeral system: they can be used for the straight-forward use case of asset ownership ledgers, but they can also be used for far, far more advanced data management use cases than that.
As we move beyond the feature-centric “guard the data” mindset of existing cybersecurity solutions to the benefit-centric mindset of holistic data integrity, tools like Fluree will help us achieve the data-driven future by providing data that we can finally and fully inherently trust.