The analysis and management of data from multiple sources may introduce issues of trust, quality control and stewardship. If you want to study someone else’s data, can you trust that it’s accurate? Can they trust you with their data? How would they know if you changed it? As data analytics use cases get more sophisticated and multi-party, such questions cease to be abstractions. The use of blockchain in data & analytics presents a robust solution to data trust and stewardship, so much so that Gartner listed it among the “Top 10 Data and Analytics Technology Trends That Will Change Your Business” in their latest report.
2019 Market Guide for Data and Analytics by Gartner
Background: Data Ownership and Trust
We are now in an era where data & analytics can unlock their full potential when the process is applied across multiple entities. Examples include fraud detection spanning multiple banks or patient analytics that encompasses data from hospitals, payers, pharmacies and wearables. However, putting data to work with more than one data owner means running headlong into one of the greatest intellectual and policy conflicts of the information age: On one side, you have people who believe that data should be universally accessible, free and “ownerless”. The mere suggestion that data should be proprietary or protected is offensive. On the other hand, you have real-world cases where data is private property. It contains secrets that need to remain secret. It’s valuable, not for sharing.
When you want to do analysis on data from more than one source, you encounter tension. If the analytical collaboration is between a corporation and, say, a university, the trust issues can create conflicts between corporate data sharers which involve clutching data close to the chest and letters written by attorneys who don’t usually know much about data in the first place. Data analysis between non-profits trends more toward Lebowsky-esque pleas of, “Look man, let’s be cool and not pollute each other’s data, you dig?”
However the script plays out, the core issues remain the same. Entities want and need to share data with others for analytical purposes. The process creates risk of data theft, accidental or deliberate data breaches and data corruption. Blockchain, which uses cryptography, full traceability and immutability to prevent tampering with data sets, can solve this problem.
“Blockchain” is one of those unfortunate, over-hyped terms people talk about without fully understanding. The “block” in the blockchain refers to a data set. Most commonly, the block is a transaction involving cryptocurrency, though in reality it can be any data transaction. The “chain” is a peer-to-peer ledger held together with cryptographic controls.
The architecture of blockchain makes it impossible for anyone in the chain to modify a block of data in it without everyone else knowing. The open, distributed blockchain ledger records transactions between parties in an efficient, verifiable and permanent way. No one actually knows for sure who invented blockchain, but it appeared on the world scene in 2008.
How Blockchains Works in Data Analytics
Gartner foresees a promising future for blockchain in data management and analytics. They call it “cryptographically supported data immutability, shared across a network of participants.” The analysts like blockchain because it provides “decentralized trust across a network of largely untrusted participants.”
This capability lets all stakeholders in a multi-party analytics process see the full lineage of data assets and transactions. There can be transparency in shared data analytics for complex networks of participants. Everyone can see who is interesting with whom. It will be visible if one interaction triggers others. Done right, the net effect is to increase trust among participants in the data they work with.
Blockchain Use Cases
Gartner cites three potential use cases for blockchain in data management and analytics. With auditing and product lineage, they envision blockchain building an irrevocable dataset that can be shared across all supply chain participants, for example. The blockchain increases visibility throughout the network, increasing the monitorable surface area. This translates into improved traceability and understanding of product provenance. Applications might emerge in tracking costly assets, pharmaceutical drugs or medical claims across the fulfillment steps.
Fraud analytics is another use case mentioned by Gartner. In their conception, a public distributed ledger could enhance risk management and fraud analytics by giving all the parties the same data set. Fraudulent activity detected by one party could be flagged and sent out on the network, notifying other participants before they are affected by the fraud.
Data sharing and collaboration is a use case where external data is introduced into an internal process. By default, such data needs to be treated as completely unverified and untrusted. With blockchain, however, it becomes possible to treat external data as if it originated internally.
Overcoming Challenges in the Use of Blockchain in Data Management and Analytics
Appealing as blockchain may seem for data management and analytics, its implementation is likely to introduce complexity. Gartner foresees challenges in integrating blockchain technologies into existing data analytics infrastructure. The best approach today is to position blockchain as a supplement to existing data management solutions. It is then possible to explore use cases that are driven by business needs.
We understand blockchain and its potential role in data management and analytics. If you are curious on how blockchain could help your business do more with its data—and data belonging to others—let’s talk.