Sample Input Data from EPCIS .xml files
Example Masking Table using 24-hour life masking codes
Example hashing function for data, using boolean inputs to indicate whether hashes and/or masks are neededare
The function that performs SHA256 hashing in c#
part 5: big company grants green company access to their query interface
part 4: green company proves to big company that it was part of the chain of custody
part 2: green company accesses sanitised file
part 1: EPCIS xml file is sanitised and uploaded
part 3: green company requests desantised data (note that neither green company of VeChain know the identity of "big company")
View of the user interface when opened
Choose the EPCIS file to processed
View the unprocessed file, click to process
View of the obfuscated file before uploading, note the hashing and much smaller file size (also it's JSON)
Transaction on VeChain - note that the blue arrow highlights the hash pointing to the data in IPFS
Supply chains are complex and non-deterministic, it's not always possible to know where a product has come from or where it might be going to (the essence of traceability). Just as the fisherman who catches the Tuna doesn't know if it will end up in a high-end restaurant, a can or in pre-made sushi, the sushi maker does not necessarily know who caught their fish. Discovery of the chain of custody of the product is a key feature in supporting EPCIS and Fish Supply Chain traceability. When the supply chain is not clear, the discovery service needs to be public or semi-public, however the raw data in EPCIS Visilibity Events is highly sensitive. Our solution demonstrates how data can be sanitised, shared and discovered without risk of exposing commercially sensitive data.
We propose an architecture of a supply chain discovery service and provide the key program that interoperates between EPCIS repositories and the public discovery service. Our solution aims to be technology and choreography agnostic to the greatest extent possible, thus allowing maximum standardisation and interoperability.
Data Truncation Not all the data in the EPCIS Visibility Event is needed for the discovery service. The service requires only enough information for any actor in the chain of custody to be able to locate the event in their own EPCIS repository and to prove that they were indeed part of the chain of custody. Therefore, our program first strips away all the Master Data and Instance Lot Master Data, as well as information such as Quantity and Unit of Measure from the Visibility Event Data. We have adopted a cautious approach to truncation, industry consultation could lead to further truncation of the file. Truncation is important as it reduces the load on the file sharing mechanism that we will discuss shortly.
Data Sanitisation: Non-Static Data Data must be obfuscated to remove any sensitive information that could be used to identify the parties and locations in the chain of custody. Most of this data is what we call 'non-static', indicating that it is different for every product (such as the SGTIN). This data is relatively simple to obfuscate. Having evaluated both key-based encryption and hashing, we selected hashing as it offers a greater level of security. (see footnotes) Further, the fact that the same input always produces the same output from a hashing function means that our proposed verification method (below) works. We chose SHA-256, which produces a 256 bit output hash, as the hashing function based on its wide use in cryptocurrency and blockchain.
Data Sanitisation: Static Data One challenge we were presented with is the case of so-called 'static' data. This data, such as Readpoint GLN, is static over many EPCIS events and so could be used by a malicious party to map and reproduce the original data. To avoid this we have developed a time-dependent masking method that uses private lookup tables of mask values that are valid only within the specific actor, for a specific time window (see image in attachments). In our program we have used 24 hour periods that end at 23:59:59.99 every night for the sake of simplicity. The masking string is concatenated with the static data string and hashed, producing hashes that vary enough over time to make mapping of the data impossible. It is critical that these lookup tables remain private to the owners and are stored permanently, so as to allow them to be referred back to.
We attach an indicator of the form of the input string to all our output hashes so that is clear whether the input was, for example, a GTIN or an SGTIN.
Output & Compression We convert the output file to the lightweight JSON format, reducing the load on the storage and sharing systems. JSON allows key-value pairs to be indicated with minimal notation. The output file is a combination of hashed senstive data, as described above, and non-sensitive data, such as action and datetime.
Storage and Sharing It is envisaged that Sanitised Fish will use an API to upload the output file to the VeChain IPFS, but the IPFS is still a work in progress by VeChain, so for the purposes of this hackathon we are using local storage but replicating the interaction with IPFS as much as possible. Once processed, the file is uploaded to the IPFS and a hashed pointer to the file is attached a VeChain transaction. A transaction hash, essentially a receipt of the process, is returned to the party that uploaded the data into sanitised fish. In this format, the sanitised file is publicly available and the supply chain and blockchain are linked. This linking becomes important when proof of custody verification is required.
Use of the Discovery Service To demonstrate the use of our system, consider two companies in the fish supply chain: Captain Hook's, an owner of a fleet of fishing vessels in Thailand and a supplier of fresh fish, and Fishy Dishy, who processes fish into 'Fish Fingers' for retail in Supermarkets in the UK. Let us assume that Fishy Dishy would like to obtain tracebility information on the products they are supplied with. Using the publicly known product identification information, they are able to locate the sanitised file on the IPFS. Using this file, Fishy Dishy can request access to the desanitised data on VeChain. The request is pulled from VeChain by Captain Hook's, who monitor the transaction on VeChain. Now, Captain Hook's can see that Fishy Dishy is requesting access to de-sanitised information. Fishy Dishy can prove they are part of the chain of custody by recreating the hashed EPC of the product. Captain Hook's can then provide Fishy Dishy with access to the query interface of their EPCIS repository. Thus, the discovery service has been used to discover the origin of products in the supply chain, without the need for pre-determined flow of goods.
One of the options for the future we are most excited about is the use of smart contracts to automate the proof of custody process. Although built in the context of the GDST hackathon, Sanitised Fish can be used with any supply chain. Further, we hope to be able to integrate with other hackathon projects to be able to provide truncation and sanitisation for their data. For example, the project working with GPS coordinates faces the challenge that fishermen may not want to reveal their "best and top-secret" fishing locations. Our program could be used to hide the GPS coordinates and return only a hash indicating the fishing took place in a legal local. The same applies for the DNA verification project. For commercial reasons, actors in the supply chain may not want to release the full details of the DNA analysis. Our obfuscation methods can be applied here to hide the exact results of the test and return only an aggregate analysis.
Interoperability Sanitised Fish is built from the ground-up to be technology and solution provider agnostic. VeChain was used for demonstration purposes, but it is trivial to change the details of the API calls in the source code to operate with a different provider. The function of a discovery service is to maximise the potential for interoperability in the supply chain, we believe our solution fulfills this potential whilst remaining lightweight and simple in nature.
Innovation Sanitised Fish is the only existing implementation of an EPCIS truncation, sanitised, compression, sharing and discovery program. We have utilised blockchain to support the distributed nature of complex supply chain discovery mechanisms.
Impact Sanitised Fish offers the potential to revolutionise business communication regarding product provenance and traceability. Previously sensitive data can now be shared publicly without risk to businesses thanks to the secure hashing and masking deployed in the program. Further, a distributed discovery service will allow the automation of verification of the chain of custody, reducing business effort and increasing efficiency.
Interface Sanitised Fish handles the hashing, API calls and blockchain interaction internally. Users interact with the app through a simple user interface that offers them the opportunity to understand what the app does to their data. It is anticipated that a final version of the software would be web-based and interact with an IPFS, this was not possible due to time and technical restraints.
Feasibility Sanitised Fish offers a low/zero cost solution to the discovery problem. By using an IPFS and blockchain pointers, we upload as little data as possible to the blockchain and therefore reduce our transaction costs to the lowers possible level.
This has not been a simple build, here are a few of the barriers we were presented with:
- Conversion of XML format to lightweight JSON format
- Use of time-dependent masks to hide static EPCIS data
- Selection of data to keep/remove in truncation
- Extraction of data from complex XML input files
- Attaching file input type data to hashes
- API interaction with VeChain to share files
In the attachments, you will find images of input and output files, and slides explaining how the discovery service works.
Key-based Encryption vs. Hashing - We selected hashing over key-based encryption (symmetric or asymmetric). The main motive for this was that, should the key be leaked somehow, it would be relatively easy for a malicious party to undo the encryption. As we do not require two-way encryption and decryption of data, hashing was much more suitable.