Client UI that allows users to specify the parameters for query based on services provided.
Service Provider UI, enables users to register and manage their microservices with the gateway easily
Diagram representing the data exchange protocol we implemented for Private-Set-Intersection
Diagram showcasing the overall architecture of our gateway and how components interact with each other
Who Are We:
We are Team Waste Washers, a team of computing students from the National University of Singapore, with an interest in backend engineering and microservice architectures!
Boh Jie Qi :
We are working on Problem Statement 2: Privacy Innovation!
What is the GodPSIlla Gateway?
It aims to provide external untrusted organisations a means to interact with and query TikTok's Kitex microservices for data, and find the intersection size between the datasets of the microservices and their own. The data exchange protocol design is partially inspired by OpenMined's implementation of PSI-CA.
Pronounced [god-si-lla] (a combination of "Godzilla" and "PSI"), it is built upon an API Gateway prototype that we made to support microservices built in Kitex called the Godzilla Gateway (which was in turn inspired by the Kong API Gateway).
Features of the GodPSIlla Gateway
- Client Library featuring UI and GRPC Client that can be downloaded and used to interact with the server side gateway
- Server Library featuring a UI and code generator, that allows users to easily customise and generate a PSI gateway that supports their microservices.
- Uses the ChaCha20 stream cipher for commutative encryption and decryption
- Supports querying intersection size between own dataset and multiple microservices (Similar to Multi-ID PSI)
How do I use it?
As The Service Provider
Imagine you are a large organisation that implements microservices in Kitex. Your services implement getter RPC methods that returns their data. You would like to expose an API for external parties to query your data and find the proportion that matches their data, without compromising the identities of the elements in either of your data sets, as per the rules of PSI-CA.
- Download our server project files
- Start up the GodPSIlla Server
- Through our interactive UI, register your microservices with the GodPSIlla Server and provide basic information about the Gateway.
- Generate and start the gateway through the UI.
- Your services are now able to be used for PSI with external organisations that download and use our client.
As The Client
Imagine you are a company, and would like to find out how many of your users post videos on TikTok. If TikTok has the PSI gateway set up with a microservice that manages video posters that exposes a getter method to return the dataset of posters, you can find out the proportion of your user base that posts videos on TikTok.
- Download our client project files
- Start up the client
- Through our client UI, specify the microservices and method names you want to call, as well as provide your data, and the upstream url of the gateway server.
- Our client does all the necessary calls and computations and returns the intersection size.
How does it work?
High-Level Overview of our PSI implementation
Client Data: A,B,C; Client Secret Key: x; Server Data: C,D,E; Server Secret Key: y
We relied on a commutative cryptography protocol for our information exchange. We used ChaCha20, a stream cipher. Commutative means that double encryption on two different keys produces the ciphertext that can be correctly decrypted using the keys in arbitrary order.
Through this data exchange protocol, we are able to ensure that both the server and client cannot find out the identities of elements both within and not within the intersection, and that the client can only find the cardinality of the intersection.
Additionally, we were able to structure the exchange of information in a request-response exchange format, which made implementation easier on our end.
High-Level Overview of our Gateway Architecture
The numbers and arrows denote steps and flow of data respectively.
Client makes a HTTP Call to the HTTP Server where the GRPC Client is located, containing the following:
- Upstream url of the GRPC Middleman
- Information regarding services it would like to use in the PSI
- A CSV file containing client data to be used in the PSI
The GRPC Client encrypts the client data, then makes a GRPC call to the server side GRPC middleman with a request containing encrypted client data, as well as microservices to be queried for the PSI.
The GRPC Middleman encrypts the encrypted client data, and makes a HTTP request to the API Gateway asking for the intersection of the microservices the client is requesting for, via the NGINX proxy.
The NGINX proxy, which helps with load balancing between the 3 identical gateway instances, routes the HTTP request to 1 of the gateway instances.
The gateway instance does validation of the microservice names and methods requested, and makes a call to the Kitex Thrift microservices requesting their data.
The microservices respond with their data requested.
After the data is received from all requested microservices, the gateway instance computes their intersection normally, before sending back the HTTP response, which encapsulates the list of strings encoded in Protobufs format.
The NGINX proxy routes the HTTP response back to the GRPC Middleman.
The GRPC Middleman now has the intersection of the data from the microservices, and encrypts it, before sending it, as well as the doubly encrypted user data, back to the GRPC Client.
The GRPC Client decrypts both sets of data once with its own key, and does a comparison to find common elements, before returning the data in the HTTP response.
Components of our project
|Client Side Application||- User Interface to simplify process of submitting queries
- Local HTTP Server and GRPC Client to manage data encryption, decryption, and computations, as well as gRPC calls to server
Client Side Repository
|Server Side Application||- API gateway UI to register and manage microservices
- Gateway generator
- API gateway to:
a. Route incoming queries to specified services
b. Compute intersection between data queried from microservices
- GRPC Server that does encryption of data and sends response to client
Server Side Repository
|Demo Services||- Microservices we used for testing and demonstration purposes.
Demo Services Repository
|Example API Gateway||- Basic API gateway registered with ViewerService and PosterService, and GRPC Middleman implementation
Example gateway Repository
As with all products developed for use with microservices, performance metrics and load testing are crucial to seeing their scalabilities and if they are viable for commercial applications.
In the case of Private Set Intersection, we saw that most examples varied the sizes of the datasets between the 2 parties, as well as the size of the actual intersection.
We then developed a simple tool that integrates the both the client and server, and measured the end-to-end time taken for the computation of intersections.
|Client-Server Dataset Size||Client Dataset Size||Service 1 Dataset Size||Service 2 Dataset Size||Client-Server Intersect Count||Time Taken (ms)|
For the 2 microservices, we maintained that they intersected completely i.e. both services will have the exact same set of elements as a control.
In order to generate large datasets of seemingly random elements deterministically, we created 2 pseudorandom functions denoted by PRF1 and PRF2 that took a seed value, and generated a sequence of seemingly random numbers, which we then fed into a function to generate random strings of a certain length.
Data set composition for the above test cases (in order):
|Client Dataset||Service 1 / Service 2 Dataset|
|PRF1(100)||PRF1(20) + PRF2(80)|
|PRF1(100)||PRF1(80) + PRF2(20)|
|PRF1(100)||PRF1(20) + PRF2(99,980)|
|PRF1(100)||PRF1(80) + PRF2(99,920)|
|PRF1(100,000)||PRF1(20) + PRF2(80)|
|PRF1(100,000)||PRF1(80) + PRF2(20)|
|PRF1(100,000)||PRF1(20,000) + PRF2(80,000)|
|PRF1(100,000)||PRF1(80,000) + PRF2(20,000)|
We ran each test 100 times and computed the average time in milliseconds to return the value.
How we built it
Here, we detailed the specifics of the implementation of the components seen above in the high-level gateway architecture.
Client User Interface: Built using NextJS, and makes HTTP requests to the GRPC Client, that helps make a GRPC call to the GRPC Middleman with the user input.
GRPC Client: Built using Golang, the gRPC client also contains the library with the methods to encrypt and decrypt data using ChaCha20, as well as .proto files used to make calls to the gRPC Middleman.
GRPC Middleman: This is an intermediate gRPC server. We decided on using GRPC as a communication protocol due to the efficiencies provided by transmitting data in Protobuf format, due to the potentially vast amounts of data to be transmitted to the server. It performs the main tasks of data encryption using the same encryption and decryption implementation in the gRPC Client, and makes a HTTP request to the API Gateway to interact with the microservices.
NGINX Reverse Proxy: The NGINX reverse proxy is used so that we can run multiple instances of our gateway at the same time to improve fault tolerance and provide additional load balancing. It also allows us to update our gateway while it is still running, by regenerating instances one by one.
API Gateway We modified an API Gateway that we built previously. It was originally designed to just be able to accept HTTP calls and translate them to generic calls to the corresponding microservices. We modified it to provide a new route /PSI, and be able to aggregate microservice data and compute the intersection, before responding.
- Tiktok's Kitex framework is used to build test microservices in order simulate the microservice structures used in large companies like Tiktok.
- Tiktok's Hertz framework is used to build the HTTP gateway server for its high performance, and its ability to synergise with Kitex-built microservices.
Challenges we ran into
System Architecture Decisions
There was the issue of deciding where to locate computationally-intensive tasks such as data encryption. We originally wanted to send all the aggregated data from microservices through the API gateway, to the Middleman to compute their intersection as well, but figured that would incur unnecessary overhead by transmitting vast amounts of data over the wire. We decided on computing the intersection on the API Gateway instead, before transmitting the intersection to the Middleman. This would help distribute the computational workload between the different components as well.
Implementation of the ChaCha20 Cipher
We are aware of some flaws of our usage of our cipher, and are actively working to find solutions to them. One of the main benefits of the ChaCha20 is not reusing nonces in encryption. However, the constraint of having to encrypt an entire list of strings, while ensuring commutativity in encryption and decryption between 2 parties, means that nonce reuse was required.
Implementation of PETAce-SetOp
We wanted to integrate the PETAce-SetOps library into our solution, but didn't think we could to fit the PSI implementations in PETAce-SetOps in our data exchange design. Additionally, we wanted our application to support MacOS and Windows, and so it wasn't viable to use the libraries which required Linux. Nonetheless, we read through the PrivacyGo repositories for inspiration on the types of cryptography primitives available and use cases, and that was what inspired us to research on PSI and eventually try to use it.
Dependency Issues With Windows
During our testing, we encountered new problems on Windows that we hadn't before. This was traced to a combination of the kitex and registry-etcd versions leading to connection issues. Nonetheless, we made sure to change the dependency versions and rebuild the project such that the current binaries in the files work for Windows, and also raised an issue in the cloudwego/kitex repo.
What we learned
A big takeaway from this project was our newfound understanding of the various PETs, and their importance in cross organisation data sharing and computations today, especially with vast swathes of sensitive user data being exchanged all the time.
We also used this opportunity to dive deep into the various implementaions of private set intersection. They served as a effective way to help us gain a clearer picture of our system design. And we also learnt about some potential pitfalls of PSI such as the inefficent computation of intersection, and how the revealing of intersection size can cause information leak. We believe that these knowledge will serve us well in improving in future implementation of the gateway and in future projects.
What's next for GodPSIlla Gateway
Incorporating elements of DPCA-PSI
We recognise that a fusion of PETs is crucial to accomplish a balance in privacy, utility and efficiency. In particular, PrivacyGo's implementation of DPCA-PSI tries to prevent privacy leakage when one party knows the exact size of the intersection or the number of elements in a dataset, which is a possibility with our implementation. For future improvements, we could inject noise in the datasets exchanged between the client and server to hide the exact size of the intersection, or try to incorporate PrivacyGo's implementation of cryptographic primitives into our computations.
Supporting other PETs
Due to the highly modular nature of our gateway, we believe that we can add support for other set operations such as Private Information Retrieval, as well as Private Join and Compute, which also have highly relevant use cases in microservice architectures.