Devpost

Inspiration

Over 50% of enterprise data goes unused due to legal, security, and other hurdles. Big data involving financial, medical, or municipial data requires collecting and aggregating large amounts of data from different clients and sources, hindering security and raising questions about user data privacy. We propose a platform that allows users to aggregate insights from their data while keeping the data private.

What it does

FedUp is a platform where where clients, each with their own private data, with can connect other over a peer to peer network and train a collective machine learning model, without ever sharing their training or testing data. Data security is ensured by only sharing gradient updates over a sufficient number of epochs and training samples at a time.

First, nodes in the network first coordinate a graph topology that defines how model updates are propogated through the network.

A start-to-finish training process was demonstrated on 3 of our members' devices. Our approach outperformed traditional machine learning to predict loan defaults, in number of epochs to convergence and validation loss.

How we built it

Each node first opts to participate in a network by registering its network identifier on a peer discovery server. Once all participants have joined the network, the peer discovery server broadcasts the topology graph to all participant nodes and denotes the first initiator node. From this point, the network orchestrates a fully decentralized training structure, where nodes will receive model updates from the previous peer, then train on their local training data. The training node will then forward their model update to the next peer, and this process will continue to propagate updates to this global model until convergence is reached. Model training is implemented with PyTorch and the network operates over TCP/IP.

Challenges we ran into

  • Running asynchronous server requests in a way that would orchestrate the correct order for the client to train the model.
  • Building the graph for the network (edge structure).
  • Trying to figure out how to send model weights over the network: How to pack up the data, and route it appropriately through server requests.
  • Smoothly passing along control to each node without interruptions.
  • Preparing synthetic and randomized data in a way that approximates a real-world circumstance as closely as possible.

Accomplishments that we're proud of

  • It works
  • We present a novel infrastructure for organizations and users to create network topologies and orchestration plans that are easy to implement and quick to use.
  • We are proud to demonstrate the capabilities of decentralized federated learning to the next generation of engineers who will determine the distribution of intelligence.

What we learned

  • Implementing a decentralized network through ip. (we had no prior experience)
  • Implementing novel research ideas in short timeframes

What’s next for FedUp:

  • Securing the network and model transimission protocol using encryption
  • Creating new architectures for the network such as semi-centralized or hybrid (bring together ring DFL networks to train a model)
  • Implement new optimization algorithms, such as Federated Batched Gradient Descent to allow multiple nodes to compute in parallel.
  • Applying more datasets and use cases as proof of concept in other fields.
  • Adding more personalization through independent final layers to customize the model to each node's dataset.

Built With

Share this project:

Updates