ETH 2.0 Validator Coordinator

2 Validators running concurrently
Validator Coordinator Architecture
Visualization of 2 validators, and a coordinator choosing

Inspiration

Most proof-of-stake blockchains have some slashing mechanism to deter validators from performing malicious behavior. Because of this, people have built software for validators so that they can reduce their risk of slashing, for example Tendermint KMS. In Ethereum 2.0, a slashing event could lead to a loss of 32 ETH per validator, which could be worth tens of thousands of dollars... or more ;)

This does not exist in the ETH 2.0 context, so we decided to build it.

What it does

Many node operators run 2 or more instances of a validator. This is for redundancy -- if 1 goes offline, the other can still validate blocks and earn rewards. However, this greatly increases the risk of slashing, since the protocol does not allow a validator to publish two different blocks at the same height.

We have built a validator coordinator which, for every block height, picks ONE validator to sign the block, even if the operator is running multiple instances of it. Before signing a block, each validator will ask the coordinator if it is safe to sign or not. If the coordinator has seen a validator already try to sign this same block, it will tell this current validator that it is UNSAFE to sign.

Example of coordinator in action

How I built it

We used Prysmatic Labs' ETH 2.0 client, and edited the code such that the validator will make a request to the coordinator service before it signs blocks. If the coordinator service returns true, the validator can proceed to sign and broadcast the block. Otherwise, the validator will just show an error. Here is a screenshot from running two validators at the same time.

The coordinator uses an in-memory data store to keep track of all the blocks it has "seen". Whenever a validator asks the coordinator if it is allowed to sign a block, the validator checks the Ethereum 2.0 consensus rules and makes a decision whether or not the validator is allowed to proceed.

If a new instance of the validator is being spun up, it can communicate with the coordinator to immediately come up to speed about the blocks it is allowed to sign. This lets node operators upgrade their validators without downtime, since they do not need to tear down their old validators before spinning up their new ones.

We also built a visualization tool to see the validators and the coordinator in action. Here it is.

Challenges I ran into

It was somewhat difficult to get a local ETH 2.0 testnet running on our computers. We tried both Lighthouse and Prysmatic's clients, and eventually (with lots of help from their team), we were able to run a local testnet where we could experiment. Syncing to Prysmatic's testnet was also an option, but would have taken >10 hours.