Inspiration
Providing a rock-solid template to allow for as many people as possible to become validators across various Cosmos Ecosystem networks
What it does
Provides what is essentially a "one-click" user experience to running a production ready validator infrastracture that is proxied via sentries. All this is achieved through infrastracture as code, which launches the necessary servers, and sets everything up for the user, saving days and providing replicatable setups.
How I built it
Using HashiCorp's Terraform for the Infrastructure as Code portion, which does the heavy lifting. VPNCloud is used to provide a P2P mesh network between the validator and all sentries. Croc is utilized to share the node id of the sentries to the validator. Much of the rest is Linux sysadmin within the IaC.
Challenges I ran into
Finding a decent p2p mesh VPN software that was secure and easy to setup. Spent a lot of time tinkering with tinc (no pun intended), and trying to come up with an elegant way to connect the infrastructure pieces while allowing for some self-healing characteristics. In the end, found vpncloud a much greater joy to work with, and apparently has much higher throughput albeit tinc still has some interesting use cases and capabilities.
The initial plan was in fact to make it super robust, HA, self-healing, via the use of Consul and Nomad, however, initial setups showed the "UNIX philosophy" preached by this setup is not that great in a networking context. In that, instead of having to worry about split-brain at a single layer, now you have multiple levels of failure potential, which you definitely don't want with a validator.
The next step was working with kubernetes, and lots of experiments were done on that front with decent success, however, it was decided due to time constraints (was working with it till the last couple days), its complexity, and also the risk of liveness over safety, to go with a simpler static model as submitted, which will be more understandable and serve as a great template and still be faster to get a validator running in production than anything else.
Accomplishments that I'm proud of
Being able to achieve the vision that was aimed for, while having multiple avenues for it, and having probably iterated through most of the good ones. Another thing is that everything used is open-source and things have been setup in such a way to be vendor agnostic, no cloud provider lock-ins which should further aid in reliability of Cosmos networks instead of having everyone run on AWS and have it become a single point of failure.
What I've learned
Liveness isn't the be all, end all. Cosmos has made a good decision favouring safety.
What's next for Press Enter to Run a Validator...
If there is interest, hopefully in fact coming out with a self-healing kubernetes evolution that has top liveness while also satisfying the safety needs necessary for running a validator. Also making it very flexible and easily configurable to switch the IaC config for all different Cosmos networks
Built With
- croc
- kubernetes
- terraform
- vpncloud
Log in or sign up for Devpost to join the conversation.