Inspiration

  • The idea was to get an application which can check whenever a website goes down
  • specifically, it was to make sure that this check is based on regions.
    • eg. one website can be down in singapore, but working perfectly fine in europe.

What it does

  • This application can check if a site is accessible from across the world
  • The user specifies an application and the interval in which it checks its health.
  • A task is run on different regions in the world checking if that application is available
  • User can also check the history of these checks from different regions

How we built it

  • This is designed to make it work in an environment where there are multiple regions involved (much like aws).
  • Several things were kept in mind when making the architecture
    • Making sure that the scalability (both in same regions and different regions) is straight forward
    • Making sure that the write to the database after execution results were as fast as possible
  • Mainly there are four components of this application:
    • Executor
    • Webui
    • PostgreSQL
    • Couchbase
  • Executor and webui are services written in golang, while postgreSQL and couchbase are popular databases
  • Postgres is used to store information about the application
  • Couchbase is used to store results of the exection
  • For any region where we want to make it work, we need at least one couchbase instance.
  • Couchbase instance across different regions should be in an xdcr replication
  • An executor simply writes its results to couchbase within it's region
  • Webui reads the results(along with replicated data) from couchbase instance in it's own region

Challenges we ran into

  • Biggest challenge i ran into was finding a database which fits this requirement.
  • I wanted a database which can do following:
    • Do distributed writes (Write can be done on any instance in a cluster)
    • Give good performance in replication even on different regions
    • Handle time series database
  • Due to above points, i chose couchbase
  • Another major challenge was making sure that the setup I was doing for databases and the application itself were easy to scale
    • I used terraform to help with that

Accomplishments that we're proud of

  • The performance of the application
    • Though there are no metrics for this yet, in my observation, the performance of the application is better than other products like this.
    • It is due to several factors, one major one being that the executor is very lightweight and only does this one specific thing of doing very simple network request
    • Written in golang, executor performs really well and on top of that, I feel more performance can be extracted from it with simple optimizations
    • This simple designed coupled with golang's optimization for arm64 architecture, gives really good performance on graviton CPUs
    • I am able to check health for 100 applications using one tg4.micro instance running this executor

What we learned

  • One big thing, which i learnt is that making softwares which can scale well is significantly different from making one instance solutions.
    • I had to reiterate the architecture itself a several times before i was sure that this would work
  • Another big thing i realized is that most database support replication which is read only(you can write on master node, and read from any other node)
    • this architecture of database would not fit my design, and getting a good database which can do distributed write was difficult.
    • in the end, I settled for couchbase, and after working with it, it is not perfect and has a learning curve, but gets the job done

What's next for Alshain: Application Health Check

  • Performance optimization
    • It is clear that more performance can be extracted out of executor, which i look forward to doing
  • Better database
    • Couchbase has some performance issues when it comes to aggregation queries, I look forward to solving them with couchbase itself or replace the database layer with something else entirely
Share this project:

Updates