TestButler: IoT testing made easier

Enabling your software to be directly tested on your hardware in real-life conditions with GitLab CI/CD pipelines

When you build an application to run on edge hardware, testing it on workstations or emulators is just not enough. The factors affecting production deployments of IoT fleets can range from sensor inputs, variants of GPU, electromagnetic interference, internet connectivity, thermal constraints, and even environmental. The only accurate test feedback you can trust in your AI/IoT/Automotive development workflows is when tests run on actual devices.

That's what Testbutler can enable for you! TestButler CLI takes your existing tests, runs them against a specific device target, and outputs backtest results. This is a proof of concept GitLab CI/CD component that does just that. Later, we would like to build this as an AutoDevOps feature that uses GitLab Runners to target devices to run tests against.

Does it solve a real problem for GitLab users?

Does the project clearly articulate the problem it aims to address for GitLab users? Effective solutions are not solely technical but can also drive social impact. Explain why this problem is significant and how your solution effectively resolves it.

GitLab's motto is "Software. Faster."
This project aims to solve exactly that for GitLab users developing IoT/AI/Automotive applications deployed on edge hardware faster and exponentially more reliably.

How do you ask? By automatically testing your software stack on actual hardware with a Hardware in the Loop CI/CD pipeline.

This isn't a new problem by any imagination. Testing software that works on hardware is a notoriously manual, slow, and error-prone process with constantly shifting goalposts. Some steps to test on hardware include preparing the OS image, flashing the image onto a Single Board Computer (SBC), setting up the software, running the tests on the SBC, getting results, and repeat.

Current solutions testing software on workstations or emulators is just not enough. Factors affecting production deployments IoT fleets are much more varied and can't be recreated in the lab. Sensor inputs, variants of GPU, electromagnetic interference, internet connectivity, thermal constraints, humidity, external vibrations, storage failures, and even humans. How exactly can we build a pipeline that can replicate and test in these conditions to make absolutely sure that the software in the merge request can run on the hardware?

The solution: Automated software tests executed on actual hardware without human intervention at scale.

Few companies, organizations, and individuals have reinvented this "automated testing" loop. But, in doing so, they have also locked the test harness (test hardware) behind proprietary walls, made it too specific for only their use case, and charged exorbitant prices for its usage. This is not the open-source way. Hence, I present to you TestButler.

This proof of concept aims to solve it by providing each GitLab user with an end-to-end testing pipeline to test their IoT applications on real hardware. No matter the use case or scenario. An open-source test runner, Testbutler, which abstracts hardware complexity by using an open-source, non-custom test hardware utilizing the power of GitLab runners.

Idea: Integrating a standardized testing solution for IoT/AI applications on the edge will significantly boost GitLab's already well-rounded CI/CD and AutoDevOps offering.

Innovativeness

Does the project highlight the unique features of your solution and how it differs from existing solutions? Show the creativity and originality in your approach.

USP: Testbutler's fire-and-forget testing approach helps you bridge the gap to automate hardware tests regardless of whether you are using a Raspberry Pi 5, or building an AI application in a network-constrained environment. You bring the scenario. We can test it.

GitLab USP: Built on existing GitLab Runners architecture to provide existing customers with increased functionality and newer paths to enable testing. No new components or developer friction was introduced.

Unique Features:

  1. Open source: All components (software/hardware) are open source. Enabling anyone to contribute, extend, and customize for their needs. No vendor lock-in.
  2. Plug and play test harness: Autokit uses a USB interface to automate hardware.
  3. Batteries and BYOB Included: Support flashing, power-on /off, serial console access, file system access, GPIO, I2C, and more out of the box for commonly used SBCs. You can add integrations for new device types that benefit the community.
  4. Built to be flexible: Supports for device finders (Jumpstarter), fleet management platforms (Balena), test frameworks, languages, Device under Test (DUT) hardware (other solutions built for specific hardware & internal use only)
  5. Using standard tools: GitLab Runners + Docker on GitLab's CI/CD pipeline orchestration, runtime, and reproducible environments.

All in all, Testbutler is open-source, scalable, and easily integrates with GitLab's existing ecosystem.

Overall Quality

Did you demonstrate excellence in the execution of your idea, including design, user experience, and technical implementation?

"The complexity bias is a cognitive bias that refers to our tendency to prefer complex solutions over simple ones, even when the simple solution is more effective."

In executing this idea, the idea was getting a lot more complex for the pitch. But, with time, I plan to make this project more accessible to build with and contribute back to. Working in IoT, I understand that adopting developer tools is directly proportional to friction elimination. TestButler is built on just that.

The user experience for the pipeline has been kept to "Setup, fire, and forget" with a focus on reliability and less maintenance. Testbutler gives you maximum freedom in writing your own tests and doesn't stop you from experimenting or exploring new ideas. Neither do we hold your hands. Once the pipeline is created, each PR will run the tests, and the hardware in the loop will provide feedback to the pull request. Autokit, which handles hardware automation, is being used for its focus on reducing friction in processes like flashing and power on/off procedures.

As a fleet owner or application developer, you don't need to learn how to flash a Jetson Nano or how to turn on an Intel Nuc board. Testbutler summarizes that all for you in a single command of device.on() and device.flash().

Scalability

Can your solution grow in terms of customer base, revenue, and operations without a significant drop in performance or quality?

Yes, considering most of the solution depends on the test hardware the customers provision on standard tools such as GitLab Runners and Docker Containers. It's going to be highly scalable. The solutions can scale horizontally by adding more test hardware as the use case needs.

Testbutler can support multiple use cases and testing scenarios, which can help customers scale considerably vertically. Testbutler and its test harness, Autokit, can be extended to support new hardware with community contributions. With each new device supported, we can expand our total addressable market by 100x, keeping us nimble to what the industry wants.

Better documentation on creating these drivers would be vital in building this community base. Good thing I am a documentarian.

Total Addressable Market (TAM)

What is the potential impact opportunity if your solution was to capture the entire market segment it's targeting?

Testbutler reduces the release timelines in the IoT development lifecycle from months to hours, eliminating the need for manual testing and freeing up engineering team resources. Standardizing the testing process could save up to 80% on testing, support, and DevOps operations, making it economically feasible for smaller developers and startups to build on GitLab. With the IoT market expected to grow exponentially yearly, GitLab could be the market leader in enabling continuous testing and deployment for billions of connected devices worldwide.

The market segment includes:

  1. Linux Kernel developers building testing pipelines
  2. Embedded Linux/Embedded IoT (How Balena tests to release BalenaOS for 100+ supported boards)
  3. Automotive Linux (ELISA Project using GitLab Runner + CodeQA Testing tool)
  4. Hardware manufacturers
  5. Companies building IoT applications on SBC hardware
  6. Enthusiasts building projects on SBC hardware

And so many more.

Feasibility

How practical is your solution in terms of available resources, technology, and time? Consider any potential barriers to success.

Extremely feasible. It is already implemented at multiple levels. With time, I would like to work on a standardized execution with any IoT backend (AWS, Azure IoT, Ubuntu Core). Building this standardized solution will require community adoption, documentation, and feedback from hardware manufacturers, kernel developers, and customers. Not to mention support from a diverse, dedicated team willing to solve this hard problem while keeping true to open-source like GitLab.

Awareness about the tool in the right communities would boost adoption too. The progress would be slow since complete adoption would require a migration from existing test frameworks and hardware. So, the value proposition and migration paths we provide must be clear and impactful.

Much like, Rspack trying to make their place as Webpack alternative.

Mini Case Study

  • BalenaOS Team used to spend a month to release a new version of BalenaOS for a single device type due to manual testing.
  • After building AutoKit, the testing process was automated and integrated into their CI/CD pipelines.
  • Balena now releases BalenaOS for 90+ device types. That's 90 distinct releases of an operating system released after being tested on real hardware, with over 3000+ releases of BalenaOS being built and tested each month.

The impact is there once you have pipelines automated.

Built With

Share this project:

Updates