loftili

Inception

About a year and a half ago, I was moving into a new apartment and was very focused on making it actually feel like my home. One of the most important things to me in that regard was my audio setup; music is a very influential catalyst on my mood and I wanted my apartment to be very music "accessible". Unfortunately, I wasn't prepared financially to commit to a solution involving hundreds of dollars, so I purchased a sony soundbar that would let me connect via bluetooth. I was under the impression that I'd be able to come home from work and with a few taps on my phone have music playing through my speakers. While I wasn't completely wrong, there was still much to be desired; every second spent dealing with bluetooth connectivity was a second not spent jamming out.

Around the same time, my mentor had introduced me to the raspberry pi, a fanless computer with a solid 512mb of RAM that goes for around $45. These tiny little heros also come with a 3.5mm analog output and an ethernet port - making it able to both connect to the internet and play audio. Since the majority of my professional career has been working with web applications, I immediately knew what I was going to use mine for: a web application that plays audio. I was going to build a web application that I could log into and push a button, which would in turn play audio. And so it began!

Early versions and weird problems

My experience in the web application domain had already exposed me to modern mvc frameworks like rails and laravel, so I started there. The plan was to have the raspberry pi run one of these with a clean ui and handle everything: managing the audio files, the database, and the actual audio output. I quickly ran into two fundamental problems with this approach:

1. Playing audio files using php or ruby

The first thing I wanted to accomplish as a proof-of-concept was to play an audio mp3 file from within a php or ruby runtime. I scoured packagist and rubygems for libraries that would help out but was not able to find something that did exactly what I wanted, or that was widely supported by the community. I quickly realized that what I was looking to make was something that scripting languages weren't really intended to do in the first place.

2. Audio playback + Web Server

With whatever mvc I would choose, there was the requirement that I put nginx or apache in front of it; this is what I was familiar with. Aside from the actual overhead of needing either of those to run the application, I quickly realized that I would run into problems controlling audio playback from inside my web application's runtime. Lets say for example, I were to use this bit of naughty code to start audio playback inside some PlaybackController's interface:

...

class PlaybackController extends Controller {

  ...

  public function start($track_id) {
    $track = Flight::findOrFail($track_id);
    $command = 'aplay ' . $track.file_path;
    exec($command)
    return response("", 200);
  }

  ...

}

My concern was that the exec in that handler would be a blocking procedure; the http request sent into the application would not complete until the aplay command had finished.

3. What next?

Aside from having to deal with hanlding the issue of synchronous playback blocking the completion of http requests in a rest api, this approach was going to leave me stranded after the track finished playing. How would the web application continue playing the next track in a playlist? Once the first track finsihed, would it just run another exec command, making the request hang for even longer? What happens if a user makes a request to the Playback::start handler while it's already playing? There were too many challenges with this approach that I was not prepared to solve; I needed to attack it from another direction.

Version 0.2

The biggest problem I faced during the early web application iterations was the problem of state persistance between the application and the web server. I needed my php or rails application to live closer to where apache or nginx lived, and vice versa. For a while I considered writing an extension for apaache or nginx, but I knew the learning curve I faced just to learn how to start writing an extension was daunting. So, I scrapped everything and started down the path that would eventually lead me to the first breakthrough in the project and version 0.2: I started a c++ application.

My first thought was to find a c++ port (or as close to one as I could find) of the popular mvc pattern, giving me a familiar structure in an unfamiliar programming language - the last time I had done any programming in c++ was in college. It wasn't long before I came across libmicrohttp, an open source http server that handles the tcp and http layers, allowing developers to focus solely on handling the requests and creating responses. With this library, I was able to build a basic c++ application that responded to requests using a router/controller mechanism, and was able to play random audio sounds using libao. This first leap came in May 2014. Soon after, I was working on managing a persistant playback thread, solving the problems faced in earlier iterations, as well as adding support for mpg playback using mpg123. Everything was starting to come together, but much more less quickly than I had hoped. Was I really prepared to write and entire web application (including the html, javascript and css) this way? How do I handle file uploads? How do I save information to a database? These were all questions that lead me to one of the most important decisions in the lifetime of my project: I wouldn't do it on the raspberry pi.

Version 0.5

At this point the codebase split into two - the core c++ applicaiton I had been working on, and a new php application. It was this php application that would serve the static assets (html, css, javascript), as well has handle the heavy-lifting business logic like user authentication and file uploads. I could deploy this php application anywhere: my shared hosting account with site5, a micro ec2 instance on aws... anywhere. It was this application that users would log into, and it was this application that would then communicate with the raspberry pi, which now would only need to concern itself with the most basic of responses, and the most basic of requests. It no longer needed to serve any html, and with rapidjson, serving intuitive json responses was trivial. With the business logic wrapped in a web application, I had inadvertantly given myself (out of laziness mind you), some of the most adventageous benefits of the project to date:

I was able to build and release features at a much faster pace
Moving the business logic to the server meant the core application could be refactored later
A centralized place for user login meant a single user could control multiple devices

This web application eventually split into two separate codebases itself - the api and the ui. You can see some of the early schema design here.

It was at this point that the application took a very social turn - I could give you permission to control my device. All of sudden, this project that had started as a way for me to play music for myself had become something much much larger. Now I was able to play music for friends and family living on the other side of the country and vice versa. I would be able to tell my brother to put "something on" for me while I washed dishes, or force him to listen to that song I had just fallen in love with.

Unfortunately, having the api communicate to the device meant that users would need to set up some port forwarding rules on their router, something which is not very user-friendly. With the forwarding set up, a user would log into the application and configure their device (represented by a device table in the schema), proving their public IP address and the port exposed by the core application.

Version 1.x

After months of cranking out features on the ui and api side, I knew it was time to revisit the c++ application. I needed to find a solution that would not require users to log into their routers and poke around with unfamiliar techno-mumbo-jumbo. At the same time, I was running into serious problems with real-time device state communication - the only way to know if a device was still up and running was to send a request to it. I had attempted to solve this by adding a "state client" to the core library, which would occasionally update it's own status on the API by sending a POST /devicestate request, along with a json body containing the current play state. This solution had it's own shortcomings, though, and I knew I would already be pulling wires with the next iteration. For a few months, I stopped all work on the platform as a whole and focused my attention to soaking in as much c++ knowlege as I could. I read books by Andrei Alexandrescu and James O. Coplien . I prowled reddit's c++ subreddit. On February 20, 2015, the first code was dropped for version 1.0 of the core application.

Unlike the previous versions, I would not run a web server on the device at all. Instead, it would open up a tcp socket against an api endpoint, which the api would then "hang onto", using to send any further commands down through. I could then use the presence of that socket to dictate to the user if the device was connected or not, a process which had now been updated to use web sockets between the ui to the api. The commands themselves would be very simple - CMD audio:start, CMD audio:stop. Once the core application received one of these commands, it was then in it's power to decide what to do next - fetching the correct track to play, or stopping it's playback thread.

The code

At this point all of the source code is hosted on github:

the ui - browser-side code made up of coffeescript, jade and sass
the api - node server-side rest api code built using the sailsjs framework
the core - c++ application code that runs on supported devices

Ideally, developers contributing to the platform should have familiarity with each of these repositories and have them checked out on their systems, able to build.

Moving forward

As it stands today, users with a raspberry pi can follow the installation guide, sign up for an account, register their device using a serial number, and start using the web application to control their audio. The api was recently integrated with soundcloud, so aside from uploading audio files that the user has the right to, they now have the soundcloud library at their disposal. Up until this point I have been the only devleopment resource working on this project, and I would like to see some community involvement now that it has reached a certain level of maturity. With the codebases split into three separate repositories, developers interesting in contributing can do so in whichever codebase they feel most comfortable. Demos of the application can be found on the loftili youtube channel.