What it does
Column Swarm Reinforcement Learning (CSRL) is a swarm intelligence model composed of a bi-directional hierarchy of "swarmling"-minicolumns developed based on HTM technology. Each column can solve many simple tasks on its own, but together they can solve large-scale problems.
CSRL is on other words a lot of little reinforcement learners that work together to create one big reinforcement learner, where the resulting hierarchical model looks astoundingly like HTM.
How I built it
I came up with the idea of using a swarm intelligence model with HTM after seeing Jeff's 6-layer model of the neocortex. I thought of ways that the construction of such a hierarchy could be done automatically. I had previously had some success using swarms of simpler learning automata, and with the development of SDRRL (an ultra-efficient "swarmling") and its similarity to cortical minicolumns used in HTM I decided to make a fully blown hierarchy with them.
I wrote it in C++, first on the CPU. I will move it to the GPU using OpenCL soon.
CSRL is composed of a lot of small reinforcement learners that work together. I wrote a tutorial for a single on of those reinforcement learners (called SDRRL, sparse distributed representation reifnorcement learning). At least, this is what it will be composed of in the future; currently it is composed of simple OLPOMDP learners.
Here is a tutorial on how to create SDRRL: link A tutorial on the full swarm will come soon.
These SDRRL units are organized into a hierarchy, where the actions of a unit are fed in as inputs to other units. Each unit optimizes locally, but the end result is that the whole swarm optimizes globally as well.
Challenges I ran into
Optimizing SDRRL. SDRRL is insanely fast be default, but I need to run potentially millions of SDRRL swarmlings at once. So I came up with several mechanisms to exploit the SDRRL sparsity to reduce computational costs.
What's next for Column Swarm Reinforcement Learning
GPU version, more demos. Performance improvements. Integrate SDRRL into CSRL (Instead of OLPOMDP).