Above All

I'm here to learn. I've written this project up because it seems like a reasonable thing to do. But there's a good chance that people who know better will say:

  • This is a waste of time because...
  • Numenta has already done this extensively, here's what we know...
  • You're not going to be able to see what you want to see because...

Let me know!

NuPIC Hackathon 2015 Project Proposal

Let's build a framework for performing sensitivity analysis of the NuPIC platform.

You have a training set of sequences and their classification and you want to train a NuPIC algorithm to classify novel sequences. They could be EEG readings and you want to say whether a person is asleep or not or they are price series and you want to say whether the next day will likely be up or down.

You want to use NuPIC to perform the classification. But there are questions to answer.

  • What should my sampling frequency be relative to signal frequency?
  • How much noise will the algorithm handle (s/n sensitivity)?
  • How sensitive to frequency difference? (Train on waveform at X freq - will it classify 2X waveform?)
  • How does a classifier looking at an entire input series at once compare to the temporal pooler classifying a sequence? (This is about the structure of the classifier, not about attributes of the input.)
  • How much training do I have to do? 10 sequences? 100? 100,000? (This is about training process, also not about input.)
  • Sensitivity to preprocessing - what if we use sample deltas as input? How about FFT preprocessing?

The framework will run classification experiments while varying properties of the input signal in order to see how sensitive the algorithm is to changes in that input. The process is akin to swarming, but rather than optimizing algorithm parameters for a test set, it assumes a static set of algorithm parameters and tests performance over varying input.

Components

A first sketch of the system would include the following components.

  • Parameterized NuPIC implementation - single layer SP+TP with classifier
  • Process for determining training & test parameters that will remain fixed for all runs
  • Signal generator - generate(waveform, frequency, sampling frequency, amplitude, linear drift, randomness)
  • Test Controller - Test(static params, independent variable, range, step) and TestCross(static params, ind1, range, step, ind2, range, step).
  • Visualization - Take 2D (basic sensitivity) and 3D (cross sensitivity) result sets and create graphs. We want to be able to browse visualizations quickly and tag some as favorites for presentation and discussion or followup. Use whatever is easiest: D3, Tableau, Wolfram Cloud, ???.
  • Batch Controller - The whole job of testing multiple independent variables will likely be compute intensive. Default plan would be to spin up a new AWS machine for each separate test. Pass test parameters and output S3 bucket directory to new instance. Wait until timeout or all finished before killing all instances that have not killed themselves. Or is there better pre-built infrastructure for this?

Notes

Jeff H: Re: Boosting <>

The "with artificial data sets" made this relevant. I guess it means we need to stay aware of encoding width relative to SP size.

Built With

Share this project:

Updates