00000

## Clustering with the Knowm API

This article outlines a method for clustering spike patterns on kT-RAM using our AHaH instruction set. There are a number of reasons for clustering on a neuromorphic chip which pose as advantages for a computing systems. First off, algorithms which can be actualized using AHaH computing will likely benefit from speed and power-efficiency improvements, thus making them ideal for working with large datasets or alongside iteratively adapting algorithms. Secondly, the low energy costs associated with a clustering technique on kT-RAM allow us to implement a learning procedure in an energy-constrained environment. These include onboard satellite systems, remote sensors, and large-scale data centers.

Like in the other tutorials, this article series uses our chip emulator The Knowm API to introduce our method in code. Hopefully, this helps solidify understanding. For those interested, you can access the full code base by signing up for the Knowm Developer Community.

## Spike Representation

The first step in clustering generic data sets on kT-RAM is to first convert all values into a representation called a spike pattern.

$x \rightarrow \{ 1, 0, 0, 1, 1, 1, \ldots 0, 0, 1 \}$

Spike conversion is a translation into a discrete representation. Clustering is heavily dependent on our choice of representation and it is important to understand this translation, what data loss occurs? and what structure is generated by such a conversion?. In following examples, we explore the effect of encoding schemes on clustering.

## Spike Clustering

A kT-RAM clustering is performed by spiking a set of AHaH nodes -each accepting the same pattern- and reading their output as a cluster label.

$Clusterer = \{ A1, A2, A3, \ldots, AN-1, AN \}$

While iterating through each inputted spike pattern froma training set or continuous stream, these AHaH nodes are adapted with an unsupervised Hebbian instruction pair FF-RU.

The FF-RU instruction pairing forces each AHaH node into a randomly determined attractor state which represents a decision boundary in the spike space.

We showed in our PLOS paper that these attractor states were maximum-margin decision boundaries and that they were formed between the Independent Components of our spike patterns.

Maximum margin decision boundaries achieved by AHaH attractor states on a four state problem

Once trained, spike patterns are clustered on kT-RAM by linking those patterns which activate the same set of AHaH nodes during a Feed-Forward (FF) operation. A unique cluster label can then be associated to each.

If we had N AHaH nodes on kT-RAM:

$kT-RAM = \{ A1, A2, A3, \ldots AN-1, AN \}$

Each AHaH node has created a linear decision boundary in its spike space. Thus, when a Feed Forward operation is executed on a particular pattern, P, the signed output of these AHaH nodes represents a cluster.
For intance,

$activations = \{ 0.0, -0.1, 0.2, \ldots 0.1, 0.3 \}$

Is thresholded with a heavy side step function H() to attain a new spike pattern P.

$P = H(activations) = \{ 0, 0, 1, \ldots 1, 1 \}$

P can be read in binary to attain a single integer cluster label.

IndependentComponents

## Discussion

The number of clusters is not predefined in this clustering method. Instead, the total number of possible output labels from the AHaH collective is given by 2^N, where N is the number of AHaH nodes in the collective. This does not guarantee the collective outputs a total of 2^N unique labels since some AHaH nodes are likely to pick up the same independent components and find the same decision boundary. This can occur if the number of components N is small and/or the number of patterns, F, is high. However, as the number of AHaH nodes increases, the probability of this occurring drops exponentially.

We’ve found this method to be quite malleable. By separating our clustering approach and our encoding schemes we separate the notion of “similarity” from our clustering method. Instead, a practitioner can focus on creating his or her descriptive spike encodings before applying clustering on kT-RAM.

A KNN encoding method in conjunction with the above clustering method produces results similar to the algorithm K-Means. Likewise, a density based encoding can be used to create clusterings similar to a Density-based method like DBSCAN.

KNN Encoding – Clustering

## The Partitioner

A simple implementation of the above method can be achieved with the Knowm API. Our implementation, contained in Partitioner.java includes the bare bones operations for creating and training we need for kT-RAM clustering.
If you have the code you can use it to cluster spike patterns like this:

The class is available here: org.knowm.knowmj.module.encoder._spikes.

If you look at the class heading you’ll notice is that the partitioner extends our generic encoder class. It just so happens these spikes will represent our clusters.

One of the ways of reading this output is as group labels in binary format. We could also apply the outputs of the Partitioner into another Machine Learning module or into another partitioner.

We instantiate our partitioner as follows:

As we can see, our partitioner is composed of a number of AHaH nodes. During clustering, each of these nodes is accepting the same spike patterns and performing an FF-RU instruction. The partitioner thresholds these outputs and joins these together.

We’ve gone over kT-RAM clustering at a high-level and introduced the Knowm API code. The following article uses this class and a K-nearest neighbors encoder to demonstrate clustering on 2D distribution.

00000

## Review of 2017 Energ...

250160AHaH Computing, Thermodynamic Computing, Competitive C...

## AHaH Computing in a ...

150200This is a video series titled: “AHaH Computing i...

## Machine Learning Cap...

15000Machine Learning with Thermodynamic RAM and the Knowm A...