We introduce a technology stack or specification describing the multiple levels of abstraction and specialization needed to implement a neuromorphic processor (NPU) based on the previously-described concept of AHaH Computing and integrate it into today’s digital computing systems. The general purpose NPU implementation described here is called Thermodynamic-RAM (kT-RAM) and is just one of many possible architectures, each with varying advantages and trade offs. Bringing us closer to brain-like neural computation, kT-RAM will provide a general-purpose adaptive hardware resource to existing computing platforms enabling fast and low-power machine learning capabilities that are currently hampered by the separation of memory and processing, a.k.a the von Neumann bottleneck. Because understanding such a processor based on non-traditional principles can be difficult, by presenting the various levels of the stack from the bottom up, layer by layer, explaining kT-RAM becomes a much easier task. The levels of the Thermodynamic-RAM technology stack include the memristor, synapse, AHaH node, kT-RAM, instruction set, sparse spike encoding, kT-RAM emulator, and SENSE server.
Machine learning applications span a very diverse landscape. Some areas in- clude motor control, combinatorial search and optimization, clustering, prediction, anomaly detection, classification, regression, natural language processing, planning and inference. A common thread is that a system learns the patterns and structure of the data in its environment, builds a model, and uses that model to make pre- dictions of subsequent events and take action. The models which emerge contain hundreds to trillions of continuously adaptive parameters. Human brains contain on the order of 1015 adaptive synapses. How the adaptive weights are exactly im- plemented in an algorithm varies, and established methods include support vector machines, decision trees, artificial neural networks and deep learning, to name a few . Intuition tells us learning and modeling the environment is a valid approach in general, as the biological brain also appears to operate in this manner. The unfortunate limitation with the algorithmic approach, however, is that it runs on traditional digital hardware. In such a computer, calculations and memory updates must necessarily be performed in different physical locations, often separated by a significant distance. The power required to adapt parameters grows impractically large as the number of parameters increases owing to the tremendous energy con- sumed shuttling digital bits back and forth. In a biological brain (and all of nature), the processor and memory are the same physical substrate and many computations and memory adaptations are performed in parallel. Recent progress has been made with multi-core processors and specialized parallel processing hardware like GP- GPUs  and FPGAs , but for machine learning applications that intend to achieve the ultra-low power dissipation of biological nervous systems, it is a dead end approach . The low-power solution to machine learning occurs when the memory-processor distance goes to zero, and this can only be achieved through intrinsically adaptive hardware, such as memristors.
Given the success of recent advancements in machine learning algorithms com- bined with the hardware power dilemma, an immense pressure exists for the devel- opment of neuromorphic computer hardware. The Human Brain Project and the BRAIN Initiative with funding of over EUR 1.190 billion and USD 3 billion respec- tively partly aim to reverse engineer the brain in order to build brain-like hardware [5, 6]. DARPA’s recent SyNAPSE program funded two large American tech com- panies IBM and HP as well as research giant HRL labs and aimed to develop a new type of cognitive computer similar to the form and function of a mammalian brain. The recent Nanotechnology-Inspired Grand Challenge for Future Computing in the United States  was formed to ”Create a new type of computer that can proac- tively interpret and learn from data, solve unfamiliar problems using what it has learned, and operate with the energy efficiency of the human brain.” CogniMem is commercializing a k-nearest neighbor application specific integrated circuit (ASIC) for pattern classification, a common machine learning task found in diverse appli- cations . Stanford’s Neurogrid, a computer board using mixed digital and analog computation to simulate a network, is yet another approach at neuromorphic hard- ware . Manchester University’s SpiNNaker is another hardware platform utilizing parallel cores to simulate biologically realistic spiking neural networks. IBM’s neurosynaptic core and TrueNorth cognitive computing system resulted from the SyNAPSE program . All these platforms have yet to prove utility along the path towards mass adoption, and none have yet solved the foundational problem of memory-process separation.
More rigorous theoretical frameworks are also being developed for the neuromor- phic computing field. Recently, Traversa and Ventra have introduced the idea of ‘universal memcomputing machines’, a general-purpose computing machine that has the same computational power as a non-deterministic Universal Turing Ma- chine showing intrinsic parallelization and functional polymorphism . Their system and other similar proposals employ a relatively new electronic component, the memristor, whose instantaneous state is a function of its past states. In other words, it has memory, and like a biological synapse, it can be used as a subcom- ponent for computation while at the same time storing a unit of data. A previous study by Thomas et al. demonstrated that the memristor can better be used to implement neuromorphic hardware than traditional CMOS electronics .
Our attempt to develop neuromorphic hardware takes a unique approach in- spired by life, and more generally, natural self-organization. We call the theoretical result of our efforts ‘AHaH Computing’ and have previously provided a thorough and rigorous quantitative description . Rather than trying to reverse engineer the brain or transfer existing machine learning algorithms to new hardware and blindly hope to end up with an elegant power efficient chip, AHaH computing was designed from the beginning with a few key constraints: (1) must result in a hardware solution where memory and computation are combined, (2) must enable most or all machine learning applications, (3) must be simple enough to build chips with existing manufacturing technology and emulated with existing computational platforms for verification of methods (4) must be understandable and adoptable by application developers across all manufacturing sectors. This initial motivation led us to utilize physics to create a technological framework for a neuromorphic processor satisfying the above constraints.
In trying to understand how nature computes, we stumbled upon a fundamental structure found not only in the brain but also almost everywhere one looks – a self-organizing energy-dissipating fractal. We find it in rivers, trees, lighting and fungus, but we also find it deep within us. The air that we breathe is coupled to our blood through thousands of bifurcating flow channels that form our lungs. Our brain is coupled to our blood through thousands of bifurcating flow channels that form our arteries and veins. The neurons in our brains are built of thousands of bifurcating flow channels that form our axons and dendrites. At all scales of organi- zation we see the same fractal built from the same simple building block: a simple structure formed of competing energy dissipation pathways. We call this building block ‘nature’s transistor’, as it appears to represent a foundational adaptive build- ing block from which higher-order self-organized structures are built, much like the transistor is a building block for modern computing.
When multiple conduction pathways compete to dissipate energy through an adaptive container, the container will adapt in a particular way that leads to the maximization of energy dissipation. We call this mechanism Anti-Hebbian and Hebbian (AHaH) plasticity. It is computationally universal, but perhaps more im- portantly and interestingly, it also leads to general-purpose solutions in machine learning. Because the AHaH rule describes a physical process, we can create efficient and dense analog AHaH synaptic circuits with memristive components. One version of these mixed signal (digital and analog) circuits forms a generic adaptive comput- ing resource we call Thermodynamic Random Access Memory or Thermodynamic- RAM, described herein. Thermodynamics is the branch of physics that describes the temporal evolution of matter as it flows from ordered to disordered states, and nature’s transistor is an energy-dissipation flow structure, hence ‘thermodynamic’.
In neural systems, the algorithm is specified by two things: the network topology and the plasticity of the interconnections or synapses. Any general-purpose neural processor must contend with the problem that hard-wired neural topology will restrict the available neural algorithms that can be run on the processor. It is also crucial that the NPU interface merge easily with modern methods of computing. A ‘Random Access Synapse’ structure satisfies these constraints.
Thermodynamic-RAM is the first attempt at realizing a working neuromorphic processor implementing the theory of AHaH computing. While several alternative designs, such as dual crossbars, are feasible and may offer specific advantages over others, this first design aims to be a general computing substrate geared towards reconfigurable network topologies and the entire spectrum of the machine learning application space. In the following sections, we break down the entire design specification into various levels from ideal memristors to integrating the finished product into existing technology. Defining the individual levels of this ‘technology stack’ helps to introduce the technology step by step and group the necessary pieces into tasks with focused objectives. This allows for separate groups to specialize at one or more levels of the stack where their strengths and interests exist. Improvements at various levels can propagate throughout the whole technology ecosystem, from materials to markets, without any single participant having to bridge the whole stack. In a way, the technology stack is an industry specification.
Fig. 1. Our generalized memristor model captures both the memory and exponential diode characteristics via metastable switches (MSS) and parallel Schottky diodes and provides an excellent model for a wide range of memristive devices. Here we show a hysteresis plot for a Ag-chalcogenide device from Boise State University along with a fitted model.
Fig. 2. A) A self-organizing energy-dissipating fractal can be found throughout nature, is composed of a simple repeating structure formed of competing energy dissipation pathways. B) The simple bifurcating dissipative pathway is what we call nature’s transistor or synapse. C) A differential pair of memristors provide a means for implementing a synapse in our electronics.
Fig. 4. A spike-based system such as kT-RAM requires spike encoders (sensors), spike streams (wire bundles), spike channels (a wire), spike space (number of wires), spike patterns (active spike channels) and finally spikes (the state of being active). A spike encoding is, surprisingly, nothing more than a list of encoders that directly address synapses on a kT-RAM core.
Fig. 5. A) Spikes (integers) in a spike pattern (integer set) are used to address synaptic elements in a core, which become selectively coupled to drive circuitry (AHaH Controller). B) During the read and write phases, the activated synapses (memristor pairs) are coupled to the triple H-tree electrodes A, B and y. C) By coupling several cores together, via either analog or digital methods, large collections or core (kT-RAM) can be created and specialized for tasks such as high-dimension inference (analog coupling) or compositional learning (digital coupling). D) kT-RAM can borrow from existing RAM architecture to integrate into existing digital computing platforms.