“What you can imagine depends on what you know.” –Daniel C. Dennett

Additional Educational Resources

Reddit Forum

I’ve read your PLOS Paper but im lost. Could you distill AHaH Computing into a sentence or two?

There exists a computational building block or primitive that is formed of ‘energy dissipating pathways competing for conduction resources‘. It manifests at all scales of Nature, from neurons to river basins. You can realize this building block efficiently in electronic systems with memristors. Unlike traditional computing, this building block combines or ‘mixes’ memory and processing. We call the building-block an “AHaH Node”. Just as the transistor is the basis of many circuits, AHaH Nodes are the basis of many circuits. The paper shows how AHaH nodes can be used as the basis of general-purpose computing and machine learning. Since AHaH Nodes mix memory and processing, they are very efficient for memory-processing intensive operations like learning.

Is Knowm technology available for license?

Yes. Please contact us to discuss your ideas or needs.

You can obviously call your project whatever you want but please don’t use the term ‘knowm’ to refer to the same phenomenon occurring in Nature. Please just use some descriptive term. I don’t think you guys deserve naming rights for this phenomenon.

We are truly sorry if how we have named things is upsetting. There are two issues. First, there is the name for the “thing in Nature”. Second, there is the name of our organization.

In regards to the ‘thing in Nature’, there are two problems with using a descriptive term. First, it is cumbersome in dialog. This leads to acronyms, which are also cumbersome and very confusing to newcomers. Second, when you are trying to build a technology/science around something like ‘Knowm’, descriptive terms are usually not quite right. We mean something specific with the word “Knowm”, and we believe ‘it’ needs a new name to distinguish ‘it’ from all the other terms people throw around which describe part, but not all, of what we are talking about. ‘It’ is not just a fractal, or something that resembles a tree or a neuron or X, Y or Z. A quick definition could perhaps be something like “the adaptive energy dissipation (living) structure found throughout Nature that is the result of vascularization”, but even this is not quite right. As we gain more knowledge of Knowm, we will ascribe more meaning to it. However, for this to happen, ‘it’ needs a name. Ideally a name with little to no scientific baggage.

We chose the word “Knowm” for the following reasons: (1) Its short, new and easy to remember with no pre-existing scientific associations. (2) It denotes ‘a thing’. Since we believe that Knowm’s are alive (yes, even rivers and lightning), it makes sense to talk about it like that. (3) The word rhymes with Ohm, which is the unit of resistance. Knowm is a flow-structure, and our work deals with utilizing memristors to mimic Knowm’s building block, something we call Knowm’s Synapse or Nature’s Transistor. (4) ‘know’ denotes knowledge. Our work primarily deals with building adaptive learning machines, so this fits. We also believe there is an intrinsic physical intelligence in knowms, as they actively search to find energy-dissipating pathways. We believe knowledge is the product of intelligence and related to this process. (5) Knowm sounds like the friendly fairy-tale characters called ‘Gnomes’, which are the protectors of Nature that live in trees. (6) Rhymes with the mantra ‘Om’, which is said to be “the sound of the universe” and has a significant spiritual meaning to many people. We have also found that many ascribe spiritual significance to Knowms. Since we see Knowm’s everywhere in Nature, to the point where it appears as if Nature may itself be built of them, we feel the name honors this spiritual or ‘holistic’ sense. To be clear, we are not in search of or proponents of spirituality. We are scientists and engineers in search of knowledge and technology who have a deep respect for both Nature and people.

As for naming our company after it, we felt that it made sense. We coined the name ‘Knowm’ long before we formed the company. Our affiliated company “KnowmTech” was formed in 2001, for example. Since our mission is to facilitate a greater understanding of Knowm (or whatever you want to call it!) and its applications to technology, particularly in neuromemristive processors, we feel the name is appropriate and we do not mean any disrespect.

Most of us at Knowm Inc agree with Nobel laureate physicist Richard Feynman: “I learned very early the difference between knowing the name of something and knowing something.” That is, a name has no real meaning in itself and ultimately does not matter much to understanding. It’s the meaning we ascribe to the name that matters. What really matters to us is how things work, not what things are called. We would very much like “it” (Knowm) to have a unique name so that we can ascribe meaning to it because “it”, in our opinion, is it’s own thing. Until ‘it’ has a unique name, it’s going to be hard to study ‘it’ because folks will not understand specifically what we are talking about. Hence we made up a name so we could get to work.

KT-RAM is in essence a memory storage system utilizing memristors to vary the signal response from a spike encoder. In what ways is KT-RAM different than normal RAM? More specifically, what makes KT-RAM so special?

Its an analog synaptic processor. It reduces synaptic integration and adaptation to analog operations on memristors, thus saving the considerable amount of energy required to shuttle multiple bits backand forth between memory and processing. Each “bit” in kT-RAM is a multi-bit analog synaptic weight thanks to the differential pair of memristors.

For any spike pattern, why are both forward and backward instructions necessary? (Over-saturation? Does this mean excess voltage is left in the network?)

It has to do with saturation of the differential memristors. The synapse is encoded as the difference in conductance between the two memristors that form the pair: Gs=Ga-Gb. If you only ever apply a positive bias, both memristors will saturate and your state is lost. Same thing if you only apply negative voltage. So pairing the instructions keeps things working. You could also utilize natural decay if the memristors are ‘volatile’. That is, you could drive the conductance higher and then wait while their conductance comes back down or normalizes. One way or another, you have to prevent saturation in the differential pair.

For a set of classifiers, the output seems to be a confidence level of 0 to 100%. Can the output provide multiple classifiers with each having a confidence level? (i.e. blue-green color or a mixed breed of dog.)

Absolutely. Given some spike stream (coming from feature learners), you can spin up an AHaH node (equal in size to the spike stream space) for each label. You can do this serially or in parallel, depending on the size and quantity of the cores.

I’ve been led to believe the process is read-write only. Does every memory retrieval cause a change in memristor conductivity? I thought a small enough voltage wouldn’t alter the resistance.

A small enough voltage will not alter resistance, depending on the physics of the specific memristors. The low-power solution to adaptive learning involves understanding how to build a system where the parts break, because if your voltage is very low (and hence you are consuming low power) and you want to adapt at the same voltage, then your synapses will become volatile because the barrier potential between states will be of the same order as random thermal energy. If you can repair this constant damage, you get the low-power adaptive learning solution. Like your brain right now, which is basically a big hunk of volatile pudding. Our current memristor technology does provide for a non-destructive read, but our methodology (AHaH Computing), solves for the more general case and gives us a scaling path to much higher levels of adaptive efficiency. (Nobody appears to understand this, BTW). So we could set the core voltage below the forward adaptation voltage of our BSAFW memristors, say .1V, and execute FF instructions to read without having to worry.

How computationally intensive is the emulator? I imagine for small AHaH nodes, any decent pc would be okay. Is there any benefit to using a large multicore super computer?

We have ‘interchangeable cores’. One core is for detailed memristor simulations (MEMRISTOR). The others are for efficient deployments on applications (NIBBLE,BYTE,FLOAT). Each module (like the classifier or feature learners) maps to the kT-RAM instruction set. So we can develop on BYTE core because its fast, then check it will work with our memristor (swap in the MEMRISTOR core), then deploy applications on NIBBLE or BYTE. Our current BYTE and NIBBLE emulator is very efficient and comparable (and in some ways surpasses) the efficiency of existing machine learning methods operating on existing digital computing platforms. Lots of caveats here, of course. Until we have kT-RAM we are under the same constraints as everybody else. So we have developed a path whereby we can commercialize on existing digital platforms. Yes, large multicore super computer would help. Thats why we are developing the SENSE Server. Its a cloud-scalable compute resource with hooks to kT-RAM emulators optimized for FPGAs, GPUs, multi-core CPUs, etc. We plan on turning anything that we can into a kT-RAM emulator.

What is the relationship between KT-RAM and neural networks?

Neural Networks (the algorithms) are collections of linear neurons with non-linear activation functions that take real-valued inputs and multiply by real-valued weights. kT-RAM is a generic synapse resource that takes spike inputs (x=0 or x=1), and multiplies by real-valued weights to produce a real-valued output. (The spike-code is a hardware constraint, but AHaH nodes can in principle work with non-spike inputs as well.)

I understand how spike encoding works with 1 as a positive pulse and 0 as a negative pulse…

Thats not how spike encoding works! There is no such thing as “0 as a negative pulse”. Thats a binary code. An input spike has a spike or no spike (z). z means it is electrically floating, which cannot transmit a state. Note that AHaH nodes can work with binary codes, as well as continuous inputs. Spike codes are nice for a number of reasons, including that they map to address spaces perfectly and (so far as we know), are universal and can achieve the same results as other encodings.

I can understand how the universal logic can be created by NAND and XOR gates. I don’t know what memristors would do differently though.

The same AHaH circuit can come to represent any logic gate, except the XORs since they are non-linear and an AHaH node is a linear discriminator. However, you can combine multiple other logic gates together to achieve XOR and the same is true for AHaH attractor states. So while you would have had to hard-wire a logic gate in circuitry to be a NAND or XOR gate (or whatever else), you can make a memristor circuit that can become whatever gate you want via learning or programming. Unsupervised AHaH attractor states are universal logic functions and can thus be used as the basis of a computing fabric. If the logic function must be learned or later programmed, or if the input space is large (for example pattern recognition or inference), then AHaH nodes are an good option. If the logic function is set and never needs to be changed, then there is no reason for them–just use a dedicated circuit. The key features of AHaH nodes are reconfigurability and learning.

How can you create logic statements with AHaH Nodes? How does logic work?

please see the “AHaH Attractors Support Universal Algorithms” section of the AHaH Computing paper. Take some input state, lets call it “zero” or “false” or “state 1”, and assign it a spike pattern. This could be whatever you want. Do the same for another input state that you call “one” or “true” or “state 2”. The example used in the PLOS paper: Logic state–>spike pattern “true”–>[1,z] false–>[z,1] Note we could have other options, like: true–>[1,z] false–>[1,1] or: true–>[1,1,z] false–>[z,1,1] etc Combine the resulting spike patterns so they can be processed by an AHaH node. In our first example, we would need four inputs since each input required two lines. Giving the AHaH node the logic input “true-false” on logic input line 1 and 2, respectively, would yield the spike pattern: [1,z,z,1] Measure the output state (voltage) of the AHaH node. If it is a positive voltage, call that “true”. if it is a negative voltage, call that “false”. The AHaH Node is now a logic gate. It takes as input two logic states and returns a logic state. To find out what logic gate it is, you measure the output of the AHaH node for different input patterns. You can build up a truth table that shows the output for each input pattern. You will find that it obeys one of the 16 possible logic functions (for a binary two-input, one-ouput gate). You could also have a non-binary logic system, and construct spike-encoding for those states, make a truth table, and do everything the same. The attractor points of unsupervised AHaH plasticity are logic functions.

Books about Life, Nature & Physics

See the Knowm Library for more books

How exactly do you spike a node? Specifically how do you target an individual memristor instead of causing a current through the whole circuit?

kT-RAM has a RAM (SRAM) interface. Each synapse (memristor pair) is coupled to the driver and read-out electrodes via pass-gates. These pass gates are controlled via the state of the RAM bits. Once the synapse (or synapses) is/are coupled to the driver electrodes, the AHaH Controller can read out the state and apply feedback. All the other cells remain decoupled.

Is it correct to state that kT-RAM is limited to linear regression? Since the calculations are being made by a passive linear network of conductances, this seems to me to be the case. Or am I missing something here?

kT-RAM has a spike interface, so depending on what you mean by linear regression the answer is either yes or no. kT-RAM is an AHaH Node substrate (with a spike interface) and an AHaH Node is similar to a linear neuron with a sigmoid/tanh activation function (but not exactly). You can threshold the activation voltage of an AHaH Node, compare it to other node activation voltages (i.e. sort) or digitize it (which is expensive). You can do many things with AHaH Nodes, including non-linear classification, if you string them together in multiple stages. This is no different than standard neural networks. Read the PLOS Paper for a list of some stuff. An AHaH Node is a memory-processing primitive, and kT-RAM is a computing substrate. It is not an algorithm. Via the KnowmAPI we have been and continue to learn new ways to use it.

Power scaling in kT-RAM is worse than linear in the number of AHaH nodes and must be restricted to small configurations. How can this scale?

It really depends on what you mean by “large” and what you mean by “scale”. It is likely you are attempting to see kT-RAM as a whole solution rather than as a part of a solution. A 512 X 512 core could be used to emulate neurons with up to 262,144 synapses, which is about at the max end of biology if you look at Purkinje cells. However, that same core could be used to emulate 20 neurons with 13,107 synapses or 100 neurons with 2621 synapses. You could have neurons with various sizes, or you could go all the way down to the individual synapse. This flexibility has a trade-off in energy due to capacitive losses in the H-Tree. Simple optimizations for larger cores such as chokes help to reduce this capacitive loss, but this is an engineering problem just like everything else. The purpose of kT-RAM is to provide for a more flexible adaptive synaptic resource at the hardware level, embedded into larger architectures of various types, just like SRAM or various types of logic blocks are embedded into chip designs of various types. As for scaling, there is usually a trade-off in terms of flexibility and power when it comes to computing architectures. The modern CPU is a great example. It is a “jack of all trades and a master of none” and they have come to dominate the computing world. It is important to keep in mind two things. First, kT-RAM is an “adaptive synaptic resource” intended to be used as a co-processor within a variety of large-scale architectures, like a mesh-grid of cpu cores. Second, kT-RAM is just one possible implementation of AHaH nodes! Crossbars, for example, are another. Each has its own advantages and disadvantages, and depending on what you want to achieve, you should go with the best solution. Remember, its not possible to beat physics. But it is possible to clearly define what you want and then explore the space of possibilities that give you the best solution. For an interesting and entertaining read on some of this, see the Gordon-Panthana dialog.

There’s no evidence that your single learning law combined with limited precision synapses and “spiking” neurons could get you anywhere close to state-of-the-art performance on benchmarks like Imagenet.

To correct the false-assumption, a chip capable of AHaH plasticity is not necessarily a chip constrained to learning in only one way. We have only come to this understanding through our work with AHaH Computing Learning algorithms become specific instruction set sequences or routines, where each operation results in Anti-Hebbian or Hebbian learning. At the lowest level, Anti-Hebbian just means “move the synapse toward zero” and Hebbian means “move it away from zero”. People have come to the misconception that we are only working with a ‘local’ or ‘fixed’ learning rule. On the contrary, we have defined an instruction set from which the local unsupervised rule (FF-RU for example) is just one possibility, which does not preclude global computations. We hold ML in very high regards, and it is the work of folks like Yann Lecun (and the others) that are pushing the boundaries, using the tools available to them. They are the current undisputed champions in primary performance benchmarking. We think the field of neuromorphics should re-align with primary performance benchmarking like ML has. As far as evidence that spikes and low-resolution synapse can work, it does beg the question of how our brain do it, seeing that they are (ultra efficient) spike-based networks with low-resolution synapses. From a hardware perspective, spikes and limited precision synapses make the most sense. From the algorithmic perspective, work like this, which is motivated by attempts to map algorithms to more efficient hardware, demonstrates that limited precision synapses can work better than full precision. Our results with the KnowmAPI support this. Our goal is to achieve ‘primary performance parity’ with state-of-the-art machine learning. We do not want to re-invent the field of ML–we want to achieve what it has already achieved in a much more efficient learning substrate. The massive strides being made in machine vision (and almost everything else) is wonderful, and we are watching closely what algorithms and approaches are working best. We will take what we can and port it over to AHaH Computing, and we will invent new approaches if needed. Rather than ignore the adaptive power problem, like others do, we are working on a solution.

How does kT-RAM compare to IBM’s SyNAPSE chip?

They are not actually the same thing. True-North is mesh of programmable digital cores that pass spikes around. kT-RAM is a specification for an adaptive synaptic resource intended to be used as a co-processor within a variety of large-scale architectures. For example, the SRAM inside the True-North core could be replaced with a kT-RAM core (or cores), and the result would be on-chip learning, more synapses, lower power and more flexibility at the core level, constrained to the specific large-scale topology of True-North (grid of cores). IBM would have to ditch most of their software and methodologies, so its arguable if it would be worth it instead of just building new large-scale architectures from scratch.

You talk about merging memory and processing, and that Nature does not do this. What about the summation/integration of the synaptically-scaled input spikes in a neuron? Doesn’t that have to be done in the dendrites and soma? Isn’t that a case of biological processing that’s spatially separated from memory?

Not really. That is a mathematical idealization of a neuron. Its actually more like that in kT-RAM than it is in a real neuron. Although I would say that this separation you are referring to is not like the separation in a digital computer. Also, you left out learning in this idealization. Just zoom into a (real) dendrite a little, or a (real) neuron, or anything (real). At some point you will see that the idealization of memory and processing being separate does not hold. To describe anything you will need both state information (physical objects or ‘memory’) and transformations (laws of physics acting on those objects). A neuron does not separate memory and processing and shuttle bits back and forth. It is a merging of memory and processing. A synapse is not memory and it is not processing — it is a merging of the two. A soma is not memory and it is not processing. Its a merging of the two. And so on.

Books about Machine Intelligence

The pre-processing step shown for the MNIST problem in the AHaH Computing Paper is projecting onto a random frame. That preserves information, but the filters in that projection are not going to be attractors. Thus I would think that reading them during the processing of successive inputs would cause them to drift, upsetting the learning of the classifier.

There are a few answers to this, specific to general comments. First, take a look at an image from some of our early work:

AHaH Attractors

Unsupervised AHaH attractors, shown as decision boundaries along side a clustered data distribution. Random initialization left, converged attractor right.

  1. The attractor states of unsupervised AHaH plasticity are a function of the structure of the data. if the data is random and uniform, then the decision boundary will drift. If the data has a bi-modal projection, this will result in an (unsupervised AHaH) attractor that maximizes the margin between opposing distributions. So it will start random but get trapped in whatever attractor. These attractors are related to ICA. Perhaps read the “clustering” section of the PLOS ONE paper.
  2. AHaH Computing strives to have no such thing as a non-destructive read for the reason that eliminating this requirement enables very low thresholds of adaptation, which in turn enables very low operating voltages, which is required if we are going to get to biological-scale levels of efficiency. It’s clear that brains could also “drift” for the simple reason that they are alive. That is, the atoms in a brain are recycled over a few months and are composed of structures that are constantly repaired (the act of being alive). And yet your memories are stable and you appear to work fine. Why is that? The implication of course being that a solution to the repair problem could also be a solution to the “how is a brain built” problem, since repairing and building are related.
  3. The classifier can get upset if its input is shifting, so one has to be careful about this. One also has to be careful that the classifier itself (the nodes that make it up) are not affected by repeated FF-RF operation. Repeated application of the FF-RF operation coupled to deactivation of input lines (like drop out) results in better performance. See the cortical computing paper for this result.
  4. This question gets to the heart of what AHaH Computing is about, and why we went to great lengths to show that AHaH attractors can be used to do many things. In a situation where “the parts are constantly breaking” (all of living systems and brains, but also ‘nano electronics’), you need to find a way to get the parts to constantly ‘heal’. The only way I know how to do this is with attractors. In the case of unsupervised AHaH, the attractors are a reflection of the data structure–and it just so happens that they are logic functions, they maximize classification margin and extract independent components.
  5. Another key to this is sparse spike encoding. Sparsity has a strong effect on the stability of AHaH attractor states, and non-sparse spike streams can cause problems including likely occupation of the null state. This can be remedied through ‘bias’ synapses that receive the FF-RA instruction, at the expense of power and time. We believe sparse spike coding was a step forward in AHaH Computing developments.

I have heard AHaH Computing being compared to Quantum Computing. How?

This confusion stems from the following quote from Alex Nugent, Knowm CEO:

“Knowm’s AHaH computing approach combines the best of machine learning and quantum computing via memristors”

Note the “best of” modifier. Quantum computing is about exploiting physics to solve problems (what Alex believes is the good part), using a state of nature that requires conditions of isolation not even seen in deep space (the bad part). We are taking (what we believe is) the good part. The intention of Quantum Computing is great, and it looks amazing on paper. You take a property of Nature and find a way to exploit it to accelerate computations. Attempting to build quantum entangled states for computing leads to incredibly complex machines with little in the way of practical value (so far) compared to the investment of time and money (30 years and billions in research funding). In AHaH Computing we are trying to understand how natural self-organization works, and to exploit these mechanisms to accelerate solutions to some problems. We do not need, nor do we want, to rely on quantum superpositions. We want to deploy practical technology that can do what brains can do by building, rather than calculating, adaptive neural circuits.

How many bits of information, on average, do you anticipate storing in an AHaH synapse (memristor pair) given the “noise” induced by device variation plus adaptation and decay processes?

Great question. We would love you to order some memristors and test them yourself, and ideally publish the results. We will make sure Dr. Campbell gets the feedback to further optimize. The question we have been more focused on is similar: how much [bits of information] is required? We have built the Knowm API around interchangeable core types, which enable one to plug in different memristor models to see how they work on real-world benchmark problems. 4 bits is good for many things, but 8 bits is generally enough for everything. There is also the issue of how you measure resolution. In AHaH Computing we are primarily interested in the incremental response. Given an upper and lower conductance bound, how many pulses on average does it take to move the conductance from high to low and low to high? This is actually different than attempting to ‘set’ or ‘program’ the device in one step with an applied voltage or compliance current. As the voltage-time product (“voltage flux”) is lowered, the incremental response becomes more fine-grained. Another issue is time. A memristor’s conductance state may decay over time, which under some measurements would manifest as low resolution. However, keep in mind that a continuous learning system is also constantly repairing the state, or stated differently, performing constant error-correction. Based on the preliminary data we have seen coming from Dr. Campbell’s lab, we are guessing between 6 and 9 bits. However, this could change as we get more data or refine how its measured. In terms of device variation, the memristors are adaptive over a conductance range. So long as this range overlaps, then the differential pair will ‘auto-tune’ via learning. If a memristor-pair becomes non-adaptive, then the synapse is no longer capable of learning. Depending on the application, this may or may not matter. If the synapse was part of a classifier and the data statistics do not change, then it does not matter. If the data statistics do change, then other synapses can adapt to compensate. Exploiting this adaptive behavior to repair around faulty synapses was some of the first work we did with the (unsupervised) AHaH rule so many year ago. See Reliable computing with unreliable components: Using separable environments to stabilize long-term information storage.

Is there any method of saving and duplicating state with this technology other than retraining the network?

There are three methods. First, our memristors are programmable and can be set to defined resistance values. This does, however, complicate the driver circuity in the AHaH Controller. Second, there is something called ‘current injection’ combined with the FF-RA instruction. This can also be used for A2D operations (converting analog signals into a spike code). Third, its fast to sync a neural state. Given an original kT-RAM and a new kT-RAM, spike patterns are generated and the original response is recorded. The same spike pattern is loaded onto the new kT-RAM and supervised learning (FF-RH or FF-RL instructions) syncs the states. This is fast because multiple synapses can be modified at the same time.

How does this technology handle power loss?

Memristors are non-volatile and hold their state. SRAM interface is used only to temporarily couple memristors to driver electrodes. I.e. to “load spike patterns”.

What courses/subjects would you recommend for an undergrad with an interest in neuromorphic computing and AHaH?

AHaH Computing “crosses the technology stack”, so my advice is to learn as much as you can about the whole process of computing, from silicon wafers to real-world application development. I would then focus on one or two levels of the stack. Physics: Classical including Electricity & Magnetism and Thermodynamics. If you want to go into memristor device fabrication you will need Quantum and Physical Chemistry. In terms of Thermodynamics, be careful here. Be sure to check out some “fringe” stuff like Constructal Theory. The goal is to understand how life works as a physical process, and modern physics educations have very little if nothing to say about it. They can, however, give you solid foundations. Electronics: You could specialize here or, just get a good overview. Introduction to electronics is a must. After that you should know the basics of VLSI and how chips are made. Circuit design, both analog and digital. CS: Foundations in computing. You need to really understand how modern processors work, and how code gets turned into instructions that execute on processors. If you specialize here, do not forget about what is occurring at a physical level! Get really good at linux, and understand how the operating system connects to low-level peripherals and co-processors. Machine Learning: Overview of existing methods (decision trees/forrest, SVM, Neural Networks, Baysian Models, etc) and domains (perception, planning, control). Unsupervised learning. Focus on methods that have been commercialized in the real-world and try to understand why. If you specialize here, do not forget about what is occurring at a physical level in the computers! The machine learning community is notorious for simultaneously exploiting hardware accelerators while also deemphasizing their importance. They (for the most part) see hardware as something that should support their algorithms, not something that is intertwined with algorithms and physics. Neuroscience/Computational Neuroscience: Be very careful here! Use (biological) neuroscience as inspiration, but try not to fall into the “lets mimic” camp as a route to understanding learning algorithms. The brain is far too complex to focus on the minute details if your goal is useful technology. Many in the neuromorphic community have large blinders on in regards to real-world problem solving and the results are chips that are only useful in a narrow context. The ML community is better focused in this regard, but they in turn have a “hardware blindspot”. Rhetoric & Debate Learn how to communicate effectively with other people. Get good at recognizing logical fallacies. Technology is intertwined with money and egos. For whatever reason, folks tend to get really stubborn and angry around ideas that threaten the status quo. They will react with arguments that on the surface may seem reasonable but after inspection are illogical. While technology is the foundation, our economy is driven by the interactions of people and human communications.

Your “open source” license on your PLOS ONE code basically limits any use to exact replication and threatens legal action for anything else.

The PLOS ONE code is not the code we want people to develop with and is there for read-only methods verification. The PLOS One paper code used a functional (mathematical) model. The problem with the functional model is that there are ways to use it that do not map to hardware. The KnowmAPI is built on a modular kT-RAM emulator, allowing users to test various “core types” with different memristor models. That is, its a simulation of an actual circuit, with each core type offering different levels of resolution. The Knowm API is now available for individuals or organizations under a NDA agreement as part of the KDC.

There have been dozens of attempts to produce “machine learning hardware” with FPGAs or ASICs, they always end up obsolete upon release thanks to Moore’s law.

Moore’s law does not resolve the memory-processing duality. Its going to be us or somebody else, but the problem will be solved because a tremendous amount of energy is currently wasted shuttling information back and forth and this will not be resolved except through fundamental changes in computing architectures. At this point in time, much larger performance (speed, efficiency) increases can be achieved through code optimization and changes in processor architecture than through smaller transistors. Innovations are now being driven by alternative architectures. We are not aware of a more efficient architecture for synaptic integration and learning than one that eliminates the memory-processing duality and enables low adaptation voltages.

The PLOS ONE article does a horrible job of explaining what sort of models you are aiming for!

AHaH nodes are building blocks. You use them to build models, and we have only just begun doing that. We are aiming for all sorts of models across perception, planning and control. We have shown decision trees, linear classifiers, fully-connected layers, combinatorial solvers, robotic arm actuators, reconfigurable logic and more. Our goal with the paper was not to show a model. It was to show a generic adaptive building block circuit that can be used to make many models.

Backprop or gradient descent is already known to be the best!

Backprop is a wonderfully useful algorithm and works well with current methods of computing. However, it is clear that backprop is not operating in the brain (at least in the way it is currently formulated mathematically) and hence there are clearly other methods available. When constraining solutions based on physics and circuits, so as to side-step the adaptive power problem, backprop becomes problematic. It is important to understand how intertwined machine learning is with the computational platforms that enable it. When the constraints of those platforms change, the optimal machine learning models change. Large GPU back-prop trained models today contain perhaps 100 million to 1 billion weights. The human cortex contains about 150,000 billion synapses and consumes about billion time less energy and space. That said, we’ve seen a few studies showing how to attain backprop operating in interesting ways that may be easy to port to kT-RAM and memristors in general, for example here. Here is one of backpropigation’s inventors, Geoffrey Hinton, talking about back prop and STDP. Once we develop and perfect AHaH based methods that achieve primary performance parity with existing methods like backprop, Knowm will dominate in secondary metrics such as power efficiency, speed, and cost. At that point, backprop as it currently exists will be obsolete. Until then, memristors are likely still going to be the most efficient path to implementation of the backpropigation algorithm in hardware.

If AHaH is a local rule, how can it be used for anything useful?

AHaH is not just a ‘local rule’. kT-RAM instructions are paired, but they do not have to occur one right after each other. For example: FF-RH or FF-RL. In the first instruction (FF) you can read the device. You can then compare with other node activations and do whatever you want. You can then send an RH or RL or whatever command at a later time. If you want unsupervised learning you would send a compound instruction like FF-RU. By creating conditionals of instructions, the space of possible routines grow very large. This gets even bigger with “split-conditionals”: groups of synapses can be read together and sub-groups can get different instructions. When you pair kT-RAM instructions with programmable routines, the space of available learning algorithms is truly enormous.

Am I wrong in thinking that the von Neumann bottleneck is the exact problem that memristors would solve?

You are correct. For spreadsheets and email apps, that architecture is fine. But for processing the vast amounts of data needed for real-time machine learning apps, it’s a serious problem. CPUs are being designed with memory closer to the processor, they are making multi-core processors, and there are new chips like Adapteva’s Epiphany. All of these tricks reduce the distance between processor and memory. Nature, on the other hand, uses a different approach to computing. It’s a system where the processor is the memory. The distance is zero. With memristors, the same can be achieved. One way to do it would be our proposed Thermodynamic-RAM. We write about it in a paper, which has a historical background section to give some perspective.

Have you or your colleagues seen the Terminator movies?

Intelligence, like any tool, can be used for various purposes, both good and evil. We are well aware of the dangers and have put considerable thought into it. Many are thinking about it. A good recent book on the topic is Superintelligence. One common fear is if just one group or organization gains control of AI and uses it for their own selfish gain at the expense of others. Toward that scenario we have published our work in an international open-access journal. We are committed to making the technology generally available so all can benefit, and we hold ourselves to high moral standards. That said, if you are fearful about the potential misuse of a technology the best thing you can do is learn about it! Intelligent machines are inevitable. The benefits are enormous, and there is an international race afoot. The single most important thing to do right now is to learn so you can become part of the solution or to help spot problems and enact proper regulation.

Do you work in close contact with AI programmers aside engineers?

Our goal is to port successful machine learning techniques and algorithms to the AHaH computing paradigm. Currently there are only a few people who know enough about AHaH Computing to do this. We have first focused on acquiring basic machine learning building blocks an AI programmer needs. We are open-sourcing the KnowmAPI under the KDC, including tutorials, video lectures, etc. We also develop the SENSE Platform, a horizontally scalable ‘cloud computing’ platform (server network) with hooks to hardware accelerators like FPGAs, GPUs, Epiphanies, Automata Processors, and eventually kT-RAM. Most developers will not have to worry about the very low-level (hardware) or very high-level (network) details. Rather, they can focus on application and algorithm development.

Just how efficient do you expect AHaH to be, for say, 100 trillion synapses? Our brains do it at about 20 watts, supercomputers would have it at several hundred megawatts, or even a gigawatt, what’s kT-RAM expected to be at?

Great question! There is a lot to this topic, but let me try to give you a short and simple answer. We are using differential memristors as synapses. The energy dissipated in a synapse is the power dissipated over both memristors for duration of the read/write pulses. P = IV = V^2 / R , where V is the voltage and R the resistance. Typical on-state resistance of memristor are 500kOhm, and typical voltage is .5V, so: P=.5^2 / 500E3 = 5E-7 watts. The energy is only dissipated during read and write events, which occur in pulses of ~50ns (nanoseconds) or less. The energy per memristor per synaptic read or write event is then 5E-7W x 50E-9s = 2.5E-14 Joules. Since the kT-RAM instruction set requires paired read-write instructions, and since a synapse is two memristors, we multiple that answer by four: 1E-13. This is .1 pico-joules per adaptive synaptic event. Note we could lower the voltage and and pulse width to achieve even lower power for synaptic integration operations (i.e. no learning). Also note that capacitance plays a big role, so if your AHaH node is very large it will dissipate more energy as electrons get soaked up on the wires. If we say a human brain has 100 Billion neurons, each with 1000 synapse, that fire on average once per second, that is 1E14 adaptive synaptic events per second. The energy consumed in one second is 20 Joules. So if we put all energy into synaptic events we get 2E-13. The actual deployed power consumption of kT-RAM is dependent on what sort of computing architecture its embedded in. Very small cores are more efficient than larger cores for processing smaller spike streams. The purpose of AHaH circuit is to remove the memory-processing bottleneck for adaptive learning operations. If you knew exactly the connection topology of a brain and made a custom ASIC of AHaH nodes you are looking at efficiencies comparable to and possibly even exceeding biology (eventually). The reason for this is that our modern methods of chip communication can be quite a bit more efficient (and faster) than biology in some respects. However, if you have a more generic architecture that enables you to explore more connection typologies, for example a mesh grid of little CPUs with local RAM and kT-RAM, you would expend more energy but get something more flexible: The ability to emulate any brain, not just one specific type.

Would the sort of devices you’re working on be usable with existing off the shelf software? For instance, we already have “smart” programs like Google Now and Siri that “learn” about us and respond accordingly. Would the sort of physically adapting hardware you’re developing be able to improve the performance of existing adaptive software or would an entirely new software infrastructure have to be created?

Great question. As with any hardware that is integrated into an existing system, you need both a physical connection and some software “glue” between the components. Most people have heard of a driver and that’s one possible glue layer we are referring to. In our design of kT-RAM, we’ve developed the hardware interface and defined an instruction set. We describe it here in section C. Thermodynamic-RAM Instruction Set and Operation. While new ‘software stacks’ must be created, this does not mean we cannot easily transfer knowledge from one system to another. It is always possible to use the output of one learning system as a supervised input to another.

From your point of view, do you think there could be a sudden leap in computing technology that will render the current “top-of-the-line” consumer-level computing (high-end gaming graphics cards like the GTX980, new i7 processors, etc) obsolete in a very short period of time?

No. GPUs and CPUs are absolutely great at what they were designed to do. G stands for ‘Graphics’, after all! They will likely be around for a long time. There will be new types of hardware available that will dominate applications in machine learning and give the everyday consumer access to an entire new range of applications and abilities that are not possible right now. We recently proposed one such device that would plug into existing computer platforms and allow for many exciting new possibilities. It’s called Thermodynamic-RAM.

Books about Brains & Neuroscience

What is your opinion on the Human Brain Project? A large chunk of the science community has implied that working on a comprehensive simulation of the brain is currently premature.

Many researchers take the “let’s decode the brain!” approach and want to simulate the brain on massive computer clusters. While this will most certainly provide insights into how the brain works, they will eventually be faced with the reality that that their simulations will not be able to compete on power, density and speed with other approaches that have addressed the issue directly. We’ve taken a different approach, which is to build a chip that doesn’t necessarily emulate a brain but instead provides adaptive-learning functions at a foundation ‘physical’ level and consequently beats other approaches on power and density. Our goal is to then match primary performance benchmarks.

What do you think about Action Selection?

There are various ways to go about it, and its a very significant problem with major consequences for effective solutions. We currently like prediction of future reward states. If you can predict the consequence of actions then you can search over possible options and select the best one. Think about how you can quickly envision or imagine far into the future. This is an interactive process over a collection of prior memories or experiences, i.e. its ‘reflective’. Other, more simple action-reward approaches can also be useful. If you hook a bunch of AHaH nodes to the muscles of a robotic arm, its not hard to get them to actuate a robotic arm.

What do you think about Consciousness?

We really don’t know. Its an important question, and we have theories about what attention is, but our main focus is in solving well-specified real-world problems.

How closely does your approach mimic biology?

We are not specifically focused on biology. Rather, we are interested in natural self-organization and how to harness it. We believe that biology has ‘hijacked’ or ‘co-opted’ a more fundamental process of energy attempting to dissipated itself through adaptive containers. Ultimately we are trying to solve problems, not build brains. Our primary concerns are solving real-world problems, and although biology is a big source of inspiration, our technology is not going to look much like biology.

The Knowm fractal seems to be modeled after structure in the brain, but memristors don’t quite correspond to neurons as I understand them.

Don’t think about the Knowm fractal, think about what the Knowm fractal is built of, what we call “Knowm’s Synapse”. Knowm’s Synapse is a universal adaptive building block formed of two energy dissipation pathways competing for conduction resources. We can emulate Knowm’s Synapse with differential pairs of memristors.

kT-Synapse

Knowm’s Synapse can be emulated with two memristors.

What do you mean by kT-RAM being capable of ‘healing’?

AHaH Nodes are always adaptive. Unsupervised AHaH plasticity creates attractor states that repair state information as they are used. The act of using kT-RAM heals it. Some synapses could stop working, inputs could turn off or become chaotic, etc. Unsupervised AHaH plasticity will constantly tune the functioning synapses. Unsupervised AHaH plasticity is attempting to find a bimodal projection and maximize the margin between opposing distributions. So it will track the data structure. This also leads to unsupervised (or semi-supervised) learning. Once the node is in the attractor, via one or a few supervised examples, the AHaH node will continue to converge and performance will get better without additional training labels. See figure 17 in this paper or figure 4 in this paper. This is how we first discovered AHaH. We trained an SVM and then applied unsupervised AHaH plasticity and it got better because it was still converging on the test set.

Will the transition from simulated kT-RAM to actual kT-RAM do anything other than increase efficiency?

A star-gate will open up and confetti will fall from the sky. But seriously, physical kT-RAM is all about power and space efficiency. However, keep in mind that kT-RAM is a general-purpose AHaH processor. While you can create more efficient AHaH processors for specific things you give up generality and hence utility. We feel kT-RAM offers a good balance. However, improvements are inevitable!

How is your current machine learning approach on kT-RAM different from a traditional neural net?

kT-RAM is an “AHaH Node resource”. AHaH nodes can be used for things like feature learning, classification and combinatorial optimization. You can think of an AHaH Node as a ‘neuron’ and you can make networks out of them and you can solve the same sorts of problem that neural networks solve. But you can also do more basic stuff like logic, memory, random number generation and set iteration. Our current approach is all unsupervised feature learning combined with multi-label semi-supervised classification. AHaH Nodes can be used in many way and we have just started trying various things out. We believe AHaH nodes are universal adaptive resources, and our success to date has given us confidence that a great number of effective and efficient solutions can be attained with them.

How is kT-RAM different from an FPGA?

kT-RAM is best understood as a learning co-processor. It excels at things like inference and classification. We are not aware of other computing substrates that provide access to higher power efficiency or synaptic density than kT-RAM. FPGAs are very useful generic hardware accelerators built with modern CMOS technology. Before physical production of kT-RAM, we are using FPGAs (and other hardware accelerators) as kT-RAM emulators in our development platforms.

Neurogrid can already do a million neurons & 6 billion synapses in 5 watts.

First let me say that we admire Dr. Boahen and what he has done! We are all working toward similar goals and face difficult constraints. Neurogrid is a fantastic demonstration of how dedicated hardware can improve efficiency, but it has problems from a practical perspective. A big problem with the neuromorphic community is that they go for mimicry instead of functionality. For example: “Neurogrid simulates one million neurons with two subcellular compartments each, a choice motivated by neurophysiological studies.” Note the choice was not motivated by practical utility or real-world problem solving. Consequently, it is hard to take this platform and solve real-world problems. It is a research platform for computational neuroscientists, not industry folks with real-world machine learning needs. Topologies are restricted to structured patterns motivated again by “neuroanatomical studies”, and synaptic adaptation is limited to STDP. So you have to take those numbers and put them into context. This is why we are aligning with primary performance benchmarking.

How does kT-RAM differ from Micron’s Automata Processor?

Micron’s Automata Processor is a parallel and scalable regular expression matcher with limited dynamic reconfigurability. KT-RAM is a learning co-processor. kT-RAM can do things like unsupervised feature learning, inference, classification, prediction and anomaly detection. As an example, if you wanted to search Wikipedia for all occurrences of some word or word pattern that you specify, the Automata Processor would be great. If you wanted to learn a representation of that word and how it relates to other words (its meaning), you would be better off using kT-RAM.

Why is Java the implementation for Knowm’s API?

A few reasons.

  1. Java is not the only language used. For example, we are working with RIT and they have implemented kT-RAM emulators (and spike encoders) in C, which we access via JNI wrapper. We are also working with FPGA implementations. Java ties it altogether.
  2. It ties nicely with distributed clustering software like Storm, which we are using on the SENSE Server. Web apps too.
  3. Many developers know Java (for example, android).
  4. We do not anticipate Java is the only language we will use, and we welcome developer participation in porting to other languages. That is why we created the Knowm Development Community (KDC). If you write a C/C++ emulator, for example, then you will receive royalties every time it is deployed in an application. If C/C++ is better for certain tasks, it will eventually be used.

10 Comments

    • Simon Reichlin
      reply

      Hi dear Knowm Team,

      I’m a tech freak from switzerland and read an article about your work today. I guess your on the right path to develop human future. Please submit your PayPal account, because i really would like to support you and your work.

      Best Regards
      Simon Reichlin

      • Alex Nugent
        Alex Nugent
        reply

        Simon–Thank you for your support! We have set up a donations page. Please let us know if you have any problems.

    • Noah Gundotra
      reply

      Hi Knowm,
      I’m a high school student really intrigued by your work, and I was wondering if you could set me on a path for learning more about AHaH computing, like books or videos.

      I’m currently enrolled in a Stanford online Machine Learning course, and I’d like to learn more about the implications of your tech.

      Thanks!
      Noah

      • Alex Nugent
        Alex Nugent
        reply

        Noah, Thanks for your interest! AHaH Computing is so new, there are no public text books! We currently have a series of online lectures for members of the KDC, which we are expanding. We offer bounties in place of the fee. Basically if you do something to help the community, you can get in. If you are interested in Machine Learning and currently taking the Stanford Machine Learning course, I would say that is a good place to start! Our goal with AHaH Computing and kT-RAM is to achieve “primary performance parity” with state-of-the-art machine learning, which we have done already in some areas. The implications are extremely efficient learning processors and a path to biological-scale efficiency.

  • Knowm.org | AHaH Computing in a Nutshell

    • Rolf Kickuth
      reply

      Hi Alex, I am a scientific journalist in Germany, following AI et al. since more than 25 years. What still is not clear to me: Do the chips that you sell really work with memristive materials, or do they simulate memristors with standard cmos-technology? If they consist of memristive materials: How long is their operation time? I think as one is working more with ions instead of only electrons there are side effects which will lead to a malfunction after some time. If they do not consist: When will the memristive material be ready? HP estimates concerning its own materials about 2019… By the way your algorithm resembles me to Simulated Annealing ANNs. Is that right? Best regards Rolf

      • Alex Nugent
        Alex Nugent
        reply

        Do the chips that you sell really work with memristive materials, or do they simulate memristors with standard cmos-technology?

        The chips are actual memristors with a memristive material. You can read the datasheet here.

        How long is their operation time?

        Only time will tell! What we can say is that they can be electrically cycled billions of times, they can hold resistive state and be cycled at high temps (>140C). Early devices I received from Dr. Campbell over two years ago still work just fine. The physical structure/operational theory of the devices give me confidence that they will age well. But of course, the only experiment to resolve that question is time itself. Dr. Campbell reports that there is “No reason to assume it wouldn’t work in ten years. Different devices that I’ve tested after thirteen years still work fine.”

        When will the memristive material be ready?

        Depends on your use case. I am far more interested in incremental synapses than bits, for example, and the required operational characteristics of the devices for each use case is different. If you want to know if the devices are ready for your particular application (memory, logic, learning, filters, oscillators, etc), we offer packaged devices, research die, and data. We offer a BEOL service that works with CMOS fabs to add memristors to client designs, and we are currently working with clients.

        By the way your algorithm resembles me to Simulated Annealing ANNs. Is that right?

        Not sure what algorithm you are talking about, so I can’t say. AHaH Nodes are computing primitives. They can be bits, logic functions, classifiers, feature learners or even random number generators. kT-RAM is an AHaH node substrate. I have explored many ways of using AHaH nodes, and you could call each method “an algorithm”. I suspect that, like logic, there are an endless number of algorithms that could be built from AHaH nodes. That said, there are elements of what I have published that definitely resemble simulated annealing.

    • Rolf Kickuth
      reply

      Hello Alex, the Knowm memristor chips contain 8 memristors per chip. One advance of memristors shall be that they can be produced VERY small. Will there be memristor chips from Knowm in the near future which contain millions of memristors?

      • Alex Nugent
        Alex Nugent
        reply

        Will there be memristor chips from Knowm in the near future which contain millions of memristors?

        Yup.

Leave a Comment