“What you can imagine depends on what you know.” –Daniel C. Dennett
There exists a computational building block or primitive that is formed of ‘energy dissipating pathways competing for conduction resources‘. It manifests at all scales of Nature, from neurons to river basins. You can realize this building block efficiently in electronic systems with memristors. Unlike traditional computing, this building block combines or ‘mixes’ memory and processing. We call the building-block an “AHaH Node”. Just as the transistor is the basis of many circuits, AHaH Nodes are the basis of many circuits. The paper shows how AHaH nodes can be used as the basis of general-purpose computing and machine learning. Since AHaH Nodes mix memory and processing, they are very efficient for memory-processing intensive operations like learning.
We are truly sorry if how we have named things is upsetting. There are two issues. First, there is the name for the “thing in Nature”. Second, there is the name of our organization.
In regards to the ‘thing in Nature’, there are two problems with using a descriptive term. First, it is cumbersome in dialog. This leads to acronyms, which are also cumbersome and very confusing to newcomers. Second, when you are trying to build a technology/science around something like ‘Knowm’, descriptive terms are usually not quite right. We mean something specific with the word “Knowm”, and we believe ‘it’ needs a new name to distinguish ‘it’ from all the other terms people throw around which describe part, but not all, of what we are talking about. ‘It’ is not just a fractal, or something that resembles a tree or a neuron or X, Y or Z. A quick definition could perhaps be something like “the adaptive energy dissipation (living) structure found throughout Nature that is the result of vascularization”, but even this is not quite right. As we gain more knowledge of Knowm, we will ascribe more meaning to it. However, for this to happen, ‘it’ needs a name. Ideally a name with little to no scientific baggage.
We chose the word “Knowm” for the following reasons: (1) Its short, new and easy to remember with no pre-existing scientific associations. (2) It denotes ‘a thing’. Since we believe that Knowm’s are alive (yes, even rivers and lightning), it makes sense to talk about it like that. (3) The word rhymes with Ohm, which is the unit of resistance. Knowm is a flow-structure, and our work deals with utilizing memristors to mimic Knowm’s building block, something we call Knowm’s Synapse or Nature’s Transistor. (4) ‘know’ denotes knowledge. Our work primarily deals with building adaptive learning machines, so this fits. We also believe there is an intrinsic physical intelligence in knowms, as they actively search to find energy-dissipating pathways. We believe knowledge is the product of intelligence and related to this process. (5) Knowm sounds like the friendly fairy-tale characters called ‘Gnomes’, which are the protectors of Nature that live in trees. (6) Rhymes with the mantra ‘Om’, which is said to be “the sound of the universe” and has a significant spiritual meaning to many people. We have also found that many ascribe spiritual significance to Knowms. Since we see Knowm’s everywhere in Nature, to the point where it appears as if Nature may itself be built of them, we feel the name honors this spiritual or ‘holistic’ sense. To be clear, we are not in search of or proponents of spirituality. We are scientists and engineers in search of knowledge and technology who have a deep respect for both Nature and people.
As for naming our company after it, we felt that it made sense. We coined the name ‘Knowm’ long before we formed the company. Our affiliated company “KnowmTech” was formed in 2001, for example. Since our mission is to facilitate a greater understanding of Knowm (or whatever you want to call it!) and its applications to technology, particularly in neuromemristive processors, we feel the name is appropriate and we do not mean any disrespect.
Most of us at Knowm Inc agree with Nobel laureate physicist Richard Feynman: “I learned very early the difference between knowing the name of something and knowing something.” That is, a name has no real meaning in itself and ultimately does not matter much to understanding. It’s the meaning we ascribe to the name that matters. What really matters to us is how things work, not what things are called. We would very much like “it” (Knowm) to have a unique name so that we can ascribe meaning to it because “it”, in our opinion, is it’s own thing. Until ‘it’ has a unique name, it’s going to be hard to study ‘it’ because folks will not understand specifically what we are talking about. Hence we made up a name so we could get to work.
Its an analog synaptic processor. It reduces synaptic integration and adaptation to analog operations on memristors, thus saving the considerable amount of energy required to shuttle multiple bits backand forth between memory and processing. Each “bit” in kT-RAM is a multi-bit analog synaptic weight thanks to the differential pair of memristors.
It has to do with saturation of the differential memristors. The synapse is encoded as the difference in conductance between the two memristors that form the pair: Gs=Ga-Gb. If you only ever apply a positive bias, both memristors will saturate and your state is lost. Same thing if you only apply negative voltage. So pairing the instructions keeps things working. You could also utilize natural decay if the memristors are ‘volatile’. That is, you could drive the conductance higher and then wait while their conductance comes back down or normalizes. One way or another, you have to prevent saturation in the differential pair.
Absolutely. Given some spike stream (coming from feature learners), you can spin up an AHaH node (equal in size to the spike stream space) for each label. You can do this serially or in parallel, depending on the size and quantity of the cores.
A small enough voltage will not alter resistance, depending on the physics of the specific memristors. The low-power solution to adaptive learning involves understanding how to build a system where the parts break, because if your voltage is very low (and hence you are consuming low power) and you want to adapt at the same voltage, then your synapses will become volatile because the barrier potential between states will be of the same order as random thermal energy. If you can repair this constant damage, you get the low-power adaptive learning solution. Like your brain right now, which is basically a big hunk of volatile pudding. Our current memristor technology does provide for a non-destructive read, but our methodology (AHaH Computing), solves for the more general case and gives us a scaling path to much higher levels of adaptive efficiency. (Nobody appears to understand this, BTW). So we could set the core voltage below the forward adaptation voltage of our BSAFW memristors, say .1V, and execute FF instructions to read without having to worry.
Neural Networks (the algorithms) are collections of linear neurons with non-linear activation functions that take real-valued inputs and multiply by real-valued weights. kT-RAM is a generic synapse resource that takes spike inputs (x=0 or x=1), and multiplies by real-valued weights to produce a real-valued output. (The spike-code is a hardware constraint, but AHaH nodes can in principle work with non-spike inputs as well.)
Thats not how spike encoding works! There is no such thing as “0 as a negative pulse”. Thats a binary code. An input spike has a spike or no spike (z). z means it is electrically floating, which cannot transmit a state. Note that AHaH nodes can work with binary codes, as well as continuous inputs. Spike codes are nice for a number of reasons, including that they map to address spaces perfectly and (so far as we know), are universal and can achieve the same results as other encodings.
The same AHaH circuit can come to represent any logic gate, except the XORs since they are non-linear and an AHaH node is a linear discriminator. However, you can combine multiple other logic gates together to achieve XOR and the same is true for AHaH attractor states. So while you would have had to hard-wire a logic gate in circuitry to be a NAND or XOR gate (or whatever else), you can make a memristor circuit that can become whatever gate you want via learning or programming. Unsupervised AHaH attractor states are universal logic functions and can thus be used as the basis of a computing fabric. If the logic function must be learned or later programmed, or if the input space is large (for example pattern recognition or inference), then AHaH nodes are an good option. If the logic function is set and never needs to be changed, then there is no reason for them–just use a dedicated circuit. The key features of AHaH nodes are reconfigurability and learning.
please see the “AHaH Attractors Support Universal Algorithms” section of the AHaH Computing paper. Take some input state, lets call it “zero” or “false” or “state 1”, and assign it a spike pattern. This could be whatever you want. Do the same for another input state that you call “one” or “true” or “state 2”. The example used in the PLOS paper: Logic state–>spike pattern “true”–>[1,z] false–>[z,1] Note we could have other options, like: true–>[1,z] false–>[1,1] or: true–>[1,1,z] false–>[z,1,1] etc Combine the resulting spike patterns so they can be processed by an AHaH node. In our first example, we would need four inputs since each input required two lines. Giving the AHaH node the logic input “true-false” on logic input line 1 and 2, respectively, would yield the spike pattern: [1,z,z,1] Measure the output state (voltage) of the AHaH node. If it is a positive voltage, call that “true”. if it is a negative voltage, call that “false”. The AHaH Node is now a logic gate. It takes as input two logic states and returns a logic state. To find out what logic gate it is, you measure the output of the AHaH node for different input patterns. You can build up a truth table that shows the output for each input pattern. You will find that it obeys one of the 16 possible logic functions (for a binary two-input, one-ouput gate). You could also have a non-binary logic system, and construct spike-encoding for those states, make a truth table, and do everything the same. The attractor points of unsupervised AHaH plasticity are logic functions.
kT-RAM has a RAM (SRAM) interface. Each synapse (memristor pair) is coupled to the driver and read-out electrodes via pass-gates. These pass gates are controlled via the state of the RAM bits. Once the synapse (or synapses) is/are coupled to the driver electrodes, the AHaH Controller can read out the state and apply feedback. All the other cells remain decoupled.
kT-RAM has a spike interface, so depending on what you mean by linear regression the answer is either yes or no. kT-RAM is an AHaH Node substrate (with a spike interface) and an AHaH Node is similar to a linear neuron with a sigmoid/tanh activation function (but not exactly). You can threshold the activation voltage of an AHaH Node, compare it to other node activation voltages (i.e. sort) or digitize it (which is expensive). You can do many things with AHaH Nodes, including non-linear classification, if you string them together in multiple stages. This is no different than standard neural networks. Read the PLOS Paper for a list of some stuff. An AHaH Node is a memory-processing primitive, and kT-RAM is a computing substrate. It is not an algorithm. Via the KnowmAPI we have been and continue to learn new ways to use it.
It really depends on what you mean by “large” and what you mean by “scale”. It is likely you are attempting to see kT-RAM as a whole solution rather than as a part of a solution. A 512 X 512 core could be used to emulate neurons with up to 262,144 synapses, which is about at the max end of biology if you look at Purkinje cells. However, that same core could be used to emulate 20 neurons with 13,107 synapses or 100 neurons with 2621 synapses. You could have neurons with various sizes, or you could go all the way down to the individual synapse. This flexibility has a trade-off in energy due to capacitive losses in the H-Tree. Simple optimizations for larger cores such as chokes help to reduce this capacitive loss, but this is an engineering problem just like everything else. The purpose of kT-RAM is to provide for a more flexible adaptive synaptic resource at the hardware level, embedded into larger architectures of various types, just like SRAM or various types of logic blocks are embedded into chip designs of various types. As for scaling, there is usually a trade-off in terms of flexibility and power when it comes to computing architectures. The modern CPU is a great example. It is a “jack of all trades and a master of none” and they have come to dominate the computing world. It is important to keep in mind two things. First, kT-RAM is an “adaptive synaptic resource” intended to be used as a co-processor within a variety of large-scale architectures, like a mesh-grid of cpu cores. Second, kT-RAM is just one possible implementation of AHaH nodes! Crossbars, for example, are another. Each has its own advantages and disadvantages, and depending on what you want to achieve, you should go with the best solution. Remember, it’s not possible to beat physics. But it is possible to clearly define what you want and then explore the space of possibilities that give you the best solution.
To correct the false-assumption, a chip capable of AHaH plasticity is not necessarily a chip constrained to learning in only one way. We have only come to this understanding through our work with AHaH Computing Learning algorithms become specific instruction set sequences or routines, where each operation results in Anti-Hebbian or Hebbian learning. At the lowest level, Anti-Hebbian just means “move the synapse toward zero” and Hebbian means “move it away from zero”. People have come to the misconception that we are only working with a ‘local’ or ‘fixed’ learning rule. On the contrary, we have defined an instruction set from which the local unsupervised rule (FF-RU for example) is just one possibility, which does not preclude global computations. We hold ML in very high regards, and it is the work of folks like Yann Lecun (and the others) that are pushing the boundaries, using the tools available to them. They are the current undisputed champions in primary performance benchmarking. We think the field of neuromorphics should re-align with primary performance benchmarking like ML has. As far as evidence that spikes and low-resolution synapse can work, it does beg the question of how our brain do it, seeing that they are (ultra efficient) spike-based networks with low-resolution synapses. From a hardware perspective, spikes and limited precision synapses make the most sense. From the algorithmic perspective, work like this, which is motivated by attempts to map algorithms to more efficient hardware, demonstrates that limited precision synapses can work better than full precision. Our results with the KnowmAPI support this. Our goal is to achieve ‘primary performance parity’ with state-of-the-art machine learning. We do not want to re-invent the field of ML–we want to achieve what it has already achieved in a much more efficient learning substrate. The massive strides being made in machine vision (and almost everything else) is wonderful, and we are watching closely what algorithms and approaches are working best. We will take what we can and port it over to AHaH Computing, and we will invent new approaches if needed. Rather than ignore the adaptive power problem, like others do, we are working on a solution.
They are not actually the same thing. True-North is mesh of programmable digital cores that pass spikes around. kT-RAM is a specification for an adaptive synaptic resource intended to be used as a co-processor within a variety of large-scale architectures. For example, the SRAM inside the True-North core could be replaced with a kT-RAM core (or cores), and the result would be on-chip learning, more synapses, lower power and more flexibility at the core level, constrained to the specific large-scale topology of True-North (grid of cores). IBM would have to ditch most of their software and methodologies, so its arguable if it would be worth it instead of just building new large-scale architectures from scratch.
Not really. That is a mathematical idealization of a neuron. Its actually more like that in kT-RAM than it is in a real neuron. Although I would say that this separation you are referring to is not like the separation in a digital computer. Also, you left out learning in this idealization. Just zoom into a (real) dendrite a little, or a (real) neuron, or anything (real). At some point you will see that the idealization of memory and processing being separate does not hold. To describe anything you will need both state information (physical objects or ‘memory’) and transformations (laws of physics acting on those objects). A neuron does not separate memory and processing and shuttle bits back and forth. It is a merging of memory and processing. A synapse is not memory and it is not processing — it is a merging of the two. A soma is not memory and it is not processing. Its a merging of the two. And so on.
Our memristors are non-volatile, so state is already saved. In terms of state duplication or ‘programming’, there are multiple ways but two general categories. The first way is to use various techniques to program each each memristor into a defined resistance state. This is fairly time consuming if done electrically but a recent breakthrough in optical interface technology may greatly speed this up. The second way is through supervised learning. A ‘master’ acts as a teacher to a ‘student’, the result being a copy of neural state function. This method is faster than one might expect and has side benefits. First, this method does not require any additional interfacing circuitry. Second, since the student is learning it can adapt around faults in the hardware.
Knowm Memristors are non-volatile and hold their state.
AHaH Computing “crosses the technology stack”, so my advice is to learn as much as you can about the whole process of computing, from silicon wafers to real-world application development. I would then focus on one or two levels of the stack. Physics: Classical including Electricity & Magnetism and Thermodynamics. If you want to go into memristor device fabrication you will need Quantum and Physical Chemistry. In terms of Thermodynamics, be careful here. Be sure to check out some “fringe” stuff like Constructal Theory. The goal is to understand how life works as a physical process, and modern physics educations have very little if nothing to say about it. They can, however, give you solid foundations. Electronics: You could specialize here or, just get a good overview. Introduction to electronics is a must. After that you should know the basics of VLSI and how chips are made. Circuit design, both analog and digital. CS: Foundations in computing. You need to really understand how modern processors work, and how code gets turned into instructions that execute on processors. If you specialize here, do not forget about what is occurring at a physical level! Get really good at linux, and understand how the operating system connects to low-level peripherals and co-processors. Machine Learning: Overview of existing methods (decision trees/forrest, SVM, Neural Networks, Baysian Models, etc) and domains (perception, planning, control). Unsupervised learning. Focus on methods that have been commercialized in the real-world and try to understand why. If you specialize here, do not forget about what is occurring at a physical level in the computers! The machine learning community is notorious for simultaneously exploiting hardware accelerators while also deemphasizing their importance. They (for the most part) see hardware as something that should support their algorithms, not something that is intertwined with algorithms and physics. Neuroscience/Computational Neuroscience: Be very careful here! Use (biological) neuroscience as inspiration, but try not to fall into the “lets mimic” camp as a route to understanding learning algorithms. The brain is far too complex to focus on the minute details if your goal is useful technology. Many in the neuromorphic community have large blinders on in regards to real-world problem solving and the results are chips that are only useful in a narrow context. The ML community is better focused in this regard, but they in turn have a “hardware blindspot”. Rhetoric & Debate Learn how to communicate effectively with other people. Get good at recognizing logical fallacies. Technology is heavily intertwined with money and egos. For whatever reason, folks tend to get really stubborn and angry around ideas that threaten the status quo. They will react with arguments that on the surface may seem reasonable but after inspection are illogical. While technology is the foundation, our economy is driven by the interactions of people and human communications.
Moore’s law does not resolve the memory-processing duality. It’s going to be us or somebody else, but the problem will be solved because a tremendous amount of energy is currently wasted shuttling information back and forth and this will not be resolved except through fundamental changes in computing architectures. At this point in time, much larger performance (speed, efficiency) increases can be achieved through code optimization and changes in processor architecture than through smaller transistors. Innovations are now being driven by alternative architectures. We are not aware of a more efficient architecture for synaptic integration and learning than one that eliminates the memory-processing duality and enables low adaptation voltages. Also, Moore’s law as we have all come to know it–is unambiguously dead.
AHaH nodes are building blocks. You use them to build models, and we have only just begun doing that. We are aiming for all sorts of models across perception, planning and control. We have shown decision trees, linear classifiers, fully-connected layers, combinatorial solvers, robotic arm actuators, reconfigurable logic and more. Our goal with the paper was not to show a model. It was to show a generic adaptive building block circuit that can be used to make many models.
Backprop is a wonderfully useful algorithm and works well with current methods of computing. However, it is clear that backprop is not operating in the brain (at least in the way it is currently formulated mathematically) and hence there are clearly other methods available. When constraining solutions based on physics and circuits, so as to side-step the adaptive power problem, backprop becomes problematic. It is important to understand how intertwined machine learning is with the computational platforms that enable it. When the constraints of those platforms change, the optimal machine learning models change. Large GPU back-prop trained models today contain perhaps 100 million to 1 billion weights. The human cortex contains about 150,000 billion synapses and consumes about billion time less energy and space. That said, we’ve seen a few studies showing how to attain backprop operating in interesting ways that may be easy to port to kT-RAM and memristors in general, for example here. A number of groups are working on neuromemristive back-prop. Here is one of backpropagation’s inventors, Geoffrey Hinton, talking about back prop and STDP. Once we develop and perfect AHaH based methods that achieve primary performance parity with existing methods like backprop, Knowm will dominate in secondary metrics such as power efficiency, speed, and cost. At that point, backprop as it currently exists will be obsolete. Until then, memristors are likely still going to be the most efficient path to implementation of the backpropagation algorithm in hardware.
You are correct. For spreadsheets and email apps, that architecture is fine. But for processing the vast amounts of data needed for real-time machine learning apps, it’s a serious problem. CPUs are being designed with memory closer to the processor, they are making multi-core processors, and there are new chips like Adapteva’s Epiphany. All of these tricks reduce the distance between processor and memory. Nature, on the other hand, uses a different approach to computing. It’s a system where the processor is the memory. The distance is zero. With memristors, the same can be achieved. One way to do it would be our proposed Thermodynamic-RAM. We write about it in a paper, which has a historical background section to give some perspective.
Great question! There is a lot to this topic, but let me try to give you a short and simple answer. We are using differential memristors as synapses. The energy dissipated in a synapse is the power dissipated over both memristors for duration of the read/write pulses. , where V is the voltage and R the resistance. Typical on-state resistance of memristor are 500kOhm, and typical voltage is .5V, so: P=.5^2 / 500E3 = 5E-7 watts. The energy is only dissipated during read and write events, which occur in pulses of ~50ns (nanoseconds) or less. The energy per memristor per synaptic read or write event is then 5E-7W x 50E-9s = 2.5E-14 Joules. Since the kT-RAM instruction set requires paired read-write instructions, and since a synapse is two memristors, we multiple that answer by four: 1E-13. This is .1 pico-joules per adaptive synaptic event. Note we could lower the voltage and and pulse width to achieve even lower power for synaptic integration operations (i.e. no learning). Also note that capacitance plays a big role, so if your AHaH node is very large it will dissipate more energy as electrons get soaked up on the wires. If we say a human brain has 100 Billion neurons, each with 1000 synapse, that fire on average once per second, that is 1E14 adaptive synaptic events per second. The energy consumed in one second is 20 Joules. So if we put all energy into synaptic events we get 2E-13. The actual deployed power consumption of kT-RAM is dependent on what sort of computing architecture its embedded in. Very small cores are more efficient than larger cores for processing smaller spike streams. The purpose of AHaH circuit is to remove the memory-processing bottleneck for adaptive learning operations. If you knew exactly the connection topology of a brain and made a custom ASIC of AHaH nodes you are looking at efficiencies comparable to and possibly even exceeding biology (eventually). The reason for this is that our modern methods of chip communication can be quite a bit more efficient (and faster) than biology in some respects. However, if you have a more generic architecture that enables you to explore more connection typologies, for example a mesh grid of little CPUs with local RAM and kT-RAM, you would expend more energy but get something more flexible: The ability to emulate any brain, not just one specific type.
No. GPUs and CPUs are absolutely great at what they were designed to do. G stands for ‘Graphics’, after all! They will likely be around for a long time. There will be new types of hardware available that will dominate applications in machine learning and give the everyday consumer access to an entire new range of applications and abilities that are not possible right now. We recently proposed one such device that would plug into existing computer platforms and allow for many exciting new possibilities. It’s called Thermodynamic-RAM.
We really don’t know. Its an important question, and we have theories about what attention is, but our main focus is in solving well-specified real-world problems.
We are not specifically focused on biology. Rather, we are interested in natural self-organization and how to harness it. We believe that biology has ‘hijacked’ or ‘co-opted’ a more fundamental process of energy attempting to dissipated itself through adaptive containers. Ultimately we are trying to solve problems, not build brains. Our primary concerns are solving real-world problems, and although biology is a big source of inspiration, our technology is not going to look much like biology.
AHaH Nodes are adaptive. The act of learning creates attractor states that repair state information. The act of using kT-RAM heals it, because kT-RAM can learn around faults. Some synapses could stop working, inputs could turn off or become chaotic, etc. Learning will constantly tune the functioning synapses.
A star-gate will open up and confetti will fall from the sky. But seriously, physical kT-RAM is all about power and space efficiency.
kT-RAM is an “AHaH Node resource”. Anti-Hebbian and Hebbian (AHaH) nodes can be used for things like feature learning, classification and combinatorial optimization. You can think of an AHaH Node as a ‘neuron’ and you can make networks out of them and you can solve the same sorts of problem that neural networks solve. But you can also do more basic stuff like logic, memory, random number generation and set iteration. AHaH Nodes can be used in many way and we have just started trying various things out. We believe AHaH nodes are universal adaptive resources, and our success to date has given us confidence that a great number of effective and efficient solutions can be attained with them.
11 Comments
Knowm.org | AHaH Computing in a Nutshell