“What you can imagine depends on what you know.” –Daniel C. Dennett
There exists a computational building block or primitive that is formed of ‘energy dissipating pathways competing for conduction resources‘. It manifests at all scales of Nature, from neurons to river basins. You can realize this building block efficiently in electronic systems with memristors. Unlike traditional computing, this building block combines or ‘mixes’ memory and processing. We call the building-block an “AHaH Node”. Just as the transistor is the basis of many circuits, AHaH Nodes are the basis of many circuits. The paper shows how AHaH nodes can be used as the basis of general-purpose computing and machine learning. Since AHaH Nodes mix memory and processing, they are very efficient for memory-processing intensive operations like learning.
Yes. Please contact us to discuss your ideas or needs.
We are truly sorry if how we have named things is upsetting. There are two issues. First, there is the name for the “thing in Nature”. Second, there is the name of our organization.
In regards to the ‘thing in Nature’, there are two problems with using a descriptive term. First, it is cumbersome in dialog. This leads to acronyms, which are also cumbersome and very confusing to newcomers. Second, when you are trying to build a technology/science around something like ‘Knowm’, descriptive terms are usually not quite right. We mean something specific with the word “Knowm”, and we believe ‘it’ needs a new name to distinguish ‘it’ from all the other terms people throw around which describe part, but not all, of what we are talking about. ‘It’ is not just a fractal, or something that resembles a tree or a neuron or X, Y or Z. A quick definition could perhaps be something like “the adaptive energy dissipation (living) structure found throughout Nature that is the result of vascularization”, but even this is not quite right. As we gain more knowledge of Knowm, we will ascribe more meaning to it. However, for this to happen, ‘it’ needs a name. Ideally a name with little to no scientific baggage.
We chose the word “Knowm” for the following reasons: (1) Its short, new and easy to remember with no pre-existing scientific associations. (2) It denotes ‘a thing’. Since we believe that Knowm’s are alive (yes, even rivers and lightning), it makes sense to talk about it like that. (3) The word rhymes with Ohm, which is the unit of resistance. Knowm is a flow-structure, and our work deals with utilizing memristors to mimic Knowm’s building block, something we call Knowm’s Synapse or Nature’s Transistor. (4) ‘know’ denotes knowledge. Our work primarily deals with building adaptive learning machines, so this fits. We also believe there is an intrinsic physical intelligence in knowms, as they actively search to find energy-dissipating pathways. We believe knowledge is the product of intelligence and related to this process. (5) Knowm sounds like the friendly fairy-tale characters called ‘Gnomes’, which are the protectors of Nature that live in trees. (6) Rhymes with the mantra ‘Om’, which is said to be “the sound of the universe” and has a significant spiritual meaning to many people. We have also found that many ascribe spiritual significance to Knowms. Since we see Knowm’s everywhere in Nature, to the point where it appears as if Nature may itself be built of them, we feel the name honors this spiritual or ‘holistic’ sense. To be clear, we are not in search of or proponents of spirituality. We are scientists and engineers in search of knowledge and technology who have a deep respect for both Nature and people.
As for naming our company after it, we felt that it made sense. We coined the name ‘Knowm’ long before we formed the company. Our affiliated company “KnowmTech” was formed in 2001, for example. Since our mission is to facilitate a greater understanding of Knowm (or whatever you want to call it!) and its applications to technology, particularly in neuromemristive processors, we feel the name is appropriate and we do not mean any disrespect.
Most of us at Knowm Inc agree with Nobel laureate physicist Richard Feynman: “I learned very early the difference between knowing the name of something and knowing something.” That is, a name has no real meaning in itself and ultimately does not matter much to understanding. It’s the meaning we ascribe to the name that matters. What really matters to us is how things work, not what things are called. We would very much like “it” (Knowm) to have a unique name so that we can ascribe meaning to it because “it”, in our opinion, is it’s own thing. Until ‘it’ has a unique name, it’s going to be hard to study ‘it’ because folks will not understand specifically what we are talking about. Hence we made up a name so we could get to work.
Its an analog synaptic processor. It reduces synaptic integration and adaptation to analog operations on memristors, thus saving the considerable amount of energy required to shuttle multiple bits backand forth between memory and processing. Each “bit” in kT-RAM is a multi-bit analog synaptic weight thanks to the differential pair of memristors.
It has to do with saturation of the differential memristors. The synapse is encoded as the difference in conductance between the two memristors that form the pair: Gs=Ga-Gb. If you only ever apply a positive bias, both memristors will saturate and your state is lost. Same thing if you only apply negative voltage. So pairing the instructions keeps things working. You could also utilize natural decay if the memristors are ‘volatile’. That is, you could drive the conductance higher and then wait while their conductance comes back down or normalizes. One way or another, you have to prevent saturation in the differential pair.
Absolutely. Given some spike stream (coming from feature learners), you can spin up an AHaH node (equal in size to the spike stream space) for each label. You can do this serially or in parallel, depending on the size and quantity of the cores.
A small enough voltage will not alter resistance, depending on the physics of the specific memristors. The low-power solution to adaptive learning involves understanding how to build a system where the parts break, because if your voltage is very low (and hence you are consuming low power) and you want to adapt at the same voltage, then your synapses will become volatile because the barrier potential between states will be of the same order as random thermal energy. If you can repair this constant damage, you get the low-power adaptive learning solution. Like your brain right now, which is basically a big hunk of volatile pudding. Our current memristor technology does provide for a non-destructive read, but our methodology (AHaH Computing), solves for the more general case and gives us a scaling path to much higher levels of adaptive efficiency. (Nobody appears to understand this, BTW). So we could set the core voltage below the forward adaptation voltage of our BSAFW memristors, say .1V, and execute FF instructions to read without having to worry.
We have ‘interchangeable cores’. One core is for detailed memristor simulations (MEMRISTOR). The others are for efficient deployments on applications (NIBBLE,BYTE,FLOAT). Each module (like the classifier or feature learners) maps to the kT-RAM instruction set. So we can develop on BYTE core because its fast, then check it will work with our memristor (swap in the MEMRISTOR core), then deploy applications on NIBBLE or BYTE. Our current BYTE and NIBBLE emulator is very efficient and comparable (and in some ways surpasses) the efficiency of existing machine learning methods operating on existing digital computing platforms. Lots of caveats here, of course. Until we have kT-RAM we are under the same constraints as everybody else. So we have developed a path whereby we can commercialize on existing digital platforms. Yes, large multicore super computer would help. Thats why we are developing the SENSE Server. Its a cloud-scalable compute resource with hooks to kT-RAM emulators optimized for FPGAs, GPUs, multi-core CPUs, etc. We plan on turning anything that we can into a kT-RAM emulator.
Neural Networks (the algorithms) are collections of linear neurons with non-linear activation functions that take real-valued inputs and multiply by real-valued weights. kT-RAM is a generic synapse resource that takes spike inputs (x=0 or x=1), and multiplies by real-valued weights to produce a real-valued output. (The spike-code is a hardware constraint, but AHaH nodes can in principle work with non-spike inputs as well.)
Thats not how spike encoding works! There is no such thing as “0 as a negative pulse”. Thats a binary code. An input spike has a spike or no spike (z). z means it is electrically floating, which cannot transmit a state. Note that AHaH nodes can work with binary codes, as well as continuous inputs. Spike codes are nice for a number of reasons, including that they map to address spaces perfectly and (so far as we know), are universal and can achieve the same results as other encodings.
The same AHaH circuit can come to represent any logic gate, except the XORs since they are non-linear and an AHaH node is a linear discriminator. However, you can combine multiple other logic gates together to achieve XOR and the same is true for AHaH attractor states. So while you would have had to hard-wire a logic gate in circuitry to be a NAND or XOR gate (or whatever else), you can make a memristor circuit that can become whatever gate you want via learning or programming. Unsupervised AHaH attractor states are universal logic functions and can thus be used as the basis of a computing fabric. If the logic function must be learned or later programmed, or if the input space is large (for example pattern recognition or inference), then AHaH nodes are an good option. If the logic function is set and never needs to be changed, then there is no reason for them–just use a dedicated circuit. The key features of AHaH nodes are reconfigurability and learning.
please see the “AHaH Attractors Support Universal Algorithms” section of the AHaH Computing paper. Take some input state, lets call it “zero” or “false” or “state 1”, and assign it a spike pattern. This could be whatever you want. Do the same for another input state that you call “one” or “true” or “state 2”. The example used in the PLOS paper: Logic state–>spike pattern “true”–>[1,z] false–>[z,1] Note we could have other options, like: true–>[1,z] false–>[1,1] or: true–>[1,1,z] false–>[z,1,1] etc Combine the resulting spike patterns so they can be processed by an AHaH node. In our first example, we would need four inputs since each input required two lines. Giving the AHaH node the logic input “true-false” on logic input line 1 and 2, respectively, would yield the spike pattern: [1,z,z,1] Measure the output state (voltage) of the AHaH node. If it is a positive voltage, call that “true”. if it is a negative voltage, call that “false”. The AHaH Node is now a logic gate. It takes as input two logic states and returns a logic state. To find out what logic gate it is, you measure the output of the AHaH node for different input patterns. You can build up a truth table that shows the output for each input pattern. You will find that it obeys one of the 16 possible logic functions (for a binary two-input, one-ouput gate). You could also have a non-binary logic system, and construct spike-encoding for those states, make a truth table, and do everything the same. The attractor points of unsupervised AHaH plasticity are logic functions.
See the Knowm Library for more books
kT-RAM has a RAM (SRAM) interface. Each synapse (memristor pair) is coupled to the driver and read-out electrodes via pass-gates. These pass gates are controlled via the state of the RAM bits. Once the synapse (or synapses) is/are coupled to the driver electrodes, the AHaH Controller can read out the state and apply feedback. All the other cells remain decoupled.
kT-RAM has a spike interface, so depending on what you mean by linear regression the answer is either yes or no. kT-RAM is an AHaH Node substrate (with a spike interface) and an AHaH Node is similar to a linear neuron with a sigmoid/tanh activation function (but not exactly). You can threshold the activation voltage of an AHaH Node, compare it to other node activation voltages (i.e. sort) or digitize it (which is expensive). You can do many things with AHaH Nodes, including non-linear classification, if you string them together in multiple stages. This is no different than standard neural networks. Read the PLOS Paper for a list of some stuff. An AHaH Node is a memory-processing primitive, and kT-RAM is a computing substrate. It is not an algorithm. Via the KnowmAPI we have been and continue to learn new ways to use it.
It really depends on what you mean by “large” and what you mean by “scale”. It is likely you are attempting to see kT-RAM as a whole solution rather than as a part of a solution. A 512 X 512 core could be used to emulate neurons with up to 262,144 synapses, which is about at the max end of biology if you look at Purkinje cells. However, that same core could be used to emulate 20 neurons with 13,107 synapses or 100 neurons with 2621 synapses. You could have neurons with various sizes, or you could go all the way down to the individual synapse. This flexibility has a trade-off in energy due to capacitive losses in the H-Tree. Simple optimizations for larger cores such as chokes help to reduce this capacitive loss, but this is an engineering problem just like everything else. The purpose of kT-RAM is to provide for a more flexible adaptive synaptic resource at the hardware level, embedded into larger architectures of various types, just like SRAM or various types of logic blocks are embedded into chip designs of various types. As for scaling, there is usually a trade-off in terms of flexibility and power when it comes to computing architectures. The modern CPU is a great example. It is a “jack of all trades and a master of none” and they have come to dominate the computing world. It is important to keep in mind two things. First, kT-RAM is an “adaptive synaptic resource” intended to be used as a co-processor within a variety of large-scale architectures, like a mesh-grid of cpu cores. Second, kT-RAM is just one possible implementation of AHaH nodes! Crossbars, for example, are another. Each has its own advantages and disadvantages, and depending on what you want to achieve, you should go with the best solution. Remember, its not possible to beat physics. But it is possible to clearly define what you want and then explore the space of possibilities that give you the best solution. For an interesting and entertaining read on some of this, see the Gordon-Panthana dialog.
To correct the false-assumption, a chip capable of AHaH plasticity is not necessarily a chip constrained to learning in only one way. We have only come to this understanding through our work with AHaH Computing Learning algorithms become specific instruction set sequences or routines, where each operation results in Anti-Hebbian or Hebbian learning. At the lowest level, Anti-Hebbian just means “move the synapse toward zero” and Hebbian means “move it away from zero”. People have come to the misconception that we are only working with a ‘local’ or ‘fixed’ learning rule. On the contrary, we have defined an instruction set from which the local unsupervised rule (FF-RU for example) is just one possibility, which does not preclude global computations. We hold ML in very high regards, and it is the work of folks like Yann Lecun (and the others) that are pushing the boundaries, using the tools available to them. They are the current undisputed champions in primary performance benchmarking. We think the field of neuromorphics should re-align with primary performance benchmarking like ML has. As far as evidence that spikes and low-resolution synapse can work, it does beg the question of how our brain do it, seeing that they are (ultra efficient) spike-based networks with low-resolution synapses. From a hardware perspective, spikes and limited precision synapses make the most sense. From the algorithmic perspective, work like this, which is motivated by attempts to map algorithms to more efficient hardware, demonstrates that limited precision synapses can work better than full precision. Our results with the KnowmAPI support this. Our goal is to achieve ‘primary performance parity’ with state-of-the-art machine learning. We do not want to re-invent the field of ML–we want to achieve what it has already achieved in a much more efficient learning substrate. The massive strides being made in machine vision (and almost everything else) is wonderful, and we are watching closely what algorithms and approaches are working best. We will take what we can and port it over to AHaH Computing, and we will invent new approaches if needed. Rather than ignore the adaptive power problem, like others do, we are working on a solution.
They are not actually the same thing. True-North is mesh of programmable digital cores that pass spikes around. kT-RAM is a specification for an adaptive synaptic resource intended to be used as a co-processor within a variety of large-scale architectures. For example, the SRAM inside the True-North core could be replaced with a kT-RAM core (or cores), and the result would be on-chip learning, more synapses, lower power and more flexibility at the core level, constrained to the specific large-scale topology of True-North (grid of cores). IBM would have to ditch most of their software and methodologies, so its arguable if it would be worth it instead of just building new large-scale architectures from scratch.
Not really. That is a mathematical idealization of a neuron. Its actually more like that in kT-RAM than it is in a real neuron. Although I would say that this separation you are referring to is not like the separation in a digital computer. Also, you left out learning in this idealization. Just zoom into a (real) dendrite a little, or a (real) neuron, or anything (real). At some point you will see that the idealization of memory and processing being separate does not hold. To describe anything you will need both state information (physical objects or ‘memory’) and transformations (laws of physics acting on those objects). A neuron does not separate memory and processing and shuttle bits back and forth. It is a merging of memory and processing. A synapse is not memory and it is not processing — it is a merging of the two. A soma is not memory and it is not processing. Its a merging of the two. And so on.
There are a few answers to this, specific to general comments. First, take a look at an image from some of our early work:
This confusion stems from the following quote from Alex Nugent, Knowm CEO:
“Knowm’s AHaH computing approach combines the best of machine learning and quantum computing via memristors”
Note the “best of” modifier. Quantum computing is about exploiting physics to solve problems (what Alex believes is the good part), using a state of nature that requires conditions of isolation not even seen in deep space (the bad part). We are taking (what we believe is) the good part. The intention of Quantum Computing is great, and it looks amazing on paper. You take a property of Nature and find a way to exploit it to accelerate computations. Attempting to build quantum entangled states for computing leads to incredibly complex machines with little in the way of practical value (so far) compared to the investment of time and money (30 years and billions in research funding). In AHaH Computing we are trying to understand how natural self-organization works, and to exploit these mechanisms to accelerate solutions to some problems. We do not need, nor do we want, to rely on quantum superpositions. We want to deploy practical technology that can do what brains can do by building, rather than calculating, adaptive neural circuits.
Great question. We would love you to order some memristors and test them yourself, and ideally publish the results. We will make sure Dr. Campbell gets the feedback to further optimize. The question we have been more focused on is similar: how much [bits of information] is required? We have built the Knowm API around interchangeable core types, which enable one to plug in different memristor models to see how they work on real-world benchmark problems. 4 bits is good for many things, but 8 bits is generally enough for everything. There is also the issue of how you measure resolution. In AHaH Computing we are primarily interested in the incremental response. Given an upper and lower conductance bound, how many pulses on average does it take to move the conductance from high to low and low to high? This is actually different than attempting to ‘set’ or ‘program’ the device in one step with an applied voltage or compliance current. As the voltage-time product (“voltage flux”) is lowered, the incremental response becomes more fine-grained. Another issue is time. A memristor’s conductance state may decay over time, which under some measurements would manifest as low resolution. However, keep in mind that a continuous learning system is also constantly repairing the state, or stated differently, performing constant error-correction. Based on the preliminary data we have seen coming from Dr. Campbell’s lab, we are guessing between 6 and 9 bits. However, this could change as we get more data or refine how its measured. In terms of device variation, the memristors are adaptive over a conductance range. So long as this range overlaps, then the differential pair will ‘auto-tune’ via learning. If a memristor-pair becomes non-adaptive, then the synapse is no longer capable of learning. Depending on the application, this may or may not matter. If the synapse was part of a classifier and the data statistics do not change, then it does not matter. If the data statistics do change, then other synapses can adapt to compensate. Exploiting this adaptive behavior to repair around faulty synapses was some of the first work we did with the (unsupervised) AHaH rule so many year ago. See Reliable computing with unreliable components: Using separable environments to stabilize long-term information storage.
There are three methods. First, our memristors are programmable and can be set to defined resistance values. This does, however, complicate the driver circuity in the AHaH Controller. Second, there is something called ‘current injection’ combined with the FF-RA instruction. This can also be used for A2D operations (converting analog signals into a spike code). Third, its fast to sync a neural state. Given an original kT-RAM and a new kT-RAM, spike patterns are generated and the original response is recorded. The same spike pattern is loaded onto the new kT-RAM and supervised learning (FF-RH or FF-RL instructions) syncs the states. This is fast because multiple synapses can be modified at the same time.
Memristors are non-volatile and hold their state. SRAM interface is used only to temporarily couple memristors to driver electrodes. I.e. to “load spike patterns”.
AHaH Computing “crosses the technology stack”, so my advice is to learn as much as you can about the whole process of computing, from silicon wafers to real-world application development. I would then focus on one or two levels of the stack. Physics: Classical including Electricity & Magnetism and Thermodynamics. If you want to go into memristor device fabrication you will need Quantum and Physical Chemistry. In terms of Thermodynamics, be careful here. Be sure to check out some “fringe” stuff like Constructal Theory. The goal is to understand how life works as a physical process, and modern physics educations have very little if nothing to say about it. They can, however, give you solid foundations. Electronics: You could specialize here or, just get a good overview. Introduction to electronics is a must. After that you should know the basics of VLSI and how chips are made. Circuit design, both analog and digital. CS: Foundations in computing. You need to really understand how modern processors work, and how code gets turned into instructions that execute on processors. If you specialize here, do not forget about what is occurring at a physical level! Get really good at linux, and understand how the operating system connects to low-level peripherals and co-processors. Machine Learning: Overview of existing methods (decision trees/forrest, SVM, Neural Networks, Baysian Models, etc) and domains (perception, planning, control). Unsupervised learning. Focus on methods that have been commercialized in the real-world and try to understand why. If you specialize here, do not forget about what is occurring at a physical level in the computers! The machine learning community is notorious for simultaneously exploiting hardware accelerators while also deemphasizing their importance. They (for the most part) see hardware as something that should support their algorithms, not something that is intertwined with algorithms and physics. Neuroscience/Computational Neuroscience: Be very careful here! Use (biological) neuroscience as inspiration, but try not to fall into the “lets mimic” camp as a route to understanding learning algorithms. The brain is far too complex to focus on the minute details if your goal is useful technology. Many in the neuromorphic community have large blinders on in regards to real-world problem solving and the results are chips that are only useful in a narrow context. The ML community is better focused in this regard, but they in turn have a “hardware blindspot”. Rhetoric & Debate Learn how to communicate effectively with other people. Get good at recognizing logical fallacies. Technology is intertwined with money and egos. For whatever reason, folks tend to get really stubborn and angry around ideas that threaten the status quo. They will react with arguments that on the surface may seem reasonable but after inspection are illogical. While technology is the foundation, our economy is driven by the interactions of people and human communications.
The PLOS ONE code is not the code we want people to develop with and is there for read-only methods verification. The PLOS One paper code used a functional (mathematical) model. The problem with the functional model is that there are ways to use it that do not map to hardware. The KnowmAPI is built on a modular kT-RAM emulator, allowing users to test various “core types” with different memristor models. That is, its a simulation of an actual circuit, with each core type offering different levels of resolution. The Knowm API is now available for individuals or organizations under a NDA agreement as part of the KDC.
Moore’s law does not resolve the memory-processing duality. Its going to be us or somebody else, but the problem will be solved because a tremendous amount of energy is currently wasted shuttling information back and forth and this will not be resolved except through fundamental changes in computing architectures. At this point in time, much larger performance (speed, efficiency) increases can be achieved through code optimization and changes in processor architecture than through smaller transistors. Innovations are now being driven by alternative architectures. We are not aware of a more efficient architecture for synaptic integration and learning than one that eliminates the memory-processing duality and enables low adaptation voltages.
AHaH nodes are building blocks. You use them to build models, and we have only just begun doing that. We are aiming for all sorts of models across perception, planning and control. We have shown decision trees, linear classifiers, fully-connected layers, combinatorial solvers, robotic arm actuators, reconfigurable logic and more. Our goal with the paper was not to show a model. It was to show a generic adaptive building block circuit that can be used to make many models.
Backprop is a wonderfully useful algorithm and works well with current methods of computing. However, it is clear that backprop is not operating in the brain (at least in the way it is currently formulated mathematically) and hence there are clearly other methods available. When constraining solutions based on physics and circuits, so as to side-step the adaptive power problem, backprop becomes problematic. It is important to understand how intertwined machine learning is with the computational platforms that enable it. When the constraints of those platforms change, the optimal machine learning models change. Large GPU back-prop trained models today contain perhaps 100 million to 1 billion weights. The human cortex contains about 150,000 billion synapses and consumes about billion time less energy and space. That said, we’ve seen a few studies showing how to attain backprop operating in interesting ways that may be easy to port to kT-RAM and memristors in general, for example here. Here is one of backpropagation’s inventors, Geoffrey Hinton, talking about back prop and STDP. Once we develop and perfect AHaH based methods that achieve primary performance parity with existing methods like backprop, Knowm will dominate in secondary metrics such as power efficiency, speed, and cost. At that point, backprop as it currently exists will be obsolete. Until then, memristors are likely still going to be the most efficient path to implementation of the backpropagation algorithm in hardware.
AHaH is not just a ‘local rule’. kT-RAM instructions are paired, but they do not have to occur one right after each other. For example: FF-RH or FF-RL. In the first instruction (FF) you can read the device. You can then compare with other node activations and do whatever you want. You can then send an RH or RL or whatever command at a later time. If you want unsupervised learning you would send a compound instruction like FF-RU. By creating conditionals of instructions, the space of possible routines grow very large. This gets even bigger with “split-conditionals”: groups of synapses can be read together and sub-groups can get different instructions. When you pair kT-RAM instructions with programmable routines, the space of available learning algorithms is truly enormous.
You are correct. For spreadsheets and email apps, that architecture is fine. But for processing the vast amounts of data needed for real-time machine learning apps, it’s a serious problem. CPUs are being designed with memory closer to the processor, they are making multi-core processors, and there are new chips like Adapteva’s Epiphany. All of these tricks reduce the distance between processor and memory. Nature, on the other hand, uses a different approach to computing. It’s a system where the processor is the memory. The distance is zero. With memristors, the same can be achieved. One way to do it would be our proposed Thermodynamic-RAM. We write about it in a paper, which has a historical background section to give some perspective.
Intelligence, like any tool, can be used for various purposes, both good and evil. We are well aware of the dangers and have put considerable thought into it. Many are thinking about it. A good recent book on the topic is Superintelligence. One common fear is if just one group or organization gains control of AI and uses it for their own selfish gain at the expense of others. Toward that scenario we have published our work in an international open-access journal. We are committed to making the technology generally available so all can benefit, and we hold ourselves to high moral standards. That said, if you are fearful about the potential misuse of a technology the best thing you can do is learn about it! Intelligent machines are inevitable. The benefits are enormous, and there is an international race afoot. The single most important thing to do right now is to learn so you can become part of the solution or to help spot problems and enact proper regulation.
Our goal is to port successful machine learning techniques and algorithms to the AHaH computing paradigm. Currently there are only a few people who know enough about AHaH Computing to do this. We have first focused on acquiring basic machine learning building blocks an AI programmer needs. We are open-sourcing the KnowmAPI under the KDC, including tutorials, video lectures, etc. We also develop the SENSE Platform, a horizontally scalable ‘cloud computing’ platform (server network) with hooks to hardware accelerators like FPGAs, GPUs, Epiphanies, Automata Processors, and eventually kT-RAM. Most developers will not have to worry about the very low-level (hardware) or very high-level (network) details. Rather, they can focus on application and algorithm development.
Great question! There is a lot to this topic, but let me try to give you a short and simple answer. We are using differential memristors as synapses. The energy dissipated in a synapse is the power dissipated over both memristors for duration of the read/write pulses. , where V is the voltage and R the resistance. Typical on-state resistance of memristor are 500kOhm, and typical voltage is .5V, so: P=.5^2 / 500E3 = 5E-7 watts. The energy is only dissipated during read and write events, which occur in pulses of ~50ns (nanoseconds) or less. The energy per memristor per synaptic read or write event is then 5E-7W x 50E-9s = 2.5E-14 Joules. Since the kT-RAM instruction set requires paired read-write instructions, and since a synapse is two memristors, we multiple that answer by four: 1E-13. This is .1 pico-joules per adaptive synaptic event. Note we could lower the voltage and and pulse width to achieve even lower power for synaptic integration operations (i.e. no learning). Also note that capacitance plays a big role, so if your AHaH node is very large it will dissipate more energy as electrons get soaked up on the wires. If we say a human brain has 100 Billion neurons, each with 1000 synapse, that fire on average once per second, that is 1E14 adaptive synaptic events per second. The energy consumed in one second is 20 Joules. So if we put all energy into synaptic events we get 2E-13. The actual deployed power consumption of kT-RAM is dependent on what sort of computing architecture its embedded in. Very small cores are more efficient than larger cores for processing smaller spike streams. The purpose of AHaH circuit is to remove the memory-processing bottleneck for adaptive learning operations. If you knew exactly the connection topology of a brain and made a custom ASIC of AHaH nodes you are looking at efficiencies comparable to and possibly even exceeding biology (eventually). The reason for this is that our modern methods of chip communication can be quite a bit more efficient (and faster) than biology in some respects. However, if you have a more generic architecture that enables you to explore more connection typologies, for example a mesh grid of little CPUs with local RAM and kT-RAM, you would expend more energy but get something more flexible: The ability to emulate any brain, not just one specific type.
Great question. As with any hardware that is integrated into an existing system, you need both a physical connection and some software “glue” between the components. Most people have heard of a driver and that’s one possible glue layer we are referring to. In our design of kT-RAM, we’ve developed the hardware interface and defined an instruction set. We describe it here in section C. Thermodynamic-RAM Instruction Set and Operation. While new ‘software stacks’ must be created, this does not mean we cannot easily transfer knowledge from one system to another. It is always possible to use the output of one learning system as a supervised input to another.
No. GPUs and CPUs are absolutely great at what they were designed to do. G stands for ‘Graphics’, after all! They will likely be around for a long time. There will be new types of hardware available that will dominate applications in machine learning and give the everyday consumer access to an entire new range of applications and abilities that are not possible right now. We recently proposed one such device that would plug into existing computer platforms and allow for many exciting new possibilities. It’s called Thermodynamic-RAM.
Many researchers take the “let’s decode the brain!” approach and want to simulate the brain on massive computer clusters. While this will most certainly provide insights into how the brain works, they will eventually be faced with the reality that that their simulations will not be able to compete on power, density and speed with other approaches that have addressed the issue directly. We’ve taken a different approach, which is to build a chip that doesn’t necessarily emulate a brain but instead provides adaptive-learning functions at a foundation ‘physical’ level and consequently beats other approaches on power and density. Our goal is to then match primary performance benchmarks.
There are various ways to go about it, and its a very significant problem with major consequences for effective solutions. We currently like prediction of future reward states. If you can predict the consequence of actions then you can search over possible options and select the best one. Think about how you can quickly envision or imagine far into the future. This is an interactive process over a collection of prior memories or experiences, i.e. its ‘reflective’. Other, more simple action-reward approaches can also be useful. If you hook a bunch of AHaH nodes to the muscles of a robotic arm, its not hard to get them to actuate a robotic arm.
We really don’t know. Its an important question, and we have theories about what attention is, but our main focus is in solving well-specified real-world problems.
We are not specifically focused on biology. Rather, we are interested in natural self-organization and how to harness it. We believe that biology has ‘hijacked’ or ‘co-opted’ a more fundamental process of energy attempting to dissipated itself through adaptive containers. Ultimately we are trying to solve problems, not build brains. Our primary concerns are solving real-world problems, and although biology is a big source of inspiration, our technology is not going to look much like biology.
Don’t think about the Knowm fractal, think about what the Knowm fractal is built of, what we call “Knowm’s Synapse”. Knowm’s Synapse is a universal adaptive building block formed of two energy dissipation pathways competing for conduction resources. We can emulate Knowm’s Synapse with differential pairs of memristors.
AHaH Nodes are always adaptive. Unsupervised AHaH plasticity creates attractor states that repair state information as they are used. The act of using kT-RAM heals it. Some synapses could stop working, inputs could turn off or become chaotic, etc. Unsupervised AHaH plasticity will constantly tune the functioning synapses. Unsupervised AHaH plasticity is attempting to find a bimodal projection and maximize the margin between opposing distributions. So it will track the data structure. This also leads to unsupervised (or semi-supervised) learning. Once the node is in the attractor, via one or a few supervised examples, the AHaH node will continue to converge and performance will get better without additional training labels. See figure 17 in this paper or figure 4 in this paper. This is how we first discovered AHaH. We trained an SVM and then applied unsupervised AHaH plasticity and it got better because it was still converging on the test set.
A star-gate will open up and confetti will fall from the sky. But seriously, physical kT-RAM is all about power and space efficiency. However, keep in mind that kT-RAM is a general-purpose AHaH processor. While you can create more efficient AHaH processors for specific things you give up generality and hence utility. We feel kT-RAM offers a good balance. However, improvements are inevitable!
kT-RAM is an “AHaH Node resource”. AHaH nodes can be used for things like feature learning, classification and combinatorial optimization. You can think of an AHaH Node as a ‘neuron’ and you can make networks out of them and you can solve the same sorts of problem that neural networks solve. But you can also do more basic stuff like logic, memory, random number generation and set iteration. Our current approach is all unsupervised feature learning combined with multi-label semi-supervised classification. AHaH Nodes can be used in many way and we have just started trying various things out. We believe AHaH nodes are universal adaptive resources, and our success to date has given us confidence that a great number of effective and efficient solutions can be attained with them.
kT-RAM is best understood as a learning co-processor. It excels at things like inference and classification. We are not aware of other computing substrates that provide access to higher power efficiency or synaptic density than kT-RAM. FPGAs are very useful generic hardware accelerators built with modern CMOS technology. Before physical production of kT-RAM, we are using FPGAs (and other hardware accelerators) as kT-RAM emulators in our development platforms.
First let me say that we admire Dr. Boahen and what he has done! We are all working toward similar goals and face difficult constraints. Neurogrid is a fantastic demonstration of how dedicated hardware can improve efficiency, but it has problems from a practical perspective. A big problem with the neuromorphic community is that they go for mimicry instead of functionality. For example: “Neurogrid simulates one million neurons with two subcellular compartments each, a choice motivated by neurophysiological studies.” Note the choice was not motivated by practical utility or real-world problem solving. Consequently, it is hard to take this platform and solve real-world problems. It is a research platform for computational neuroscientists, not industry folks with real-world machine learning needs. Topologies are restricted to structured patterns motivated again by “neuroanatomical studies”, and synaptic adaptation is limited to STDP. So you have to take those numbers and put them into context. This is why we are aligning with primary performance benchmarking.
Micron’s Automata Processor is a parallel and scalable regular expression matcher with limited dynamic reconfigurability. KT-RAM is a learning co-processor. kT-RAM can do things like unsupervised feature learning, inference, classification, prediction and anomaly detection. As an example, if you wanted to search Wikipedia for all occurrences of some word or word pattern that you specify, the Automata Processor would be great. If you wanted to learn a representation of that word and how it relates to other words (its meaning), you would be better off using kT-RAM.
A few reasons.