Context
Knowm Inc is a very small company in a field of giants, and because of this we face very difficult constraints above and beyond the (already extremely difficult) technical problems we are trying to solve. Our goal is to build an adaptive learning processor and advance the state of machine intelligence, and we have been at this a long time, boot-strapping all the way. We have exposed some of our work to the public, and we have a Reddit forum where we gather news, articles and answer questions if we feel we can. Our approach is different than others. Often times people simply write us off as crazy, or incompetent, without taking the time to ask questions. In a lot of cases, we are dismissed for being small without ever being asked a single question! Every now and then we provoke a dialog with folks who are clearly intelligent, a bit angry or threatened, but also considerate enough to ask us questions and seek clarification. It’s only through these (sometimes rather uncomfortable) interactions that we can get feedback and help correct mis-communications. An anonymous internet stranger calling himself/herself Gordon-Panthana and identifying as an academic, appears to have created an anonymous account for the sole purpose of testing/probing/trolling us. He/She/They has taken a great deal of time and effort to ask questions. While the tone of the questions was not always friendly, and the selective reading of our work and statements irritating, the discussion was none-the-less informative. Below is an accounting of these discussions, which we feel are helpful to readers trying to understand our work. We have committed the discussion to the Knowm.org blog in the event that Gordon-Panthana edits or deletes his/her anonymous account.
Follow Up: after another more aggressive round of trolling, we decided to track Gordon-Panthan’s IP address. This revealed that the account was likely coming from a major US technology company in California.
How to Read
The discussion is arranged into alternating sections where either Alex Nugent or Gordon-Panthana is speaking. In the section where Alex is speaking, he will quote Gordon-Panthana, and visa-versa. A quote looks like this:
"This is a quote"
No capacitive losses?
Gordon-Panthana:
Alex Nugent, in his talk at RIT on April 9th, claims that brains suffer no capacitive losses since the computation and memory is done the same place. Between 14:00 and 14:17 he says “d is zero” which, from his equation, implies that CV2 , the capacitive losses, must also be zero. How is that possible? Neuronal computation, even in the simplest models, is distributed across synapses, dendrites and the soma, not concentrated in a single point. Since the computation uses charged particles moving through space, capacitive losses would seem to be unavoidable. Explanation?
Alex Nugent, CEO Knowm Inc:
"claims that brains suffer no capacitive losses since the computation and memory is done the same place."
That is not correct. The capacitive losses associated with shuttling information back and forth between memory and processing for synaptic integration and adaption operations is reduced to zero. You still have the power associated with communication of spike patterns. That does not go away. Computers expend additional energy computing synaptic integration and adaptation, while brains (and all of nature), no not maintain a separation between memory and processing and hence this loss (which is unique to computers and of large magnitudes) is not present.
Gordon-Panthana:
"You still have the power associated with communication of spike patterns."
Yes, I agree that communication losses can be factored out.
"The capacitive losses...for synaptic integration and adaption operations is reduced to zero"
I don’t follow. Are you suggesting that biological synapses and memristors have no parasitic capacitance?
"brains...[do] not maintain a separation between memory and processing"
What about the summation/integration of the synaptically-scaled input spikes in a neuron? Doesn’t that have to be done in the dendrites and soma? Isn’t that a case of biological processing that’s spatially separated from memory?
Alex Nugent, CEO Knowm Inc:
"Yes, I agree that communication losses can be factored out."
Definitely did not say that. How can you factor communication losses out?
"I don't follow. Are you suggesting that biological synapses and memristors have no parasitic capacitance?"
no. I am saying that a biological neuron does not store memory as bits in one place and shuttle those bits over to a processor to be added together and weights computed only to be written back again. Its all part of an integrated physical system that is both representing state and processing at the same time as information flows through it and energy is dissipated.
"What about the summation/integration of the synaptically-scaled input spikes in a neuron? Doesn't that have to be done in the dendrites and soma? Isn't that a case of biological processing that's spatially separated from memory?"
Not really. That is a mathematical idealization of a neuron. Its actually more like that in kT-RAM than it is in a real neuron. Although I would say that this separation you are referring to is not like the separation in a digital computer. Also, you left out learning in this idealization. Just zoom into a (real) dendrite a little, or a (real) neuron, or anything (real). At some point you will see that the idealization of memory and processing being separate does not hold. To describe anything you will need both state information (physical objects or ‘memory’) and transformations (laws of physics acting on those objects). A neuron does not separate memory and processing and shuttle bits back and forth. It is a merging of memory and processing. A synapse is not memory and its not processing–its a merging of the two. A soma is not memory and its not processing. Its a merging of the two. And so on.
Gordon-Panthana:
"[the neuron is an] integrated physical system that is both representing state and processing at the same time"
That model still has capacitive losses, and your comment “…as information flows through it and energy is dissipated [italics mine]” suggests you might agree. A neuron holds charge (we can measure it with a voltage probe) and thus it possesses capacitance. The neuron charge changes over time as input spikes are accumulated and output spikes are generated. That temporal evolution of charge will induce CV2 losses. Thus:
Neurons have capacitive losses resulting from internal processing, adaptation, and spike communication. Physics demands it. And biology does its best to minimize it, for example with the myelin sheath found on some axons.
The confusion in the RIT talk comes from the slide around the 14:00 mark. That slide is dominated by the equation:
“E = 1/2 CV2 = 1/2 d sigma V2”
The talk blames the high power of a von Neumann architecture overwhelmingly on the value of d. It’s claimed that simulating a human on a von Neumann machine would require 160 TW at 1 volt, but an actual human requires only 100W. Why the disparity? The slide is very clear about this: “In brains, all life, and everywhere in Nature, d is 0” Sticking the value d = 0 into the above equation yields
E = 1/2 CV2 = 0 (!) In other words, brains have no capacitive losses at all! That is not consistent with the biology of the brain and the laws of physics. My take-away was that Alex was saying that the 20W or so dissipated by a brain is due to resistive, inductive, and metabolic processes alone. Perhaps Alex didn’t mean that and was oversimplifying his presentation for dramatic effect, but that is what the slide and the talk clearly imply, and it is the reason for the original post (which, you’ll note, is titled “No capacitive losses?”).
Alex Nugent, CEO Knowm Inc:
We agree, and you have interpreted the talk in a way that is physical nonsense, so I can see why you are concerned. Of course energy is dissipated in a brain, and of course there are capacitive losses. There are just way more losses in a digital computer trying to calculate than in the real thing. It is extremely wasteful to calculate (via the separation of memory and processing) adaptation versus building intrinsically adaptive circuits. Energy in a physical neuron is not dissipated shuttling information back and forth between memory and processing as state variables are calculated, but this does not mean it does not dissipate energy (capacitive or otherwise).
"The talk blames the high power of a von Neumann architecture overwhelmingly on the value of d. It's claimed that simulating a human on a von Neumann machine would require 160 TW at 1 volt, but an actual human requires only 100W. Why the disparity? The slide is very clear about this: "In brains, all life, and everywhere in Nature, d is 0""
Yes. d is the memory-processing distance. There is no distinction between the two in a real physical system and hence the energy associated with this act of separation goes to zero. That does not mean this is the only contribution to the total energy dissipation, and it of course does not mean you can magically make a circuit that consumes zero energy. It means that calculation of very large numbers of interacting adaptive variables via the separation of memory and processing is overwhelming less efficient than building a very large interacting adaptive system directly. The example was meant to show just how significant this contribution can be, and why having an intrinsically adaptive element (the memristor) is exciting and useful.
Misquote about quantum computing and AHaH?
Gordon-Panthana:
This could be a misquote, but an article in EE Times (http://www.eetimes.com/document.asp?doc_id=1327068) quotes Alex Nugent, CEO of Known, as saying: “Knowm’s AHaH computing approach combines the best of machine learning and quantum computing via memristors.” Quantum? Their approach appears to be purely classical. What part of quantum computing is incorporated in AHaH?
Alex Nugent, CEO Knowm Inc:
There is no quantum computing in kT-RAM, and this is not a misquote, although I can see why its confusing. AHaH computing is ‘classical’ from a physics perspective. Like Quantum Computing we exploit physics to accelerate certain functions. Specifically we use the physics of memristors to eliminate the power and time normally associated with synaptic integration and adaptation. Unlike quantum computing, AHaH Computing’s ‘ahbits’ are easy to build with currently available fabrication technology—indeed much of Nature appears to be built out of them. Unlike machine learning but similar to quantum computing AHaH Computing embraces hardware as part of the solution and designs from the bottom-up (hardware) and top-down (AHaH compatible algorithms). The result is a broadly commercially viable machine learning system that can scale to biological levels of efficiency.
Gordon-Panthana:
...like Quantum Computing we exploit physics to accelerate certain functions.
Quantum computing exploits quantum physics (superposition and entanglement) to accelerate computation at the system / architectural level. We agree that AHaH does not do that.
I understand you’re trying to draw an analogy here, but “quantum computing” in that quote is not only confusing, it’s easily perceived as overstating the capabilities of the architecture. That’s risky since it will cause some readers (those familiar with quantum computing) to immediately dismiss your approach out of hand.
...but similar to quantum computing AHaH Computing embraces hardware as part of the solution
I have no argument with embracing hardware as part of the solution, but that’s not unique to AHaH and quantum computing. Many neuromorphic architectures based on memristors have been proposed over the last several years. Every one embraces the memristor hardware in its architecture. AHaH belongs in that camp, not in the quantum computing camp.
Bottom line: We agree that AHaH does classical computing. We perhaps disagree about the sociological and public relations risks of using the phrase “quantum computing” when describing AHaH.
Alex Nugent, CEO Knowm Inc:
That's risky since it will cause some readers (those familiar with quantum computing) to immediately dismiss your approach out of hand
Many people are incapable of asking questions and presume quite a lot about the capability (or incapability) of others. Its too bad. Its also a tad hypocritical considering the vast discrepancy between QC theory and practice. But in the end its those people who operate on their assumptions and fail to ask questions that get left out when the party starts. We are not looking to convert QC folks, and its OK if they dismiss AHaH Computing, although of course we would like to work with everybody.
easily perceived as overstating the capabilities of the architecture.
One big problem is how over-hyped QC is. If anything a rational comparison to QC would be negative. Yes, QC exploits quantum physics, but its still physics, and its capabilities are still limited to certain problems classes and (severely) constrained by practical considerations, not the least of which being that Nature abhors qu-bits. 30 years and billions in funding and what do we have? The promise of the QC? That, to me, is a pretty big overstatement of capabilities.
Many neuromorphic architectures based on memristors have been proposed over the last several years.
Absolutely. We started in 2001 and generalized the 1-2, 2-1 and 2-2 AHaH circuits to all memristors by 2005, and have proposed numerous architectures before and since, started and advised programs like DARPA SyNAPSE and PI, and in general been part of the effort since before HP’s marketing engine did us all a favor. If others were not looking at this stuff, it would not be nearly as exciting! I actually remember a time when nobody was, and it sucked (to be blunt). Many designs are good for one thing or another, but not really a general-purpose architecture like kT-RAM which is why we like it. This is one reason we are offering our memristors (both discrete and BEOL) for others to use, not just for kT-RAM and AHaH Computing, but whatever they want. There are so many cool circuits memristors enable, its going to be very neat to see what people do with them. What would you do?
As for neuromorphic designs, many are based on a programing mindset and side-step learning, or they presume perfect non-volatility or other ideals, or specific memristor properties like intrinsic diode to get around cross-bar problems like sneak-paths, or they forget about memristor forming and fail to design a circuit that offers an escape route, or they introduce topology constraints (crossbars again, although most are blind to the topology problem), or hard-wire very specific algorithms like WTA or kNN, or throw a massive amount of complexity at neuron circuits with dozens of tunable parameters but no general theory on what they are doing as a system, or fail to test against simple machine learning benchmarks before diving head-first into a chip design, or get obsessed with STDP without a good idea of how that would integrate into existing methods of computing and machine learning.
We agree that AHaH does classical computing.
Classical in the sense of physics, yes. Although QM effects (tunneling) are exploited in memristor operation, we do not rely on QM superpositions (thankfully!).
Gordon-Panthana:
So, my bottom line summary was correct 🙂
I agree with many of your points, particularly about the vast amounts of money that have been wasted on premature neuromorphic hardware designs. And, yes, I’m very aware of the constrained potential of QC, though I’m more optimistic than you about it long term.
My post was not an attack on AHaH. Rather it’s questioning the way you were selling it in that particular quote. When I read the part about AHaH containing the “best of … quantum computing,” my gut reaction was to dismiss it as a statement by an overanxious car salesman. It just doesn’t make a good first impression.
Alex Nugent, CEO Knowm Inc:
particularly about the vast amounts of money that have been wasted on premature neuromorphic hardware designs.
Hah! So I take it you are a quantum guy? 🙂
There is something other than traditional computing and quantum computing. Its called “life”. Personally I am board with physics focusing on the things you can’t touch (qubits) and the places you cant see or go (black holes). WTF is life?
Gordon-Panthana:
Hah! So I take it you are a quantum guy? :)
Ouch, the ultimate insult 🙂
Alex Nugent, CEO Knowm Inc:
Is a brain a classical computer in your definition?
Gordon-Panthana:
Yes. Quantum coherence in a hot, wet brain seems like too big of a stretch.
Alex Nugent, CEO Knowm Inc:
Ok, so “classical computing” is relegated only to quantum computing. Funny how that line is drawn in computing when there are all sorts of analog, distributed, stochastic variations possible without going to quantum.
Gordon-Panthana:
There’s nothing wrong with analog or distributed or stochastic computing, I’ve done them myself. But they’re still classical since there are no system-level quantum interactions involved in the computation.
Analog architectures are tricky, though, since it’s easy for a designer to “sweep complexity under the run” without realizing it. A common error is to implicitly assume that analog signal is a synonym for real-valued signal which can lead to absurdities in the analysis. A recent example of this is the Memcomputing fiasco. Memcomputing is a brain-inspired analog architecture which, the authors claim, can solve NP-complete problems with polynomial resources in polynomial time. It can’t, at least not in a physically realizable way, but it’s instructive to understand the “rug sweeping” errors that led them astray. Igor Markov has written an excellent summary of most of the errors here, and a more extensive discussion can be found in Scott Aaronson’s blog here.
Alex Nugent, CEO Knowm Inc:
A common error is to implicitly assume that analog signal is a synonym for real-valued signal which can lead to absurdities in the analysis.
A typical problem is the assumption of zero noise, or perfect measurements, and the subsequent need for error-correction codes (something shared with QC). Unsupervised AHaH is, interestingly, doing essentially nothing but error-correction.
But they're still classical since there are no system-level quantum interactions involved in the computation.
I get your definition. I’m commenting on how the line is drawn with quantum on the one side and everything else on the other. Think classical music or classical art. Why is quantum special such that everything else is “classical”? I get it from the physics perspective (or at least thats how I was indoctrinated when I majored in it), but not really from the computing perspective. The word “classical” does not (or at least should not) imply “quantum” outside of physics.
A recent example of this is the Memcomputing fiasco.
Well on the one hand I think anybody trying to push the boundaries is great, although I knew when I first read his claim about NP-complete he would be crucified by the quantum and computer science community.
It can't, at least not in a physically realizable way, but it's instructive to understand the "rug sweeping" errors that led them astray.
Can you explain to me how quantum computing does not also do the same thing? Not so much errors in math, but rather sweeping the complexity off on the experimentalists. Why are you confident about quantum computing and the ability to create reliable qubits?
Gordon-Panthana:
Can you explain to me how quantum computing does not also [sweep complexity under the rug]?.
Quantum computing sweeps the complexity under the Hilbert space rug. How and why does that work? Beats me. I think that’s the big mystery of quantum mechanics.
Why are you confident about quantum computing and the ability to create reliable qubits?...Why is quantum special...from the computing perspective?
I’m not confident at all, only more optimistic than you. Here are some great slides on these topics.
IBM’s ‘Rodent Brain’ Chip Could Make Our Phones Hyper-Smart
Neuromorphics:
“Once a model is trained in a massive computer data center, the chip helps you execute the model.” Hm, so no on-chip learning. I can see why they did that, but it would be so much cooler if it could learn in the field at hardware speeds.
Alex Nugent, CEO Knowm Inc:
Yeah the ‘a’ and ‘p’ in the SyNAPSE stood for ‘adaptive’ ‘plastic’. That’s why I left my advisory role to develop ahah computing and kt-ram. Two orders of magnitude lower power and 10x synaptic density and it learns. IBM memristor technology was non-existent at the time (still appears to be like that), so the direction they went was somewhat unavoidable.
Gordon-Panthana:
The IBM TrueNorth chip has 256 M synapses with a power density of 20 mW / cm2. Since the chip uses 70 mW in operation, this suggests a size of roughly 2cm X 2cm, or ~ 64 M synapses / cm2.
...kt-ram. Two orders of magnitude lower power and 10x synaptic density and it learns.
So a kT-RAM chip would hold ~ 640 M synapses / cm2 with a power density of ~ 2 mW / cm2. Correct?
Alex Nugent, CEO Knowm Inc:
So a kT-RAM chip would hold ~ 640 M synapses / cm2
It depends on the technology node of course. With SRAM interface, the kT-RAM cell is two memristors, an SRAM cell (6T), and a few pass-gates (2 or 3), or ~12T + 2M. True north is 9 SRAM bits per synapse (6X9=54). It’s possible to replace the SRAM with something resembling DRAM since its not required to hold a charge long, but SRAM is a good place to start. So at 28nm node, with SRAM interface, your looking at something more like 400-450 M/cm2. Depending on thresholds of memristors you may be able to dispense with a pass-gate.
As for power density, you can’t directly compare total power density of True North. We look at the power for synaptic integration. Specifically as stated in IBM’s report:
“Within a core, the average energy required to move a bit from local memory (SRAM) to the Controller was ⇠47fJ (at 1kHz and 0.775V, reading all 428 million bits was 20mW).” “Specifically, a synaptic operation constitutes adding a 9 bit signed integer SGi to the membrane potential, which is a 20 bit signed integer, when axon Ai(t) = 1 and synaptic connection wi,j = 1.”
Source: A Million Spiking-Neuron Integrated Circuit with a Scalable Communication Network and Interface, Merolla et.al. 2015”
If this is confusing, Dharmendra Mohda puts it nicely in a new blog:
“There are two factors: technology and architecture. Unlike today’s inorganic silicon technology, the brain uses biophysical, biochemical, organic wetware. While future enabling nanotechnology is underway, we focused on the second factor: architecture innovation…” –Dharmendra Mohda
You can think of kT-RAM as the “future enabling nanotechnology” that is underway.
Gordon-Panthana:
True north is 9 SRAM bits per synapse
There’s only 1 SRAM bit per synapse in TrueNorth–the synapses are binary. The 9 bits of synaptic weight range comes from a configurable multiplier, which they call SjGi in the Supplementary text. But that multiplier is configured per-axon, not per-synapse. A non-binary, integral weight in TrueNorth requires the use of multiple binary synapses and multiple axons.
Thus a fair comparison requires a scaling down of the 256 M synapses in TrueNorth. For example, for 4-bit synaptic weights, TrueNorth has effectively only 64 M synapses. This, of course, works in kT-RAM’s favor.
So, the first question: How many bits of information, on average, do you anticipate storing in an AHaH synapse (memristor pair) given the “noise” induced by device variation plus adaptation and decay processes?
As for power density... We look at the power for synaptic integration.
In the Knowm FAQ, you estimate the resistive losses in the memristors at 0.1 pJ per synaptic operation. To evaluate a node (neuron), the memristor-pairs (synapses) for that node are enabled (driven with voltage sources) to sum currents in the H-tree. That summation will require additional energy expenditure.
So, the second question: For a 4 cm2 kT-RAM chip, approximately how much energy will be dissipated in the H-tree when evaluating a node with, say, 64 synapses?
Alex Nugent, CEO Knowm Inc:
There's only 1 SRAM bit per synapse in TrueNorth--the synapses are binary. The 9 bits of synaptic weight range comes from a configurable multiplier, which they call SjGi in the Supplementary text.
Crazy eh? Also, nice catch. I didn’t realize they kept the 1 bit per synapse. I figured they used that to meet the DARPA metrics but then fixed it in later cores.
So, the first question: How many bits of information, on average, do you anticipate storing in an AHaH synapse (memristor pair) given the "noise" induced by device variation plus adaptation and decay processes?
Keep in mind that we are dealing with attractors that constantly adapt and heal, and not a typical “set/read”. Each time you use the synapses, you fix it. We have built the Knowm API around interchangeable core types, which enable one to plug in different memristor models, to test what is required for various problems types. As for our actual devices, you can purchase some memristors (or raw die soon) and test this yourself. Best order soon, they are going fast.
to sum currents in the H-tree. That summation will require additional energy expenditure.
Correct, and for that we have performed SPICE simulations, which we have verified with independent SPICE simulations by RIT.
For a 4 cm2 kT-RAM chip, approximately how much energy will be dissipated in the H-tree when evaluating a node with, say, 64 synapses?
Why would you want to build a 4 Cm2 kT-RAM chip and then only access 64 synapses? It would be much better to embed it within a multi-core architecture like True North or Epiphany. The whole point of this game is to minimize communication distance, although I suppose at some point on-chip optics will play into this equation. As for numbers, you are more than welcome to perform your own circuit simulations. If you join the KDC, you could take a look at our code for generating the SPICE netlists and help us advance the technology.
Who are you and who do you work for?
Gordon-Panthana:
How many bits of information...in an AHaH synapse?
you can...test this yourself
The memristors you’re selling (which, from the graphs on the sales page, have on-resistance in the ~ k-ohm range) don’t match the characteristics you anticipate in the Knowm FAQ: “Typical on-state resistance of memristor are 500kOhm.” So testing this myself would be pointless.
Why would you want to build a 4 Cm2 kT-RAM chip and then only access 64 synapses
You wouldn’t, but you would certainly want to execute large neural circuits. TrueNorth claims 256 M synapses and 1 M neurons in a 4 cm2 chip at less than 100 mW, so kT-RAM would have to at least match that to be competitive.
It would be much better to embed it within a multi-core architecture like True North or Epiphany.
Your technology stack paper says that kT-RAM is a “general computing substrate geared towards reconfigurable network topologies and the entire spectrum of machine learning application range.”
So if a customer can only afford one AHaH chip as a coprocessor, wouldn’t kT-RAM be the one they want?
Your comments suggest that you are, perhaps, backing away from kT-RAM.
you are more than welcome to perform your own circuit simulations.
I already have, a parametric SPICE model including parasitic capacitance. That’s why I asked the question about energy dissipation in the H-tree.
Alex Nugent, CEO Knowm Inc:
So testing this myself would be pointless.
Actually you appear to be avoiding it. Replace “typical on-state resistance” with “average resistance” (which is what was meant) and we are pretty much on target. That said, memristors are constantly being developed to optimize a number of properties, with resistance being just one. And seriously, you can buy them and test them! We would love the feedback.
TrueNorth claims 256 M synapses and 1 M neurons in a 4 cm2 chip at less than 100 mW, so kT-RAM would have to at least match that to be competitive.
Being “competitive” is about both primary and secondary metrics. That said, of course. You also appear to want to directly compare True North with kT-RAM. As I have said before, they are not exactly equivalent. You could put, for example, a kT-RAM Core into a True-North core to increase its efficiency and density. Or you could have a big monolithic kT-RAM. Or other options.
So if a customer can only afford one AHaH chip as a coprocessor, wouldn't kT-RAM be the one they want?
What are they trying to accomplish? I presume you are talking synaptic integration and learning, and if so, then yes.
Your comments suggest that you are, perhaps, backing away from kT-RAM.
What about my comments suggest that?
I already have, a parametric SPICE model including parasitic capacitance. That's why I asked the question about energy dissipation in the H-tree.
Cool, what parameters did you use? (core size, memristor resistances, interconnect resistance and capacitances, etc). Did you use any form of optimization in the H-Tree (choking, buffers, etc?). What electrode configuration did you use? What driving pattern? What synaptic sparsity?
Gordon-Panthana:
[kt-RAM] SPICE model
Cool, what parameters did you use?
The usual electrical properties for CMOS (you mentioned the 28 nm node, but I can plug in whatever). The architectural parameters are from your kT-RAM papers, but those are configurable as well (the model is a Python program that synthesizes a SPICE deck). The memristor device parameters are from the Knowm FAQ and your comments about memistor density above. H-tree optimizations were generous in that they favored kT-RAM power performance even when they would significantly increase cost. For my first pass, I skipped over some of the overhead processes (enabling memristor drivers, executing kT-RAM instructions) but those are easy to add later.
My inspiration for building this was this quote from your Cortical Processing paper:
RC delays and capacitive effects of switching have been ignored for now
. When I ignore capacitance, I get the same power results you do. When I add in capacitance, I get radically different results.
Care to guess how they differ?
Alex Nugent, CEO Knowm Inc:
I get the same power results you do.
What “results” are those? Are you using the FAQ power number?
RC delays and capacitive effects of switching have been ignored for now
That is in reference to our digital emulator within the KnowmAPI, which is focused on learning and primary performance bench-marking. That is not the same as SPICE simulations. They are solving different problems.
When I ignore capacitance, I get the same power results you do. When I add in capacitance, I get radically different results.
If you are trying to compare a kT-RAM SPICE simulation with the dissipative power of an AHaH synapse, then of course. What “results” of ours are you talking about? Given that we have not published our SPICE simulations, and we have corroborated with an independent party, I am pretty curious what you are attempting to compare.
While your at this, could you run a simulation of a 256 256 crossbar with 2-1 ahah configuration? if you want to more directly compare with true north, that would be the way to go since true north is topologicaly constricted. Otherwise you should probably use 16×16 kT-Core sizes. We would love it if you could publish your results, and especially the code. We offer bounties in exchange for kdc fees, and a nice blog article (to be published on knowm.org) summarizing your work could constitute such a bounty.
Gordon-Panthana:
…256 256 crossbar with 2-1 ahah configuration… 16×16 kT-Core sizes.
Those are really, really tiny configurations, nowhere close to the 256 M synapses in TrueNorth, so those are uninteresting to me.
I realize that a direct comparison between TrueNorth and kT-RAM is not possible. I’m interested only in a “ballpark” comparison. You ballparked it yourself when you said above that kT-RAM had “Two orders of magnitude lower power and 10x synaptic density” compared to TrueNorth. That’s what I’m interested in verifying.
We would love it if you could publish your results
I would consider this. If I write such a report, it would be with the intent of being easily reproducible by, say, a junior EE student. So the report would completely specify all electrical and architectural parameters, SPICE simulation methodology (e.g. wire model, optimizations), and additional assumptions (neuron fan-in, fan-out, …) I used to build the model. It would not cover learning, only inference energy and throughput.
The model kT-RAM would be based on the two kT-RAM papers in arXiv: “Thermodynamic-RAM Technology Stack” (submitted 21 Jun 2014) and “Cortical Processing with Thermodynamic-RAM” (submitted 14 Aug 2014). It would also use parameters from the Knowm FAQ and from comments you have made in this thread.
The kT-RAM size would be 8K x 8K, yielding 64 M AHaH synapses. This is roughly equivalent to the 256 M TrueNorth binary synapses if we assume that each AHaH synapse can hold, on average, four bits of information. This “big monolithic kT-RAM” (as you put it) is by far the most interesting architecturally since, as you point out, it is “geared towards reconfigurable network topologies and the entire spectrum of machine learning application range.” That sounds like an awesome chip–who wouldn’t want that?
The logical place to post that report would be in the hardware subreddit. That audience would be most likely to given informative feedback which would allow me to correct mistakes or faulty assumptions in my model and rerun it. It would also be a neutral forum, unlike your blog.
But there is one condition: that you post a similar report on your simulation to the hardware subreddit. No need to publish code, just the report. That would allow us to converge our models and eliminate any of my misunderstandings of your papers. The big winners, of course, would be your customers, who would then see transparent simulations of total system power and performance, not just the resistive losses in the memristors.
Interested?
Alex Nugent, CEO Knowm Inc:
Those are really, really tiny configurations, nowhere close to the 256 M synapses in TrueNorth, so those are uninteresting to me.
But that would be the only way to compare honestly the synaptic integration efficiency and density, which is what I was refering to. kT-RAM is an adaptive synaptic resource, like SRAM is a memory resource, and it’s intended to be embedded in a number of ways into a number of larger-scale architectures, just like SRAM is embedded into True-North cores.
I'm interested only in a "ballpark" comparison.
No you’re not. You are interested in building a simulation of a massive kT-RAM core that will introduce massive capacitive losses in a hope to make True North seem better, or kT-RAM seem worse. The size of a kT-RAM Core should be only as big as the largest expected neuron. A 512X512 Core would give you 262,144 max synapses per neuron, which is about as big as they get in brains but also about as big as we have ever found a need for. How you tile the cores and communicate between them plays a big role, of course, but I get the feeling you know that.
If I write such a report, it would be with the intent of being easily reproducible by, say, a junior EE student.
Sounds great. All reports should be like that! I would of course prefer you do this within the KDC where those EE students could know who you are and ask you questions.
It would not cover learning, only inference energy and throughput.
To say you want to build a 8kX8k kt-RAM core and then compare this to a whole True North chip on energy and throughput is, from an EE perspective, completely rediculous. Capacitive losses in kT-RAM would be very high in this case, and throughput would be low. Of course, this massive kT-RAM would be able to learn, and when compared honestly with True-North would need to include the supercomputers used to train it. But as you say, you will not concern yourself with that detail. Really?
The kT-RAM size would be 8K x 8K, yielding 64 M AHaH synapses. This is roughly equivalent to the 256 M TrueNorth binary synapses if we assume that each AHaH synapse can hold, on average, four bits of information.
This is not even remotely equivalent to the 256 M True North synapses, which (1) can’t learn and (2) are constrained to a fixed topology of 256 synapses per neuron and 256 neurons per core. If you want an honest comparison to True North, you would look at the synaptic power and density within each True-North core and compare that to a kT-Core of equivalent size. For completeness, you should also compare to a memristive cross-bar that matches true-north core topology, as I have said.
This "big monolithic kT-RAM" (as you put it) is by far the most interesting architecturally
If its interesting to you, then by all means go for it. As I mentioned before, the whole point of the neuromorphic game is to reduce the total communication distance. By building such large cores, you do gain a lot of flexibility but you pay for it in capacitive losses. Are you aware of any machine learning algorithm that requires neurons with 64M synapses?
But there is one condition: that you post a similar report on your simulation to the hardware subreddit.
All such data is openly available to members of the KDC, as I have said before. So no, we do not accept that condition.
That would allow us to converge our models and eliminate any of my misunderstandings of your papers.
That is the point of the KDC: to create a respectful, non-anonymous environment where we can all innovate. As for misunderstandings, we appear to have corrected quite a few in our comments. If you publish results and take everything out of context, I’m sure others will correct you and, if we have time, we will as well.
The big winners, of course, would be your customers, who would then see transparent simulations of total system power and performance, not just the resistive losses in the memristors.
You are free to work on whatever you like, and publish whatever you like (of course!). And if you are published, we’d love to see your work. To say that the big winner here is our customers is pretty disingenuous, and you know that. I only ask that if you do draw a comparison in published work to True North that you are honest about what each platform is, and is not, capable of. You should also emphasize that True North and kT-RAM and not incompatible in the sense that a kT-RAM core could be embedded into a True-North core, replacing its current 256 neuron ‘digital cross bar’ with something less topologically restricted, denser, and more efficient.
As before, we would welcome your work published on Knowm.org, and please contact us if you would like to be part of the KDC. Despite the fact that you are being highly selective in your interpretation of our published work and appear to have a hidden agenda, your questions are actually really great. Its been hard to not answer them! However, to be frank, if you really want openness then you need to step up and identify yourself and your interests. You are being rather cowardly in that respect. From here on out I will only respond to you within the r/knowm forum.
Gordon-Panthana:
...a 8kX8k kt-RAM core... is, from an EE perspective, completely rediculous. Capacitive losses in kT-RAM would be very high in this case, and throughput would be low.
We agree. kT-RAM scales horribly. Power scaling is much worse than linear in the number of AHaH nodes and must be restricted to small configurations. A large kT-RAM would indeed be ridiculous!
When I asked you above about kT-RAM being competitive with TrueNorth, you said that I “could have a big monolothic kT-RAM” without mentioning that it would be ridiculous. Why is that?
Your papers also suggest a scalability that is not realizable. Figure 4c of your Technology Stack paper shows how kT-RAM cores can be tiled to create “very large kT-RAM,” and section II.D reinforces that by noting that “Cores can be electrically coupled to form a larger combined core.” You even write:
“While at first glance it appears that this architecture leads to one giant AHaH Node per chip or core, the core can be partitioned into smaller AHaH nodes of arbitrary size by temporally partitioning… [italics mine]”
Yes, you can do everything you claimed in those quotes, but there’s not a hint in the paper that tiling like that would cause power to blow up in a super-linear way and lead to a power-hungry chip.
massive kT-RAM would be able to learn... But as you say, you will not concern yourself with that detail. Really?
Yes, really. There’s no evidence that your single learning law combined with limited precision synapses (6 to 9 bits in this latest claim) and “spiking” neurons could get you anywhere close to state-of-the-art performance on benchmarks like Imagenet. There are a lot of reasons to believe it can’t.
The size of a kT-RAM Core should be only as big as the largest expected neuron. A 512X512 Core...as big as we have ever found a need for.
So you’re saying that a 512 x 512 kT-RAM core is a reasonable size?
you need to step up and identify yourself and your interests. You are being rather cowardly in that respect.
You’re forgetting that this is Reddit. See core value 3: Respect anonymity and privacy
But I’ve already told you in the knowm subreddit that I’m independent and that my interests are academic. I don’t work for anyone who could possibly be a competitor to you.
From here on out I will only respond to you within the r/knowm forum.
I stopped posting there because you told me discussions were conditioned on my identifying myself to you, in violation of Reddit core value 3. It sounds like you’ve lifted that ban.
Alex Nugent, CEO Knowm Inc:
I really do want to respond to your technical questions, because I can see you are an intelligent and passionate person and they are great questions. However, you are being highly selective in how you interpret our work, and you appear to be driven by anger rather than curiosity and the excitement of creating new things. You are treating kT-RAM as a whole solution, rather than a part of a solution, and you are picking apart every statement we make as if this is some ultimate truth or claim, attempting to compare it to some other statement or claim completely out of context. If you would like to have technical discussions and participate in constructive science and engineering, which we are all about, then you should step up an identify yourself. Not to the world, but to us. Just send a private message and let us know who you are. And again, you should join the KDC if/when you think what we are doing has merit. Perhaps you never will, but the offer is open.
As for ‘evidence’ on image recognition (where did that come from?), I want you to know that we hold ML in very high regards, and it is the work of folks like Yann Lecun (and the others) that are pushing the boundaries, using the tools available to them. They are the current undisputed champions! We also think the field of neuromorphics should re-align with primary performance benchmarking like ML has. As for spikes and low-resolution synapse, it does beg the question of how your brain does it, seeing that it’s a (ultra efficient) spike-based network with low-resolution synapses. But perhaps your own brain is not good evidence for you. Kind of ironic when you think about it!
I also want to remind you that, in terms of our memristors and their properties, you can actually buy them and test them yourself. To achieve higher resolutions you will need small pulse widths (500ns or less) and low voltages, and will likely need to order raw dies and use a probe station. We really appreciate feedback. We are a very small team with limited resources, and we appreciate all the help we can get.
Finally, in response to your link to Yann Lecun Facebook post, you may be interested in our review of the Karles Invitation Conference, specifically day two and the presentation of Dr. Jackson from IBM’s True-North team.
Startup Knowm combines machine learning, quantum computing via memristors
Alex Nugent, CEO Knowm Inc:
The title of the article is a misrepresentation of the original press-release. There Is No Quantum Computing occuring in kt-ram or ahah computing. It’s a new form of computing made possible with memristors, comparable to QC due to its exploitation of physics.
Gordon-Panthana:
The title of the article is a misrepresentation of the original press-release.
The article is about Knowm, not the press release, and the title is an accurate statement of claims made by Knowm. Compare the title with the following statement by Alex Nugent, CEO of Knowm, in EE Times and quoted in the second paragraph of this article: “Knowm’s AHaH computing approach combines the best of machine learning and quantum computing via memristors.”*
The title is just a compressed version of that quote.
There Is No Quantum Computing occuring in kt-ram or ahah computing.
Agreed. But you can’t fault the author for quoting you from a published source. If there’s misrepresentation here, it was in the original EE Times quote, not this article.
Alex Nugent, CEO Knowm Inc:
Good grief. Quantum computing is about exploiting physics to solve problems (the good part), using a state of nature that requires conditions of isolation not even seen in deep space (the bad part). We are taking (what we feel is) the good part. Just to clarify, the game of “telephone” starts with a press-release, where one or more reporters pick it up. That gets translated into text, which others then take and change and twist. At least folks like you seek clarification, although you are certainly selective in how you interpret what we have said.
Gordon-Panthana:
the game of "telephone" starts with a press-release... which others then take and change and twist.
The author changed and twisted nothing. The title was an accurate paraphrasing of a quote you gave to EE Times which you have acknowledged was not a misquote. To now accuse the article of “misrepresentation” is disingenuous. You owe the author an apology.
Quantum computing is about exploiting physics to solve problems (the good part)...
To say that the “good part” of quantum computing is the non-quantum part is absurd. If you take out quantum, it’s not quantum computing. It’s like saying that the “good part” of string theory is the non-string stuff. By your re-definition, a television incorporates the “best of quantum computing.”
although you are certainly selective in how you interpret what we have said
No, I am literal in interpreting what you said, as was the author of this article. Your statement that AHaH contains “the best…of quantum computing” has a straightforward intepretation. I know of no one who would interpret it to mean that AHaH “contains no quantum computing whatsoever.” The “exploiting physics” part is trivial–all computing systems do that, not just AHaH.
Alex Nugent, CEO Knowm Inc:
The author changed and twisted nothing.
Im not saying it was intentional, but going from “Knowm’s AHaH computing approach combines the best of machine learning and quantum computing via memristors,” to “Startup Knowm combines machine learning, quantum computing via memristors” does change the meaning.
To say that the "good part" of quantum computing is the non-quantum part is absurd.
No its not.
If you take out quantum, it's not quantum computing.
Correct.
It's like saying that the "good part" of string theory is the non-string stuff.
Thats funny, because I could see that. The good part of string theory is that it is an attempt to unite the physical forces and find a common cause to the laws of the universe (its intention). The bad part is that those strings lead to a complex description of multiple dimensions and astronomical possible universes, with little in the way of practical value (so far). The intention of Quantum Computing is rather good, and it looks amazing on paper. You take a property of Nature and find a way to exploit it to accelerate computations. Attempting to do so leads to incredibly complex machines with little in the way of practical value (so far) compared to the investment of time and money.
The "exploiting physics" part is trivial--all computing systems do that, not just AHaH.
Yes and no. Quantum computing exploits superpositions to accelerate some types of problem beyond what would be possible if everything was explicitly calculated. AHaH Computing exploits adaptation at the hardware level to accelerate some types of problems beyond what would be possible if everything was explicitly calculated. They are similar in that regard, and that is what was being compared to. I’m sorry you can’t see or understand that.
Gordon-Panthana:
OK, I see. The author was negligent for not understanding that when you said “best…of quantum computing” that you really meant exploit physics, but using absolutely no quantum computing. It’s so obvious when you put it that way. I’m surprised the author missed it.
Alex Nugent, CEO Knowm Inc:
Dude, calm down. I did not say “negligent” I said “mis-representation of the original”. I am thrilled the author wrote about Knowm and AHaH Computing, and (thanks in large part to you pestering me about it), I want to make sure folks understand what was intended. What do you expect? Why are you so caught up with this? Didn’t you say you were moving on to other things?
Gordon-Panthana:
I think you missed the sarcasm…
Alex Nugent, CEO Knowm Inc:
Correct.
Subscribe To Our Newsletter
Join our low volume mailing list to receive the latest news and updates from our team.