Overview

The goal of the ICRC conference is to “discover and foster novel methodologies to reinvent computing technology, including new materials and physics, devices and circuits, system and network architectures, and algorithms and software”. The conference took place on the 5th floor of the Ritz-Carlton in McLean, VA. It was relatively small conference, about 200 in attendance, with two simultaneous sessions and talks focusing in general on neuromorphics, quantum computing and photonics. Naturally, Knowm Inc. was there and taking notes.

Wednesday Nov. 8

DARPA’s Vision for the Future of Computing

Dr. Hava Siegelmann (DARPA MTO)

DARPA Life Long Learning Program ( L2M) Dr. Hava Siegelmann

DARPA Life Long Learning Program ( L2M) Dr. Hava Siegelmann

  1. DoD considers AI/ML a strong tool. Countries are rushing to acquire ML, future security is dependent on it.
  2. Must think all the way through including fundamentals: software, hardware, materials, etc.
  3. Self-Driving cars, emphasized that accidents by Google, uber and telsa was not the car’s fault.
  4. AI is successful but brittle: We want rigor of automation with the flexibility of human.
    1. Catastrophic forgetting a big problem.
  5. LifeLongLearning Program: develop fundamentally new ML mechanisms that enable systems to improve their performance over their lifetimes.
  6. Adapt to new conditions is the big goal.
    1. “Develop with data from Afghanistan…deploy in Syria.”
  7. Current AI has two parts: (1) programs and rule, (2) parameter learning (ML)
  8. “Its not the strongest that survives; but…the one that is able to best to adapt..to the changing environment” L.C. Meginson, re “On the origin of the species”
  9. Nature’s mechanisms for change beyond preloaded programs:
    1. brain reconsolidating
    2. epigenetics
  10. Promoted her theoretical computer science book: “Neural Networks and Analog Computation“.
  11. Quoted Turing and pointed out that he did not see Turing machines as a basis for intelligent machines, instead he pointed to “unorganized machines”.
    1. “Electronic computers are intended to carry out any definite rule of thumb process…working in a disciplined but unintelligent manner” (emphasis hers)
    2. “My contention is that machine can be constructed that will simulate the behavior of the human mind”
  12. Nature combines Turing machines with super-Turing computation.
    1. adapt as needed, changing their Turing parameters.
  13. Lifelong Learning Program Plan
    1. Continual learning
    2. Adaptation to new tasks and circumstances
    3. Goal-driven perception
    4. Selective plasticity
    5. Safety and monitoring
  14. “We have already picked teams, but the way to join us is the “Center Group”
    1. Led by Government with one technical representative from each performer.
    2. Unclear how others can participate.
DARPA Life Long Learning Program ( L2M), Dr. Hava Siegelmann, Center Group

DARPA Life Long Learning Program ( L2M), Dr. Hava Siegelmann, Center Group

Continuously learning systems with high biological realism

Karlheinz Meier

  1. “Focus on finding the principles”
  2. “The ability to test models” is missing from neuroscience.
  3. Two fundamentally different modeling approaches:
    1. numerical models: params stored as binary numbers
    2. physical model: params as physical quantities (voltage, current, charge, etc)
  4. Emphasized the structural organization of the brain.
    1. local spatial integration–directionality–connectivity–hierarchy
  5. Emphasized time and temporal integration (STDP, sparse information coding via timing and time correlations, etc)
  6. Why use spikes: energy efficiency, scalability, computational advantages (to be proven)
  7. Emphasized that there are 7 to 11 orders of magnitude in space and time across various structures and temporal operation of brains.
  8. Comparison of digital and analog and mixed signal
    1. Mixed Signal: local analog computation
    2. Binary communication by spikes
    3. Signal restored in neuron (implies scalability)
  9. Neuromorphic implementations:
    1. In increasing biological realism: Spiknnaker–>IBM–>BrainScales
  10. Review of brainscales processing unit circuit, unit, whole system
  11. Attempted to do deep neural networks. Illustrated back propagation cycle.
Prof. Karlheinz Meier (Heidelberg University) Continuously Learning Neuromorphic Systems with High Biological Realism, Back-Propagation Cycle

Prof. Karlheinz Meier (Heidelberg University) Continuously Learning Neuromorphic Systems with High Biological Realism, Back-Propagation Cycle

Referenced Arxiv Paper: Neuromorphic Hardware In The Loop: Training a Deep Spiking Network on the BrainScaleS Wafer-Scale System

  1. Spiking Boltzmann machines: learn internal stochastic model of input space: generate or discriminate
    1. Stochasticity comes from where?
    2. external noise source?
    3. noisy components?
    4. ongoing network spiking activity!
    5. Running functional networks is noise for another network.
    6. Arxiv Paper: Deterministic networks for probabilistic computing
  2. Device variability: good or bad? reviewed study where spiking neural network parameters were varied.
    1. use or reduce?
    2. ignore or calibrate?
    3. important to understand for very deep sub-micron and nano devices!
  3. Biological self-organization and learning introduce useful variability
  4. Next: brainscalesS-2: local learning, homeostasis, dendritic computation
  5. The HICANN-X Chip (next generation BrainsScalesS)
    1. Embedded SIMD plasticity processing unit (PPU) (see slide pic)
    2. example shown to learning a target firing rate
    3. parallel adjustment of neuron parameters.
BrainScaleS Next-Generation : The HICANN-X Chip

BrainScaleS Next-Generation : The HICANN-X Chip

  1. Backpropagation activated calcium (BAC) firing for feature binding and recognition
  2. Active dendrites examples
  3. Design of a 5000 wafer system underway

A Spike-Timing Neuromorphic Architecture

Aaron J. Hill (Sandia National Laboratories)

Aaron J. Hill (Sandia National Laboratories) A Spike-Timing Neuromorphic Architecture

Aaron J. Hill (Sandia National Laboratories)
A Spike-Timing Neuromorphic Architecture

  1. Challenges for neuromorphic hardware:
    1. Low computation power
    2. Large population of neurons
    3. Real time learning
    4. High fan in
    5. Temporally coded information
  2. Spiking temporal processing unit
    1. Support high fidelity spike timing dynamics
    2. Simple leaky integrate and fire neuron model with 3 parameters
    3. leak rate
    4. Temporal buffer for synaptic delays
    5. Supports arbitrarily connected networks with configurable weights
  3. Large off chip memory contains all synaptic information
    1. pre synaptic neuron designator
    2. synaptic weight
    3. synaptic delay
    4. post-synaptic neuron designator
  4. Spike transfer structure “the heart of the system”
    1. spike transfer structure communicates directly to the neuronal processing units
    2. “LIF” neuron equations, taking into consideration the temporal buffer
  5. Output spike consolidation
    1. three-stage consolidator
    2. pipelined efficiency
    3. each stage operates in parallel
    4. number of stages is a parameter
  6. Hardware development
    1. Nallatech 385A FPGA Accelerator Card
    2. 2048 neurons in first test bed
    3. think they can get 4096, will take manual place and route
    4. 32 deep parallel buffer
    5. 16MB of synapse memory
      1. 18 bits per weight
      2. 6 bits per delay
      3. 8 bits post synaptic neuron
      4. 2048 X 2048 total synapses
  7. Power measurements on Nallatech 385A
    1. ~21 nJ per event for 2048 to 32 neurons
    2. (only looked at dynamic power, wanted to remove FPGA overhead)
  8. Software:
    1. uses home grown neural modeling tool: “neurons to algorithms”
    2. object oriented
    3. declarative, not procedural
    4. parts are inheritable and extensible
    5. back end designed for the STPU
  9. Liquid state machine was tested
    1. “can increase accuracy by expanding spike information across time”: synaptic response functions
    2. reviewed effects of performance on various transfer functions
    3. STPU results on LSM
    4. audio signals dataset post processed into cepstral coefficients
      1. 87.3% accuracy on zeros
      2. 84.6 accuracy on ones
    5. Particle Image Velocimetry
      1. Computing cross-correlation
    6. Spike Optimization
      1. Implement fundamental mathematical operations through temporal coding
        1. spikesort
        2. spikemin, spikemax, spikemedian
        3. spikeOpt(median)
      2. implemented on STPU hardware

Q; how to design how deep to make temporal buffer
A: easy: not a lot of thinking. 32 and 64 come directly from hardware constraint.

Q: what are prospects for coding like an FFT
A: speaker: I dont know. attendant: they are good. we are working on it.

Q: in neural compiler flow, can you use something other than FPGA in mapping. Can representation be standardize.
A: Not sure where we are in this process

Feature Learning using Synaptic Competition in a DynamicallySized

Stanislaw Wozniak (IBM Research, Zurich)

Stanislaw Wozniak (IBM Research, Zurich) Feature Learning using Synaptic Competition in a DynamicallySized Neuromorphic Architecture

Stanislaw Wozniak (IBM Research, Zurich)
Feature Learning using Synaptic Competition in a DynamicallySized
Neuromorphic Architecture

  1. “the era of data-centric cognitive computing”
    1. Challenges: power due to von Neumann architecture
    2. Goal: rethink computing
    3. neurommorphic
    4. phase-change memristors
  2. Talked about abstraction levels across scale, from molecular to whole brain
    1. most popular abstraction: artificial neural network
    2. this work focuses on biologically realistic networks: stateful models that work in time.
  3. Use model as blue print for hardware.
    1. spiking neurons, integrate and fire.
  4. Learning feedback through STDP
    1. using phase change memory to implement STDP (see slide pic)
    2. use for mixed analog-digital networks
    3. digital communication
    4. simple and reliable (binary spikes)
    5. analog processing
    6. efficient using memristors
  5. Prototype uses crossbar array
  6. Use single device to store weight
    1. Use Kirchoffs law for operations
  7. Talked about going from one layer to two or more layers on MNIST
  8. Basic overview of learning features using “weather glyfs”
    1. Synaptic Feature Learning
    2. WTA/lateral inhibition
    3. Synaptic competition
      1. Limited resources related to plasticity
      2. WTA dynamics at the synapses
    4. “Representation overflow” by using an “overflow neuron” that has equal low magnitude weights.
    5. Limitations: by design it limits activation to a single neuron. Results in low F score
      1. we use WTA for learning, but report activity with WTA disabled
Stanislaw Wozniak (IBM Research, Zurich) Feature Learning using Synaptic Competition in a DynamicallySized Neuromorphic Architecture

Stanislaw Wozniak (IBM Research, Zurich)
Feature Learning using Synaptic Competition in a DynamicallySized
Neuromorphic Architecture

Q: is the phase reversible?
A: can go between two phases, but high conductance to low conductance is abrupt

Q: how many bits can be stored in memristors?
A: I think 4 bits can be stored.

Q: how faithfully can you reproduce intermediate states
A: we care more about logic of execution. we can read back at accuracy that is sufficient

Q: how to stop learning?
A: never sure what new patterns arrive. ideally never stop learning. Note: have they tested method on real-world noisy data?

Achieving swarm intelligence with spiking neural oscillators

Yan Fang

Yan Fang (University of Pittsburgh) Achieving Swarm Intelligence with Spiking Neural Oscillators

Yan Fang (University of Pittsburgh)
Achieving Swarm Intelligence with Spiking Neural Oscillators

  1. motivation: neuromorphic computing
    1. computational model
    2. processor architecture
  2. Motivation: swarm intelligence
    1. SI algorithm
    2. ant colony, firefly, particle swarm, bees
    3. applications
      1. optimization
      2. scheduling
      3. path planning
  3. Can we bride two models?
  4. first step: neuromorphic computing for SI algorithm
  5. generalized swarm intelligence algorithms (see slide pic)
  6. use leaky integrate and fire model
  7. prepare an m by n array of neurons to represent each paramter in every agent

Q: any target optimization problem?
A: looking at traveling salesman

Q: have you compared to traditional algorithms?
A: Sort of, compared to other swarm models

Energy Efficient single flux quantum based neuromorphic computing

Mike Schneider

Michael Schneider (National Inst. of Standards and Technology) Energy Efficient Single Flux Quantum Based Neuromorphic Computing

Michael Schneider (National Inst. of Standards and Technology)
Energy Efficient Single Flux Quantum Based Neuromorphic
Computing

  1. review of josephson junction
  2. review of circuit model of JJ
  3. play around with how FM thickness affects critical current density.
  4. magnetic nanoclusters in a josephson junction
  5. “nice analog tunable moment change”
  6. Makes a claim that large-scale “image net” system would reduce to 1 watt, including cooling.
Michael Schneider (National Inst. of Standards and Technology) Energy Efficient Single Flux Quantum Based Neuromorphic Computing

Michael Schneider (National Inst. of Standards and Technology)
Energy Efficient Single Flux Quantum Based Neuromorphic
Computing

“takes a killowatt to cool a watt”

During Q/A:
1. admitted that driving information into the network would be a major challenge that is not at all a solved problem.
1. did not clearly answer how this would be better than a memristive processor.

Improved Deep Neural Network hardware accelerators basd on non-volatle memory: the local gains technique

Geoffrey Burr

  1. AI as driven by DNNs
    1. Image recognition, speech recognition, machine translation
  2. Multiply-Accumulate is key operation for DNN
  3. Approach uses pairs of memristor devices for weights/synapses.
  4. want to do both inference and learning via backprop on-chip
    1. need highly parallel on-chip circuitry
  5. “our existing phase change memristor are not symmetrical enough”
    1. need to invent new RRAM materials
    2. we need “tricks”. Thats my job: to come up with tricks.
  6. We are going to have a grid that sort of looks like True North.
  7. We now can get hardware results which are exactly equivalent to software
  8. Local Gains Technique
    1. Attempted to redo previous work and duplicate findings, found it did not match.
    2. Looked into various reasons why, like node activation function, could not find reason
    3. Found the reason:
    4. “I had succeeded with my experiment with a particular configuration, I tried it twenty times and one of those times it worked. I had one weekend left to write the paper, and so I wrote this simulator that wrapped around that one particular configuration and learning rate and hyperbolic tangent space. When the students came back they optimized both of those.”
  9. Local Gains Technique
    1. Old technique. related to momentum, found out about it on a Hinton video.
    2. Encourage weights that have a ‘direction’
    3. Discourage weights that cant make up their mind.
  10. Local Gains does not work for typical computer scientists
    1. too hard to tune parameters.
    2. may work for PCM engineer, since PCM has a non-linear response in incrementation.
    3. The issue is that many weights are being pulled up and down, and if those do not balance then you end up in trouble.
  11. We did a nice study of coefficients, seems tolerant to many settings.

  12. We have a thing called a “safety margin”.
    1. (appears related to classification margin)
Geoffrey Burr (IBM Research, Almaden) Improved Deep Neural Network Hardware Accelerators Based on Non-Volatile-Memory: the Local Gains Technique

Geoffrey Burr (IBM Research, Almaden)
Improved Deep Neural Network Hardware Accelerators Based on
Non-Volatile-Memory: the Local Gains Technique

A Comparison Between Single Purpose and Flexible Neuromorphic Processor Designs

David Mountain (US Department of Defense)

David Mountain (US Department of Defense) A Comparison Between Single Purpose and Flexible Neuromorphic Processor Designs

David Mountain (US Department of Defense)
A Comparison Between Single Purpose and Flexible
Neuromorphic Processor Designs

  1. Benchmarks:
    1. mnist
    2. CSlite-malware detection application
    3. very digital, nothing “neuromorphic”
    4. back end detector like a classifier
    5. 5800 neurons in 6 layers
    6. AES-256
    7. pure digital
    8. 12500 neurons in 8 layers
  2. memristor array architecture
    1. Neuron type is a multiply accumulate with threshold gate neuron.
    2. Uses differential pair synapses for weights.
David Mountain (US Department of Defense) A Comparison Between Single Purpose and Flexible Neuromorphic Processor Designs

David Mountain (US Department of Defense)
A Comparison Between Single Purpose and Flexible
Neuromorphic Processor Designs

  1. Tile concept, bypass comparators and pass to next tile (’tile feature’).
  2. Tile design is flexible, since array size is limiting to application
  3. Also tested hierarchical routing.
    1. All-to-All (A2A) in a tree architecture (based on functional ASIC design in 90nm)
  4. Evaluation Methodology
    1. Calculate the area for each design
    2. Calculate worst-case timing for each design
    3. calculate power for each design
    4. calculate T/W (throughput/watt) and T/A (throughput/area
  5. Conclusions
    1. Flexible design using tiles and A2A switch is practical compared to special-purpose design.
    2. the tile feature clearly provides value
    3. A2A network is superior to 2D mesh.

In-Memory Execution of Compute Kernels using flow-based memristirve crossbar computing

Dwaipayan Chakraborty

  1. Objective: automated synthesis of C program, translate to crossbar execution.
    1. restricted sub-set of lanaguage because you cant do everything with crossbars
  2. problem: “every crossbar is different”. have to make specialized design for every function
  3. utilizing snea kpaths for computation
    1. any boolean formula can be mapped to a crossbar using flow-based computing
    2. Advantages
    3. Exploits non-volatility of memristors
    4. leverages flow through the crossbar structure
    5. fast & energy efficient in many cases
    6. Disadvantages
    7. designing flow-based computing circuits may be computationally hard.
    8. intuitive understanding of flow-based computing is difficult.
Sumit Kumar Jha (University of Central Florida ) Flow-based Non-volatile Memory Crossbar Accelerators for Parallel Computations

Sumit Kumar Jha (University of Central Florida )
Flow-based Non-volatile Memory Crossbar Accelerators for
Parallel Computations

For more information see this video

VoiceHD: Hyperdimensional Computing for Efficient Speech Recognition

Mohsen Imani (University of California, San Diego)

  1. Deep learning is changing our lives
  2. power consumption is out of control
    1. Lee Sedol is 50,000X more efficient than alpha go
  3. Energy efficiency
    1. mobile: battery
    2. cloud: cost
  4. Hardware scalability
  5. Hyperdimensional Computing
    1. take all images of cats (or any other datatype) and “encode a hypervector”
    2. each hypervector is a model of cat or dog
    3. look for similarity in hypervectors
    4. any data is encoded and compressed
    5. encoding is reversible (can reconstruct input from encoded vector)
    6. can be implemented in DRAM

Embedding in Neural Networks: A-priori Design of Hybrid Computers for Prediction

Bicky Marquez (Institute FEMTO-ST)

  1. Classification using delayed phontonic systems
  2. generally going to higher dimensional allows linear separation
  3. if you can satisfy these, you have a reservoir computer:
    1. approximation property
    2. separation property
    3. fading memory property
  4. Instead using random network, they use a delayed (ring) system
  5. spoken digit recognition, TI-46 corpus, spoke digits from zero to nine
    1. only have 1000 neurons, but can achieves a performance of 1 million digit recognitions per second
  6. predict chaotic system (mackey-glass system)
    1. predictable up to a time, unpredictable after
  7. random recurrent neural networks, their system worked better (predicted longer out)

Convolutional Drift Networks for Spatio-Temporal Processing

Dillon Graham (Rochester Institute of Technology)

  1. large volume of video data is being generated
  2. current approach are very task-specific, costly to train or both
    1. hand crafted features are commonly used
    2. Our approach: develop a new neural net architecture with properties
    3. capable of video activity classification
    4. general architecture
    5. minimal training cost
    6. classification performance competitive with SoTA
  3. Experiment
    1. can our new approach perform competitively?
    2. task: video level activity classification
    3. data: two first person video datasets
  4. Combine deep learning and reservoir computing to build a powerful and efficient NN architecture for spatiotemporal data
    1. observation: all reservoirs converged to same accuracy, but bigger reservoir converged faster

A New Approach for Multi-Valued Computing Using Machine

Learning

Wafi Danesh (University of Missouri, Kansas City)

  1. moving beyond CMOS
  2. emergence of an array of devices to continue scaling
    1. need to leverage intrinsic properties
    2. beyond CMOS device have unique intrinsic characteristics
      1. more than a switch
      2. analog/multi state
      3. ultra low power
      4. inherently non volatile
      5. different domains (magnetic/electric/magnetoelectric
  3. Proposed MVL (multi-valued logic) synthesis approach
    1. Random MVL function decomposed to a set of linear equations
    2. Adaptable to any technology
    3. Scaled with circuit size
  4. New MVL algorithm
    1. 3 steps
      1. domain selection
      2. linear regression
      3. pattern matching
  5. Quaternary Multiplier example

Thursday Nov. 9

On Thermodynamics and the Future of Computing

Todd Hylton (University of California, San Diego)

Todd Hylton (University of California, San Diego) On Thermodynamics and the Future of Computing

Todd Hylton (University of California, San Diego)
On Thermodynamics and the Future of Computing

  1. A different way to think about what we do in the field
    1. how we might really reboot computing in a different way
  2. The primary problem is that computers cannot organize themselves
    1. they have narrow focus, rudimentary AI capabilities
  3. Our mechanistic approach to the problem
    1. machines are the sum of their part
    2. machines are disconnected from the world except hough us
  4. The world is not machine
  5. Thermodynamic Computing Hypothesis
  6. Why Thermodynamics?
    1. its universal
    2. its temporal
    3. its efficient
  7. Thermodynamics drives the evolution of everything
  8. Thermodynamic evolution is the missing, unifying concept in computing (and many other domains)
  9. Electronic systems are well suited for thermodynamic evolution
  10. Thermodynamics should be the principle concept in future computing systems
  11. Review of thermodynamics in closed and open systems
  12. Examples of self organization in nature (Alex Nugent’s slide from DARPA days)
  13. Arbortron video from Stanford
  14. Life and modern electronic systems comparison
Todd Hylton (University of California, San Diego) On Thermodynamics and the Future of Computing

Todd Hylton (University of California, San Diego)
On Thermodynamics and the Future of Computing

  1. Thermodynamic Evoltuion Hypothesis
    1. Thermodynamic evolution supposes that all organization spontaneously emerges in order to use sources of free energy in the universe and that there is competition for this energy.
    2. Thermodynamic evolution is second law, except that it adds the idea that in order for entropy to increase an organization must emerge that makes it possible to access free energy.
    3. The first law of thermodynamics implies that there is competition for free energy
  2. Basic proposed demonstration architecture
    1. evolvable cores in a network that evolve to move energy from source to sink
  3. Thermodynamic Bit review
  4. Related ideas
    1. Free energy principle
    2. Thermodynamics of Prediction
    3. Causal Entropic Forcing

More information on Thermodynamic Computing

A Unified Hardware/Software Co-Design Framework for

Neuromorphic Computing Devices and Applications

James Plank (University of Tennessee, Knoxville)

James Plank (University of Tennessee, Knoxville) A Unified Hardware/Software Co-Design Framework f

James Plank (University of Tennessee, Knoxville)
A Unified Hardware/Software Co-Design Framework f

  1. Software Core
    1. spiking, highly recurrent NN
    2. implements common functionality or provides and interface for other components to implement
  2. Application Device operations
    1. load network
    2. serialize network
    3. apply input spikes
    4. read output spikes
    5. run
  3. “Applications want to use numerical events, devices want to use spikes”
    1. the core provides support for converting application<->device
  4. Devices/Architectures
    1. NIDA
    2. DANNA
    3. mrDANNA
  5. Applications
    1. Control
    2. Classification
    3. Security
    4. Micro-Applications
  6. Mr. DANNA
    1. Halfnium Oxide devices
    2. Differential Pair memristors (2-1 configuration)
    3. Cadence-Spectre simulations
    4. 8 synapse neurons

Impact of Linearity and Write Noise of Analog Resistive Memory Devices in a Neural Algorithm Accelerator

Robin Jacobs-Gedrim (Sandia National Laboratories)

Robin Jacobs-Gedrim (Sandia National Laboratories) Impact of Linearity and Write Noise of Analog Resistive Memory Devices in a Neural Algorithm Accelerator

Robin Jacobs-Gedrim (Sandia National Laboratories)
Impact of Linearity and Write Noise of Analog Resistive
Memory Devices in a Neural Algorithm Accelerator

  1. Example of Google deep learning study
  2. energy/space comparison
  3. Review of matrix vector multiply with crossbar
  4. Symmetry/Asymmetry of pulsing incrementation of memristors
    1. How much effect does this have on accuracy?
Robin Jacobs-Gedrim (Sandia National Laboratories) Impact of Linearity and Write Noise of Analog Resistive Memory Devices in a Neural Algorithm Accelerator

Robin Jacobs-Gedrim (Sandia National Laboratories)
Impact of Linearity and Write Noise of Analog Resistive
Memory Devices in a Neural Algorithm Accelerator

Paper: Resistive memory device requirements for a neural algorithm accelerator

  1. Comparison of various types of memristors
    1. SiO2-Cu
    2. TaOx
    3. Ag-Chalcogenide varient (likely from ASU?)
  2. All devices had a nonlinear response that negatively impacted accuracy
    1. TaOx has lowest noise
  3. High resistance devices fabricated with an “Al2O3 Bilayer”
    1. integrated tunneling barrier to increase resistance
    2. devices needed to be formed
    3. poor/noisy incrementation response
  4. Differential pair memristor representation
Robin Jacobs-Gedrim (Sandia National Laboratories) Impact of Linearity and Write Noise of Analog Resistive Memory Devices in a Neural Algorithm Accelerator

Robin Jacobs-Gedrim (Sandia National Laboratories)
Impact of Linearity and Write Noise of Analog Resistive
Memory Devices in a Neural Algorithm Accelerator

An Energy-Efficient Mixed-Signal Neuron for Inherently ErrorResilient

Neuromorphic Systems

Baibhab Chatterjee (Purdue University)

Baibhab Chatterjee (Purdue University) An Energy-Efficient Mixed-Signal Neuron for Inherently ErrorResilient Neuromorphic Systems

Baibhab Chatterjee (Purdue University)
An Energy-Efficient Mixed-Signal Neuron for Inherently ErrorResilient
Neuromorphic Systems

  1. Can we reduce Multiply Accumulate (MAC) energy by 100X?
  2. what are the Bottlenecks? Could Be:
    1. energy for computation
    2. energy for communication
    3. memory fetch energy
    4. architecture/algorithms
  3. Digital MAC Review
    1. number of transistors increase quickly as number of bits increases
    2. power at low frequency is dominated by leakage
    3. high frequency is dynamic power
  4. Mixed signal MAC
    1. 1000X better at low frequency, 100X at higher frequency
    2. higher noise, non-linearities are tradeoffs
Baibhab Chatterjee (Purdue University) An Energy-Efficient Mixed-Signal Neuron for Inherently ErrorResilient Neuromorphic Systems

Baibhab Chatterjee (Purdue University)
An Energy-Efficient Mixed-Signal Neuron for Inherently ErrorResilient
Neuromorphic Systems

  1. “Noise Factor” analysis developed and used to evaluate in simulations against multiple benchmarks, relating noise factor to performance degredation.

Borrowing from Nature to Build Better Computers

Prof. Luis Ceze (University of Washington)

  1. Molecular Data Storage
  2. Stored “this too shall pass” video in DNA, and other stuff related to DNA as a storage medium.

Rebooting the Data Access Hierarchy in Computing Systems

Wen-mei Hwu (University of Illinois, UrbanaChampaign)

  1. data access challenges
  2. IBM Illinois Erudite project
  3. The reason some apps cant be run on GPU is a data access problem.
  4. Volta/HBM2, 900GB/s bandwidth,225 Giga SP operands/cycle
    1. each operands must be used 62.3 times once fetched to achieve peak FLOPS rate
    2. sustain<1.6% of peak without data reuse
  5. Volta-Host DDR3
    1. operands must be used 700 times once fetch to achieve peak FLOP
    2. sustain<.14% peak without data reuse
  6. Volta-FLASH, 16GB/S PCIe3
    1. operands must be used 3507 times once fetched to achieve peak FLOPS
    2. sustain .03% of peak without data reuse
  7. Large Problem challenge
    1. solving larger problems motivates continued growth of computing capability
      1. inverse solvers for science and engineering apps
      2. matrix factorization and graph traversal for analytixs
    2. as problems size grows
      1. fast, low complexity algorithms win
      2. sparsity increases, iterative methods win
      3. data reuse diminished
  8. Erudite Project
    1. Work done at IBM Illinois C3SR
  9. computation types
    1. low-complexity iterative solver algorithms
    2. graphs analytics
      1. inference, search, counting
    3. large cognitive application
      1. large multi-model classifiers
  10. to achieve performance:
    1. elimination of file-system software overhead for engaging in large datasets
    2. placement of computation appropriately in the memory and storage hierarchy
    3. highly optimized kernel synthesis
    4. collaborative heterogeneous acceleration
  11. Step 1: remove file system from data access path, get ride of storage
  12. Step 2: place NMA compute inside memory system, 100+ GFLOPS NMA compute into DDR/Flash memory system (~10TBs)
    1. Erudite NMA board 1.0
      1. develop a principled methodology for acceleration
      2. throughput proportional to capacity
        1. 1 GFLOPS/10GB sustained
        2. 100 GFLOPS sustained
  13. Step 3: collaborative heterogeneous computing (Chai)
  14. Unacceptable latency moving data between CPU and GPU.
  15. Research Agenda
    1. Package-level integration
      1. optical interconnects in package?
      2. collaboration support for heterogeneous devices
      3. virtual address translation
    2. System software revolution
      1. persistent objects for multi-language environments
      2. directory and mapping of very large persistent objects
    3. power consumption in memory
      1. much higher memory level parallelism needed for flash based memories
      2. latency vs. throughput oriented memories
  16. Conclusion and Outlook
    1. drivers for computing capabilities
      1. large-scale inverse problems with natural data inputs
      2. machine learning based applications
    2. Erudite cognitive computing systems project
      1. removing file system bottleneck from access paths to large datasets
      2. placing compute into the appropriate levels of the memory system hierarchy
      3. memory parallelism proportional to the data capacity
      4. collaborative nms execution with CPU and GPUs
      5. >100x improvement in power efficiency and performance

“sparse is more of our target than dense”

The Superstrider Architecture: Integrating Logic and Memory towards non-von Neumann Computing

Sriseshan Srikanth (Georgia Institute of Technology)

Sriseshan Srikanth (Georgia Institute of Technology) The Superstrider Architecture: Integrating Logic and Memory towards non-von Neumann Computing

Sriseshan Srikanth (Georgia Institute of Technology)
The Superstrider Architecture: Integrating Logic and Memory
towards non-von Neumann Computing

  1. superstrider: geared toward processing sparse data streams
  2. moving data is expensive, time and energy
  3. logic-memory integration,
    1. vertical integration with 3D stacking
    2. micron hybrid memory cube
Sriseshan Srikanth (Georgia Institute of Technology) The Superstrider Architecture: Integrating Logic and Memory towards non-von Neumann Computing

Sriseshan Srikanth (Georgia Institute of Technology)
The Superstrider Architecture: Integrating Logic and Memory
towards non-von Neumann Computing

  1. superstrider
    1. acceleration reduction of sparse data streams using logic-memory 3D integration
    2. accumulation of phase of SpGEMM
      1. near perfect cache mis-rate for sparese data
    3. data organized in binary tree
    4. key principles
      1. memory rows organized as a binary tree with k records (sorted by key_ and a pivot(key) per row
      2. memory access and computation granularity:
        1. 1 memory row of sorted records
        2. SIMD style operation tightly integrated with wide memory words
      3. Sorted invariant
        1. two N/2 length pre-sorted vectors can be merged in log2(N) stages
        2. novel algorithms
    5. Long examples of storing sparse data pairs given

NNgine: Ultra-Efficient Nearest Neighbor Accelerator Based on In-Memory Computing

Mohsen Imani (University of California, San Diego)

Mohsen Imani (University of California, San Diego) NNgine: Ultra-Efficient Nearest Neighbor Accelerator Based on In-Memory Computing

Mohsen Imani (University of California, San Diego)
NNgine: Ultra-Efficient Nearest Neighbor Accelerator Based on
In-Memory Computing

  1. Big data processing with general purpose processing
    1. can todays system process big data?
    2. [cores/memory with separation shown]
  2. Cost of memory access is much higher than computation
    1. DRAM read: 640pJ
    2. 8b add: .03pJ
    3. DRAM consume 170x more energy than FPU mult
  3. Processing in memory (PIM)
    1. perform a part of computation tasks inside the memory
  4. Supporting in-memory operations
    1. bitwise
      1. OR, AND, XOR
    2. Search operation
      1. nearest search
      2. clustering
      3. classification
      4. database
    3. Addition Multiplication
      1. matrix multiplications
      2. deep learning
Mohsen Imani (University of California, San Diego) NNgine: Ultra-Efficient Nearest Neighbor Accelerator Based on In-Memory Computing

Mohsen Imani (University of California, San Diego)
NNgine: Ultra-Efficient Nearest Neighbor Accelerator Based on
In-Memory Computing

  1. Related publications
Mohsen Imani (University of California, San Diego) NNgine: Ultra-Efficient Nearest Neighbor Accelerator Based on In-Memory Computing

Mohsen Imani (University of California, San Diego)
NNgine: Ultra-Efficient Nearest Neighbor Accelerator Based on
In-Memory Computing

  1. Crossbar NOR operation example
    1. voltage divider with lower elements encoding each input. if any element is on, will 1. pull down voltage and you get OR logic operation.
  2. NORE based addition
  3. Fast In-Memory Addition
    1. add multiple numbers in four stages
      1. additions in the same stage of execution are independent and can occur in parallel
      2. speed up comes at th cost of increased energy consumption and number of writes in memory
      3. last state dominates the addition latency.
  4. NNgine: KNN accelerator
    1. nearest neighbor search accelerator
    2. performing the search operation inside the memory next ot DRAM
  5. kNN review
  6. kNN in GPU performance review, execution time, cache hit, dataset size
  7. Content addressable memory (CAM)
    1. Nearest Neighbor search
      1. find similar rows in parallel using hammming distance criteria
      2. count HDs based on timing characteristics of discharging current
  8. NNAM Architecture overview
    1: DVS: applies voltage over scaling to block
    any row with the higher match bits discharges first
    2: NN detector: sense the number of matched lines and notify the controller
    3: Controller dynamically adjust voltage, based on the number of discharge row
  9. Use a simple analog design to sample match lines, accumulate currents, compare to thresholds
  10. NNgine energy improvement: 349X
  11. AdaBoost acceleration

Socrates-D: Multicore Architecture for On-line Learning

Tarek Taha (University of Dayton)

  1. have a system that can learning continuously at low power. uses:
    1. robotics
    2. personal devices
    3. low power deployed systems
  2. what is done now: Continuous training on cloud
    1. requires network access
    2. privacy risk
    3. slow and high energy
  3. Multicore near memory computations
  4. both learning and inference capability
  5. overview of backprop
  6. on the forward pass, the forward matrix is large. in backward pass we have to access in transpose form
    1. to handle transpose matrix we have two weight matrix memories for forward and transposed forms
    2. dual matrix approach not necessary, but runtime is much longer and power consumption is lower.
  7. Static Routine and Dynamic Routing implemented in simulations.
    1. Static Routing is significantly more efficient, but a problem:
    2. connections are dedicated, which will block routining through cores.
    3. solved this problem with time multiplexing
  8. distribute large networks across cores by partitioning network and using another core to add partial sums from partitioned cores.

Computing Based on Material Training: Application to Binary

Classification Problems

Eleonore Vissol-Gaudin (Durham University)

Eleonore Vissol-Gaudin (Durham University) Computing Based on Material Training: Application to Binary Classification Problems

Eleonore Vissol-Gaudin (Durham University)
Computing Based on Material Training: Application to Binary
Classification Problems

  1. Use evolutionary algorithms
  2. explore and exploit unconfigured materials
  3. perform a computation
  4. computer-hardware interface (custom made)-material
  5. materials have a non-linear input output behavior
  6. non-biological material is considered better
  7. used carbon nanotubes dispersed in liquid crystal
  8. treat training as an optimization function
    1. define an objective function
    2. define a set of decision variables
  9. supervised learning approach
    1. divide dataset into training and verification
    2. send the training set to black box
    3. modify configuration signals
  10. signal are controlled by an evolutionary algorithm
  11. use custom motherboard “evolvable motherboard”
  12. liquid crystal provides non-conductive medium for carbon nanotubes, which form percolation paths between electrodes during applicaiton of voltage across electrodes
Eleonore Vissol-Gaudin (Durham University) Computing Based on Material Training: Application to Binary Classification Problems

Eleonore Vissol-Gaudin (Durham University)
Computing Based on Material Training: Application to Binary
Classification Problems

  1. evolutionary algorithm
    1. stochastic
    2. derivation-free
    3. iterative
  2. Differential Evolution
    1. selection, crossover, mutation
  3. Questions they want to answer:
    1. can the material classify dataset restrain after being retrained
    2. can it provide solution comparable to ML techniques
  4. use artificial binary 2D datasets
    1. linear
    2. non-linear
  5. used Fischer criteria to gauge complexity of classification
  6. Main observations
    1. contribution of material state to classifier non-negligible
    2. the same sample can be retrained for at least two problems
    3. modification in the state do not fully destroy original solution
  7. applied to two data sets
    1. worse on mamographic mass dataset
    2. bupa liver disorder data are comparable to NN

Nonlinear Dynamics and Chaos for Flexible, Reconfigurable

Computing

Benham Kia (North Carolina State University)

  1. New reconfigurable hardware that be instantly reprogrammed to implement many different functions
  2. nonlinearity as a source of variability
    1. simple nonlinear system can exhibit diverse, complex behaviors
  3. Example: songbird
    1. reference songbird paper: M. Fee, et al, Nature 395 1998
    2. the vocal organ of birds are nonlinear cavity that produce complex song.
    3. by changing the input parameters, circuit produces different songs
  4. Chaos is not random
  5. Chaos=infinite number of unstable “modes” without any stable condition
  6. Paper title: “a simple nonlinear circuit contains an infinite number of functions”
    1. reference table 1 in paper.
  7. “a dynamic system is nothing more than an embodiment of a function”
  8. Used Mosis to fabricate a series of circuits (four generations), starting in 2014.
  9. It can implement a different instruction at each clock cycle with special thanks to instant re programmability
    1. enabled application: analog to digital conversion and noise filtering
    2. analog input–>(control)–>digital sequence
  10. Focus application:
    1. extend typical signal processing chain to:
      1. filter noise from analog signal
      2. convert to analog signal to digital
      3. perform reconfigurable computing
      4. implement multiplication efficiently
      5. evolve or adapt
      6. all with minimal power and silicon area

A Thermodynamic Treatment of Intelligent Systems

Natesh Ganesh (University of Massachusetts, Amherst)

  1. what are the thermodynamic conditions under which physical systems learn?
  2. discussed the difference between ‘self assembly’ (non-dissipative) vs ‘self-organized’ (dissipative, non-equilibrium)
  3. if you remove energy source, the structure decays in a self-organized system
  4. review of fluctuation theorems for non-equilibrium thermodynamics dissipation in finite state automata
  5. what is the relationship between dissipation and intelligence?
    1. intelligence: can use past input to prediction future inputs
  6. average change in total entropy of system and bath = fluctuations about the mean
  7. under the right conditions: learning is synonymous with the energy efficient dynamics of the system
  8. thermodynamic computing
    1. a new engineering paradigm that will combine thermdynamics and information theory
  9. highlighted UCLA silver nanowire work.

Related Posts

Subscribe To Our Newsletter

Join our low volume mailing list to receive the latest news and updates from our team.

Leave a Comment

Knowm 32X32 Crossbar

Knowm Newsletter

Are you ready for memristor AI processors? With our newsletter, you will be.