Overview

The goal of the ICRC conference is to “discover and foster novel methodologies to reinvent computing technology, including new materials and physics, devices and circuits, system and network architectures, and algorithms and software”. The conference took place on the 5th floor of the Ritz-Carlton in McLean, VA. It was relatively small conference, about 200 in attendance, with two simultaneous sessions and talks focusing in general on neuromorphics, quantum computing and photonics. Naturally, Knowm Inc. was there and taking notes.

Wednesday Nov. 8

DARPA’s Vision for the Future of Computing

Dr. Hava Siegelmann (DARPA MTO)

DARPA Life Long Learning Program ( L2M) Dr. Hava Siegelmann

DoD considers AI/ML a strong tool. Countries are rushing to acquire ML, future security is dependent on it.
Must think all the way through including fundamentals: software, hardware, materials, etc.
Self-Driving cars, emphasized that accidents by Google, uber and telsa was not the car’s fault.
AI is successful but brittle: We want rigor of automation with the flexibility of human.
1. Catastrophic forgetting a big problem.
LifeLongLearning Program: develop fundamentally new ML mechanisms that enable systems to improve their performance over their lifetimes.
Adapt to new conditions is the big goal.
1. “Develop with data from Afghanistan…deploy in Syria.”
Current AI has two parts: (1) programs and rule, (2) parameter learning (ML)
“Its not the strongest that survives; but…the one that is able to best to adapt..to the changing environment” L.C. Meginson, re “On the origin of the species”
Nature’s mechanisms for change beyond preloaded programs:
1. brain reconsolidating
2. epigenetics
Promoted her theoretical computer science book: “Neural Networks and Analog Computation“.
Quoted Turing and pointed out that he did not see Turing machines as a basis for intelligent machines, instead he pointed to “unorganized machines”.
1. “Electronic computers are intended to carry out any definite rule of thumb process…working in a disciplined but unintelligent manner” (emphasis hers)
2. “My contention is that machine can be constructed that will simulate the behavior of the human mind”
Nature combines Turing machines with super-Turing computation.
1. adapt as needed, changing their Turing parameters.
Lifelong Learning Program Plan
1. Continual learning
2. Adaptation to new tasks and circumstances
3. Goal-driven perception
4. Selective plasticity
5. Safety and monitoring
“We have already picked teams, but the way to join us is the “Center Group”
1. Led by Government with one technical representative from each performer.
2. Unclear how others can participate.

DARPA Life Long Learning Program ( L2M), Dr. Hava Siegelmann, Center Group

Continuously learning systems with high biological realism

Karlheinz Meier

“Focus on finding the principles”
“The ability to test models” is missing from neuroscience.
Two fundamentally different modeling approaches:
1. numerical models: params stored as binary numbers
2. physical model: params as physical quantities (voltage, current, charge, etc)
Emphasized the structural organization of the brain.
1. local spatial integration–directionality–connectivity–hierarchy
Emphasized time and temporal integration (STDP, sparse information coding via timing and time correlations, etc)
Why use spikes: energy efficiency, scalability, computational advantages (to be proven)
Emphasized that there are 7 to 11 orders of magnitude in space and time across various structures and temporal operation of brains.
Comparison of digital and analog and mixed signal
1. Mixed Signal: local analog computation
2. Binary communication by spikes
3. Signal restored in neuron (implies scalability)
Neuromorphic implementations:
1. In increasing biological realism: Spiknnaker–>IBM–>BrainScales
Review of brainscales processing unit circuit, unit, whole system
Attempted to do deep neural networks. Illustrated back propagation cycle.

Prof. Karlheinz Meier (Heidelberg University) Continuously Learning Neuromorphic Systems with High Biological Realism, Back-Propagation Cycle

Referenced Arxiv Paper: Neuromorphic Hardware In The Loop: Training a Deep Spiking Network on the BrainScaleS Wafer-Scale System

Spiking Boltzmann machines: learn internal stochastic model of input space: generate or discriminate
1. Stochasticity comes from where?
2. external noise source?
3. noisy components?
4. ongoing network spiking activity!
5. Running functional networks is noise for another network.
6. Arxiv Paper: Deterministic networks for probabilistic computing
Device variability: good or bad? reviewed study where spiking neural network parameters were varied.
1. use or reduce?
2. ignore or calibrate?
3. important to understand for very deep sub-micron and nano devices!
Biological self-organization and learning introduce useful variability
Next: brainscalesS-2: local learning, homeostasis, dendritic computation
The HICANN-X Chip (next generation BrainsScalesS)
1. Embedded SIMD plasticity processing unit (PPU) (see slide pic)
2. example shown to learning a target firing rate
3. parallel adjustment of neuron parameters.

BrainScaleS Next-Generation : The HICANN-X Chip

Backpropagation activated calcium (BAC) firing for feature binding and recognition
Active dendrites examples
Design of a 5000 wafer system underway

A Spike-Timing Neuromorphic Architecture

Aaron J. Hill (Sandia National Laboratories)

Aaron J. Hill (Sandia National Laboratories)
A Spike-Timing Neuromorphic Architecture

Challenges for neuromorphic hardware:
1. Low computation power
2. Large population of neurons
3. Real time learning
4. High fan in
5. Temporally coded information
Spiking temporal processing unit
1. Support high fidelity spike timing dynamics
2. Simple leaky integrate and fire neuron model with 3 parameters
3. leak rate
4. Temporal buffer for synaptic delays
5. Supports arbitrarily connected networks with configurable weights
Large off chip memory contains all synaptic information
1. pre synaptic neuron designator
2. synaptic weight
3. synaptic delay
4. post-synaptic neuron designator
Spike transfer structure “the heart of the system”
1. spike transfer structure communicates directly to the neuronal processing units
2. “LIF” neuron equations, taking into consideration the temporal buffer
Output spike consolidation
1. three-stage consolidator
2. pipelined efficiency
3. each stage operates in parallel
4. number of stages is a parameter
Hardware development
1. Nallatech 385A FPGA Accelerator Card
2. 2048 neurons in first test bed
3. think they can get 4096, will take manual place and route
4. 32 deep parallel buffer
5. 16MB of synapse memory
  1. 18 bits per weight
  2. 6 bits per delay
  3. 8 bits post synaptic neuron
  4. 2048 X 2048 total synapses
Power measurements on Nallatech 385A
1. ~21 nJ per event for 2048 to 32 neurons
2. (only looked at dynamic power, wanted to remove FPGA overhead)
Software:
1. uses home grown neural modeling tool: “neurons to algorithms”
2. object oriented
3. declarative, not procedural
4. parts are inheritable and extensible
5. back end designed for the STPU
Liquid state machine was tested
1. “can increase accuracy by expanding spike information across time”: synaptic response functions
2. reviewed effects of performance on various transfer functions
3. STPU results on LSM
4. audio signals dataset post processed into cepstral coefficients
  1. 87.3% accuracy on zeros
  2. 84.6 accuracy on ones
5. Particle Image Velocimetry
  1. Computing cross-correlation
6. Spike Optimization
  1. Implement fundamental mathematical operations through temporal coding
    1. spikesort
    2. spikemin, spikemax, spikemedian
    3. spikeOpt(median)
  2. implemented on STPU hardware

Q; how to design how deep to make temporal buffer
A: easy: not a lot of thinking. 32 and 64 come directly from hardware constraint.

Q: what are prospects for coding like an FFT
A: speaker: I dont know. attendant: they are good. we are working on it.

Q: in neural compiler flow, can you use something other than FPGA in mapping. Can representation be standardize.
A: Not sure where we are in this process

Feature Learning using Synaptic Competition in a DynamicallySized

Stanislaw Wozniak (IBM Research, Zurich)

Stanislaw Wozniak (IBM Research, Zurich)
Feature Learning using Synaptic Competition in a DynamicallySized
Neuromorphic Architecture

“the era of data-centric cognitive computing”
1. Challenges: power due to von Neumann architecture
2. Goal: rethink computing
3. neurommorphic
4. phase-change memristors
Talked about abstraction levels across scale, from molecular to whole brain
1. most popular abstraction: artificial neural network
2. this work focuses on biologically realistic networks: stateful models that work in time.
Use model as blue print for hardware.
1. spiking neurons, integrate and fire.
Learning feedback through STDP
1. using phase change memory to implement STDP (see slide pic)
2. use for mixed analog-digital networks
3. digital communication
4. simple and reliable (binary spikes)
5. analog processing
6. efficient using memristors
Prototype uses crossbar array
Use single device to store weight
1. Use Kirchoffs law for operations
Talked about going from one layer to two or more layers on MNIST
Basic overview of learning features using “weather glyfs”
1. Synaptic Feature Learning
2. WTA/lateral inhibition
3. Synaptic competition
  1. Limited resources related to plasticity
  2. WTA dynamics at the synapses
4. “Representation overflow” by using an “overflow neuron” that has equal low magnitude weights.
5. Limitations: by design it limits activation to a single neuron. Results in low F score
  1. we use WTA for learning, but report activity with WTA disabled

Stanislaw Wozniak (IBM Research, Zurich)
Feature Learning using Synaptic Competition in a DynamicallySized
Neuromorphic Architecture

Q: is the phase reversible?
A: can go between two phases, but high conductance to low conductance is abrupt

Q: how many bits can be stored in memristors?
A: I think 4 bits can be stored.

Q: how faithfully can you reproduce intermediate states
A: we care more about logic of execution. we can read back at accuracy that is sufficient

Q: how to stop learning?
A: never sure what new patterns arrive. ideally never stop learning. Note: have they tested method on real-world noisy data?

Achieving swarm intelligence with spiking neural oscillators

Yan Fang

Yan Fang (University of Pittsburgh)
Achieving Swarm Intelligence with Spiking Neural Oscillators

motivation: neuromorphic computing
1. computational model
2. processor architecture
Motivation: swarm intelligence
1. SI algorithm
2. ant colony, firefly, particle swarm, bees
3. applications
  1. optimization
  2. scheduling
  3. path planning
Can we bride two models?
first step: neuromorphic computing for SI algorithm
generalized swarm intelligence algorithms (see slide pic)
use leaky integrate and fire model
prepare an m by n array of neurons to represent each paramter in every agent

Q: any target optimization problem?
A: looking at traveling salesman

Q: have you compared to traditional algorithms?
A: Sort of, compared to other swarm models

Energy Efficient single flux quantum based neuromorphic computing

Mike Schneider

Michael Schneider (National Inst. of Standards and Technology)
Energy Efficient Single Flux Quantum Based Neuromorphic
Computing

review of josephson junction
review of circuit model of JJ
play around with how FM thickness affects critical current density.
magnetic nanoclusters in a josephson junction
“nice analog tunable moment change”
Makes a claim that large-scale “image net” system would reduce to 1 watt, including cooling.

Michael Schneider (National Inst. of Standards and Technology)
Energy Efficient Single Flux Quantum Based Neuromorphic
Computing

“takes a killowatt to cool a watt”

During Q/A:
1. admitted that driving information into the network would be a major challenge that is not at all a solved problem.
1. did not clearly answer how this would be better than a memristive processor.

Improved Deep Neural Network hardware accelerators basd on non-volatle memory: the local gains technique

Geoffrey Burr

AI as driven by DNNs
1. Image recognition, speech recognition, machine translation
Multiply-Accumulate is key operation for DNN
Approach uses pairs of memristor devices for weights/synapses.
want to do both inference and learning via backprop on-chip
1. need highly parallel on-chip circuitry
“our existing phase change memristor are not symmetrical enough”
1. need to invent new RRAM materials
2. we need “tricks”. Thats my job: to come up with tricks.
We are going to have a grid that sort of looks like True North.
We now can get hardware results which are exactly equivalent to software
Local Gains Technique
1. Attempted to redo previous work and duplicate findings, found it did not match.
2. Looked into various reasons why, like node activation function, could not find reason
3. Found the reason:
4. “I had succeeded with my experiment with a particular configuration, I tried it twenty times and one of those times it worked. I had one weekend left to write the paper, and so I wrote this simulator that wrapped around that one particular configuration and learning rate and hyperbolic tangent space. When the students came back they optimized both of those.”
Local Gains Technique
1. Old technique. related to momentum, found out about it on a Hinton video.
2. Encourage weights that have a ‘direction’
3. Discourage weights that cant make up their mind.
Local Gains does not work for typical computer scientists
1. too hard to tune parameters.
2. may work for PCM engineer, since PCM has a non-linear response in incrementation.
3. The issue is that many weights are being pulled up and down, and if those do not balance then you end up in trouble.
We did a nice study of coefficients, seems tolerant to many settings.
We have a thing called a “safety margin”.
1. (appears related to classification margin)

Geoffrey Burr (IBM Research, Almaden)
Improved Deep Neural Network Hardware Accelerators Based on
Non-Volatile-Memory: the Local Gains Technique

A Comparison Between Single Purpose and Flexible Neuromorphic Processor Designs

David Mountain (US Department of Defense)

David Mountain (US Department of Defense)
A Comparison Between Single Purpose and Flexible
Neuromorphic Processor Designs

Benchmarks:
1. mnist
2. CSlite-malware detection application
3. very digital, nothing “neuromorphic”
4. back end detector like a classifier
5. 5800 neurons in 6 layers
6. AES-256
7. pure digital
8. 12500 neurons in 8 layers
memristor array architecture
1. Neuron type is a multiply accumulate with threshold gate neuron.
2. Uses differential pair synapses for weights.

David Mountain (US Department of Defense)
A Comparison Between Single Purpose and Flexible
Neuromorphic Processor Designs

Tile concept, bypass comparators and pass to next tile (’tile feature’).
Tile design is flexible, since array size is limiting to application
Also tested hierarchical routing.
1. All-to-All (A2A) in a tree architecture (based on functional ASIC design in 90nm)
Evaluation Methodology
1. Calculate the area for each design
2. Calculate worst-case timing for each design
3. calculate power for each design
4. calculate T/W (throughput/watt) and T/A (throughput/area
Conclusions
1. Flexible design using tiles and A2A switch is practical compared to special-purpose design.
2. the tile feature clearly provides value
3. A2A network is superior to 2D mesh.

In-Memory Execution of Compute Kernels using flow-based memristirve crossbar computing

Dwaipayan Chakraborty

Objective: automated synthesis of C program, translate to crossbar execution.
1. restricted sub-set of lanaguage because you cant do everything with crossbars
problem: “every crossbar is different”. have to make specialized design for every function
utilizing snea kpaths for computation
1. any boolean formula can be mapped to a crossbar using flow-based computing
2. Advantages
3. Exploits non-volatility of memristors
4. leverages flow through the crossbar structure
5. fast & energy efficient in many cases
6. Disadvantages
7. designing flow-based computing circuits may be computationally hard.
8. intuitive understanding of flow-based computing is difficult.

Sumit Kumar Jha (University of Central Florida )
Flow-based Non-volatile Memory Crossbar Accelerators for
Parallel Computations

For more information see this video

VoiceHD: Hyperdimensional Computing for Efficient Speech Recognition

Mohsen Imani (University of California, San Diego)

Deep learning is changing our lives
power consumption is out of control
1. Lee Sedol is 50,000X more efficient than alpha go
Energy efficiency
1. mobile: battery
2. cloud: cost
Hardware scalability
Hyperdimensional Computing
1. take all images of cats (or any other datatype) and “encode a hypervector”
2. each hypervector is a model of cat or dog
3. look for similarity in hypervectors
4. any data is encoded and compressed
5. encoding is reversible (can reconstruct input from encoded vector)
6. can be implemented in DRAM

Embedding in Neural Networks: A-priori Design of Hybrid Computers for Prediction

Bicky Marquez (Institute FEMTO-ST)

Classification using delayed phontonic systems
generally going to higher dimensional allows linear separation
if you can satisfy these, you have a reservoir computer:
1. approximation property
2. separation property
3. fading memory property
Instead using random network, they use a delayed (ring) system
spoken digit recognition, TI-46 corpus, spoke digits from zero to nine
1. only have 1000 neurons, but can achieves a performance of 1 million digit recognitions per second
predict chaotic system (mackey-glass system)
1. predictable up to a time, unpredictable after
random recurrent neural networks, their system worked better (predicted longer out)

Convolutional Drift Networks for Spatio-Temporal Processing

Dillon Graham (Rochester Institute of Technology)

large volume of video data is being generated
current approach are very task-specific, costly to train or both
1. hand crafted features are commonly used
2. Our approach: develop a new neural net architecture with properties
3. capable of video activity classification
4. general architecture
5. minimal training cost
6. classification performance competitive with SoTA
Experiment
1. can our new approach perform competitively?
2. task: video level activity classification
3. data: two first person video datasets
Combine deep learning and reservoir computing to build a powerful and efficient NN architecture for spatiotemporal data
1. observation: all reservoirs converged to same accuracy, but bigger reservoir converged faster

A New Approach for Multi-Valued Computing Using Machine

Learning

Wafi Danesh (University of Missouri, Kansas City)

moving beyond CMOS
emergence of an array of devices to continue scaling
1. need to leverage intrinsic properties
2. beyond CMOS device have unique intrinsic characteristics
  1. more than a switch
  2. analog/multi state
  3. ultra low power
  4. inherently non volatile
  5. different domains (magnetic/electric/magnetoelectric
Proposed MVL (multi-valued logic) synthesis approach
1. Random MVL function decomposed to a set of linear equations
2. Adaptable to any technology
3. Scaled with circuit size
New MVL algorithm
1. 3 steps
  1. domain selection
  2. linear regression
  3. pattern matching
Quaternary Multiplier example

Thursday Nov. 9

On Thermodynamics and the Future of Computing

Todd Hylton (University of California, San Diego)

Todd Hylton (University of California, San Diego)
On Thermodynamics and the Future of Computing

A different way to think about what we do in the field
1. how we might really reboot computing in a different way
The primary problem is that computers cannot organize themselves
1. they have narrow focus, rudimentary AI capabilities
Our mechanistic approach to the problem
1. machines are the sum of their part
2. machines are disconnected from the world except hough us
The world is not machine
Thermodynamic Computing Hypothesis
Why Thermodynamics?
1. its universal
2. its temporal
3. its efficient
Thermodynamics drives the evolution of everything
Thermodynamic evolution is the missing, unifying concept in computing (and many other domains)
Electronic systems are well suited for thermodynamic evolution
Thermodynamics should be the principle concept in future computing systems
Review of thermodynamics in closed and open systems
Examples of self organization in nature (Alex Nugent’s slide from DARPA days)
Arbortron video from Stanford
Life and modern electronic systems comparison

Todd Hylton (University of California, San Diego)
On Thermodynamics and the Future of Computing

Thermodynamic Evoltuion Hypothesis
1. Thermodynamic evolution supposes that all organization spontaneously emerges in order to use sources of free energy in the universe and that there is competition for this energy.
2. Thermodynamic evolution is second law, except that it adds the idea that in order for entropy to increase an organization must emerge that makes it possible to access free energy.
3. The first law of thermodynamics implies that there is competition for free energy
Basic proposed demonstration architecture
1. evolvable cores in a network that evolve to move energy from source to sink
Thermodynamic Bit review
Related ideas
1. Free energy principle
2. Thermodynamics of Prediction
3. Causal Entropic Forcing

More information on Thermodynamic Computing

A Unified Hardware/Software Co-Design Framework for

Neuromorphic Computing Devices and Applications

James Plank (University of Tennessee, Knoxville)

James Plank (University of Tennessee, Knoxville)
A Unified Hardware/Software Co-Design Framework f

Software Core
1. spiking, highly recurrent NN
2. implements common functionality or provides and interface for other components to implement
Application Device operations
1. load network
2. serialize network
3. apply input spikes
4. read output spikes
5. run
“Applications want to use numerical events, devices want to use spikes”
1. the core provides support for converting application<->device
Devices/Architectures
1. NIDA
2. DANNA
3. mrDANNA
Applications
1. Control
2. Classification
3. Security
4. Micro-Applications
Mr. DANNA
1. Halfnium Oxide devices
2. Differential Pair memristors (2-1 configuration)
3. Cadence-Spectre simulations
4. 8 synapse neurons

Impact of Linearity and Write Noise of Analog Resistive Memory Devices in a Neural Algorithm Accelerator

Robin Jacobs-Gedrim (Sandia National Laboratories)

Robin Jacobs-Gedrim (Sandia National Laboratories)
Impact of Linearity and Write Noise of Analog Resistive
Memory Devices in a Neural Algorithm Accelerator

Example of Google deep learning study
energy/space comparison
Review of matrix vector multiply with crossbar
Symmetry/Asymmetry of pulsing incrementation of memristors
1. How much effect does this have on accuracy?

Robin Jacobs-Gedrim (Sandia National Laboratories)
Impact of Linearity and Write Noise of Analog Resistive
Memory Devices in a Neural Algorithm Accelerator

Paper: Resistive memory device requirements for a neural algorithm accelerator

Comparison of various types of memristors
1. SiO2-Cu
2. TaOx
3. Ag-Chalcogenide varient (likely from ASU?)
All devices had a nonlinear response that negatively impacted accuracy
1. TaOx has lowest noise
High resistance devices fabricated with an “Al2O3 Bilayer”
1. integrated tunneling barrier to increase resistance
2. devices needed to be formed
3. poor/noisy incrementation response
Differential pair memristor representation

Robin Jacobs-Gedrim (Sandia National Laboratories)
Impact of Linearity and Write Noise of Analog Resistive
Memory Devices in a Neural Algorithm Accelerator

An Energy-Efficient Mixed-Signal Neuron for Inherently ErrorResilient

Neuromorphic Systems

Baibhab Chatterjee (Purdue University)

Baibhab Chatterjee (Purdue University)
An Energy-Efficient Mixed-Signal Neuron for Inherently ErrorResilient
Neuromorphic Systems

Can we reduce Multiply Accumulate (MAC) energy by 100X?
what are the Bottlenecks? Could Be:
1. energy for computation
2. energy for communication
3. memory fetch energy
4. architecture/algorithms
Digital MAC Review
1. number of transistors increase quickly as number of bits increases
2. power at low frequency is dominated by leakage
3. high frequency is dynamic power
Mixed signal MAC
1. 1000X better at low frequency, 100X at higher frequency
2. higher noise, non-linearities are tradeoffs

Baibhab Chatterjee (Purdue University)
An Energy-Efficient Mixed-Signal Neuron for Inherently ErrorResilient
Neuromorphic Systems

“Noise Factor” analysis developed and used to evaluate in simulations against multiple benchmarks, relating noise factor to performance degredation.

Borrowing from Nature to Build Better Computers

Prof. Luis Ceze (University of Washington)

Molecular Data Storage
Stored “this too shall pass” video in DNA, and other stuff related to DNA as a storage medium.

Rebooting the Data Access Hierarchy in Computing Systems

Wen-mei Hwu (University of Illinois, UrbanaChampaign)

data access challenges
IBM Illinois Erudite project
The reason some apps cant be run on GPU is a data access problem.
Volta/HBM2, 900GB/s bandwidth,225 Giga SP operands/cycle
1. each operands must be used 62.3 times once fetched to achieve peak FLOPS rate
2. sustain<1.6% of peak without data reuse
Volta-Host DDR3
1. operands must be used 700 times once fetch to achieve peak FLOP
2. sustain<.14% peak without data reuse
Volta-FLASH, 16GB/S PCIe3
1. operands must be used 3507 times once fetched to achieve peak FLOPS
2. sustain .03% of peak without data reuse
Large Problem challenge
1. solving larger problems motivates continued growth of computing capability
  1. inverse solvers for science and engineering apps
  2. matrix factorization and graph traversal for analytixs
2. as problems size grows
  1. fast, low complexity algorithms win
  2. sparsity increases, iterative methods win
  3. data reuse diminished
Erudite Project
1. Work done at IBM Illinois C3SR
computation types
1. low-complexity iterative solver algorithms
2. graphs analytics
  1. inference, search, counting
3. large cognitive application
  1. large multi-model classifiers
to achieve performance:
1. elimination of file-system software overhead for engaging in large datasets
2. placement of computation appropriately in the memory and storage hierarchy
3. highly optimized kernel synthesis
4. collaborative heterogeneous acceleration
Step 1: remove file system from data access path, get ride of storage
Step 2: place NMA compute inside memory system, 100+ GFLOPS NMA compute into DDR/Flash memory system (~10TBs)
1. Erudite NMA board 1.0
  1. develop a principled methodology for acceleration
  2. throughput proportional to capacity
    1. 1 GFLOPS/10GB sustained
    2. 100 GFLOPS sustained
Step 3: collaborative heterogeneous computing (Chai)
Unacceptable latency moving data between CPU and GPU.
Research Agenda
1. Package-level integration
  1. optical interconnects in package?
  2. collaboration support for heterogeneous devices
  3. virtual address translation
2. System software revolution
  1. persistent objects for multi-language environments
  2. directory and mapping of very large persistent objects
3. power consumption in memory
  1. much higher memory level parallelism needed for flash based memories
  2. latency vs. throughput oriented memories
Conclusion and Outlook
1. drivers for computing capabilities
  1. large-scale inverse problems with natural data inputs
  2. machine learning based applications
2. Erudite cognitive computing systems project
  1. removing file system bottleneck from access paths to large datasets
  2. placing compute into the appropriate levels of the memory system hierarchy
  3. memory parallelism proportional to the data capacity
  4. collaborative nms execution with CPU and GPUs
  5. >100x improvement in power efficiency and performance

“sparse is more of our target than dense”

The Superstrider Architecture: Integrating Logic and Memory towards non-von Neumann Computing

Sriseshan Srikanth (Georgia Institute of Technology)

Sriseshan Srikanth (Georgia Institute of Technology)
The Superstrider Architecture: Integrating Logic and Memory
towards non-von Neumann Computing

superstrider: geared toward processing sparse data streams
moving data is expensive, time and energy
logic-memory integration,
1. vertical integration with 3D stacking
2. micron hybrid memory cube

Sriseshan Srikanth (Georgia Institute of Technology)
The Superstrider Architecture: Integrating Logic and Memory
towards non-von Neumann Computing

superstrider
1. acceleration reduction of sparse data streams using logic-memory 3D integration
2. accumulation of phase of SpGEMM
  1. near perfect cache mis-rate for sparese data
3. data organized in binary tree
4. key principles
  1. memory rows organized as a binary tree with k records (sorted by key_ and a pivot(key) per row
  2. memory access and computation granularity:
    1. 1 memory row of sorted records
    2. SIMD style operation tightly integrated with wide memory words
  3. Sorted invariant
    1. two N/2 length pre-sorted vectors can be merged in log2(N) stages
    2. novel algorithms
5. Long examples of storing sparse data pairs given

NNgine: Ultra-Efficient Nearest Neighbor Accelerator Based on In-Memory Computing

Mohsen Imani (University of California, San Diego)

Mohsen Imani (University of California, San Diego)
NNgine: Ultra-Efficient Nearest Neighbor Accelerator Based on
In-Memory Computing

Big data processing with general purpose processing
1. can todays system process big data?
2. [cores/memory with separation shown]
Cost of memory access is much higher than computation
1. DRAM read: 640pJ
2. 8b add: .03pJ
3. DRAM consume 170x more energy than FPU mult
Processing in memory (PIM)
1. perform a part of computation tasks inside the memory
Supporting in-memory operations
1. bitwise
  1. OR, AND, XOR
2. Search operation
  1. nearest search
  2. clustering
  3. classification
  4. database
3. Addition Multiplication
  1. matrix multiplications
  2. deep learning

Mohsen Imani (University of California, San Diego)
NNgine: Ultra-Efficient Nearest Neighbor Accelerator Based on
In-Memory Computing

Related publications

Mohsen Imani (University of California, San Diego)
NNgine: Ultra-Efficient Nearest Neighbor Accelerator Based on
In-Memory Computing

Crossbar NOR operation example
1. voltage divider with lower elements encoding each input. if any element is on, will 1. pull down voltage and you get OR logic operation.
NORE based addition
Fast In-Memory Addition
1. add multiple numbers in four stages
  1. additions in the same stage of execution are independent and can occur in parallel
  2. speed up comes at th cost of increased energy consumption and number of writes in memory
  3. last state dominates the addition latency.
NNgine: KNN accelerator
1. nearest neighbor search accelerator
2. performing the search operation inside the memory next ot DRAM
kNN review
kNN in GPU performance review, execution time, cache hit, dataset size
Content addressable memory (CAM)
1. Nearest Neighbor search
  1. find similar rows in parallel using hammming distance criteria
  2. count HDs based on timing characteristics of discharging current
NNAM Architecture overview
1: DVS: applies voltage over scaling to block
any row with the higher match bits discharges first
2: NN detector: sense the number of matched lines and notify the controller
3: Controller dynamically adjust voltage, based on the number of discharge row
Use a simple analog design to sample match lines, accumulate currents, compare to thresholds
NNgine energy improvement: 349X
AdaBoost acceleration

Socrates-D: Multicore Architecture for On-line Learning

Tarek Taha (University of Dayton)

have a system that can learning continuously at low power. uses:
1. robotics
2. personal devices
3. low power deployed systems
what is done now: Continuous training on cloud
1. requires network access
2. privacy risk
3. slow and high energy
Multicore near memory computations
both learning and inference capability
overview of backprop
on the forward pass, the forward matrix is large. in backward pass we have to access in transpose form
1. to handle transpose matrix we have two weight matrix memories for forward and transposed forms
2. dual matrix approach not necessary, but runtime is much longer and power consumption is lower.
Static Routine and Dynamic Routing implemented in simulations.
1. Static Routing is significantly more efficient, but a problem:
2. connections are dedicated, which will block routining through cores.
3. solved this problem with time multiplexing
distribute large networks across cores by partitioning network and using another core to add partial sums from partitioned cores.

Computing Based on Material Training: Application to Binary

Classification Problems

Eleonore Vissol-Gaudin (Durham University)

Eleonore Vissol-Gaudin (Durham University)
Computing Based on Material Training: Application to Binary
Classification Problems

Use evolutionary algorithms
explore and exploit unconfigured materials
perform a computation
computer-hardware interface (custom made)-material
materials have a non-linear input output behavior
non-biological material is considered better
used carbon nanotubes dispersed in liquid crystal
treat training as an optimization function
1. define an objective function
2. define a set of decision variables
supervised learning approach
1. divide dataset into training and verification
2. send the training set to black box
3. modify configuration signals
signal are controlled by an evolutionary algorithm
use custom motherboard “evolvable motherboard”
liquid crystal provides non-conductive medium for carbon nanotubes, which form percolation paths between electrodes during applicaiton of voltage across electrodes

Eleonore Vissol-Gaudin (Durham University)
Computing Based on Material Training: Application to Binary
Classification Problems

evolutionary algorithm
1. stochastic
2. derivation-free
3. iterative
Differential Evolution
1. selection, crossover, mutation
Questions they want to answer:
1. can the material classify dataset restrain after being retrained
2. can it provide solution comparable to ML techniques
use artificial binary 2D datasets
1. linear
2. non-linear
used Fischer criteria to gauge complexity of classification
Main observations
1. contribution of material state to classifier non-negligible
2. the same sample can be retrained for at least two problems
3. modification in the state do not fully destroy original solution
applied to two data sets
1. worse on mamographic mass dataset
2. bupa liver disorder data are comparable to NN

Nonlinear Dynamics and Chaos for Flexible, Reconfigurable

Computing

Benham Kia (North Carolina State University)

New reconfigurable hardware that be instantly reprogrammed to implement many different functions
nonlinearity as a source of variability
1. simple nonlinear system can exhibit diverse, complex behaviors
Example: songbird
1. reference songbird paper: M. Fee, et al, Nature 395 1998
2. the vocal organ of birds are nonlinear cavity that produce complex song.
3. by changing the input parameters, circuit produces different songs
Chaos is not random
Chaos=infinite number of unstable “modes” without any stable condition
Paper title: “a simple nonlinear circuit contains an infinite number of functions”
1. reference table 1 in paper.
“a dynamic system is nothing more than an embodiment of a function”
Used Mosis to fabricate a series of circuits (four generations), starting in 2014.
It can implement a different instruction at each clock cycle with special thanks to instant re programmability
1. enabled application: analog to digital conversion and noise filtering
2. analog input–>(control)–>digital sequence
Focus application:
1. extend typical signal processing chain to:
  1. filter noise from analog signal
  2. convert to analog signal to digital
  3. perform reconfigurable computing
  4. implement multiplication efficiently
  5. evolve or adapt
  6. all with minimal power and silicon area

A Thermodynamic Treatment of Intelligent Systems

Natesh Ganesh (University of Massachusetts, Amherst)

what are the thermodynamic conditions under which physical systems learn?
discussed the difference between ‘self assembly’ (non-dissipative) vs ‘self-organized’ (dissipative, non-equilibrium)
if you remove energy source, the structure decays in a self-organized system
review of fluctuation theorems for non-equilibrium thermodynamics dissipation in finite state automata
what is the relationship between dissipation and intelligence?
1. intelligence: can use past input to prediction future inputs
average change in total entropy of system and bath = fluctuations about the mean
under the right conditions: learning is synonymous with the energy efficient dynamics of the system
thermodynamic computing
1. a new engineering paradigm that will combine thermdynamics and information theory
highlighted UCLA silver nanowire work.

IEEE International Conference on Rebooting Computing (ICRC 2017)

Overview

Wednesday Nov. 8

DARPA’s Vision for the Future of Computing

Dr. Hava Siegelmann (DARPA MTO)

Continuously learning systems with high biological realism

Karlheinz Meier

A Spike-Timing Neuromorphic Architecture

Aaron J. Hill (Sandia National Laboratories)

Feature Learning using Synaptic Competition in a DynamicallySized

Stanislaw Wozniak (IBM Research, Zurich)

Achieving swarm intelligence with spiking neural oscillators

Yan Fang

Energy Efficient single flux quantum based neuromorphic computing

Mike Schneider

Improved Deep Neural Network hardware accelerators basd on non-volatle memory: the local gains technique

Geoffrey Burr

A Comparison Between Single Purpose and Flexible Neuromorphic Processor Designs

David Mountain (US Department of Defense)

In-Memory Execution of Compute Kernels using flow-based memristirve crossbar computing

Dwaipayan Chakraborty

VoiceHD: Hyperdimensional Computing for Efficient Speech Recognition

Mohsen Imani (University of California, San Diego)

Embedding in Neural Networks: A-priori Design of Hybrid Computers for Prediction

Bicky Marquez (Institute FEMTO-ST)

Convolutional Drift Networks for Spatio-Temporal Processing

Dillon Graham (Rochester Institute of Technology)

A New Approach for Multi-Valued Computing Using Machine

Wafi Danesh (University of Missouri, Kansas City)

Thursday Nov. 9

On Thermodynamics and the Future of Computing

Todd Hylton (University of California, San Diego)

A Unified Hardware/Software Co-Design Framework for

James Plank (University of Tennessee, Knoxville)

Impact of Linearity and Write Noise of Analog Resistive Memory Devices in a Neural Algorithm Accelerator

Robin Jacobs-Gedrim (Sandia National Laboratories)

An Energy-Efficient Mixed-Signal Neuron for Inherently ErrorResilient

Baibhab Chatterjee (Purdue University)

Borrowing from Nature to Build Better Computers

Prof. Luis Ceze (University of Washington)

Rebooting the Data Access Hierarchy in Computing Systems

Wen-mei Hwu (University of Illinois, UrbanaChampaign)

The Superstrider Architecture: Integrating Logic and Memory towards non-von Neumann Computing

Sriseshan Srikanth (Georgia Institute of Technology)

NNgine: Ultra-Efficient Nearest Neighbor Accelerator Based on In-Memory Computing

Mohsen Imani (University of California, San Diego)

Socrates-D: Multicore Architecture for On-line Learning

Tarek Taha (University of Dayton)

Computing Based on Material Training: Application to Binary

Eleonore Vissol-Gaudin (Durham University)

Nonlinear Dynamics and Chaos for Flexible, Reconfigurable

Benham Kia (North Carolina State University)

A Thermodynamic Treatment of Intelligent Systems

Natesh Ganesh (University of Massachusetts, Amherst)

Knowm Inc: The World...

Memristor Crossbars ...

Memristor Crossbars ...

Leave a Comment Cancel reply