At Knowm, we are building a new and exciting type of computer processor to accelerate machine learning (ML) and artificial intelligence applications. The goal of Thermodynamic-RAM (kT-RAM) is to run general ML operations, traditionally deployed to CPUs and GPUs, to a physically-adaptive analog processor based on memristors which unites memory and processing. If you haven’t heard yet, we call this new way of computing “AHaH Computing”, which stands for Anti-Hebbian and Hebbian Computing, and it provides a universal computing framework for in-memory reconfigurable logic, memory, and ML. While we have shown a long time ago that AHaH Computing is capable of solving problems across many domains of ML, we only recently figured out how to use the kT-RAM instruction set and low precision/noisy memristors to build supervised and unsupervised compositional (deep) ML systems. Our method does not require the propagation of error algorithm (Backprop) and is easy to attain with realistic analog hardware, including but not limited to memristors. This blog post and the research behind it is motivated by the fact that we need to compare our new approach apples-to-apples with existing deep learning approaches, looking at both primary metrics (accuracy, error, etc.) and secondary metrics (power, time, size).
Problems with Deep Neural Networks
Today’s deep learning models are neural networks, multiple layers of parameterized differentiable nonlinear modules that can be trained by back propagation of error.
- Requires massive amounts of labeled training data
- Requires extreme compute environments, limited primarily to behemoth companies/governments.
- Models are complicated and have high number of hyper parameters, making task an art rather than engineering
Geoffrey Hinton ML Pioneer Says We Need Another Approach
It’s always reassuring to hear other people in the ML community make statements that echo what we’ve been saying from the beginning!
My view is throw it all away and start again
In 1986, Geoffrey Hinton co-authored a paper that, three decades later, is central to the explosion of artificial intelligence. But Hinton says his breakthrough method should be dispensed with, and a new path to AI found.
Speaking with Axios on the sidelines of an AI conference in Toronto on Wednesday, Hinton, a professor emeritus at the University of Toronto and a Google researcher, said he is now “deeply suspicious” of back-propagation, the workhorse method that underlies most of the advances we are seeing in the AI field today, including the capacity to sort through photos and talk to Siri. “My view is throw it all away and start again,” he said.
The bottom line: Other scientists at the conference said back-propagation still has a core role in AI’s future. But Hinton said that, to push materially ahead, entirely new methods will probably have to be invented. “Max Planck said, ‘Science progresses one funeral at a time.’ The future depends on some graduate student who is deeply suspicious of everything I have said.”
How it works: In back propagation, labels or “weights” are used to represent a photo or voice within a brain-like neural layer. The weights are then adjusted and readjusted, layer by layer, until the network can perform an intelligent function with the fewest possible errors.
But Hinton suggested that, to get to where neural networks are able to become intelligent on their own, what is known as “unsupervised learning,” “I suspect that means getting rid of back-propagation.”
“I don’t think it’s how the brain works,” he said. “We clearly don’t need all the labeled data.”
Deep Learning Frameworks Review
Here, we reviewed most if not all the currently available deep learning frameworks as potential candidates for extension as well as comparison with our approach.
Framework | Language | Founder/Backer | Description | Github | License | CuDNN |
---|---|---|---|---|---|---|
TensorFlow | Python | Computation using data flow graphs for scalable machine learning | github | Apache | Y | |
chainer | Python | Preferred Networks | A flexible framework of neural networks for deep learning | github | MIT | Y |
Paddle | C++/Python | Baidu | PArallel Distributed Deep LEarning | github | Apache | Y |
dsstne | C++ | Amazon | Deep Scalable Sparse Tensor Network Engine (DSSTNE) is a library for building Deep Learning (DL) machine learning (ML) models | github | Apache | N |
CNTK | C++/Python | Microsoft | Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit | github | MIT | Y |
Theano | Python | University of Montreal | Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. | github | BSD | Y |
keras | Python | – | Deep Learning library for Python. Runs on TensorFlow, Theano, or CNTK. | github | MIT | Y |
Lasagne | Python | – | Lightweight library to build and train neural networks in Theano | github | MIT | Y |
blocks | Python | – | A Theano framework for building and training neural networks | github | MIT | Y |
h2o-3 | Java/Python | H2O.ai | Open Source Fast Scalable Machine Learning API For Smarter Applications (Deep Learning, Gradient Boosting, Random Forest, Generalized Linear Modeling (Logistic Regression, Elastic Net), K-Means, PCA, Stacked Ensembles…) | github | Apache | Y |
marvin | C++ | Princeton University | A Minimalist GPU-only N-Dimensional ConvNets Framework | github | MIT | Y |
caffe | C++ | UC Berkeley | a fast open framework for deep learning. | github | ? | Y |
mxnet | Python/C++ | Apache | Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more | github | Apache | Y |
neon | Python | Intel | Intel Nervana reference deep learning framework committed to best performance on all hardware | github | Apache | Y |
torch7 | Lua | Torch is a scientific computing framework with wide support for machine learning algorithms that puts GPUs first. | github | BSD | Y | |
pytorch | Python | – | Tensors and Dynamic neural networks in Python with strong GPU acceleration | github | BSD | Y |
caffe2 | C++/Python | – | Caffe2 is a lightweight, modular, and scalable deep learning framework. | github | BSD | N |
dynet | C++ | – | The Dynamic Neural Network Toolkit | github | Apache | N |
BigDL | Scala | Intel | BigDL: Distributed Deep Learning Library for Apache Spark | github | Apache | ? |
systemml | Java | IBM | SystemML is a flexible, scalable machine learning system. | github | Apache | ? |
mahout | Scala | – | The Apache Mahoutâ„¢ project’s goal is to build an environment for quickly creating scalable performant machine learning applications. | github | Apache | ? |
scikit-learn | Python | – | machine learning in Python | github | BSD | N |
leaf | Rust | Autumn | Open Machine Intelligence Framework for Hackers. (GPU/CPU) | github | MIT and Apache | Y |
deeplearning4j | Java | Skymind | Deep Learning for Java, Scala & Clojure on Hadoop & Spark With GPUs | github | Apache | Y |
jubatus | C++/Python | – | Framework and Library for Distributed Online Machine Learning | github | LGPL | N |
MNIST on Many Frameworks Comparison
After broadly reviewing all frameworks we narrowed our focus down to a short list including:
- Neural Networks and Deep Learning
- TensorFlow
- DL4J
- PyTorch
- CNTK
- Caffe2
- Torch7
In order to get a rough feeling for the various frameworks that we are interesting in leveraging for our own deep learning framework, we decided to get to know each framework from our short list by running the MNIST benchmark. We chose this benchmark because it’s one of the very first benchmarks that most people run as an intro to machine learning, the “hello world” of machine learning. There are many tutorials and help available. We will take a look the primary and secondary performance metrics, take additional notes along the way and rate each framework. We will run them on a Macbook Pro, and also on a Linux system with a GPU.
Neural Networks and Deep Learning by Michael Nielsen
This book does a wonderful job at teaching the concepts of neural networks and the back propagation of error algorithm. In later chapters it goes into deep learning. For most of the chapters there is code that you can look at and run. We used an updated version of the source code, adapted for Python 3.
Pre-requisites
Python et. al
1 2 3 4 5 |
brew install git brew install python3 pip3 install numpy pip3 install theano |
Run
1 2 3 4 5 |
cd ~/MNIST git clone git@github.com:MichalDanielDobrzanski/DeepLearningPython35.git cd DeepLearningPython35 python3 test.py |
By default, the 3rd network, network3.py
, is run. This network is the convolutional deep neural network described in the book. If you need to run other networks, you’ll have to uncomment/comment the correct sections in test.py
.
Results
Deep Convolutional network: Input(28×28) ==> ConvPool(5×5,2×2) ==> ConvPool(5×5,2×2) ==> FullyConnected(100) ==> Softmax(10)
Time | Accuracy | Epochs | Suffer Score |
---|---|---|---|
2 Hours | 99.13 % | 59 | 2/5 |
Suffer Score comment: Some Python-related errors needed to be dealt with.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
... Epoch 56: validation accuracy 99.03% Training mini-batch number 285000 Training mini-batch number 286000 Training mini-batch number 287000 Training mini-batch number 288000 Training mini-batch number 289000 Epoch 57: validation accuracy 99.03% Training mini-batch number 290000 Training mini-batch number 291000 Training mini-batch number 292000 Training mini-batch number 293000 Training mini-batch number 294000 Epoch 58: validation accuracy 99.02% Training mini-batch number 295000 Training mini-batch number 296000 Training mini-batch number 297000 Training mini-batch number 298000 Training mini-batch number 299000 Epoch 59: validation accuracy 99.04% Finished training network. Best validation accuracy of 99.07% obtained at iteration 199999 Corresponding test accuracy of 99.13% |
Deep Learning 4 J
Deeplearning4j is a domain-specific language to configure deep neural networks, which are made of multiple layers. Everything starts with a MultiLayerConfiguration, which organizes those layers and their hyperparameters. Hyperparameters are variables that determine how a neural network learns. They include how many times to update the weights of the model, how to initialize those weights, which activation function to attach to the nodes, which optimization algorithm to use, and how fast the model should learn.
Pre-requisites
Java, Maven
1 2 3 4 |
brew install git brew cask install java brew install maven |
Run
1 2 3 4 5 6 7 8 |
cd ~/MNIST git clone git@github.com:deeplearning4j/dl4j-examples.git cd dl4j-examples nano dl4j-examples/src/main/java/org/deeplearning4j/examples/convolution/LenetMnistExample.java mvn clean package -Djavacpp.platform=macosx-x86_64 ./runexamples.sh Enter a number for the example to run: 4 |
Results
Deep Convolutional network: Input(28×28) ==> ConvPool(5×5,2×2) ==> ConvPool(5×5,2×2) ==> FullyConnected(500) ==> Softmax(10)
Time | Accuracy | Epochs | Suffer Score |
---|---|---|---|
30 Minutes | 98.42 % | 58 | 2/5 |
Suffer Score comment: Maven took a long time to download dependencies, was inconvenient to have to change the number of epochs.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
... o.d.o.l.ScoreIterationListener - Score at iteration 54403 is 0.0033962750003584327 o.d.e.c.LenetMnistExample - *** Completed epoch 0 *** o.d.e.c.LenetMnistExample - Evaluate model.... o.d.e.c.LenetMnistExample - Examples labeled as 0 classified by model as 0: 973 times Examples labeled as 0 classified by model as 1: 1 times Examples labeled as 0 classified by model as 2: 2 times Examples labeled as 0 classified by model as 6: 1 times Examples labeled as 0 classified by model as 7: 1 times Examples labeled as 0 classified by model as 8: 2 times Examples labeled as 1 classified by model as 1: 1126 times Examples labeled as 1 classified by model as 2: 1 times Examples labeled as 1 classified by model as 3: 1 times Examples labeled as 1 classified by model as 5: 1 times Examples labeled as 1 classified by model as 6: 2 times Examples labeled as 1 classified by model as 7: 1 times Examples labeled as 1 classified by model as 8: 3 times Examples labeled as 2 classified by model as 0: 2 times Examples labeled as 2 classified by model as 1: 1 times Examples labeled as 2 classified by model as 2: 1016 times Examples labeled as 2 classified by model as 3: 1 times Examples labeled as 2 classified by model as 4: 1 times Examples labeled as 2 classified by model as 6: 4 times Examples labeled as 2 classified by model as 7: 4 times Examples labeled as 2 classified by model as 8: 3 times Examples labeled as 3 classified by model as 2: 3 times Examples labeled as 3 classified by model as 3: 982 times Examples labeled as 3 classified by model as 5: 10 times Examples labeled as 3 classified by model as 7: 5 times Examples labeled as 3 classified by model as 8: 10 times Examples labeled as 4 classified by model as 4: 979 times Examples labeled as 4 classified by model as 6: 1 times Examples labeled as 4 classified by model as 7: 1 times Examples labeled as 4 classified by model as 9: 1 times Examples labeled as 5 classified by model as 0: 2 times Examples labeled as 5 classified by model as 3: 3 times Examples labeled as 5 classified by model as 5: 881 times Examples labeled as 5 classified by model as 6: 2 times Examples labeled as 5 classified by model as 7: 1 times Examples labeled as 5 classified by model as 8: 3 times Examples labeled as 6 classified by model as 0: 10 times Examples labeled as 6 classified by model as 1: 3 times Examples labeled as 6 classified by model as 3: 1 times Examples labeled as 6 classified by model as 4: 2 times Examples labeled as 6 classified by model as 5: 4 times Examples labeled as 6 classified by model as 6: 938 times Examples labeled as 7 classified by model as 1: 5 times Examples labeled as 7 classified by model as 2: 11 times Examples labeled as 7 classified by model as 3: 1 times Examples labeled as 7 classified by model as 4: 2 times Examples labeled as 7 classified by model as 7: 1007 times Examples labeled as 7 classified by model as 8: 1 times Examples labeled as 7 classified by model as 9: 1 times Examples labeled as 8 classified by model as 0: 2 times Examples labeled as 8 classified by model as 2: 1 times Examples labeled as 8 classified by model as 3: 1 times Examples labeled as 8 classified by model as 4: 1 times Examples labeled as 8 classified by model as 5: 1 times Examples labeled as 8 classified by model as 6: 1 times Examples labeled as 8 classified by model as 7: 4 times Examples labeled as 8 classified by model as 8: 960 times Examples labeled as 8 classified by model as 9: 3 times Examples labeled as 9 classified by model as 0: 5 times Examples labeled as 9 classified by model as 1: 3 times Examples labeled as 9 classified by model as 3: 2 times Examples labeled as 9 classified by model as 4: 7 times Examples labeled as 9 classified by model as 5: 2 times Examples labeled as 9 classified by model as 7: 7 times Examples labeled as 9 classified by model as 8: 3 times Examples labeled as 9 classified by model as 9: 980 times ==========================Scores======================================== # of classes: 10 Accuracy: 0.9842 Precision: 0.9842 Recall: 0.9842 F1 Score: 0.9841 Precision, recall & F1: macro-averaged (equally weighted avg. of 10 classes) ======================================================================== o.d.e.c.LenetMnistExample - ****************Example finished******************** |
TensorFlow (99.31%)
TensorFlow is an open source software library for numerical computation using data flow graphs. The graph nodes represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them. This flexible architecture lets you deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device without rewriting code. TensorFlow also includes TensorBoard, a data visualization toolkit.
Pre-requisites
Python3, tensorFlow, etc.
1 2 3 4 5 |
brew install git brew install python3 pip3 install --upgrade tensorflow pip3 install --upgrade matplotlib |
Run
1 2 3 4 5 |
cd ~/MNIST git clone git@github.com:martin-gorner/tensorflow-mnist-tutorial.git cd tensorflow-mnist-tutorial python3 mnist_3.1_convolutional_bigger_dropout.py |
Here, we run mnist_3.1_convolutional_bigger_dropout.py
. This network is a convolutional deep neural network.
Results
Deep Convolutional network: Input(28×28) ==> Conv(5×5) ==> Conv(5×5) ==> Conv(4×4) ==> FullyConnected(200) ==> Softmax(10)
Time | Accuracy | Epochs | Suffer Score |
---|---|---|---|
20 Minutes | 99.31 % | 20 | 2/5 |
Suffer Score comment: Had to fix one error related to Python.
1 2 3 4 5 6 7 8 9 10 |
9880: accuracy:1.0 loss: 0.0204867 (lr:0.00012074833527971228) 9900: accuracy:1.0 loss: 0.0218605 (lr:0.00012054188589425116) 9900: ********* epoch 17 ********* test accuracy:0.9924 test loss: 3.07196 9920: accuracy:1.0 loss: 0.000798124 (lr:0.00012033749071449773) 9940: accuracy:1.0 loss: 0.0190406 (lr:0.00012013512930076374) 9960: accuracy:1.0 loss: 0.0180958 (lr:0.00011993478141673913) 9980: accuracy:1.0 loss: 0.028748 (lr:0.00011973642702746858) 10001: accuracy:1.0 loss: 0.0719834 (lr:0.00011953027871629794) 10001: ********* epoch 17 ********* test accuracy:0.9928 test loss: 3.18631 |
Microsoft CNTK (?)
CNTK the Microsoft Cognitive Toolkit, is a framework for deep learning. A Computational Network defines the function to be learned as a directed graph where each leaf node consists of an input value or parameter, and each non-leaf node represents a matrix or tensor operation upon its children. The beauty of CNTK is that once a computational network has been described, all the computation required to learn the network parameters is taken care of automatically. There is no need to derive gradients analytically or to code the interactions between variables for backpropagation.
Pre-requisites
These instructions are for Linux because it apparently doesn’t work on MacOS.
Python3, etc.
1 2 |
sudo apt-get install python3 |
Run
1 2 3 4 5 6 7 8 |
cd ~/MNIST git clone git@github.com:Microsoft/CNTK.git sudo apt-get install openmpi-bin (Download https://repo.continuum.io/archive/Anaconda3-4.1.1-Linux-x86_64.sh) bash Anaconda3-4.1.1-Linux-x86_64.sh pip3 install https://cntk.ai/PythonWheel/CPU-Only/cntk-2.2-cp35-cp35m-linux_x86_64.whl python3 CNTK/Examples/Image/Classification/MLP/Python/SimpleMNIST.py |
Results
MacOS: “ModuleNotFoundError: No module named ‘cntk'”.
Kubuntu LTS 16.04: “Import Error: lingpng12.so.0: cannot open shared object file: No such file or directory”.
OK, well this is turning in to a PITA. Didn’t work on Mac now there are issues with Linux.
Deep Convolutional network: Input(28×28) ==> ConvPool(5×5,3×3) ==> ConvPool(3×3,3×3) ==> Conv(3×3) ==> FullyConnected(96) ==> Softmax(10)
Time | Accuracy | Epochs | Suffer Score |
---|---|---|---|
?? | ?? % | 40 | 5/5 |
Suffer Score comment: Git clone took forever. Turns out you cannot run CNTK on a Mac. There is a work around involving running a Linux container using Docker. The installation instructions for Linux are not straightforward. The command pip
command is supposed to be pip3
for phython3.
Why does it need to be so complicated? In the end it didn’t work, so I’ve given up.
Torch (??)
At the heart of Torch are the popular neural network and optimization libraries which are simple to use, while having maximum flexibility in implementing complex neural network topologies. You can build arbitrary graphs of neural networks, and parallelize them over CPUs and GPUs in an efficient manner.
After at least one hour of googling, I was unable to find a tutorial or coherent instructions on how to install Torch7 and run a CNN MNIST demo. I opened an issue on Torch’s Google Group: https://groups.google.com/forum/#!topic/torch7/5K_yS8Q2LIA.
Results
Time | Accuracy | Epochs | Suffer Score |
---|---|---|---|
?? | ?? % | 40 | 5/5 |
Caffe2 (??)
Caffe2 is a lightweight, modular, and scalable deep learning framework. Building on the original Caffe, Caffe2 is designed with expression, speed, and modularity in mind.
After at least one hour of googling, I was unable to find a tutorial or coherent instructions on how to install Caffe2 and run a CNN MNIST demo. The best I could find was installation instructions and a separate tutorial with lot’s of code but no instructions on how to download or run it.
Results
Time | Accuracy | Epochs | Suffer Score |
---|---|---|---|
?? | ?? % | 40 | 5/5 |
PyTorch (99.09%)
PyTorch is a Python based scientific computing package targeted at two sets of audiences: 1)A replacement for numpy to use the power of GPUs and 2)a deep learning research platform that provides maximum flexibility and speed.
Pre-requisites
Python3, tensorFlow, etc.
1 2 3 4 5 6 |
brew install git brew install python3 pip3 install http://download.pytorch.org/whl/torch-0.2.0.post3-cp36-cp36m-macosx_10_7_x86_64.whl pip3 install torchvision # OSX Binaries dont support CUDA, install from source if CUDA is needed |
Run
1 2 3 4 5 |
cd ~/MNIST git clone git@github.com:pytorch/examples.git cd examples python3 mnist/main.py --epochs 40 |
Here, we run mnist/main.py
. This network is a convolutional deep neural network.
Results
Deep Convolutional network: Input(28×28) ==> ConvPool(5×5,2×2) ==> ConvPool(5×5,2×2) ==> FullyConnected(320) ==> FullyConnected(50) ==> Softmax(10)
Time | Accuracy | Epochs | Suffer Score |
---|---|---|---|
30 Minutes | 99.09 % | 58 | 1/5 |
Suffer Score comment: The absolute least effort of all the frameworks.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
Train Epoch: 58 [48000/60000 (80%)] Loss: 0.022173 Train Epoch: 58 [48640/60000 (81%)] Loss: 0.141494 Train Epoch: 58 [49280/60000 (82%)] Loss: 0.078569 Train Epoch: 58 [49920/60000 (83%)] Loss: 0.162332 Train Epoch: 58 [50560/60000 (84%)] Loss: 0.081903 Train Epoch: 58 [51200/60000 (85%)] Loss: 0.129946 Train Epoch: 58 [51840/60000 (86%)] Loss: 0.138492 Train Epoch: 58 [52480/60000 (87%)] Loss: 0.158638 Train Epoch: 58 [53120/60000 (88%)] Loss: 0.098830 Train Epoch: 58 [53760/60000 (90%)] Loss: 0.079310 Train Epoch: 58 [54400/60000 (91%)] Loss: 0.049244 Train Epoch: 58 [55040/60000 (92%)] Loss: 0.045119 Train Epoch: 58 [55680/60000 (93%)] Loss: 0.064007 Train Epoch: 58 [56320/60000 (94%)] Loss: 0.107020 Train Epoch: 58 [56960/60000 (95%)] Loss: 0.048211 Train Epoch: 58 [57600/60000 (96%)] Loss: 0.099237 Train Epoch: 58 [58240/60000 (97%)] Loss: 0.037267 Train Epoch: 58 [58880/60000 (98%)] Loss: 0.090165 Train Epoch: 58 [59520/60000 (99%)] Loss: 0.093787 Test set: Average loss: 0.0286, Accuracy: 9909/10000 (99%) |
MNIST Experiment Summary
Below is a summary of the short list MNIST experiments including time to run, accuracy and suffer score.
Framework | Time | Accuracy | Epochs | Suffer Score |
---|---|---|---|---|
neuralnetworksanddeeplearning | 2 Hours | 99.13 % | 59 | 2/5 |
TensorFlow | 20 Minutes | 99.31 % | 20 | 2/5 |
DL4J | 30 Minutes | 98.42 % | 58 | 2/5 |
PyTorch | 30 Minutes | 99.09 % | 58 | 1/5 |
CNTK | — | — | — | 5/5 |
Caffe2 | — | — | — | 5/5 |
Torch7 | — | — | — | 5/5 |
After at least an hour of trying I completely gave up on CNTK, Caffe2 and Torch7. I’m sure other people with more experience in the technologies related to those frameworks could have got them running more easily than I did, but this experiment is from my perspective as a relative beginner with deep learning frameworks and limited background in Python and Lua, etc. My success or lack thereof for each framework reflects not only the code, but the documentation, cross platform compatibility and the availability of beginner tutorials to follow for MNIST. PyTorch turned out to be the absolute simplest to run, working right out of the box.
The model accuracies were more or less the same as expected. Future SWaP comparisons will probably done against TensorFlow, DL4J and PyTorch.
1 Comment