Wisconsin Breast Cancer Classification Benchmark

One of the most promising and applicable uses of Machine Learning is medical diagnostics. For instance, many of the thousands of people that die each year in the US from Breast Cancer could potentially have been saved if they’d had better access to cheap, intelligent diagnostic tools. This article contains a tutorial showing you how this type of diagnostic tool could be built with the Knowm API.

The Dataset

The Wisconsin Breast Cancer dataset is a canonical classification benchmark for training and testing a machine learning classification tool.The task is to predict whether a particular patient has a malignant or benign tumor from 9 attributes:

Attribute Name Attribute Data Type Description
Class Integer no-recurrence-events, recurrence-events
Age Integer 10-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-89, 90-99
Menopause Integer lt40, ge40, premeno
Tumor-Size Integer 0-4, 5-9, 10-14, 15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59
Inv-nodes Integer 0-2, 3-5, 6-8, 9-11, 12-14, 15-17, 18-20, 21-23, 24-26, 27-29, 30-32, 33-35, 36-39
Node-caps Integer yes, no
Deg-malig  Integer 1, 2, 3
Breast  Integer left, right
Breast-Quad  Integer left-up, left-low, right-up, right-low, central
Irradiat  Integer yes, no

Here are a few examples from the original raw data CSV file: id, [feature vector]

Just like in the previous Census Income tutorial we will using the open source Java Datasets project to access the raw data, as it provides an extremely convenient way to query the data in the form of POJOs (Plain ol’ Java Objects). Each Wisconsin BreastCancer object contains the relevant information with necessary getters and setters.

If you haven’t already, you can access this classifier example by signing up for the Knowm Developer Community and downloading the Java code. If not you can still follow along to see how it’s done.

Building the Classifier with the Knowm API

Just like the before we will be building a classifier using the Knowm API’s LinearClassifier class which means everything we write with it will benefit from drastic improvements in speed and efficiency once ported to a physical neuromorphic chip. Let’s get to it.

Open up the BreastCancerApp class:

We can see this class extends ClassifierApp. In the previous tutorial we discussed how we used this wrapper to define our Classifier by abstracting out a lot of stuff that is common to many classification problems.


Let’s first check out how we implemented our learn() method:

This method is pretty straight forward. We are looping through all of the training data and spike encoding it like this:

and finally pass the spikes, labels and an evaluation flag to the classify method:

The false evaluation flag is telling the parent ClassifierApp class to not evaluate the output of the classifier, i.e. not to compute the recall, precision, etc.


The encoding method is also an important aspect of all our classifiers. As you will learn while using the Knowm API the method of encoding will have drastic effects on the performance. For this tutorial we will be using a special encoder we’ve created called BreastCancerSpikeEncoder which we specify in the getNewEncoder() method:

You can find the encoder class here:

This encoder has two fields, an array of A2D_Integer encoders and a SpikeStreamJoiner. The A2D encoder is a spatially adaptive encoder. You can learn more about this encoder by reading our article on it: Understanding the A2D encoder :

A look at the encode method reveals that it is actually pretty simple. We just clear the SpikeStreamJoiner and then load it up with the spikes from each encoder in our array, where each encoder is assigned to a specific field of the BreastCancer class:

Great! we’ve built all the necessary parts of our classifier, now let’s test it.

Testing Phase and Primary Performance

Before we run BreastCancerApp, lets point out a few things. First, we are specifying the kT-RAM core type in the getCoreType() method:

as well as the synaptic initialization in the getSynapticInitType() method:

You may want to see how changing these methods will effect the performance of our classifier. One of the key features of the Knowm API is the notion of interchangeable core types. Interchangeable cores are our bridge between the digital kT-RAM emulators of today and the physical kT-RAM of tomorrow by letting us run our simulations of each memristor with different degrees of accuracy. With the NIBBLE core enabled and 1 epoch of training you should see results like this:

Knowm API Breast Cancer Performance

Knowm API Breast Cancer Performance

By looking at our performance chart, we can see that the classifier performed perfectly on a sufficiently low threshold. This means we were able to correctly diagnose every test patient properly. This is a very promising result for our classifier and machine assisted medical diagnostics in general.

The different evaluation metrics were previously explained on the page Primary and Secondary Performance Metrics. We see that as expected as the confidence threshold is varied the metric change accordingly. Looking at the results, we see we attained our best result with memristor core type but this is not always the case. Sometimes the non-continuos precision of our cores will increase accuracy and sometimes not.


Let’s see how our results compare against other algorithms:

Algorithm Accuracy
RS-SVM 1.0
KnowmAPI – NIBBLE 1.0
SVM 0.972
C4.5 0.9474

The above benchmark results were pulled from the UCI Machine Learning Repository Wisconsin Breast Cancer Data Set page.

ROC Curves

Looking at the ROC curves for both benign and malignant labels, we see that the the equal error rate or crossover error rate (EER or CER): the rate at which both acceptance and rejection errors are equal, is 0.0. This is a rare example of a perfect ROC curve!

Knowm API Breast Cancer Benign ROC

Knowm API Breast Cancer Benign ROC

Knowm API Breast Cancer Malignant ROC

Knowm API Breast Cancer Malignant ROC

If we look at the raw performance data, it looks like this:

Secondary Metrics

As stated in Primary and Secondary Performance Metrics, most machine learning benchmark studies only report primary performance metrics. Here, we also report the secondary metrics when run a a 2015 Macbook Pro Retina. Wattage is a rough estimate acquired from the iStat Menu app.

Measurement Value
Energy Consumption 15.1 Watts
Speed 1 Second
Volume 600 cubic centmeters


In this article we stepped through the Wisconsin Breast Cancer dataset, the kT-RAM linear classifier, spike encoding as well as primary and secondary performance metrics. We saw that the Knowm API is very well suited for this task. If you have any comments or questions please leave them in the comment section below!

Further Reading

TOC: Table of Contents
Previous: Census Income Classification Benchmark
Next: Reuters-21578 Classification Benchmark

Related Posts

Subscribe To Our Newsletter

Join our low volume mailing list to receive the latest news and updates from our team.

Leave a Comment


Subscribe to our low-volume mailing list to receive important updates and announcements directly in your inbox.