Reuters-21578 Classification Benchmark with the Knowm API

The goal of text categorization is the classifcation of documents into a number of predefined categories. Each document can have zero, one or more labeled categories. In a world where much text is digitized, a text categorizer has many roles: Uncovering trends, understanding sentiment, indexing documents, and as a stage in natural language processing (NLP) systems.

magazines photo

Since building text classifiers by hand is difficult and time consuming it is advantageous to learn classifiers from examples. This is an example of a supervised learning task which is well suited for neuromorphic chips such as kT-RAM. This article contains tutorial for building a Text Categorizer with the Knowm API.

To get started using the Knowm API, you’ll need to sign up for our developer community and download the Java code. This example is contained in two java classes which you can access from here:

If not, you can still follow along to see how it’s done.

The Dataset

The Reuters-21578 dataset is a standard and widely distributed collection of hand-labeled articles pulled from Reuters the magazine. It’s a very well-known benchmark which has been a considerable aid in the development of algorithms for the task of text categorization.

We’ll be using Reuters-21578 to train a classifier for predicting the categories of an article from a variety of its attributes. These attributes are strings and dates pulled from the text from each article.

Attribute Description Type
Date Article date Date
Title Article title String
Places List of places “Australia” String
Organizations List of organizations “gatt” String
People List of people “perez-de-cuellar” String
Companies List of companies “IBM” String
Exchanges List of exchanges “NASDAQ” String
Body Body of article String

The categories ( labels ) from which we are trying to predict were selected based on by Reuters magazine for these articles. There are 118 categories include topics like “interest”, “money-fx”, “crude”, and “trade”.

Just like in the previous tutorials we will be using the open source Java Datasets project to access the raw data, as it provides a convenient way to query the data in the form of POJOs (Plain Ol’ Java Objects). In this case, this is extremely useful since raw Reuters examples are stored in a collection of 22 files in SGM format. You will need to download the dataset from and place it in: /usr/local/Datasets/.

Once built, each Reuters object will contain all of its attribute values along with getters and setters which will allow us to conveniently query example attributes.

Spike Encoding

The first step in text categorization is to transform our documents, which are strings and dates, into a spike representation suitable for a kT-RAM classifier. This entails converting all data types into a set of spikes through the use of a spike encoder. As in our previous tutorials, we will be using our pre-built, classifier Linear which is designed as a general purpose multi-label classifier and we will be writing our own spike encoder. Although most spike-encoding can be automated with the GenericBeanEncoder, it is helpful to understand the process of spike encoding.

Reuters objects are composed of String and dates. We use two sub-encoder objects in Reuters21578SpikeEncoder for these data types. Namely, a String_BagOfWords encoder and a Date_Encoder.

String_BagOfWords Encoder

The String_BagOfWords encoder splits a bag of words up into individual words and then creates a one to one mapping between a spike channel and unique strings. For instance:

Space is limited because each spike channel maps to a unique synapse on kT-RAM. To limit the size of our one-to-one mapping, we instantiated our String_BagOfWords encoder with a maximum cache size. In our case, we’ve allocated space for 65536 unique words.

Date Encoder

We encode dates by assigning spike channels to the individual months, weeks and days. Thejava.util.Calendar class conveniently lists these off as integers.

We implement the full example encoding in the Reuters21578SpikeEncoder.encode() method. Here we encode the various attributes and join them into a single spike pattern.

Each example is individually encoded attribute by attribute and then joined into a single set.

Our spike patterns for this encoder have length 131072 after encoding each attribute. We use a StreamReducer to reduce this size.

The streamReducer is instantiated with a maximum spike space:

The MAX_SPIKE_SPACE Integer defines the maximum size of the spike space we allow. Once this size is reached, the least-recently used spike channel is assigned.

Building a Classifier with the Knowm API

The second class, will handle training the kT-RAM chip, testing the performance of our classifier and displaying the results in a presentable way. Just like the previous tutorials we will be extending our classifier off of the ClassifierApp.

This superclass,, handles the generic code that calls the training and testing loops and plots the results in a presentable way.

For multi-label classification we set the Evaluation type to All_ABOVE_THRESHOLD. Every label is assigned to its own AHaH node, which acts as a binary classifier separating its assigned class from the others. Thus, when a spike-pattern is presented to all AHaH Nodes that make up the classifier, we are asking to report only those that exceed our confidence threshold.

We set the emulator core type by overriding getCoreType. Here we have used the SHORT 16 bit core here, but note the sweep below where we report on the BYTE and NIBBLE Cores. The ability to swap out cores is essential to the development of memristive processors, as it informs us how much resolution is actually needed for various types of problems.

We will also need to tell the ClassiferApp to use our custom made encoder class. So we add this:

Before we can train our classifier we need to load our dataset and build the classifier. This is accomplished in our main() method.

Here we load the dataset into local memory, then instantiate an instance of our ClassifierApp and finally call the go() method. Note: (Loading the dataset might fail if you haven’t downloaded the Reuters-21578 dataset from our project. )


Our Reuters Classifier will also be following the usual training loop we’ve introduced. The only difference is that our classification problem is multi-label. This means we could possible pass more than one truth label to our learner during training.

Testing and Performance

Once we’ve gone through all of our epochs of training, the classifier will call the testing method.

This function iterates through all the unseen testing data and collect metrics on its performance. Let’s run the App and see what results we get.


We trained over 10 epochs during which the linear classifier trains on the full Reuters-21578 training set. We also use a Short (16 bit) core to attained a peak F1 score (the harmonic mean of precision and recall), of 91.5% with a confidence threshold of 0.17. The results below show the performance progression of the classifier with respect to our confidence threshold.

Knowm API Reuters-21578 Performance chart - 16 bit core - 10 epochs

Knowm API Reuters-21578 Performance chart – 16 bit core – 10 epochs

Initially, with a low confidence threshold our classifier labeled each article under every category. A low confidence threshold gave us a very high recall ( very few false negatives ) and a very low precision ( many false positive ). As expected, as we increased our confidence threshold, our precision increased while the recall deteriorated.

The following chart displays the Precision/recall breakeven point on the ten most frequent Reuters categories. The results from the columns one, two and four were copied from [1], and column three from [2]. The fourth column displays scores attained by the kT-RAM classifier we designed in this tutorial.

NBayes BayesNets SVM rbf (gamma = 1.0) LinearSVM Knowm API – SHORT
earn 95.9% 98.4% 98.4% 97.7%
acq 64.7% 95.3% 89.7% 93.1%
money-fx 56.6% 76.3% 73.9% 80.3%
grain 78.8% 91.9% 94.2% 86.0%
crude 79.5% 88.9% 88.3% 85.6%
trade 63.9% 77.8% 73.5% 81.7%
interest 64.9% 76.2% 75.8% 72.5%
ship 85.4% 87.6% 78.0% 82.2%
wheat 69.7% 85.9 89.7% 87.4%
corn 65.3% 91.8% 91.1% 78.1%
Avg 75.2% 86.3% 85.5% 84.5%

The best overall Precision/recall breakeven point is achieved by a RBF SVM with a radius of 1.0.

The following experiments compare the relative F1 – performance of our classifier with different kt-RAM core types and 1, 5, 10, and 20 training epochs. Each configuration was run once. Each core type represents a different degree of synapse emulator accuracy. The Nibble core treats memristors as having 4 bits (16 distinct states or levels) of resistance, the Byte core as 8 bits (256 levels) and the short core as 16 bits (65,536 levels). Depending on the task and encoding, the classifier may need more resolution to achieve state-of-the-art (SOTA) performance. The KnowmAPI enables us to develop real-world applications for kT-RAM and map these to the physical constraints of memristors. Via our interchangeable core types, we can test various resolutions and memristor models (not shown) quickly. As we can see, the BYTE core is about all we need–which implies a resolution of 8 bits. Also note that while we have used the SHORT core, this does not mean that all 16 bits of resolution were needed to achieve the .915 result. This just tells us that 256 levels was just slightly insufficient to match SOTA algorithms on this particular classification task with this particular spike encoding. Making the transition to neuromemristive processors requires a careful assessment of a number of details, and weight precision is one of the most important.

Emulator Core Type \ Epoch 1 5 10 20
Short (16 bits) 0.880 0.905 0.909 0.915
Byte (8 bits) 0.874 0.898 0.898 0.891
Nibble (4 bits) 0.804 0.824 0.848 0.845

Secondary Metrics

As stated in Primary and Secondary Performance Metrics, most machine learning benchmark studies only report primary performance metrics. Here, we also report the secondary metrics when run a a 2015 Macbook Pro Retina. Wattage is a rough estimate acquired from the iStat Menu app.

Measurement Value
Energy Consumption 15.0 Watts
Speed 27 Seconds
Volume 600 cubic centmeters


In this article, we introduced a linear classifier built on kT-RAM for doing text classification. We used this classifier to categorize articles from the Reuters-21578 dataset and achieved a peak F1 score of 0.915. This result shows that a basic kT-RAM linear classifier acting on a simple bag-of-words encoder has comparable accuracy to an SVM on the same task. Changes in how we spike-encode the data affect the performance of our system. We may employ feature engineering or feature learning to further improve the results.

We also introduced the secondary metric performance of the chip emulator running on a Mac Book pro. We found our training and testing took 66 seconds and used 15 Watts of energy. These results highlight performance of AHaH computing on emulator software. We predict drastic improvements in speed and efficiency on hard kT-RAM.

If you have any comments or questions, please leave them in the comment section below!

Further Reading

TOC: Table of Contents
Previous: Wisconsin Breast Cancer Classification Benchmark
Next: MNIST Hand Written Digits Classification Benchmark


  1. Dumais, S. T., Platt, J., Heckerman, D., and Sahami, M. Inductive learning algorithms and representations for text categorization. Submitted for publication, 1998.

  2. Joachims T,Text Categorization with Supp ort Vector Machines?
    Learning with Many Features. Universit?at Dortmund,

Related Posts

Subscribe To Our Newsletter

Join our low volume mailing list to receive the latest news and updates from our team.

Leave a Comment


Subscribe to our low-volume mailing list to receive important updates and announcements directly in your inbox.