00000

## Robotic Arm Motor Control

Motor control is the process by which sensory information about the world and the current state of the body is used to execute actions to generate movement. Reinforcement Learning can be used to train an autonomous motor controlling agent and as a unique approach, we use a number of motor controlling AHaH nodes as our learner. Training AHaH nodes is accomplished by applying delayed Hebbian feedback during an on-policy search strategy in a virtual environment. The original results on this task were outlined in our PLOS paper – AHaH Computing–From Metastable Switches to Attractors to Machine Learning. This article is meant to be used by KDC members as a guide to further understanding our approach and to help them build their own RL applications.

One class of Robotic Actuation is teaching an autonomous robot how to actuate multiple joints while attempting to accomplish a task. Our learner will be attempting to catch a target as fast as possible. It will have no prior information about its body or environment. Instead, it will learn the task by making observations about the world and by using the feedback after it takes actions.

A virtual environment is created which allows the robot to apply itself to the task. In this environment, the robotic arms motor controller controls the angles of connected fixed length rods of a robotic arm at the center of this environment.

Knowm API – Robotic Arm

The arm rests on a plane with its base anchored at the center, and all the joints have 360 degrees of freedom to rotate. To start the task environment creates and drops targets randomly within the robotic arm’s reach radius every time the robotic arm captures a target.

The robotic arm virtual environment is part of an open source project called Proprioceptron which builds upon a 3D gaming library and offers virtual worlds and challenges for testing motor control algorithms.

As the simulation progresses the environment and task slowly change, starting with stationary targets and increasing target lateral speed as the levels increase.

## Sensing the Environment

The robotic arm is composed by connecting a number of robotic joints each of which is further composed of two opposing “muscles” responsible for actuating the joint. Each muscle is formed of many “fibers” and a single AHaH node controls each fiber. Additionally, the robotic arm has three “sensors” : a “head” sensor and two “eyes” located on its side.

The robotic arm makes two observations about the world from these sensors. First, the distance from each sensor is used to calculate a reward value. Although other reward functions are possible we’ve found that the inverse distance from the head to the target works well as a state value function:

$Value = 1 / ( 1 + d )$

The state observations are also converted into a sparse spiking representation using a RoboticArmSpikeEncoder.java, this is used by the motor controlling AHaH nodes use as feature vectors while learning the task. To do this, the RoboticArmSpikeEncoder.java uses three Float_MaxMin encoders to create a spike representation of each distance value and then joins these together with a bias term. The Float_MaxMin encoders spatially bin each distance value and assign unique spike to each.

## Acutation

Encoded values are sent to an Acuator.java object which uses the observed state signals to create a number of angular movements for the Robotic Arm. For each fiber controlling AHaH node the Actuator computes the post-synaptic activation created by the spike-encoded environment observation. An additional noise term is added to simulate the realistic variance associated with robotics in practice.

From this value, the actuator computes the number of discrete angular steps that move each joint, dj, by:

$dj = \sum\limits_{i=0}^N ( H( y_{i}^0 ) - H( y_{i}^1 ) )$

where $N$ is the number of muscle fibers, $y_i^0$ is the post-synaptic activation of the $ith$ AHaH node controlling the $ith$ muscle fiber of the primary muscle, $y_i^1$ is the post-synaptic activation of the AHaH node controlling the $ith$ muscle fiber of the opposing muscle, and $H$ is the Heaviside step function. The number of discrete angular steps moved in each joint at each time step is then given by the difference in these two values.

In the reinforcement learning framework, we can consider the activations of these nodes as an action-value function, Q(a,s), where available actions are either pull left or pull right with respect to the observed state.

## Learning

Given a movement, we can say if a fiber (AHaH node) acted for or against it. And additionally, by observing the change in the value signal from before the action and at some time after the movement we can determine if the fiber (AHaH node) increased or decreases the signal. This is accomplished by buffering actions and value signals, letting the consequence of actions mature over the size of the buffer.

If, at this later time, the value increased, then each fiber responsible for the movement receive rewarding Hebbian feedback.

Likewise, if the fiber acted in support of a movement and later the value signal dropped, then the fiber is denied a Hebbian update instead an Anti-Hebbian update is applied.

And finally, if the action neither acted for against, then the node is left to float at its current value.

As the duration of time between movement and reward increases, so does the difficulty of the problem since many movements can be taken during the interval and the value of actions becomes ambiguous. A reinforcement scheme can be implemented in a number of ways, over a number of different timescales and even combined over more than one. For example, we may integrate over a number of time scales to determine if the value increased or decreased.

## Performance

We measured the robotic arm’s efficiency in catching targets by summing the total number of discrete angular joint actuation from the time the target was placed until capture. As a control, the same challenge was carried out using a simple random actuator. The difference in the number of discrete angular time steps between the two is shown below.

## The APP

For members of the KDC you can run the application from RoboticArmAppKtRam.java.

Here we can set the following parameters for the robotic arm simulation:

This constructor is also where we instantiate our RoboticArmAHaHBrainKtRAM object.

We can modify this class and plug-in new encoding schemes and a different actuator.

The brain object is also responsible for passing the observation signals to the spike encoder.

We can modify the value calculation for varying effect on the performance of the arm. Avoid the ball at all costs for instance:

Or go to the target, but never let the head get within a single units distance.

## Results

Running the application should generate an environment and run the simulation over a number of trials. If you don’t have the code you can see a number of demonstrations on our

## Conclusion

We outlined a very basic RL framework for training a Robotic Arm on a simple motor control task. To do this, we applied a Hebbian feedback enforcement scheme on buffered data which allowed our learner to learn motor actuation for catching targets in a simulated environment. The application we built allows us to plug in different version of the actuator, spike encoder and value function.

The videos displayed in the Results section show this framework under a single setup. We are confident there are more advanced methods for applying feedback which will change the performance ability of the robotic arm. The generality of our approach makes this an easy task. We believe this is an advantage this architecture has over some of the more mathematically focused approaches we saw in the previous article. We also hope to realize the other potentials AHaH computing has for robotics in further Knowm API examples and eventually examples which use physical kT-RAM itself.

Previous: Reinforcement Learning

00000

## Review of 2017 Energ...

250160AHaH Computing, Thermodynamic Computing, Competitive C...

## Thermodynamic Comput...

02000Is there something else out there? At a fundamental lev...

## Machine Learning Cap...

14000Machine Learning with Thermodynamic RAM and the Knowm A...