apache storm machine learningRunning Apache Storm cluster on EC2 is greatly simplified with the open source project storm-deploy-alternative. With this handy project, you can define your Storm cluster in a simple configuration file and create the running instances as well as installing and running all necessary software components such as Zookeeper, Storm, and Nimbus is all handled for you. Beneath the hood, the jClouds library is used. This is a 100% Java based solution. We have also previously published another how-to article called How to Install a Distributed Apache Storm Cluster on running Apache Storm on your own hardware. Please check out that article for more background information on the mechanics of a Storm cluster. Deploying to EC2 is more or less the same, but on someone else’s hardware.

This guide aims to help you get up and running with Apache Storm on Amazon EC2 as fast as possible. This example will target the AWS free trial-period t2.micro machines.

Set Up Individual Users

This step is optional. You could use your root account, but it is highly recommended to set up individual users.

Log in to your main Amazon AWS account. In the upper right hand corner, select Your Name ==> Security Credentials and select “Get Started with IAM users. You could also get to this IAM dashboard via clicking on Services in the upper left section of the upper menu and clicking IAM. On the left hand side menu of the IAM dashboard, select Groups and create a new group. Add an AmazonEC2FullAccess policy to it and add users to it. For each new user, note the Access Key Id and Secret Access Key.

Setup a password for each user and download the credentials file. The file contains the user’s password and access URL. Set up multi-factor authentication (MFA) if you’d like as well.

From this point forward, we log out of the main account and log back in as one of the user’s created above.

Set up SSH Key Pair

Create a SSH key pair on your local dev machine, giving them the names ec2 and ec2.pub, with no passphrase (when asked, just hit enter).

This key pair will be automatically used to establish an ssh connection with the launched EC2 instances. You can leave off the -f ~/.ssh/ec2 part of the above command if you want, and a key pair with the default name, id_rsa will be generated. If you already have a id_rsa key pair though, you’ll want to create a new one with a unique identity, ec2 in this example.

Review So Far

So far, we have set up the basic essentials for deploying an Apache Storm cluster to Amazon EC2 with the storm-deploy-alternative project. It involved creating a group and user(s) for this action, logging in as one of the users and creating a new empty-passphrase ssh key pair on your dev machine.

Next, we deploy a Storm cluster using the user’s Access Key Id, Secret Access Key and ec2 ssh keys.

Using storm-deploy-alternative

There are at least two ways to use storm-deploy-alternative, one of which, involving building a jar with Maven, is already described in the README file of the project. Here, I will describe how to run it from within the Eclipse IDE. You may want to do this method if you want to look at and alter the source code or even perhaps to contribute back to the project by submitting a pull request. Both options require cloning the source code from github and hence requires that Git is installed on your dev machine.

Both methods also require that you define which Amazon Machine Image (AMI) you will be creating running instances of. Note that you do not need to create your own AMIs under your account. Instead you need to pick one from the Ubuntu Cloud Images. A handy website for finding the appropriate AMI from Ubuntu is here: https://cloud-images.ubuntu.com/locator/ec2/. At the bottom of the page you can filter the choices based on Zone, Version, Arch, etc. After you found the AMI you need, take note of the AMI-ID.

Config Files

You’re going to need both a credential.yaml and a configuration.yaml file. They are more or less self explanatory and you’ll need to copy-pasta the SSH key pair name, if not the default, the Access Key Id and Secret Access Key info. There are template configuration files in the storm-deploy-alternative project in the conf directory.

Cluster config

Ensure the image resides in the same region as specified. Choose a mirror download URL for storm and zookeeper or put the files in your own S3 bucket and use those URLs.

Credentials

Run storm-deploy-alternative in Eclipse

This requires the m2e Eclipse plugin to be installed first.

Import the project into Eclipse: File ==> Import ==> Maven ==> Existing Maven Projects … After the project is imported into Eclipse, the first thing to do is create two files in storm-deploy-alternative-local/conf called storm-deploy-alternative-local/conf/credential.yaml and storm-deploy-alternative-local/conf/configuration.yaml. You can just copy and paste the provided storm-deploy-alternative-local/conf/credential_example.yaml and storm-deploy-alternative-local/conf/configuration_example.yaml files to get started. Update the information as needed and as described above.

Optionally, you might want to change the logging level, although the default is pretty good. To do that, find and open up the simplelogger.properties file and set your preferred logging level.

Build and Upload Cloud Component

The cloud component of this project, storm-deploy-alternative-cloud.jar, is downloaded by each cluster instance and is mainly used to run Storm and zookeeper under supervision. You need to upload storm-deploy-alternative-cloud/target/storm-deploy-alternative-cloud.jar to some location on the web accessible by your cluster instances via wget. You own S3 bucket would be a logical location. Update the configuration.yaml entry, storm-deploy-alternative-cloud-jar-url, accordingly. To build it in Eclipse right-click on the parent project ==> Run As ==> Maven install. Better would be to run mvn clean package, because it doesn’t actually need to be installed in your local Maven repo. You can accomplish that by choosing Maven build… instead.

Maven Eclipse Build Cloud

Maven Eclipse Build Cloud

Next, find the class StormDeployAlternative, Right-Click ==> Run As ==> Java Application. The first time it will complain of missing program args. Open up the Run Configurations (Right-Click ==> Run As ==> Run Configurations) and add the program arg: deploy mycluster. Then run the program again.

Eclipse Program Args

Eclipse Program Args


This time, if every things works out OK, the Storm cluster will be deployed. Expect to wait about 5 to 10 minutes before it’s fully deployed. After deployment, you will see the relevant cluster information including the StormUI and Ganglia web application URLs.

You can and should verify that the instances have been deployed by looking at your EC2 console.

ec2 Instances

ec2 Instances

SSH Into the MASTER Node

Just to test that you can SSH into the MASTER node, run the following command, replacing ec2 with whatever your SSH key name is that you used above.

Run a Test Topology

SSH’ed into the MASTER node of the cluster, run the following command, which launches a test storm topology that comes bundled with Storm.

The result will look something like this:

Furthermore, you can check out the StormUI at the URL given above after the cluster deploy.

Storm UI ec2

Storm UI ec2

And also ganglia.

Ganglia

Ganglia

Kill the Running Cluster

To terminate all ec2 cluster instances, you run the StormDeployAlternative class again, but with different program args: kill mycluster. After waiting a minute or two, you can check the EC2 dashboard and verify that the running instances were terminated.

Further Reading

Related Posts

Subscribe To Our Newsletter

Join our low volume mailing list to receive the latest news and updates from our team.

4 Comments

Leave a Comment

Knowm 32X32 Crossbar

Knowm Newsletter

Are you ready for memristor AI processors? With our newsletter, you will be.