Scaling Efficiency: Managing Multiple Instances of Cron Jobs in Elastic Beanstalk

Cron jobs on multi-instance Elastic Beanstalk can be challenging. Our goal is often to have the jobs handled by NodeJS and also guaranteed to run on only one given instance at a time. Sometimes a separate web worker instance makes sense for this, but a lot of times we don’t want a separate server and codebase.

You can specify a cron job to run on Elastic Beanstalk using .ebextensions and a “leader_only” flag, but this is not reliable in a situation where your original leader instance gets terminated.

There are many task queue alternatives that can manage jobs, but these usually require a central data store like redis.

One simple and self-contained strategy is to have the instance itself check if it is the “leader” and if so to run the NodeJS “cron” code. This can be accomplished by using the aws-sdk and adding some permissions to the role that the elastic beanstalk environment is using.

The steps in the code below are:

  1. Get the current instance ID
  2. Get the environment ID
  3. Get the other instances in the environment
  4. Determine if the current instance is “master” based on list order.

Node code:

IAM Policy for the EB role.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1415212833000",
"Effect": "Allow",
"Action": [
"s3:*",
"s3:Get*",
"s3:List*",
"s3:ListAllMyBuckets",
"s3:PutObject",
"sts:AssumeRole",
"ec2:DescribeTags",
"elasticbeanstalk:DescribeEnvironmentResources",
"autoscaling:DescribeAutoScalingGroups",
"cloudformation:ListStackResources",
"codedeploy:*"
],
"Resource": [
"*",
"arn:aws:s3:::*",
"arn:aws:s3:::deploymentbucket/*",
"arn:aws:s3:::aws-codedeploy-us-east-1/*"
]
}
]
}

This code is based on some work found online including https://gist.github.com/ippeiukai/37e812e49f04ea0f26d84d380d050304