ELB Health Check With AWS Spot Fleet
In mid-2015, AWS announced Spot Fleet to make the EC2 Spot Instance model even more useful. With the addition of a new API, Spot Fleet allowed one to launch and manage an entire fleet of Spot Instances with just one request.
Spot Fleet comes with some great features:
- Automatically attaching new instances to ELB/ALB
- Replacing terminated instances to maintain the Target Capacity
- Instance Weighting
But to better understand how Spot Fleet works, visit How Spot Fleet Works
As great Spot Fleet is, it’s still has some gaps, one of them is the Health Check.
Spot Fleet checks the health status of the Spot Instances in the fleet every two minutes. The health status of an instance is either healthy or unhealthy. Spot Fleet determines the health status of an instance using the status checks provided by Amazon EC2.
As you can see, Spot Fleet health check is based on EC2 health check status, that means no way to determine the health check by ELB.
In some cases, once the ELB marks the instance as unhealthy
, we would like to replace it - like AWS Auto Scaling Group does. Well, that is not an option in Spot Fleet.
But fear no more, in this blog post I will explain how it can be done with the help of a little more AWS services and some Python coding.
So, lets begin…
Architecture
We will use AWS CloudFormation Template to deploy the following:
- An Elastic Load Balancing
- An AWS Lambda
- An Amazon CloudWatch Alarm
For the purpose of the example, I will use a Spot Fleet behind an Application Load Balancer, that uses a health check to determine if the instance is ready for incoming traffic.
Once the instance is unhealthy
the ALB stops sending it traffic, and I want to give it time to recover before terminating it - This is the job for… the ELB.
When the CloudWatch alarm is triggered because of an unhealthy instance count in the ELB, a Lambda function will be executed with a python script, that will de-register the unhealthy instance and finally terminate it.
Once terminated, Spot Fleet will automatically launch a new one to maintain its Target Capacity.
The following diagram shows how the components work together.
Review the details
All the code and CloudFormation templates are minimal and shows only what’s needed for this blog post.
Elastic Load Balancing
Because I want to give the instance time to recover from its Unhealthy
state (determent by the ALB),
I add this ELB with the same Health Check path as configured in the ALB but I give it a little more UnhealthyThreshold
.
This gives the instance more time before moving it to Unhealthy
state, and finally terminates it by our Lambda.
Lambda functions
The Lambda function does all the magic. The details of the CloudWatch Alarm are published to the Lambda function (throughout the SNS topic),
witch uses boto3 to make a couple of AWS API calls.
The first call is to describe the all instances health of the ELB, filtering on instances that are OutOfService
.
The instances that pass the filtering, are then de-registered from the ALB before terminating them.
Pay attention to the reset_alarm function, with this function we want to reset the alarm state to OK
.
Because CloudWatch Alarm has no option for repeat action when the alarm is raised, we could end up with failing instances
and the lambda won’t be triggered any more. This why, by setting the alarm state back to OK
, will cause it to trigger the action once again when the state changes to ALARM
.
Next, we will give the Lambda function permissions to be invoked from the SNS topic.
CloudWatch Alarm
Next step is to create the CloudWatch Alarm to be raised once the ELB UnHealthyHostCount
gets above 0.
SNS Topic
And now, the SNS topic that connects the CloudWatch Alarm and the Lambda function.
Spot Fleet request
Finally, the Spot Fleet request, that uses an Application Load Balancer to send traffic to the connected instances. And uses the Elastic Load Balancing for the Health Check (I could use the ALB for that, but again, I wanted to give the instances time to recover)
Test
After launching the CloudFormation stack. I want to simulate an unhealthy
instance. For that I logged into a random instance from the Spot Fleet and stopped my application (the application that supposed to answer to the Health Check path).
That will cause the ALB to mark the instance as unhealthy
and traffic won’t be sent to it. Then the ELB will mark the instance as OutOfService
, and that will trigger the CloudWatch Alarm, that will invoke the Lambda function, that will de-register it from the ALB and finally will terminate the instance.