Before you start with this feature, please make sure that you are working the latest Spotinst Policy.
Currently, when you work with EC2 Spot Instances you are only allowed to reboot or terminate an instance. This is not an issue for stateless applications as they are designed to easily scale horizontally. If you have a stateful application or an application that is designed to withstand node failure such as a database cluster then you may have decided in the past that Spot instances were not the best use case for you. Today we are excited to announce the launch of our new Stateful Spot service. This new service will finally allow you to utilize Spot Instances and save up to 80% on your EC2 Stateful environment.
How Stateful configuration works
Provision a new Stateful Spot instance from the Spotinst Console or API. This will be just like provisioning a new instance from the EC2 console. Spotinst will also take regular backup snapshots of your instances over time.
- If a Spot Interruption occurs, the instance will be shut down and terminated.
- The EBS volumes associated with the instance will become detached.
- The Original root volume and data volumes will become available and the last EBS snapshot will be taken. A new AMI will be created from this snapshot.
- A new Recovery clone instance will be launched to replace the previous instance.
- The same data, private-IP, security groups, load balancers, and other metadata will be available to you.
- The recovery time can take as little as three minutes.
- New EBS volumes will be created and attached from the newly created AMI
- The new instance is launched and becomes healthy. From the standpoint of this single server, it will appear as if it were shut down for a period of time.
Things to note
Keep the machine’s root volume – The same data (OS / Configuration etc will be maintained for your instance. To further increase the reliability of your instances we also create periodic snapshots of your data volumes while your instance is running. The new instance will be created from a “final” snapshot that will be taken only after the original instance is terminated and the EBS volumes change to an “available” state. For this to function properly it is necessary to turn off the delete on termination flag when provisioning a new instance.
Keep the machine’s private IP – New instances will be provisioned with the same configuration as the old one. The instance will be an exact clone of the old one with the same private/public IPs (Elastic IP is required). For example, for a Cassandra node, it is necessary to use the same private IP of the replaced instance for the cluster to recognize the newly created clone.
Keep the machine’s data volumes – All data volumes that were attached at the time of the previous instance termination will be automatically re-attached using the same BlockDeviceMapping configuration upon instance replacement.
Stateful Instance Actions
Pause - Equivalent to stopping the instance from AWS. All the data will be saved (According to the stateful configuration), the actual Spot instance will be terminated and when it will be resumed, it will be exactly in the same state in terms of data and network interfaces.
Resume - This will be used to start the instance after pausing it.
Recycle - Equivalent to rebooting or restarting the machine. Of course, all the data will be saved.
Deallocate - Release all resources (Root volume, data volumes, snapshot and network interfaces) and terminate the instance. When deallocating a machine, its data will be lost.
Please note: When downsizing your Elastigroup's capacity some resources will not be deleted although the instance will be terminated. to properly downsize the capacity, deallocate the relevant instances using the instances tab.
Cassandra – If your Cassandra node is replaced we’ll clone the instance and bring it back. Your Cassandra cluster will behave as if the instance was down for some time. Bringing up a clone of the previous instance ensures that cluster IOPs are not wasted on bringing a new instance up.
Elastic.co – Elasticsearch node recovery will take a fraction of the time required to provision a brand new instance. From the standpoint of your Elasticsearch cluster the instance was only down for a period of time (depending on the size of the data volumes attached). No changes are necessary for your cluster to provision this as long as you have enough instances for quorum.
Single Server Database – If you have non-production environments it is very likely that you do not have a requirement for 100% uptime for your database instances. You can also create a RDB cluster with spot instances and use the stateful spot feature to ensure that you do not lose application availability.
Monolithic – or COTS (commercial off the shelf software) – Any monolithic or Off the shelf Windows applications can be used with Stateful Elastigroup. Keep in mind that if a replacement is necessary your instance will be down for a few minutes as the Recovery process takes place.
Development instances – You can run non-production nodes on Spot Instances with occasional downtime. If an interruption occurs on your instance it will be brought back automatically within a few minutes.
Hadoop cluster -Support for “Stateful Spot” instances in Spotinst Elastigroups allows you to provision Spot Instances and automatically recover the full state of the instance including the private ip. When a recovery occurs we will automatically create a clone of the previous instance and it will appear as if the instance was brought down for a restart. For instructions please see: Hadoop use case