Spot instances are among the most valuable tools for anyone looking to deploy lean-mean computing machines.
There are numerous trade-offs possible when comparing on-demand, reserved and spot instances, the most important being lack of persistence for spot instances.
However, wouldn’t it be great if we could get the cost-savings of spot instances, while being able to achieve a higher level of continuity?
The aim is to start a Linux-based spot instance with its home directory located on a persistent EBS volume. Whenever a spot instance is spun-up, we mount the volume to /home - thereby maintaining user-specific files across spot instance restarts.
gather the building blocks
- Launch an on-demand instance from a trusted AMI
- ideally, the same instance type as your future spot
- a small root device (based on your needs) is fine
- this is just a temporary instance to develop our AMI
- Create an EBS volume of sufficient size for /home
- let’s say its called vol-e123456f
attach, prepare and mount your volume
- Attach vol-e123456f to your instance as /sdp - it will probably appear as /dev/xvdp
- SSH into the instance to format and mount your volume
# partition the disk correctly $ sudo fdisk /dev/xvdp # followed by commands [n, p, 1, w] # to create a new, primary parition as xvdp-1 and save # build a filesystem $ sudo mkfs -t ext4 /dev/xvdp1 # get the UUID for the device $ sudo blkid | grep xvdp >> /dev/xvdp1: UUID="12a3456b-7890-43de-b855-62b2bce28cd9" TYPE="ext4" # set your volume UUID to auto-mount $ sudo echo 'UUID=12a3456b-7890-43de-b855-62b2bce28cd9 /media/home ext4 defaults 0 2' >> /etc/fstab $ sudo mkdir /media/home $ sudo mount -a
move your home directory
$ sudo rsync -aXS --progress /home/. ./media/home/. # go back to /etc/fstab $ sudo vim /etc/fstab # >> change /media/home to /home # backup your old /home $ cd / && sudo mv /home /old_home && sudo mkdir /home
reboot, verify and customize
$ sudo reboot $ lsblk # should look like this NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT xvda 202:0 0 10G 0 disk └─xvda1 202:1 0 10G 0 part / xvdb 202:16 0 65G 0 disk /mnt xvdp 202:240 0 100G 0 disk └─xvdp1 202:241 0 100G 0 part /home
Now is the time to customize your instance as you wish, eg. install CUDA/cuDNN.
save your AMI and raise the spot-request
- Stop your on-demand instance and create an image
- only need to snapshot the root volume, not the /home volume
- let’s call this ami-12abcd34
- you can now terminate the on-demand instance
- Raise a spot request with AWS
- base image as ami-12abcd34
- max bid price of your preference
- request type to Persistent
- root volume with delete-on-termination set
- vol-e123456f as /home cannot be specified in the request
- let’s say this spot-request is sir-y1a23b4c
The problem is that most operating systems usually won’t boot without a home directory.
So we need to monitor the spot-request, and manually connect our persistent EBS volume as /dev/sdp every time a new instance is launched.
note : Since the instance does not even start, we can’t achieve this with AWS’s user-data script.
monitor, attach and mount
So here is a simple script, with which we can monitor the spot-request and complete the instance-volume setup.
SPOTREQUESTID="sir-y1a23b4c" VOLUMEID="vol-e123456f" REQUESTSTATUS="$(aws ec2 describe-spot-instance-requests --spot-instance-request-ids $SPOTREQUESTID | jq -r '.SpotInstanceRequests | .Status | .Code')" if [[ "$REQUESTSTATUS" == 'fulfilled' ]]; then INSTANCEID="$(aws ec2 describe-spot-instance-requests --spot-instance-request-ids $SPOTREQUESTID | jq -r '.SpotInstanceRequests | .InstanceId')" echo "Request fulfilled by instance: $INSTANCEID" VOLUME0="$(aws ec2 describe-instance-attribute --instance-id $INSTANCEID --attribute blockDeviceMapping | jq -r '.BlockDeviceMappings | .Ebs | .VolumeId')" VOLUME1="$(aws ec2 describe-instance-attribute --instance-id $INSTANCEID --attribute blockDeviceMapping | jq -r '.BlockDeviceMappings | .Ebs | .VolumeId')" if [[ "$VOLUME0" != "$VOLUMEID" ]] && [[ "$VOLUME1" != "$VOLUMEID" ]]; then echo "$VOLUMEID is not connected" VOLUMESTATUS="$(aws ec2 describe-volumes --volume-id $VOLUMEID | jq -r '.Volumes | .State')" if [[ "$VOLUMESTATUS" == 'available' ]]; then ATTACHRESPONSE="$(aws ec2 attach-volume --volume-id $VOLUMEID --instance-id $INSTANCEID --device /dev/sdp | jq -r '.State')" echo $ATTACHRESPONSE if [[ "$ATTACHRESPONSE" == "attaching" ]]; then echo "Got attaching. waiting for 10 seconds" sleep 10 echo "Rebooting instance" aws ec2 reboot-instances --instance-ids $INSTANCEID | jq . sleep 10 PUBLICIP="$(aws ec2 describe-instances --instance-id $INSTANCEID | jq -r '.Reservations | .Instances | .PublicIpAddress')" echo "$PUBLICIP my-spot.amazonaws.com" >> /etc/hosts else echo "Failed to attach $VOLUMEID to $INSTANCEID" fi else echo "$VOLUMEID is not in available state" fi else echo "$VOLUMEID is already connected" fi else echo "Request is not fulfilled. $REQUESTSTATUS" fi
Read through the echo commands in the script, and it should be self-explanatory.
The script also appends the public IP of the spot instance to /etc/hosts - so you get a static DNS name (resolvable on your host) for the instance.
After our not-insignificant efforts, we now have a persistent spot instance which maintains its home directory via an EBS volume.
So now, every time our spot instance dies, we have a guarantee that the next one to take its place will carry over existing data via the persistent volume.