Spot instances are among the most valuable tools for anyone looking to deploy lean-mean computing machines.

There are numerous trade-offs possible when comparing on-demand, reserved and spot instances, the most important being lack of persistence for spot instances.

However, wouldn’t it be great if we could get the cost-savings of spot instances, while being able to achieve a higher level of continuity?

desired setup

The aim is to start a Linux-based spot instance with its home directory located on a persistent EBS volume. Whenever a spot instance is spun-up, we mount the volume to /home - thereby maintaining user-specific files across spot instance restarts.

gather the building blocks

  • Launch an on-demand instance from a trusted AMI
    - ideally, the same instance type as your future spot
    - a small root device (based on your needs) is fine
    - this is just a temporary instance to develop our AMI
  • Create an EBS volume of sufficient size for /home
    - let’s say its called vol-e123456f

attach, prepare and mount your volume

  • Attach vol-e123456f to your instance as /sdp - it will probably appear as /dev/xvdp
  • SSH into the instance to format and mount your volume
# partition the disk correctly
$ sudo fdisk /dev/xvdp
# followed by commands [n, p, 1, w]
# to create a new, primary parition as xvdp-1 and save

# build a filesystem
$ sudo mkfs -t ext4 /dev/xvdp1

# get the UUID for the device
$ sudo blkid | grep xvdp
>> /dev/xvdp1: UUID="12a3456b-7890-43de-b855-62b2bce28cd9" TYPE="ext4"

# set your volume UUID to auto-mount
$ sudo echo 'UUID=12a3456b-7890-43de-b855-62b2bce28cd9 /media/home ext4 defaults 0 2' >> /etc/fstab
$ sudo mkdir /media/home
$ sudo mount -a

move your home directory

$ sudo rsync -aXS --progress /home/. ./media/home/.
# go back to /etc/fstab
$ sudo vim /etc/fstab
# >> change /media/home to /home
# backup your old /home
$ cd / && sudo mv /home /old_home && sudo mkdir /home

reboot, verify and customize

$ sudo reboot
$ lsblk
# should look like this
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
xvda    202:0    0    10G  0 disk
└─xvda1 202:1    0    10G  0 part /
xvdb    202:16   0    65G  0 disk /mnt
xvdp    202:240  0   100G  0 disk
└─xvdp1 202:241  0   100G  0 part /home

Now is the time to customize your instance as you wish, eg. install CUDA/cuDNN.

save your AMI and raise the spot-request

  • Stop your on-demand instance and create an image
    - only need to snapshot the root volume, not the /home volume
    - let’s call this ami-12abcd34
    - you can now terminate the on-demand instance
  • Raise a spot request with AWS
    - base image as ami-12abcd34
    - max bid price of your preference
    - request type to Persistent
    - root volume with delete-on-termination set
    - vol-e123456f as /home cannot be specified in the request
    - let’s say this spot-request is sir-y1a23b4c

the catch

The problem is that most operating systems usually won’t boot without a home directory.

So we need to monitor the spot-request, and manually connect our persistent EBS volume as /dev/sdp every time a new instance is launched.

note : Since the instance does not even start, we can’t achieve this with AWS’s user-data script.

monitor, attach and mount

So here is a simple script, with which we can monitor the spot-request and complete the instance-volume setup.

I chose to go with a bash script (run via cron) along-with some great tools like aws-cli and jq to perform this task.

SPOTREQUESTID="sir-y1a23b4c"
VOLUMEID="vol-e123456f"
REQUESTSTATUS="$(aws ec2 describe-spot-instance-requests --spot-instance-request-ids $SPOTREQUESTID | jq -r '.SpotInstanceRequests[0] | .Status | .Code')"
if [[ "$REQUESTSTATUS" == 'fulfilled' ]]; then
  INSTANCEID="$(aws ec2 describe-spot-instance-requests --spot-instance-request-ids $SPOTREQUESTID | jq -r '.SpotInstanceRequests[0] | .InstanceId')"
  echo "Request fulfilled by instance: $INSTANCEID"
  VOLUME0="$(aws ec2 describe-instance-attribute --instance-id $INSTANCEID --attribute blockDeviceMapping | jq -r '.BlockDeviceMappings[0] | .Ebs | .VolumeId')"
  VOLUME1="$(aws ec2 describe-instance-attribute --instance-id $INSTANCEID --attribute blockDeviceMapping | jq -r '.BlockDeviceMappings[1] | .Ebs | .VolumeId')"
  if [[ "$VOLUME0" != "$VOLUMEID" ]] && [[ "$VOLUME1" != "$VOLUMEID" ]]; then
    echo "$VOLUMEID is not connected"
    VOLUMESTATUS="$(aws ec2 describe-volumes --volume-id $VOLUMEID | jq -r '.Volumes[0] | .State')"
    if [[ "$VOLUMESTATUS" == 'available' ]]; then
      ATTACHRESPONSE="$(aws ec2 attach-volume --volume-id $VOLUMEID --instance-id $INSTANCEID --device /dev/sdp | jq -r '.State')"
      echo $ATTACHRESPONSE
      if [[ "$ATTACHRESPONSE" == "attaching" ]]; then
        echo "Got attaching. waiting for 10 seconds"
        sleep 10
        echo "Rebooting instance"
        aws ec2 reboot-instances --instance-ids $INSTANCEID | jq .
        sleep 10
        PUBLICIP="$(aws ec2 describe-instances --instance-id $INSTANCEID | jq -r '.Reservations[0] | .Instances[0] | .PublicIpAddress')"
        echo "$PUBLICIP my-spot.amazonaws.com" >> /etc/hosts
      else
        echo "Failed to attach $VOLUMEID to $INSTANCEID"
      fi
    else
      echo "$VOLUMEID is not in available state"
    fi
  else
    echo "$VOLUMEID is already connected"
  fi
else
  echo "Request is not fulfilled. $REQUESTSTATUS"
fi

Read through the echo commands in the script, and it should be self-explanatory.

The script also appends the public IP of the spot instance to /etc/hosts - so you get a static DNS name (resolvable on your host) for the instance.

in conclusion

After our not-insignificant efforts, we now have a persistent spot instance which maintains its home directory via an EBS volume.

So now, every time our spot instance dies, we have a guarantee that the next one to take its place will carry over existing data via the persistent volume.