24 May 2020

Having spent the better part of a day figuring this out, I thought it might be nice to save others some time.

My specific need was to configure STONITH for a Pacemaker cluster deployed on Google Cloud Platform, using fence_gce.

By far the trickiest part was figuring out Google’s extremely robust permissions system to meet this case in the most secure manner. The goal: to allow each node in a two-node cluster to fence the other node (by reset or poweroff), while restricting each node to only be able to fence its peer, and nothing else.

Below I’ll post some psuedo-code that should give you an idea about how to execute this for your particular needs. I deployed this solution using Terraform, but the concepts should be easily adaptable to other use cases.

In a nutshell:

  1. Create two custom IAM Roles (I called them fencing and list.instances):
    • fencing: permissions to reset or stop a server and get access to its state
    • list.instances: Permissions to list the available instances in a project (this was a small security sacrific, I’d prefer that a node can’t see other nodes this way, but the fencing agent seems to require it, and as of this writing, there’s no clean way on GCP to restrict the nodes listed via the API)
  2. For each instance in a cluster, create a Service Account, and configure the instance run as that service account, granting all the default scopes and the ‘compute-rw’ scope’
  3. At the project level, add an IAM policy binding that binds all the instance service accounts to the list.instances role.
  4. At the instance level, add an IAM policy binding that binds the fencing role to the instance service account for the instance that will fence the instance in question. For example, if you have data1 and data2 instances Then for the data1 instance you’ll bind data2’s service account to the fencing role, and vice-versa. This allows each node to fence its peer, but to have no other permissions for fencing, not even for itself.