By far the trickiest part was figuring out Google’s extremely robust permissions
system to meet this case in the most secure manner. The goal: to allow each node
in a two-node cluster to fence the other node (by reset or poweroff), while restricting
each node to only be able to fence its peer, and nothing else.
Below I’ll post some psuedo-code that should give you an idea about how to execute
this for your particular needs. I deployed this solution using
Terraform, but the concepts should be easily adaptable
to other use cases.
In a nutshell:
Create two custom IAM Roles (I called them fencing and list.instances):
fencing: permissions to reset or stop a server and get access to its state
list.instances: Permissions to list the available instances in a project
(this was a small security sacrific, I’d prefer that a node can’t see other
nodes this way, but the fencing agent seems to require it, and as of this
writing, there’s no clean way on GCP to restrict the nodes listed via the API)
For each instance in a cluster, create a Service Account, and configure the
instance run as that service account, granting all the default scopes and
the ‘compute-rw’ scope’
At the project level, add an IAM policy binding that binds all the instance
service accounts to the list.instances role.
At the instance level, add an IAM policy binding that binds the fencing
role to the instance service account for the instance that will fence the
instance in question. For example, if you have data1 and data2 instances
Then for the data1 instance you’ll bind data2’s service account to the
fencing role, and vice-versa. This allows each node to fence its peer,
but to have no other permissions for fencing, not even for itself.