Modeling entities & cells

Your entity types and cells are the foundation everything else sits on. Get them right and relationships, the cascade, and Actions fall out naturally. Get them wrong and you'll be reshaping a graph that real Terraform state is already projecting into — a much more expensive fix.

This page is opinionated about where to draw the lines. The full field reference lives in EntityType and Cell.

Model the cattle, not the pets

Do define an entity type for the thing you have many of and manage as a population — Tenant, TenantCluster, CustomerStack, EphemeralEnv.

Don't create an entity type for a one-off pet — the shared CI account, the single data warehouse, the bastion host.

Because Terrantula's value is making a population legible: counting, placing, capping, tearing down in order. A type with exactly one instance carries all the modeling overhead and none of the payoff. If you only ever have one, it's a pet — leave it in plain Terraform and review it line-by-line. See the cattle mindset for the dividing line.

Choosing the grain of an entity type

The single most consequential modeling decision is how coarse or fine an entity type is.

Do make the entity type the thing you place, count, and govern as a unit. If you ask "how many of these are active?" or "which one should the next thing land on?", that's an entity.

Don't model every Terraform resource as its own entity type. A tenant is one entity; the dozen AWS resources its module creates are not twelve entities.

Because entities are a projection of Terraform state, not a mirror of it. The graph should answer fleet questions, not re-render your terraform state list. Aim for the level at which placement and capacity decisions happen.

A quick grain test

If a field would have the same value across the whole population, it's probably a constant in your TF module, not an entity property. If it varies per instance and you'd query or filter on it (region, plan_tier, customer_id), it's a property. If it describes the link between two entities (a per-tenant namespace), it belongs on a relationship, not the entity.

Properties: typed, required, enumerated

Do declare properties with explicit type, mark the ones every instance must have as required: true, and use enum for closed sets like plan_tier: [basic, premium].

Don't leave a load-bearing field optional "to be safe", and don't model a closed set as a free-text string.

Because required + enum is validation Terrantula enforces at apply and at Action trigger time — a typo'd plan_tier fails loudly instead of silently provisioning the wrong shape. Optional fields that are really required just defer the error to runtime.

Keep the property set lean. A property earns its place if you query, filter, place, or interpolate on it. Descriptive trivia that no Action or metric reads is noise the graph has to carry forever.

States: name the lifecycle deliberately

Do enumerate the real lifecycle an instance moves through, and set initialState to where a fresh instance begins:

states: [pending, provisioning, active, suspended, deprovisioning, failed]
initialState: pending

Don't collapse the lifecycle to active/inactive, and don't invent states no Action ever transitions into.

Because states are what let Terrantula answer "how many tenants are provisioning right now?" and what conditions gate on (only SuspendTenant an active tenant). A lifecycle that's too flat can't express the guards you'll want; states nobody uses are dead weight that confuse the graph view. active and failed are always available implicitly — list the rest. Order them in the sequence instances actually travel.

Derive metrics; never hand-maintain a count

Do define load and capacity metrics as derived, computed from the graph:

metrics:
  - name: tenant-count
    unit: integer
    source:
      type: derived
      derivedFrom: relationship-count
      filter:
        relationshipType: runs_on
        states: [active]

Don't add a writable tenant_count property that an Action bumps by hand.

Because a derived metric tracks reality automatically as relationships are created and removed — it can't drift out of sync with the fleet. A hand-maintained counter is a pet masquerading as a metric: the first failed Action or out-of-band change desyncs it, and every placement decision after that is wrong. Metrics are the input to placement and constraints, so a stale one is worse than none.

Put the capacity rule in a constraint, not a runbook

Do express hard limits as constraints against a metric, at the layer they belong:

# On the EntityType — per-instance ceiling.
constraints:
  - metric: tenant-count
    max: 50            # no single cluster over 50 tenants

Don't leave "we cap at fifty per cluster" in a Confluence page and trust everyone to check it.

Because deprovisioning rot and capacity-as-tribal-knowledge are the exact failures Terrantula exists to kill. A constraint is enforced before the placement that would breach it — the fifty-first onboard is rejected, not discovered on the bill. A runbook is enforced by hope.

Cells: draw the boundary around the placement decision

A cell groups entities of one type, ranks them for placement, and caps the group. The boundary question is: what set of candidates does "where does the next one go?" choose among?

Do draw a cell around a homogeneous, interchangeable pool — prod-clusters-us-east, prod-clusters-eu-west — where any member is a valid home for the next tenant.

Don't put dev and prod clusters in one cell, or mix us-east and eu-west if a tenant's region_preference means it can only land in one.

Because the cell is the candidate set for placement. If members aren't interchangeable, least-loaded will happily place a US tenant on an EU cluster. Split the cell along the axis that constrains placement — usually region, environment, or tier — so every member really is a valid target.

Match the placement policy to the goal

PolicyUse when
least-loaded (default)You want to fill clusters evenly toward capacity — the usual cattle default.
round-robinYou want even distribution regardless of current load.
randomPlacement is genuinely arbitrary and you want no hot spot.

Do default to least-loaded and only change it for a specific reason.

Because least-loaded packs the fleet sensibly and surfaces capacity pressure where it actually is. The others are for narrower needs; reach for them deliberately.

Two layers of limit, two layers of meaning

Do cap the individual on the entity type and the fleet on the cell:

# On the Cell — fleet-wide ceiling across all members.
constraints:
  - metric: tenant-count
    aggregate: sum
    max: 500           # ≤ 500 tenants summed across the whole cell

Don't rely on one layer to do both jobs.

Because the per-instance constraint (max: 50 per cluster) and the cell aggregate (sum max: 500 across the fleet) answer different questions — "is this cluster full?" versus "is the fleet full?". You usually want both: a fleet of clusters each well under its own ceiling can still hit a global budget, and vice versa.

Start with explicit membership

Do add members to a cell explicitly (you choose which clusters belong) until you have a clear, stable property to derive on.

Don't reach for derived membership before you need it.

Because explicit membership is predictable — you can see exactly what's in the cell — and it's how most fleets stay. Derived membership (computed from a property value) is powerful when cluster count grows or churns, but it's an optimization to adopt once the rule is obvious, not a starting point.

Common mistakes (the anti-patterns)

  • One mega entity type with a kind enum. If kind: [tenant, cluster, database] lives inside one type, you've smeared three populations together. Split into three types — they have different lifecycles, metrics, and Actions.
  • Properties that duplicate relationship facts. A cluster_id string property on Tenant is a relationship pretending to be a property; you lose cardinality enforcement and the cascade. Model it as a runs_on relationship.
  • A constraint with no metric to enforce. Constraints reference a metric by name; define the metric (usually derived) first, or the limit enforces nothing.
  • A cell spanning environments. The fastest way to place prod cattle on a dev cluster. One cell per interchangeable pool.

Next: Relationship & cascade design → — connect the entities so teardown happens in the right order.