cloudsoft.io

Using and Creating Policies

Policies are highly reusable as their inputs, thresholds and targets are customizable. Config key details for each policy can be found in the Catalog in the AMP UI.

HA/DR and Scaling Policies

AutoScaler Policy

  • org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy

Increases or decreases the size of a Resizable entity based on an aggregate sensor value, the current size of the entity, and customized high/low watermarks.

An AutoScaler policy can take any sensor as a metric, have its watermarks tuned live, and target any resizable entity - be it an application server managing how many instances it handles, or a tier managing global capacity.

e.g. if the average request per second across a cluster of Tomcat servers goes over the high watermark, it will resize the cluster to bring the average back to within the watermarks.

brooklyn.policies:
- type: org.apache.brooklyn.policy.autoscaling.AutoScalerPolicy
  brooklyn.config:
    metric: webapp.reqs.perSec.perNode
    metricUpperBound: 3
    metricLowerBound: 1
    resizeUpStabilizationDelay: 2s
    resizeDownStabilizationDelay: 1m
    maxPoolSize: 3

ServiceRestarter Policy

  • org.apache.brooklyn.policy.ha.ServiceRestarter

Attaches to a SoftwareProcess or to anything Startable which emits ha.entityFailed on failure (or other configurable sensor), and invokes restart on that failure. If there is a subsequent failure within a configurable time interval or if the restart fails, this gives up and emits ha.entityFailed.restart for other policies to act upon or for manual intervention.

brooklyn.policies:
- type: org.apache.brooklyn.policy.ha.ServiceRestarter
  brooklyn.config:
    failOnRecurringFailuresInThisDuration: 5m

Typically this is used in conjunction with the ServiceFailureDetector enricher to emit the trigger sensor. The introduction to policies shows a worked example of these working together.

ServiceReplacer Policy

  • org.apache.brooklyn.policy.ha.ServiceReplacer

The ServiceReplacer attaches to a DynamicCluster and replaces a failed member in response to ha.entityFailed (or other configurable sensor) as typically emitted by the ServiceFailureDetector enricher.
The introduction to policies shows a worked example of this policy in use.

ServiceFailureDetector Enricher

  • org.apache.brooklyn.policy.ha.ServiceFailureDetector

The ServiceFailureDetector enricher detects problems and fires an ha.entityFailed (or other configurable sensor) for use by ServiceRestarter and ServiceReplacer. The introduction to policies shows a worked example of this in use.

SshMachineFailureDetector Policy

  • org.apache.brooklyn.policy.ha.SshMachineFailureDetector

The SshMachineFailureDetector is an HA policy for monitoring an SshMachine, emitting an event if the connection is lost/restored.

ConnectionFailureDetector Policy

  • org.apache.brooklyn.policy.ha.ConnectionFailureDetector

The ConnectionFailureDetector is an HA policy for monitoring an HTTP connection, emitting an event if the connection is lost/restored.

Primary Election / Failover Policies

There are a collection of policies, enrichers, and effectors to assist with common failover scenarios and more generally anything which requires the election and re-election of a primary member.

These can be used for:

  • Nominating one child among many to be noted as a primary via a sensor (simply add the ElectPrimaryPolicy)
  • Allowing preferences for such children to be specified (via ha.primary.weight)
  • Causing the primary to change if the current primary goes down or away
  • Causing promote and demote effectors to be invoked on the appropriate nodes when the primary is elected/changed (with the parent reset to STARTING while this occurs)
  • Mirroring sensors and optionally effectors from the primary to the parent

A simple example is as follows, deploying two item entities with one designated as primary and its main.uri sensor published at the root. If “Preferred Item” fails, “Failover Item” will be become the primary. Any demote effector on “Preferred Item” and any promote effector on “Failover Item” will be invoked on failover.

brooklyn.policies:
- type: org.apache.brooklyn.policy.failover.ElectPrimaryPolicy
  brooklyn.config:
    # `best` will cause failback to occur automatically when possible; could use `failover` instead
    primary.selection.mode: best
    propagate.primary.sensors: [ main.uri ]

brooklyn.enrichers:
- # this enricher will cause the parent to report as failed if there is no primary
  type: org.apache.brooklyn.policy.failover.PrimaryRunningEnricher

services:
- type: item
  name: Preferred Item
  brooklyn.config:
    ha.primary.weight: 1
- type: item
  name: Failover Item

ElectPrimary Policy

  • org.apache.brooklyn.policy.failover.ElectPrimaryPolicy

The ElectPrimaryPolicy acts to keep exactly one of its children or members as primary, promoting and demoting them when required.

A simple use case is where we have two children, call them North and South, and we wish for North to be primary. If North fails, however, we want to promote and fail over to South. This can be done by:

  • adding this policy at the parent
  • setting ha.primary.weight on North
  • optionally defining promote on North and South (if action is required there to promote it)
  • observing the primary sensor to see which is primary
  • optionally setting propagate.primary.sensor: main.uri to publish main.uri from whichever of North or South is active
  • optionally setting primary.selection.mode: best to switch back to North if it comes back online

The policy works by listening for service-up changes in the target pool (children or members) and listening for ha.primary.weight sensor values from those elements. On any change, it invokes an effector to perform the primary election. By default, the effector invoked is electPrimary, but this can be changed with the primary.election.effector config key. If this effector does not exist, the policy will add a default behaviour using ElectPrimaryEffector. Details of the election are described in that effector, but to summarize, it will find an appropriate primary from the target pool and publish a sensor indicating who the new primary is. Optionally it can invoke promote and demote on the relevant entities.

All the primary.* parameters accepted by that effector can be defined on the policy and will be passed to the effector, along with an event parameter indicating the sensor which triggered the election.

The policy also accepts a propagate.primary.sensors list of strings or sensors. If present, this will add the PropagatePrimaryEnricher enricher with those sensors set to be propagated.

If no quorum.up or quorum.running is set on the entity, both will be set to a constant 1.

ElectPrimary Effector

  • org.apache.brooklyn.policy.failover.ElectPrimaryEffector

This effector will scan candidates among children or members to determine which should be noted as “primary”.
The primary is selected from service-up candidates based on a numeric weight as a sensor or config on the candidates (ha.primary.weight, unless overridden), with higher weights being preferred and negative indicating not permitted.
In the case of ties, or a new candidate emerging with a weight higher than a current healthy primary, behaviour can be configured with primary.selection.mode.

If there is a primary and it is unchanged, the effector will end.

If a new primary is detected, the effector will:

  • set the local entity to the STARTING state

  • clear any “primary-election” problem

  • publish the new primary in a sensor called primary (or the sensor set in primary.sensor.name)

  • set service up true

  • cancel any other ongoing promote calls, and if there is an ongoing demote call on the entity being promoted, cancel that also

  • in parallel

    • invoke promote (or the effector called primary.promote.effector.name) on the local entity or the entity being promoted

    • invoke demote (or the effector called primary.promote.effector.name) on the local entity or the entity being demoted, if an entity is being demoted

  • set the local entity to the RUNNING state

If no primary can be found, the effector will:

  • add a “primary-election” problem so that service state logic, if applicable, will know that the entity is unhealthy

  • demote any old primary

  • set service up false

  • if the local entity is expected to be RUNNING, it will set actual state to ON_FIRE

  • if the local entity has no expectation, it will set actual state to STOPPED

More details of behaviour in edge conditions can be seen and set via the parameters on this effector.

  • primary.target.mode: where should the policy look for primary candidates; one of ‘children’, ‘members’, or ‘auto’ (members if it has members and no children)

  • primary.selection.mode: under what circumstances should the primary change: failover to change only if an existing primary is unhealthy, best to change so one with the highest weight is always selected, or strict to act as best but fail if several advertise the highest weight (for use when the weight sensor is updated by the nodes and should tell us unambiguously who was elected)

  • primary.stopped.wait.timeout: if the highest-ranking primary is stopped (but not failed), the effector will wait this long for it to be starting before picking a less highly-weighted primary; default 3s, typically long enough to avoid races where multiple children are started concurrently but they complete extremely quickly and one completes before a better one starts

  • primary.starting.wait.timeout: if the highest-ranking primary is starting, the effector will wait this long for it to be running before picking a less highly-weighted primary (or in the case of strict before failing if there are ties); default 5m, typically long enough to avoid races where multiple children are started and a sub-optimal one comes online before the best one

  • primary.sensor.name: name to publish, defaulting to primary

  • primary.weight.name: config key or sensor to scan from candidate nodes to determine who should be primary

  • primary.promote.effector.name: effector to invoke on promotion, default promote and with no error if not present (but if set explicitly it will cause an error if not present)

  • primary.demote.effector.name: effector to invoke on demotion, default demote and with no error if not present (but if set explicitly it will cause an error if not present)

PrimaryRunning Enricher

  • org.apache.brooklyn.policy.failover.PrimaryRunningEnricher

This adds service not-up and problems entries if the primary is not running, so that the parent will only be up/healthy if there is a healthy primary.

PropagatePrimary Enricher

  • org.apache.brooklyn.policy.failover.PropagatePrimaryEnricher

This allows selected sensors from the primary to be available at the parent. As the primary changes, the indicated sensors will be updated to reflect the values from the new primaries.

Optimization Policies

PeriodicEffector Policy

  • org.apache.brooklyn.policy.action.PeriodicEffectorPolicy

The PeriodicEffectorPolicy calls an effector with a set of arguments at a specified time and date. The policy monitors the sensor configured by start.sensor and will only start when this is set to true. The default sensor checked is service.isUp, so that the policy will not execute the effector until the entity is started. The following example calls a resize effector to resize a cluster up to 10 members at 8am and then down to 1 member at 6pm.

- type: org.apache.brooklyn.policy.action.PeriodicEffectorPolicy
  brooklyn.config:
    effector: resize
    args:
      desiredSize: 10
    period: 1 day
    time: 08:00:00
- type: org.apache.brooklyn.policy.action.PeriodicEffectorPolicy
  brooklyn.config:
    effector: resize
    args:
      desiredSize: 1
    period: 1 day
    time: 18:00:00

ScheduledEffector Policy

  • org.apache.brooklyn.policy.action.ScheduledEffectorPolicy

The ScheduledEffectorPolicy calls an effector at a specific time. The policy monitors the sensor configured by start.sensor and will only execute the effector at the specified time if this is set to true.

There are two modes of operation, one based solely on policy configuration where the effector will execute at the time set using the time key or after the duration set using the wait key, or by monitoring sensors. The policy monitors the scheduler.invoke.now sensor and will execute the effector immediately when its value changes to true. When the scheduler.invoke.at sensor changes, it will set a time in the future when the effector should be executed.

The following example calls a backup effector every night at midnight.

- type: org.apache.brooklyn.policy.action.ScheduledEffectorPolicy
  brooklyn.config:
    effector: backup
    time: 00:00:00

FollowTheSun Policy

  • org.apache.brooklyn.policy.followthesun.FollowTheSunPolicy

The FollowTheSunPolicy is for moving work around to follow the demand. The work can be any Movable entity. This currently available in yaml blueprints.

LoadBalancing Policy

  • org.apache.brooklyn.policy.loadbalancing.LoadBalancingPolicy

The LoadBalancingPolicy is attached to a pool of “containers”, each of which can host one or more migratable “items”. The policy monitors the workrates of the items and effects migrations in an attempt to ensure that the containers are all sufficiently utilized without any of them being overloaded.

Lifecycle and User Management Policies

StopAfterDuration Policy

  • org.apache.brooklyn.policy.action.StopAfterDurationPolicy

The StopAfterDurationPolicy can be used to limit the lifetime of an entity. After a configure time period expires the entity will be stopped.

ConditionalSuspend Policy

  • org.apache.brooklyn.policy.ha.ConditionalSuspendPolicy

The ConditionalSuspendPolicy will suspend and resume a target policy based on configured suspend and resume sensors.

CreateUser Policy

  • org.apache.brooklyn.policy.jclouds.os.CreateUserPolicy

The CreateUserPolicy Attaches to an Entity and monitors for the addition of a location to that entity, the policy then adds a new user to the VM with a randomly generated password, with the SSH connection details set on the entity as the createuser.vm.user.credentials sensor.

AdvertiseWinRMLogin Policy

  • org.apache.brooklyn.location.winrm.WinRmMachineLocation

This is similar to the CreateUserPolicy. It will monitor the addition of WinRmMachineLocation to an entity and then create a sensor advertising the administrative user’s credentials.

Synchronization Policies

nginx-multi-upstream-sync Policy

nginx-multi-upstream-sync policy is designed to be used in combination with nginx-multi type that allows to achieve blue-green deployment. Here is an example:

  1. Deploy cluster of apps version 1 with the following blueprint:

     name: My Application
     services:
       - type: nginx-multi:1.1.0-SNAPSHOT
         id: my-nginx-multi
       - type: org.apache.brooklyn.entity.webapp.DynamicWebAppCluster
         id: my-app-cluster-v1
         name: App Cluster v1
         brooklyn.config:
           latch.launch: $brooklyn:component("my-nginx-multi").attributeWhenReady("service.isUp")
           dynamiccluster.quarantineFailedEntities: false
           dynamiccluster.memberspec:
             '$brooklyn:entitySpec':
               type: org.apache.brooklyn.entity.webapp.tomcat.TomcatServer
               brooklyn.config:
                 wars.root: https://repo1.maven.org/maven2/org/apache/brooklyn/example/brooklyn-example-hello-world-sql-webapp/0.8.0-incubating/brooklyn-example-hello-world-sql-webapp-0.8.0-incubating.war
         brooklyn.policies:
           - type: nginx-multi-upstream-sync
             brooklyn.config:
               group: $brooklyn:component("my-app-cluster-v1")
               sensorsToTrack:
                 - service.isUp
               nginxNode: $brooklyn:component("my-nginx-multi")
               groupName: v1
    
  2. Navigate to NGINX effectors and render routing for v1.myapp.com pointing to v1 app group. v1.myapp.com is an endpoint to test cluster of apps version 1.
    • logicalNme v1
    • hostName v1.myapp.com
    • groupName v1
  3. Find IP address of the NGINX in sensors and map it to v1.myapp.com in /etc/hosts and load v1.myapp.com in the browser on the machine where /etc/hosts is modified to test version 1, e.g. 11.22.33.44 v1.myapp.com.
  4. Render routing for myapp.com pointing to v1 app group. myapp.com in this example is a production endpoint.
    • logicalNme production
    • hostName myapp.com
    • groupName v1
  5. Map IP address of the NGINX to myapp.com in /etc/hosts and load myapp.com in the browser on the machine where /etc/hosts is modified to verify that production endpoint loads version 1.
  6. Deploy cluster of apps version 2 alongside version 1 in the same application:

     services:
       - type: org.apache.brooklyn.entity.webapp.DynamicWebAppCluster
         id: my-app-cluster-v2
         name: App Cluster v2
         brooklyn.config:
           latch.launch: $brooklyn:component("my-nginx-multi").attributeWhenReady("service.isUp")
           dynamiccluster.quarantineFailedEntities: false
           dynamiccluster.memberspec:
             '$brooklyn:entitySpec':
               type: org.apache.brooklyn.entity.webapp.tomcat.TomcatServer
               brooklyn.config:
                 wars.root: https://repo1.maven.org/maven2/org/apache/brooklyn/example/brooklyn-example-hello-world-sql-webapp/1.0.0/brooklyn-example-hello-world-sql-webapp-1.0.0.war
         brooklyn.policies:
           - type: nginx-multi-upstream-sync
             brooklyn.config:
               group: $brooklyn:component("my-app-cluster-v2")
               sensorsToTrack:
                 - service.isUp
               nginxNode: $brooklyn:component("my-nginx-multi")
               groupName: v2
    

    Hint: add as a child to a deployed application.

  7. Render routing for v2.myapp.com pointing to v2 app group. v2.myapp.com is an endpoint to test cluster of apps version 2.
    • logicalNme v2
    • hostName v2.myapp.com
    • groupName v2
  8. Map IP address of the NGINX to v2.myapp.com in /etc/hosts and load v2.myapp.com in the browser on the machine where /etc/hosts is modified to test version 2.
  9. Render routing for myapp.com pointing to v2 app group to switch the app version at production endpoint.
    • logicalNme production
    • hostName myapp.com
    • groupName v2
  10. Refresh myapp.com in the browser on the machine where /etc/hosts is modified to verify that production endpoint loads version 2.
  11. Try to resize cluster of app version 2 and see routing configuration updated in activities of nginx-multi node.

Note, /etc/hosts is used to simplify demonstration of the policy in a blue-green deployment.

Contention Management policy

The framework for Contention Management policies is described here.

Dashboard, Compliance Policies, and Aggregation

Cloudsoft AMP includes a number of compliance policies and techniques for gathering compliance data in real-time, and aggregating this across entities within an application, and presenting this with summary views and drill-down in the Dashboard.

These topics are documented here.

Writing a Policy

Your First Policy

Policies perform the active management enabled by AMP. Each policy instance is associated with an entity, and at runtime it will typically subscribe to sensors on that entity or children, performing some computation and optionally actions when a subscribed sensor event occurs. This action might be invoking an effector or emitting a new sensor, depending the desired behavior is.

Writing a policy is straightforward. Simply extend AbstractPolicy, overriding the setEntity method to supply any subscriptions desired:

    @Override
    public void setEntity(EntityLocal entity) {
        super.setEntity(entity)
        subscribe(entity, TARGET_SENSOR, this)
    }

and supply the computation and/or activity desired whenever that event occurs:

    @Override
    public void onEvent(SensorEvent<Integer> event) {
        int val = event.getValue()
        if (val % 2 == 1)
            entity.sayYoureOdd();
    }

You’ll want to do more complicated things, no doubt, like access other entities, perform multiple subscriptions, and emit other sensors – and you can. See the best practices below and source code for some commonly used policies and enrichers, such as AutoScalerPolicy and RollingMeanEnricher.

One rule of thumb, to close on: try to keep policies simple, and compose them together at runtime; for instance, if a complex computation triggers an action, define one enricher policy to aggregate other sensors and emit a new sensor, then write a second policy to perform that action.

Best Practice

The following recommendations should be considered when designing policies:

Management should take place as “low” as possible in the hierarchy

  • place management responsibility in policies at the entity, as much as possible ideally management should take run as a policy on the relevant entity

  • place escalated management responsibility at the parent entity. Where this is impractical, perhaps because two aspects of an entity are best handled in two different places, ensure that the separation of responsibilities is documented and there is a group membership relationship between secondary/aspect managers.

Policies should be small and composable

e.g. one policy which takes a sensor and emits a different, enriched sensor, and a second policy which responds to the enriched sensor of the first (e.g. a policy detects a process is maxed out and emits a TOO_HOT sensor; a second policy responds to this by scaling up the VM where it is running, requesting more CPU)

Where a policy cannot resolve a situation at an entity, the issue should be escalated to a manager with a compatible policy.

Typically escalation will go to the entity parent, and then cascade up. e.g. if the earlier VM CPU cannot be increased, the TOO_HOT event may go to the parent, a cluster entity, which attempts to balance. If the cluster cannot balance, then to another policy which attempts to scale out the cluster, and should the cluster be unable to scale, to a third policy which emits TOO_HOT for the cluster.

Management escalation should be carefully designed so that policies are not incompatible

Order policies carefully, and mark sensors as “handled” (or potentially “swallow” them locally), so that subsequent policies and parent entities do not take superfluous (or contradictory) corrective action.

Implementation Classes

Extend AbstractPolicy, or override an existing policy.