cloudsoft.io

Logging

AMP uses the SLF4J logging facade, which allows use of many popular frameworks including logback, java.util.logging and log4j.

The convention for log levels is as follows:

  • ERROR and above: exceptional situations which indicate that something has unexpectedly failed or some other problem has occured which the user is expected to attend to
  • WARN: exceptional situations which the user may which to know about but which do not necessarily indicate failure or require a response
  • INFO: a synopsis of activity, but which should not generate large volumes of events nor overwhelm a human observer
  • DEBUG and lower: detail of activity which is not normally of interest, but which might merit closer inspection under certain circumstances.

Loggers follow the package.ClassName naming standard.

The default logging is to write INFO and above to the console. AMP also writes INFO+ messages to amp.info.log, and DEBUG+ to amp.debug.log. Each is a rolling log file, where the past 10 files will be kept.

Configuration

A logback.xml file is included in the conf/ directly of the AMP distro; this is read by brooklyn at launch time. Changes to the logging configuration, such as new appenders or different log levels, can be made directly in this file or in a new file included from this.

The default logback.xml file references a collection of other log configuration files included in the AMP jars. It is necessary to understand the source structure in the logback-includes project.

For example, to change the debug log inclusions, create a folder brooklyn under conf and create a file logback-debug.xml based on the brooklyn/logback-debug.xml from that project.

Logback is highly configurable. For example, the syslog appender can be used. This provides a simple way to integrate with tools such as logstash.

Log File Backup

This sub-section is a work in progress; feedback from the community is extremely welcome.

The default rolling log files can be backed up periodically, e.g. using a CRON job.

Note however that the rolling log file naming scheme will rename the historic zipped log files such that brooklyn.debug-1.log.zip is the most recent zipped log file. When the current brooklyn.debug.log is to be zipped, the previous zip file will be renamed brooklyn.debug-2.log.zip. This renaming of files can make RSYNC or backups tricky.

An option is to covert/move the file to a name that includes the last-modified timestamp. For example (on mac):

LOG_FILE=brooklyn.debug-1.log.zip
TIMESTAMP=`stat -f '%Um' $LOG_FILE`
mv $LOG_FILE /path/to/archive/brooklyn.debug-$TIMESTAMP.log.zip

Logging aggregators

Integration with systems like Logstash and Splunk is possible using standard logback configuration. Logback can be configured to write to the syslog, which can then feed its logs to Logstash.

AMP usage and audit trails

There are several different areas of requirements:

  • Retrieving usage information programmatically - i.e. the applications, and the resources used by each.
  • Configuring AMP logging (through Logback), and log shipping (e.g. with Logstash)
  • Audit trails, for post-hoc offline analysis
  • Integration with monitoring dashboards (e.g. Nagios)

Usage information

The AMP REST api provides access to usage information. It lists the applications (including those now terminated), showing the start/end time and state transitions for each. It also lists the machines used (including those now terminated), linking each machine back to an application id.

Documentation for the REST api can be found within AMP itself (in the web-console, go to the Script -> REST API tab, and then browse the API). The annotations on the Java interfaces also make it easy to browse the API: UsageApi.java.

An example of retrieving all applications is shown below. For each application, it shows start/end time for each phase (e.g. when it was starting, when it was running, and when it stopped).

curl http://localhost:8081/v1/usage/applications
[
  {
    "statistics": [
      {
        "status": "STARTING",
        "id": "htStRkN7",
        "applicationId": "htStRkN7",
        "start": "2014-10-09T11:00:13+0100",
        "end": "2014-10-09T11:00:15+0100",
        "duration": 2313,
        "metadata": {}
      },
      {
        "status": "RUNNING",
        "id": "htStRkN7",
        "applicationId": "htStRkN7",
        "start": "2014-10-09T11:00:15+0100",
        "end": "2014-10-09T11:00:22+0100",
        "duration": 6495,
        "metadata": {}
      }
    ],
    "links": {}
  },
  {
    "statistics": [
      {
        "status": "STARTING",
        "id": "Z3TTK4sM",
        "applicationId": "Z3TTK4sM",
        "start": "2014-10-09T10:59:55+0100",
        "end": "2014-10-09T10:59:55+0100",
        "duration": 33,
        "metadata": {}
      },
      {
        "status": "UNKNOWN",
        "id": "Z3TTK4sM",
        "applicationId": "Z3TTK4sM",
        "start": "2014-10-09T10:59:55+0100",
        "end": "2014-10-09T11:00:22+0100",
        "duration": 26634,
        "metadata": {}
      }
    ],
    "links": {}
  }
]

An example of retrieving all machines is shown below, with each machine giving the associated application id:

curl http://localhost:8081/v1/usage/machines
[
  {
    "statistics": [
      {
        "status": "ACCEPTED",
        "id": "yqqA9Moy",
        "applicationId": "rhsLVvJs",
        "start": "2014-10-09T11:11:13+0100",
        "end": "2014-10-09T11:13:45+0100",
        "duration": 151750,
        "metadata": {
          "id": "yqqA9Moy",
          "displayName": "159.8.33.134",
          "provider": "softlayer",
          "account": "cloudsoft",
          "serverId": "6488686",
          "imageId": "4343926",
          "instanceTypeName": "br-ked-aled-rhsl-j8mq-fe",
          "instanceTypeId": "6488686",
          "ram": "1024",
          "cpus": "1",
          "osName": "ubuntu",
          "osArch": "x86_64",
          "64bit": "true"
        }
      }
    ],
    "links": {}
  }
]

The start/end time to be retrieved can also be constrained. It is also possible to retrieve information about a specific machine or specific location. An exmaple is below:

curl http://localhost:8081/v1/usage/applications/rhsLVvJs?start=2014-10-09T11:07:18+0100&end=2014-10-09T11:11:14+0100
{
  "statistics": [
    {
      "status": "STARTING",
      "id": "rhsLVvJs",
      "applicationId": "rhsLVvJs",
      "start": "2014-10-09T11:07:18+0100",
      "end": "2014-10-09T11:11:14+0100",
      "duration": 236354,
      "metadata": {},
    {
      "status": "RUNNING",
      "id": "rhsLVvJs",
      "applicationId": "rhsLVvJs",
      "start": "2014-10-09T11:11:14+0100",
      "end": "2014-10-09T11:16:13+0100",
      "duration": 298443,
      "metadata": {}
    }
  ],
  "links": {}
}

Audit trails

Audit trails are essential for determining why an event happened, and who was responsible.

Persisting the AMP logs is one approach to storing the audit trail. Subsequent offline analysis can then be performed. For example, logstash (via syslog) could be used to collect all logs.

TODO: review required of log messages for REST api, to ensure the authenticated user is logged appropriately along with the operation performed

Integration with monitoring dashboards

AMP provides a web-console for monitoring the applications, and drilling into the current state and actions being performed. However, this is more a debug console. Most enterprises are keen to use their existing pane-of-glass for their operations staff. The AMP REST api provides access to all information shown in the web-console.

Integration with the monitoring dashboard could involve the dashboard making REST api calls into AMP to retrieve the required information.

Alterantively, AMP could push events to a given endpoint. This requires wiring up a listener for the desired events, and potentially bespoke code for pushing to the given endpoint (depending on the technology used).