cloudsoft.io

Logging

AMP uses the SLF4J logging facade, which allows use of many popular frameworks including logback, java.util.logging and log4j.

The convention for log levels is as follows:

  • ERROR and above: exceptional situations which indicate that something has unexpectedly failed or some other problem has occurred which the user is expected to attend to
  • WARN: exceptional situations which the user may which to know about but which do not necessarily indicate failure or require a response
  • INFO: a synopsis of activity, but which should not generate large volumes of events nor overwhelm a human observer
  • DEBUG and lower: detail of activity which is not normally of interest, but which might merit closer inspection under certain circumstances.

Loggers follow the package.ClassName naming standard.

The default logging is to write INFO+ messages to amp.info.log, and DEBUG+ to amp.debug.log. Each is a rolling log file, where the past 10 files will be kept. INFO level, and above, messages will be logged to the karaf console. Use the log: commands in the karaf client, e.g. log:tail, to read these messages.

Configuration

A org.ops4j.pax.logging.cfg file is included in the etc/ directly of the AMP distro; this is read by AMP at launch time. Changes to the logging configuration, such as new appenders or different log levels, can be made directly in this file.

Karaf logging is highly configurable. For example enable the sift appender to log to separate log files for each bundle as described here: Advanced configuration

A full explanation of logging in karaf is available here.

Karaf Log commands

Logging commands are available through the karaf console. These let you interact with the logs and dynamically change logging configuration in a running application.

Some useful log: commands are:

log:display mylogger -p ā€œ%d - %c - %m%nā€ - Show the log entries for a specific logger with a different pattern.

log:get/set - Show / set the currently configured log levels

log:tail - As display but will show continuously

log:exception-display - Display the last exception

Log File Backup

This sub-section is a work in progress; feedback from the community is extremely welcome.

The default rolling log files can be backed up periodically, e.g. using a CRON job.

Note however that the rolling log file naming scheme will rename the historic zipped log files such that brooklyn.debug-1.log.zip is the most recent zipped log file. When the current brooklyn.debug.log is to be zipped, the previous zip file will be renamed brooklyn.debug-2.log.zip. This renaming of files can make RSYNC or backups tricky.

An option is to covert/move the file to a name that includes the last-modified timestamp. For example (on mac):

LOG_FILE=brooklyn.debug-1.log.zip
TIMESTAMP=`stat -f '%Um' $LOG_FILE`
mv $LOG_FILE /path/to/archive/brooklyn.debug-$TIMESTAMP.log.zip

Logging aggregators

Integration with systems like Logstash and Splunk is possible using standard log4j configuration. Log4j can be configured to write to syslog using the SyslogAppender which can then feed its logs to Logstash.

AMP usage and audit trails

There are several different areas of requirements:

  • Retrieving usage information programmatically - i.e. the applications, and the resources used by each.
  • Configuring AMP logging (through log4j), and log shipping (e.g. with Logstash)
  • Audit trails, for post-hoc offline analysis
  • Integration with monitoring dashboards (e.g. Nagios)

Usage information

The AMP REST API provides access to usage information. It lists the applications (including those now terminated), showing the start/end time and state transitions for each. It also lists the machines used (including those now terminated), linking each machine back to an application id.

Documentation for the REST API can be found within AMP itself (in the web-console, go to the Script -> REST API tab, and then browse the API). The annotations on the Java interfaces also make it easy to browse the API: UsageApi.java.

An example of retrieving all applications is shown below. For each application, it shows start/end time for each phase (e.g. when it was starting, when it was running, and when it stopped).

curl http://localhost:8081/v1/usage/applications
[
  {
    "statistics": [
      {
        "status": "STARTING",
        "id": "htStRkN7",
        "applicationId": "htStRkN7",
        "start": "2014-10-09T11:00:13+0100",
        "end": "2014-10-09T11:00:15+0100",
        "duration": 2313,
        "metadata": {}
      },
      {
        "status": "RUNNING",
        "id": "htStRkN7",
        "applicationId": "htStRkN7",
        "start": "2014-10-09T11:00:15+0100",
        "end": "2014-10-09T11:00:22+0100",
        "duration": 6495,
        "metadata": {}
      }
    ],
    "links": {}
  },
  {
    "statistics": [
      {
        "status": "STARTING",
        "id": "Z3TTK4sM",
        "applicationId": "Z3TTK4sM",
        "start": "2014-10-09T10:59:55+0100",
        "end": "2014-10-09T10:59:55+0100",
        "duration": 33,
        "metadata": {}
      },
      {
        "status": "UNKNOWN",
        "id": "Z3TTK4sM",
        "applicationId": "Z3TTK4sM",
        "start": "2014-10-09T10:59:55+0100",
        "end": "2014-10-09T11:00:22+0100",
        "duration": 26634,
        "metadata": {}
      }
    ],
    "links": {}
  }
]

An example of retrieving all machines is shown below, with each machine giving the associated application id:

curl http://localhost:8081/v1/usage/machines
[
  {
    "statistics": [
      {
        "status": "ACCEPTED",
        "id": "yqqA9Moy",
        "applicationId": "rhsLVvJs",
        "start": "2014-10-09T11:11:13+0100",
        "end": "2014-10-09T11:13:45+0100",
        "duration": 151750,
        "metadata": {
          "id": "yqqA9Moy",
          "displayName": "159.8.33.134",
          "provider": "softlayer",
          "account": "cloudsoft",
          "serverId": "6488686",
          "imageId": "4343926",
          "instanceTypeName": "br-ked-aled-rhsl-j8mq-fe",
          "instanceTypeId": "6488686",
          "ram": "1024",
          "cpus": "1",
          "osName": "ubuntu",
          "osArch": "x86_64",
          "64bit": "true"
        }
      }
    ],
    "links": {}
  }
]

The start/end time to be retrieved can also be constrained. It is also possible to retrieve information about a specific machine or specific location. An exmaple is below:

curl http://localhost:8081/v1/usage/applications/rhsLVvJs?start=2014-10-09T11:07:18+0100&end=2014-10-09T11:11:14+0100
{
  "statistics": [
    {
      "status": "STARTING",
      "id": "rhsLVvJs",
      "applicationId": "rhsLVvJs",
      "start": "2014-10-09T11:07:18+0100",
      "end": "2014-10-09T11:11:14+0100",
      "duration": 236354,
      "metadata": {},
    {
      "status": "RUNNING",
      "id": "rhsLVvJs",
      "applicationId": "rhsLVvJs",
      "start": "2014-10-09T11:11:14+0100",
      "end": "2014-10-09T11:16:13+0100",
      "duration": 298443,
      "metadata": {}
    }
  ],
  "links": {}
}

Audit trails

Audit trails are essential for determining why an event happened, and who was responsible.

Persisting the AMP logs is one approach to storing the audit trail. Subsequent offline analysis can then be performed. For example, logstash (via syslog) could be used to collect all logs.

TODO: review required of log messages for REST api, to ensure the authenticated user is logged appropriately along with the operation performed

Integration with monitoring dashboards

AMP provides a web-console for monitoring the applications, and drilling into the current state and actions being performed. However, this is more a debug console. Most enterprises are keen to use their existing pane-of-glass for their operations staff. The AMP REST api provides access to all information shown in the web-console.

Integration with the monitoring dashboard could involve the dashboard making REST API calls into AMP to retrieve the required information.

Alternatively, AMP could push events to a given endpoint. This requires wiring up a listener for the desired events, and potentially bespoke code for pushing to the given endpoint (depending on the technology used).