Troubleshoot monitoring



Monitoring services

The monitoring system has the following services:

  1. cassandra

  2. kairosdb

  3. abiquo-emmett

  4. abiquo-delorean




Restarting the monitoring system

To restart the monitoring system, you should reboot the monitoring server or cluster.

After a reboot, the Cassandra server may take some time to start up. If KairosDB starts up before Cassandra is running, then it will not start properly.

  1. Check that all services are up with the following command

    ps aux | grep service_name

     

  2. Manually start any services that are not running, such as KairosDB




Troubleshoot 500 monitoring 30 error

If Abiquo cannot connect to the monitoring server, then it will usually trigger the following error.

500. MONITORING-30 - Something has gone wrong on the watchtower server

 

To resolve this error, do the following steps in this order.

  1. Check the monitoring queue in RabbitMQ (on the API server or Datanode server)

    # rabbitmqctl list_queues messages consumers name Listing queues 0 0 abiquo.vmactionplan.execution 0 1 abiquo.scheduler.slow.requests 0 1 abiquo.scheduler.fast.requests 0 1 abiquo.vsm.eventsynk 0 1 abiquo.actionplan.execution 0 1 abiquo.nars.requests.mothership7 0 1 abiquo.nodecollector.notifications 0 0 abiquo.vappspec.parking-expect-no-consumers 0 1 watchtower.alarm.notificacion 0 1 abiquo.bpm.notifications 0 1 abiquo.tracer.traces.tenantevents.mothership7 0 1 abiquo.nars.requests.mothership-pcr 0 1 watchtower.events.event 0 1 abiquo.virtualfactory.notifications 0 1 abiquo.datacenter.requests.mothership7.virtualfactory 0 1 abiquo.api.synchrs.requests 0 1 abiquo.tracer.traces.userevents.mothership7 0 0 abiquo.pcrsync.parking-expect-no-consumers 0 1 abiquo.nars.responses 0 0 abiquo.vmactionplan.schedule 0 1 abiquo.scheduler.requests 0 1 abiquo.am.notifications 0 0 abiquo.datacenter.requests.mothership7-pcr.virtualfactory 0 1 abiquo.vappspec.messages 0 1 abiquo.pcrsync.messages 0 1 abiquo.tracer.traces.allevents.mothership7 0 1 abiquo.ha.tasks 0 1 abiquo.datacenter.requests.mothership7.bpm 204 1 watchtower.alarm.evaluation 0 1 abiquo.actionplan.schedule 0 1 abiquo.datacenter.requests.mothership-pcr.virtualfactory 0 1 abiquo.virtualmachines.definitionsyncs 0 1 abiquo.tracer.traces.eventpersister.mothership7

    If there are watchtower.alarm.evaluation events in the queue, check the watchtower host.

  2. Check that the location of the watchtower host is correctly configured in the abiquo.properties file

  3. Check for storage space on the watchtower.host

  4. Remove files older than 30 days.

     

  5. Check if the services are listening.

     

  6. Reboot the server because a service restart does not make the Cassandra service start correctly. After the monitoring server reboot, make sure all services are up and running.
    If KairosDB starts up too quickly after Cassandra, it will fail. Then Abiquo will throw exceptions from Emmett about connection refused to localhost. To resolve this issue, on the monitoring server start KairosDB manually.

Monitoring services logs location

When troubleshooting any issues, it is also worth checking the log files:
Kairosdb:
/opt/kairosdb/log/kairosdb.log

Cassandra:
/var/log/cassandra/cassandra.log
/var/log/cassandra/debug.log
/var/log/cassandra/system.log

Abiquo-Emmet:
/var/log/emmett.log
/var/log/emmett-metrics.log

Abiquo-Delorean:
/var/log/delorean.log
/var/log/delorean-metrics.log

 

Copyright © 2006-2024, Abiquo Holdings SL. All rights reserved