Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents


Monitoring services

The monitoring system has the following services:

  1. cassandra

  2. kairosdb

  3. abiquo-emmett

  4. abiquo-delorean


...

Restarting the monitoring system

To restart the monitoring system, you should reboot the monitoring server or cluster.

...

  1. Check that all services are up with the following command

    Code Block
    ps aux | grep service_name

  2. Manually start any services that are not running, such as KairosDB


...

Troubleshoot 500

...

monitoring 30 error

If Abiquo cannot connect to the monitoring server, then it will usually trigger the following error.

...

  1. Check the monitoring queue in RabbitMQ (on the API server or Datanode server)

    Code Block
    # rabbitmqctl list_queues messages consumers name
    
    Listing queues
    0	0	abiquo.vmactionplan.execution
    0	1	abiquo.scheduler.slow.requests
    0	1	abiquo.scheduler.fast.requests
    0	1	abiquo.vsm.eventsynk
    0	1	abiquo.actionplan.execution
    0	1	abiquo.nars.requests.mothership7
    0	1	abiquo.nodecollector.notifications
    0	0	abiquo.vappspec.parking-expect-no-consumers
    0	1	watchtower.alarm.notificacion
    0	1	abiquo.bpm.notifications
    0	1	abiquo.tracer.traces.tenantevents.mothership7
    0	1	abiquo.nars.requests.mothership-pcr
    0	1	watchtower.events.event
    0	1	abiquo.virtualfactory.notifications
    0	1	abiquo.datacenter.requests.mothership7.virtualfactory
    0	1	abiquo.api.synchrs.requests
    0	1	abiquo.tracer.traces.userevents.mothership7
    0	0	abiquo.pcrsync.parking-expect-no-consumers
    0	1	abiquo.nars.responses
    0	0	abiquo.vmactionplan.schedule
    0	1	abiquo.scheduler.requests
    0	1	abiquo.am.notifications
    0	0	abiquo.datacenter.requests.mothership7-pcr.virtualfactory
    0	1	abiquo.vappspec.messages
    0	1	abiquo.pcrsync.messages
    0	1	abiquo.tracer.traces.allevents.mothership7
    0	1	abiquo.ha.tasks
    0	1	abiquo.datacenter.requests.mothership7.bpm
    204	1	watchtower.alarm.evaluation
    0	1	abiquo.actionplan.schedule
    0	1	abiquo.datacenter.requests.mothership-pcr.virtualfactory
    0	1	abiquo.virtualmachines.definitionsyncs
    0	1	abiquo.tracer.traces.eventpersister.mothership7
    

    If there are watchtower.alarm.evaluation events in the queue, check the watchtower host.

  2. Check that the location of the watchtower host is correctly configured in the abiquo.properties file

    Code Block
    # cat /opt/abiquo/config/abiquo.properties | grep watchtower.host
    abiquo.watchtower.host = monitoring.bcn.abiquo.com
  3. Check for storage space in on the watchtower.host

  4. Remove files older than 30 days.

    Code Block
    find /var/lib/cassandra/data/kairosdb/data_points -mtime +30 -print -delete;

  5. Check if the services are listening.

    Code Block
    # netstat -tlpn
    Active Internet connections (only servers)
    Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
    tcp        0      0 0.0.0.0:9160            0.0.0.0:*               LISTEN      3565/java           
    tcp        0      0 127.0.0.1:3306          0.0.0.0:*               LISTEN      3930/mysqld         
    tcp        0      0 127.0.0.1:43605         0.0.0.0:*               LISTEN      3565/java           
    tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      3559/sshd           
    tcp        0      0 10.60.20.32:7000        0.0.0.0:*               LISTEN      3565/java           
    tcp        0      0 127.0.0.1:7199          0.0.0.0:*               LISTEN      3565/java           
    tcp6       0      0 :::9100                 :::*                    LISTEN      3562/node_exporter  
    tcp6       0      0 0.0.0.0:9042            :::*                    LISTEN      3565/java           
    tcp6       0      0 :::22                   :::*                    LISTEN      3559/sshd           
    tcp6       0      0 :::36638                :::*                    LISTEN      3561/java   
    
    

  6. Reboot the server because a service restart does not make the Cassandra service start correctly.  After After the monitoring server reboot, make sure all services are up and running.
    If KairosDB starts up too quickly after Cassandra, it will fail. Then Abiquo will throw exceptions from Emmett about connection refused to localhost. To resolve this issue, on the monitoring server start KairosDB manually.

Monitoring services logs location

When troubleshooting any issues, it is also worth checking the log files:
Kairosdb:
/opt/kairosdb/log/kairosdb.log

Cassandra:
/var/log/cassandra/cassandra.log
/var/log/cassandra/debug.log
/var/log/cassandra/system.log

Abiquo-Emmet:
/var/log/emmett.log
/var/log/emmett-metrics.log

Abiquo-Delorean:
/var/log/delorean.log
/var/log/delorean-metrics.log