...
Check the monitoring queue in RabbitMQ (on the API server or Datanode server)
Code Block # rabbitmqctl list_queues messages consumers name Listing queues 0 0 abiquo.vmactionplan.execution 0 1 abiquo.scheduler.slow.requests 0 1 abiquo.scheduler.fast.requests 0 1 abiquo.vsm.eventsynk 0 1 abiquo.actionplan.execution 0 1 abiquo.nars.requests.mothership7 0 1 abiquo.nodecollector.notifications 0 0 abiquo.vappspec.parking-expect-no-consumers 0 1 watchtower.alarm.notificacion 0 1 abiquo.bpm.notifications 0 1 abiquo.tracer.traces.tenantevents.mothership7 0 1 abiquo.nars.requests.mothership-pcr 0 1 watchtower.events.event 0 1 abiquo.virtualfactory.notifications 0 1 abiquo.datacenter.requests.mothership7.virtualfactory 0 1 abiquo.api.synchrs.requests 0 1 abiquo.tracer.traces.userevents.mothership7 0 0 abiquo.pcrsync.parking-expect-no-consumers 0 1 abiquo.nars.responses 0 0 abiquo.vmactionplan.schedule 0 1 abiquo.scheduler.requests 0 1 abiquo.am.notifications 0 0 abiquo.datacenter.requests.mothership7-pcr.virtualfactory 0 1 abiquo.vappspec.messages 0 1 abiquo.pcrsync.messages 0 1 abiquo.tracer.traces.allevents.mothership7 0 1 abiquo.ha.tasks 0 1 abiquo.datacenter.requests.mothership7.bpm 204 1 watchtower.alarm.evaluation 0 1 abiquo.actionplan.schedule 0 1 abiquo.datacenter.requests.mothership-pcr.virtualfactory 0 1 abiquo.virtualmachines.definitionsyncs 0 1 abiquo.tracer.traces.eventpersister.mothership7
If there areĀ
watchtower.alarm.evaluation
events in the queue, check the watchtower host.Check that the location of the watchtower host is correctly configured in the
abiquo.properties
fileCode Block # cat /opt/abiquo/config/abiquo.properties | grep watchtower.host abiquo.watchtower.host = monitoring.bcn.abiquo.com
Check for storage space on the watchtower.host
Remove files older than 30 days.
Code Block find /var/lib/cassandra/data/kairosdb/data_points -mtime +30 -print -delete;
Check if the services are listening.
Code Block # netstat -tlpn Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:9160 0.0.0.0:* LISTEN 3565/java tcp 0 0 127.0.0.1:3306 0.0.0.0:* LISTEN 3930/mysqld tcp 0 0 127.0.0.1:43605 0.0.0.0:* LISTEN 3565/java tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 3559/sshd tcp 0 0 10.60.20.32:7000 0.0.0.0:* LISTEN 3565/java tcp 0 0 127.0.0.1:7199 0.0.0.0:* LISTEN 3565/java tcp6 0 0 :::9100 :::* LISTEN 3562/node_exporter tcp6 0 0 0.0.0.0:9042 :::* LISTEN 3565/java tcp6 0 0 :::22 :::* LISTEN 3559/sshd tcp6 0 0 :::36638 :::* LISTEN 3561/java
Reboot the server because a service restart does not make the Cassandra service start correctly. After the monitoring server reboot, make sure all services are up and running.
If KairosDB starts up too quickly after Cassandra, it will fail. Then Abiquo will throw exceptions from Emmett about connection refused to localhost. To resolve this issue, on the monitoring serverĀ start KairosDB manually.
Monitoring services logs location
When troubleshooting any issues, it is also worth checking the log files:
Kairosdb: /opt/kairosdb/log/kairosdb.log
Cassandra:/var/log/cassandra/cassandra.log
/var/log/cassandra/debug.log
/var/log/cassandra/system.log
Abiquo-Emmet:/var/log/emmett.log
/var/log/emmett-metrics.log
Abiquo-Delorean:/var/log/delorean.log
/var/log/delorean-metrics.log