Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

In Abiquo 4.6, the monitoring system has been improved to better support large numbers of VMs. The improvements were made based on the results of a simulation where a script deployed VMs and modified metrics to activate and deactivate alarms and alerts. For the simulation, 5000 VMs were deployed on 5 "fake" hypervisors running as Docker containers. During the simulation, 2 metrics of each VM had values that forced the activation of alarms and alerts for a 15 minute period, followed by default values. 

The activated alarms and alerts were quickly detected and notified by the system. However, some improvements were made.

  • Increase the speed of the push of metric data points after collection:
    • Distribute the push request work amongst the hypervisor actors and removing the single push-actor bottleneck. 
    • Improve reliability by distributing the push request queue amongst the hypervisor actors, because in a congested environment the queue was always full and oldest requests might have been able to be lost
    • Improve performance by splitting large push requests
  • Optimize configuration of KairosDB incoming queue processor

    • Increase 'batch_size' (requires changing Cassandra configuration), we should try to dimension with the expected vm / metrics / datapointsXminute

    • 'min_batch_size' and 'min_batch_wait'  configuration is unchanged: only delay 0,5s if there are few than 100 data points
    • Increase "memory_queue_size" to avoid disk usage
    • Increase "thread_count" to allow more requests to Cassandra
  • New KairosDB version with CQL instead of thrift
  • Reduce response time of Emmett request to push metric data points. All the entities handled by Emmett (metric, alarm, and alert) can have tags and can be found using the tags
    • Increase speed of retrieval of metrics, alarms and alerts from the database by decoupling tags from the entities. Only retrieve tags for create and search purposes
    • Increase speed of push process by removing unnecessary database transactions
    • Add a cache to improve the speed of the request to get a metric. This can the be the local cache, which is disabled by default, or a distributed cache for load balanced instances.




  • No labels