Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt
hiddentrue

Monitor cloud metrics and automate accordingly for high volume transactions with VMs in public cloud providers including Amazon, Azure, and vCloud Director

Image Added

In Abiquo 4.6, the monitoring system has been improved to better support large numbers of VMs. The improvements were made based on the results of a simulation where a script deployed VMs and modified metrics to activate and deactivate alarms and alerts. For the simulation, 5000 VMs were deployed on 5 "fake" hypervisors hypervisor emulators running as Docker containers. During the simulation, 2 metrics of each VM had values that forced the activation of alarms and alerts for a 15 minute period, followed by default values. 

The activated alarms and alerts were quickly detected and notified by the system. However, some improvements were made .

...

based on the results of the simulation, including:

  • Increased the speed of the push of metric data points after collection:

    • Distribute

      Distributed the push request work amongst the hypervisor actors and removing the single push-actor bottleneck. 

    • Improve

      Improved reliability by distributing the push request queue amongst the hypervisor actors, because in a congested environment the queue was always full and oldest requests might have been able to be lost

    • Improve

      Improved performance by splitting large push requests

  • Optimize Optimized configuration of KairosDB incoming queue processor

    • Increase Increased 'batch_size' (requires changing Cassandra configuration), we should try to dimension with the expected vm / metrics / datapointsXminutedata points per minute

      • The 'min_batch_size' and 'min_batch_wait'  configuration is unchanged: only delay 0

      ,
      • .5s if there are

      few
      • fewer than 100 data points

    • Increase

      Increased "memory_queue_size" to avoid disk usage

    • Increase

      Increased "thread_count" to allow more requests to Cassandra

  • New KairosDB version with CQL instead of

    thrift

    Thrift

  • Reduce

    Reduced response time of Emmett request to push metric data points

    . All

    . The Emmett module manages metrics, alarms, and alerts. It retrieves metric data and obtains alarm details and requests alarm evaluation. All the entities handled by Emmett (metric, alarm, and alert) can have tags and can be found using the tags

    • Increase

      Increased speed of retrieval of metrics, alarms, and alerts from the database by decoupling the tags from the entities. Only retrieve tags for create and search purposes

    • Increase

      Increased speed of push process by removing unnecessary database transactions

    • Add

      Added a default local cache to improve the speed of the

      request to

      get

      a

      metric request. This

      can the be the local

      cache

      , which

      is disabled by default

      , or

      and a distributed cache should be added for load balanced instances

      .

Related links: