Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The activated alarms and alerts were quickly detected and notified by the system. However, some improvements were made.

  • Increase Increased the speed of the push of metric data points after collection:
    • Distribute Distributed the push request work amongst the hypervisor actors and removing the single push-actor bottleneck. 
    • Improve Improved reliability by distributing the push request queue amongst the hypervisor actors, because in a congested environment the queue was always full and oldest requests might have been able to be lost
    • Improve Improved performance by splitting large push requests
  • Optimize Optimized configuration of KairosDB incoming queue processor

    • Increase Increased 'batch_size' (requires changing Cassandra configuration), we should try to dimension with the expected vm / metrics / datapointsXminute

      • The 'min_batch_size' and 'min_batch_wait'  configuration is unchanged: only delay 0.5s if there are fewer than 100 data points
    • Increase Increased "memory_queue_size" to avoid disk usage
    • Increase Increased "thread_count" to allow more requests to Cassandra
  • New KairosDB version with CQL instead of thrift
  • Reduce Reduced response time of Emmett request to push metric data points. All the entities handled by Emmett (metric, alarm, and alert) can have tags and can be found using the tags
    • Increase Increased speed of retrieval of metrics, alarms, and alerts from the database by decoupling the tags from the entities. Only retrieve tags for create and search purposes
    • Increase Increased speed of push process by removing unnecessary database transactions
    • Add Added a default local cache to improve the speed of the get metric request. This cache is disabled by default and a distributed cache should be added for load balanced instances

...