Abiquo technical overview

Introduction to Abiquo

Abiquo is a cloud computing platform that enables you to manage public clouds and private clouds. You can use your own physical infrastructure to provide infrastructure as a service. It is a hypervisor agnostic model (no vendor lock-in) and it provides a unified interface for managing:

VM templates
Compute
Network
Storage
Events

Abiquo datacenters

In private cloud, a datacenter consists of:

A set of hypervisors
Storage servers (for optional managed storage)
Network File System (NFS)
DHCP
Abiquo remote services

Abiquo is designed to manage multiple datacenters and public cloud providers.

Hypervisors

Abiquo manages multiple hypervisors and conversions between hypervisors.
Managed by the vendor-provided API:

VMware
Hyper-V

Managed by the Abiquo AIM agent using libvirt

KVM

And Abiquo also works with public cloud providers.

Abiquo remote services

The Abiquo platform uses Remotes Services to manage the hypervisors. The Abiquo remote services found in each datacenter are:

Appliance manager
Business process manager
Discovery manager
Virtualization manager
Virtual system monitor
Remote access manager
Services manager

Appliance manager

Provides the ability to upload and download VM templates into repositories available to the datacenter. Administrators may use the AM API to manage templates.

Business process manager

Provides conversion of VM templates between disk formats supported by hypervisors and export to public cloud providers, thus eliminating vendor lock-in.

Conversion requests and responses are submitted via message queues
Requests are processed one at a time, because working at NFS level implies being careful of I/O conflicts
Uses RabbitMQ as a message broker

Discovery manager

Provides hypervisor discovery of the following:

Hypervisor type
Deployed VMs
Physical machine capabilities and resources

Uses the API provided by the hypervisor vendor.

Also manages external storage.

Virtualization manager (virtual factory remote service)

This is the common layer to unify and manage the virtualization capabilities of each hypervisor type. Manages the life cycle of the VMs and network and storage configuration for VMs. Job requests and responses are submitted via message queues (RabbitMQ) with an actor-based model using Akka.

Virtual system monitor

Manages a set of monitors. Each monitor examines the state of VMs and notifies changes. It uses Redis as a subscription store and the pubsub mechanism for event notifications from each monitor. State changes are notified via message queues (RabbitMQ).

Server

The Abiquo Server manages an arbitrary number of datacenters by accessing their remote services. Communications between modules and datacenters are performed via RabbitMQ. MySQL stores the system configuration and state. The platform also offers enterprise functionality: pricing, scheduling, security. The server exposes its data and operations via a REST API.

Architecture diagram

Technology stack

The Abiquo technology stack includes the following projects.

Apache projects
- Thrift
- Commons
Others
- MySQL (MariaDB)
- Redis
- RabbitMQ
- Akka
- Libvirt
- Jersey

Virtual machine operations

This section outlines the VM operations in the virtual factory. The implementation uses asynchronous communication and independent, concurrent operations. It offers improved scalability, traceability, performance and stability.

This document assumes a good working knowledge of the Abiquo platform.

Advantages

Scalability
- asynchronous VM operations
  - independent
  - concurrent
Traceability
- simple, standardized VM definition for all hypervisors
- error messages and logs
- monitoring console
Performance
- non-blocking operations
Stability
- improved reconfigure
- improved rollback
- standardized and unified hypervisor communication
- automated testing

Scalability and stability

The virtual factory is based on VM abstraction. Thus all operations are performed on a single VM. This provides scalability and stability because each VM is independent.

The internal VM definition is a simple, standard, single-page transport document (instead of say, an OVF description). This standard contains common attributes for all hypervisors and providers. As a standard document, it provides stability and facilitates the addition of new hypervisors.

The hypervisor plugin abstraction has one common interface and one plugin for each hypervisor type (including public cloud providers, or in some cases, public cloud regions).

Virtual machine implementation

The virtual factory works with two interfaces: the hypervisor connection and the VM resources. Actions are performed on the VM that move it between the VM states.

Virtual machine and hypervisor actions

Hypervisor Connection
- login
- logout
Virtual Machine
- configure
- reconfigure
- unconfigure
- snapshot
- get state
- power on
- power off
- stop
- resume

Virtual machine states diagram

See VM and VApp states.

Deployment scenario

To deploy a virtual appliance containing N virtual machines, we operate directly on each VM. We perform

N virtual machine configurations
N power-on requests

We wish to run these N configurations in parallel (as resources permit) and request a power-on of each VM after configuration. These operations are concurrent and independent.

The virtual factory coordinates these tasks.

Performance and scalability

The key to the performance and scalability of the VM operations is asynchronous communication. Communications between the API/Server and the virtual factory are all "send and forget" requests, so the operations on a VM are independent. The platform achieves high performance and scalability because it does not wait for requests on other VMs in a virtual appliance.

Each request to the virtual factory is a complete entity and includes the hypervisor connection to avoid reconnects.

A request contains:

tasks and jobs
virtual machine definitions
hypervisor connection

The Server/API and virtual factory communicate using RabbitMQ which uses AMQP.

The Server and API send requests to RabbitMQ. Virtual factory gets tasks from the request queue. Each task is a set of jobs and hypervisor plugins work at the job level. The virtual factory sends job and task results to the outgoing notification queue.

Each datacenter has its own IN and OUT queue. The queue name is configured with the Abiquo configuration property abiquo.datacenter.id for the Remote Services (V2V/BPM and virtual factory). This property identifies the AMQP datacenter queues. It is needed for building routing keys and queue names, so that they are unique for each datacenter. Do not change this property! The value of this property is generated at installation, but you can recover the property through the API with Get Datacenter.

RabbitMQ is shared by all datacenters on the Abiquo platform. RabbitMQ is a single message broker for all infrastructure.

Tasks and uobs

Tasks are performed on virtual machines. A task is a collection of jobs.
Tasks are executed independently and concurrently. Within tasks, jobs can be dependent and sequential or concurrent.

For example, when deploying a virtual appliance. There are two independent and concurrent tasks: deploy VM 1 and deploy VM 2. Within each task there are two dependent jobs: Configure and Power on.

Inside the virtual factory

The components of the virtual factory and their basic functions are as follows.

Queue manager: handles incoming queue messages
Task orchestrator: decides when to send a job
Virtualization worker: executes the VM operation

Many jobs can run concurrently with multiple instances of the virtualization worker, but the number of jobs in a specific installation is dependent on both the available resources and the configuration.

The number of concurrent sessions is limited by resources because you cannot have more open sessions than the number of cores in your hypervisor host. It can also be controlled at the platform level, and for hypervisors/providers and for VDCs using the Abiquo Configuration Properties. For example, for the platform, the abiquo.virtualfactory.openSession property controls the default number of simultaneous operations on a single hypervisor or provider, and by default it is set to 2. To configure the number of concurrent connections, see Control the number of concurrent operations.

Traceability

The system provides progress of VM tasks and a new console is under development to allow you to check the states of current job execution. Job and task states are saved in the Datacenter REDIS while the task is in progress. Job and task state changes are notified to Abiquo Server/API.

There is an API query to provide the task state:
GET virtualmachine/1/tasks/

The virtual factory has a common step log, which means the same steps are logged by each hypervisor or provider. For example, all hypervisors log: start template copy and finish template copy. This information can be found in the virtual factory log and as INFO level logging in Catalina.out. In the future, this information will also be accessible to the API.

Handling failed tasks

What happens when a task can't be executed properly? The virtual factory offers full roll back of jobs from the same task that have already been executed. In addition, it offers full clean up of failed tasks and jobs. The hypervisor is returned to the original state. This ensures the stability of the platform.

The virtual factory ensures that the VM state is always consistent. This means that if an operation fails, you can always retry.

Appendix: Inside the virtual factory

The diagram below shows the interaction between the elements of the virtual factory in dealing with requests and responses.

Queue Manager: handles incoming queue messages
Task Orchestrator: decides when to send a job
Virtualization Worker: executes the VM operation

Queue manager

Receives Task message from the API or Server using the RabbitMQ connection.
- checks the message and sends the task to the Task Orchestrator

Receives Task response (from Task Orchestrator) then acknowledges the AMQP message
- the message is deleted from the queue

Task orchestrator

Receives task messages
- saves the current task message
- saves a map of task IDs to jobs IDs
- if it's a sequential task, send the first job to a VW
- it it's a concurrent task, send all the jobs to many VWs
- if the job being sent is a task, send to itself

Receives job results
- notify end of job to the outgoing AMQP queue
- check if the task is completed, then check if it should notify itself or the Queue Manager
- if not completed and a sequential task, send the next job

Receives task results
- only intermediate task results, because task is also a job.
- perform the same logic as a Job Result

Virtualization worker

Receives job messages
- create a new instance of Hypervisor Connection and perform the VM operation
- send back a Job Result when it's done