OpenStack Compute (nova)

Merge lp:~ewanmellor/nova/xenapi-concurrency-model into lp:~hudson-openstack/nova/trunk

xenapi-concurrency-model
Merge into trunk

Proposed by Ewan Mellor on 2010-08-17

Status:	Superseded
Proposed branch:	lp:~ewanmellor/nova/xenapi-concurrency-model
Merge into:	lp:~hudson-openstack/nova/trunk
Diff against target:	315 lines (+144/-31) 2 files modified nova/utils.py (+8/-0) nova/virt/xenapi.py (+136/-31)
To merge this branch:	bzr merge lp:~ewanmellor/nova/xenapi-concurrency-model
Related bugs:	Link a bug report

Reviewer	Date Requested	Status
OZAWA Tsuyoshi (community)		Approve on 2010-08-18
justinsb (community)		Approve on 2010-08-18
termie (community)		Needs Fixing on 2010-08-18
Jay Pipes (community)	2010-08-17	Approve on 2010-08-18
Rick Clark	2010-08-18	Pending
Review via email: mp+32939@code.launchpad.net

This proposal supersedes a proposal from 2010-08-15.

This proposal has been superseded by a proposal from 2010-08-19.

Description of the change

Rework virt.xenapi's concurrency model. There were many places where we were
inadvertently blocking the reactor thread. The reworking puts all calls to
XenAPI on background threads, so that they won't block the reactor thread.

Long-lived operations (VM start, reboot, etc) are invoked asynchronously
at the XenAPI level (Async.VM.start, etc). These return a XenAPI task. We
relinquish the background thread at this point, so as not to hold threads in
the pool for too long, and use reactor.callLater to poll the task.

This combination of techniques means that we don't block the reactor thread at
all, and at the same time we don't hold lots of threads waiting for
long-running operations.

There is a FIXME in here: get_info does not conform to these new rules.
Changes are required in compute.service before we can make get_info
non-blocking.

Revision history for this message

Jay Pipes (jaypipes) wrote on 2010-08-16: Posted in a previous version of this proposal

Really nice work, Ewan! No criticism at all from me! Feel free to uncomment the logging.debug() output, though. :)

review: Approve

Revision history for this message

Ewan Mellor (ewanmellor) wrote on 2010-08-16: Posted in a previous version of this proposal

I thought that those two were a bit loud even for debug level -- that's
two messages every .5 seconds when polling a task (in the default
configuration).

Ewan.

On Mon, Aug 16, 2010 at 06:35:56PM +0100, Jay Pipes wrote:

> Review: Approve
> Really nice work, Ewan! No criticism at all from me! Feel free to uncomment the logging.debug() output, though. :)
> --
> https://code.launchpad.net/~ewanmellor/nova/xenapi-concurrency-model/+merge/32722
> You are the owner of lp:~ewanmellor/nova/xenapi-concurrency-model.

Revision history for this message

Jay Pipes (jaypipes) wrote on 2010-08-16: Posted in a previous version of this proposal

> I thought that those two were a bit loud even for debug level -- that's
> two messages every .5 seconds when polling a task (in the default
> configuration).

Hmm, I suppose that is a bit loud...but then again it's debugging information. Well, I approve regardless. I'd prefer to see the debug statements uncommented, but it's certainly no reason to hold up this excellent patch :)

-jay

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2010-08-17: Posted in a previous version of this proposal

Attempt to merge lp:~ewanmellor/nova/xenapi-concurrency-model into lp:nova failed due to merge conflicts:

text conflict in nova/virt/xenapi.py

Revision history for this message

Ewan Mellor (ewanmellor) wrote on 2010-08-17: Posted in a previous version of this proposal

I've remerged this with trunk. The style cleanups that went in today caused
inevitable conflicts.

Ewan.

On Tue, Aug 17, 2010 at 10:33:45PM +0100, OpenStack Hudson wrote:

> Attempt to merge lp:~ewanmellor/nova/xenapi-concurrency-model into lp:nova failed due to merge conflicts:
>
> text conflict in nova/virt/xenapi.py
> --
> https://code.launchpad.net/~ewanmellor/nova/xenapi-concurrency-model/+merge/32722
> You are the owner of lp:~ewanmellor/nova/xenapi-concurrency-model.

Revision history for this message

Jay Pipes (jaypipes) on 2010-08-18:

review: Approve

Revision history for this message

termie (termie) wrote on 2010-08-18:

Code looks good to the degree that I understand what xenapi does.

Style fixes: only one line of whitespace between methods, please read HACKING

It also looks like you may have been a little overzealous in wrapping the description xenapi_poll_task_interval.

If you think deferredToThread is globally useful you may consider putting it in utils.

Other than those small things code looks awesome, a welcome upgrade.

review: Needs Fixing

Revision history for this message

justinsb (justin-fathomdb) wrote on 2010-08-18:

Nice.

Hopefully all these deferred twists and turns will soon be a distant and painful memory, in the compute layer at least. If we have to spawn threads (even pooled threads) for each of these method calls, I can't see Twisted having any advantages here.

review: Approve

Revision history for this message

OZAWA Tsuyoshi (ozawa-tsuyoshi) on 2010-08-18:

review: Approve

lp:~ewanmellor/nova/xenapi-concurrency-model updated on 2010-08-19

231. By Ewan Mellor on 2010-08-19: Remove whitespace to match style guide.
232. By Ewan Mellor on 2010-08-19: Move deferredToThread into utils, as suggested by termie.

Revision history for this message

Ewan Mellor (ewanmellor) wrote on 2010-08-19:

I've removed the additional whitespace between the methods.

I've left the description for xenapi_poll_task_interval alone. Assuming that
I should wrap at spaces (not at equals signs for example) and that I should
keep the description lined up on the LHS with the open parenthesis, in
both lines the next break would be at column 82, and I presume that we're
working in 80 columns. They are big words at the end there, which is why
they look a bit odd.

I've put deferredToThread into utils, as suggested.

Cheers,

Ewan.

On Wed, Aug 18, 2010 at 05:13:46PM +0100, termie wrote:

> Review: Needs Fixing
> Code looks good to the degree that I understand what xenapi does.
>
> Style fixes: only one line of whitespace between methods, please read HACKING
>
> It also looks like you may have been a little overzealous in wrapping the description xenapi_poll_task_interval.
>
> If you think deferredToThread is globally useful you may consider putting it in utils.
>
> Other than those small things code looks awesome, a welcome upgrade.
> --
> https://code.launchpad.net/~ewanmellor/nova/xenapi-concurrency-model/+merge/32939
> You are the owner of lp:~ewanmellor/nova/xenapi-concurrency-model.

Unmerged revisions

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Adam Johnson

Anne Gentle

Anthony Young

Brian Waldon

Chuck Short

Dan Mihai Dumitriu

Dave Walker

David Pravec

Diego Parrilla

Edgar Magana

Endre Karlson

Ewan Mellor

Ilya Alekseyev

Isaku Yamahata

JJ Asghar

Jay Pipes

Jonathan Bryce

Kapil Thangavelu

Keisuke Tagami

Koji Iida

Krisztian Eyssen

Lorin Hochstein

Mark McLoughlin

Masanori Itoh

Milind Barve

Nachi Ueno

Paul Guth

Pedro Perez

Rajesh Battala

Ram Durairaj

Robert Middleswarth

Salvatore Orlando

Sateesh

Soren Hansen

Tomoya Masuko

Vish Ishaya

Vladimir Popovski

Youcef Laribi

adil mukarram

jawaid ekram

justinsb

jxta

makki

med makki maalej

sreekanth

termie

to status/vote changes:

Chris Behrens

 === modified file 'nova/utils.py'
 --- nova/utils.py	2010-08-16 12:16:21 +0000
 +++ nova/utils.py	2010-08-19 14:16:14 +0000
@@ -29,6 +29,8 @@
  import socket
  import sys
++from twisted.internet.threads import deferToThread
++
  from nova import exception
  from nova import flags
@@ -142,3 +144,9 @@
  def parse_isotime(timestr):
      return datetime.datetime.strptime(timestr, TIME_FORMAT)
++
++
++def deferredToThread(f):
++    def g(*args, **kwargs):
++        return deferToThread(f, *args, **kwargs)
++    return g
 === modified file 'nova/virt/xenapi.py'
 --- nova/virt/xenapi.py	2010-08-17 11:53:30 +0000
 +++ nova/virt/xenapi.py	2010-08-19 14:16:14 +0000
@@ -16,17 +16,35 @@
  """
  A connection to XenServer or Xen Cloud Platform.
++
++The concurrency model for this class is as follows:
++
++All XenAPI calls are on a thread (using t.i.t.deferToThread, via the decorator
++deferredToThread).  They are remote calls, and so may hang for the usual
++reasons.  They should not be allowed to block the reactor thread.
++
++All long-running XenAPI calls (VM.start, VM.reboot, etc) are called async
++(using XenAPI.VM.async_start etc).  These return a task, which can then be
++polled for completion.  Polling is handled using reactor.callLater.
++
++This combination of techniques means that we don't block the reactor thread at
++all, and at the same time we don't hold lots of threads waiting for
++long-running operations.
++
++FIXME: get_info currently doesn't conform to these rules, and will block the
++reactor thread if the VM.get_by_name_label or VM.get_record calls block.
  """
  import logging
  import xmlrpclib
  from twisted.internet import defer
++from twisted.internet import reactor
  from twisted.internet import task
--from nova import exception
  from nova import flags
  from nova import process
++from nova import utils
  from nova.auth.manager import AuthManager
  from nova.compute import power_state
  from nova.virt import images
@@ -47,6 +65,11 @@
                      None,
                      'Password for connection to XenServer/Xen Cloud Platform.'
                      ' Used only if connection_type=xenapi.')
++flags.DEFINE_float('xenapi_task_poll_interval',
++                   0.5,
++                   'The interval used for polling of remote tasks '
++                   '(Async.VM.start, etc).  Used only if '
++                   'connection_type=xenapi.')
  XENAPI_POWER_STATE = {
@@ -84,9 +107,8 @@
                    for vm in self._conn.xenapi.VM.get_all()]
      @defer.inlineCallbacks
--    @exception.wrap_exception
      def spawn(self, instance):
--        vm = yield self.lookup(instance.name)
++        vm = yield self._lookup(instance.name)
          if vm is not None:
              raise Exception('Attempted to create non-unique name %s' %
                              instance.name)
@@ -105,21 +127,27 @@
          user = AuthManager().get_user(instance.datamodel['user_id'])
          project = AuthManager().get_project(instance.datamodel['project_id'])
--        vdi_uuid = yield self.fetch_image(
++        vdi_uuid = yield self._fetch_image(
              instance.datamodel['image_id'], user, project, True)
--        kernel = yield self.fetch_image(
++        kernel = yield self._fetch_image(
              instance.datamodel['kernel_id'], user, project, False)
--        ramdisk = yield self.fetch_image(
++        ramdisk = yield self._fetch_image(
              instance.datamodel['ramdisk_id'], user, project, False)
--        vdi_ref = yield self._conn.xenapi.VDI.get_by_uuid(vdi_uuid)
++        vdi_ref = yield self._call_xenapi('VDI.get_by_uuid', vdi_uuid)
--        vm_ref = yield self.create_vm(instance, kernel, ramdisk)
--        yield self.create_vbd(vm_ref, vdi_ref, 0, True)
++        vm_ref = yield self._create_vm(instance, kernel, ramdisk)
++        yield self._create_vbd(vm_ref, vdi_ref, 0, True)
          if network_ref:
              yield self._create_vif(vm_ref, network_ref, mac_address)
--        yield self._conn.xenapi.VM.start(vm_ref, False, False)
++        logging.debug('Starting VM %s...', vm_ref)
++        yield self._call_xenapi('VM.start', vm_ref, False, False)
++        logging.info('Spawning VM %s created %s.', instance.name, vm_ref)
--    def create_vm(self, instance, kernel, ramdisk):
++    @defer.inlineCallbacks
++    def _create_vm(self, instance, kernel, ramdisk):
++        """Create a VM record.  Returns a Deferred that gives the new
++        VM reference."""
++
          mem = str(long(instance.datamodel['memory_kb']) * 1024)
          vcpus = str(instance.datamodel['vcpus'])
          rec = {
@@ -152,11 +180,15 @@
              'other_config': {},
+             }
          logging.debug('Created VM %s...', instance.name)
--        vm_ref = self._conn.xenapi.VM.create(rec)
++        vm_ref = yield self._call_xenapi('VM.create', rec)
          logging.debug('Created VM %s as %s.', instance.name, vm_ref)
--        return vm_ref
++        defer.returnValue(vm_ref)
--    def create_vbd(self, vm_ref, vdi_ref, userdevice, bootable):
++    @defer.inlineCallbacks
++    def _create_vbd(self, vm_ref, vdi_ref, userdevice, bootable):
++        """Create a VBD record.  Returns a Deferred that gives the new
++        VBD reference."""
++
          vbd_rec = {}
          vbd_rec['VM'] = vm_ref
          vbd_rec['VDI'] = vdi_ref
@@ -171,12 +203,16 @@
          vbd_rec['qos_algorithm_params'] = {}
          vbd_rec['qos_supported_algorithms'] = []
          logging.debug('Creating VBD for VM %s, VDI %s ... ', vm_ref, vdi_ref)
--        vbd_ref = self._conn.xenapi.VBD.create(vbd_rec)
++        vbd_ref = yield self._call_xenapi('VBD.create', vbd_rec)
          logging.debug('Created VBD %s for VM %s, VDI %s.', vbd_ref, vm_ref,
                        vdi_ref)
--        return vbd_ref
++        defer.returnValue(vbd_ref)
++    @defer.inlineCallbacks
      def _create_vif(self, vm_ref, network_ref, mac_address):
++        """Create a VIF record.  Returns a Deferred that gives the new
++        VIF reference."""
++
          vif_rec = {}
          vif_rec['device'] = '0'
          vif_rec['network']= network_ref
@@ -188,25 +224,29 @@
          vif_rec['qos_algorithm_params'] = {}
          logging.debug('Creating VIF for VM %s, network %s ... ', vm_ref,
                        network_ref)
--        vif_ref = self._conn.xenapi.VIF.create(vif_rec)
++        vif_ref = yield self._call_xenapi('VIF.create', vif_rec)
          logging.debug('Created VIF %s for VM %s, network %s.', vif_ref,
                        vm_ref, network_ref)
--        return vif_ref
++        defer.returnValue(vif_ref)
++    @defer.inlineCallbacks
      def _find_network_with_bridge(self, bridge):
          expr = 'field "bridge" = "%s"' % bridge
--        networks = self._conn.xenapi.network.get_all_records_where(expr)
++        networks = yield self._call_xenapi('network.get_all_records_where',
++                                           expr)
          if len(networks) == 1:
--            return networks.keys()[0]
++            defer.returnValue(networks.keys()[0])
          elif len(networks) > 1:
              raise Exception('Found non-unique network for bridge %s' % bridge)
          else:
              raise Exception('Found no network for bridge %s' % bridge)
--    def fetch_image(self, image, user, project, use_sr):
++    @defer.inlineCallbacks
++    def _fetch_image(self, image, user, project, use_sr):
          """use_sr: True to put the image as a VDI in an SR, False to place
          it on dom0's filesystem.  The former is for VM disks, the latter for
--        its kernel and ramdisk (if external kernels are being used)."""
++        its kernel and ramdisk (if external kernels are being used).
++        Returns a Deferred that gives the new VDI UUID."""
          url = images.image_url(image)
          access = AuthManager().get_access_key(user, project)
@@ -218,22 +258,28 @@
          args['password'] = user.secret
          if use_sr:
              args['add_partition'] = 'true'
--        return self._call_plugin('objectstore', fn, args)
++        task = yield self._async_call_plugin('objectstore', fn, args)
++        uuid = yield self._wait_for_task(task)
++        defer.returnValue(uuid)
++    @defer.inlineCallbacks
      def reboot(self, instance):
--        vm = self.lookup(instance.name)
++        vm = yield self._lookup(instance.name)
          if vm is None:
              raise Exception('instance not present %s' % instance.name)
--        yield self._conn.xenapi.VM.clean_reboot(vm)
++        task = yield self._call_xenapi('Async.VM.clean_reboot', vm)
++        yield self._wait_for_task(task)
++    @defer.inlineCallbacks
      def destroy(self, instance):
--        vm = self.lookup(instance.name)
++        vm = yield self._lookup(instance.name)
          if vm is None:
              raise Exception('instance not present %s' % instance.name)
--        yield self._conn.xenapi.VM.destroy(vm)
++        task = yield self._call_xenapi('Async.VM.destroy', vm)
++        yield self._wait_for_task(task)
      def get_info(self, instance_id):
--        vm = self.lookup(instance_id)
++        vm = self._lookup_blocking(instance_id)
          if vm is None:
              raise Exception('instance not present %s' % instance_id)
          rec = self._conn.xenapi.VM.get_record(vm)
@@ -243,7 +289,11 @@
                  'num_cpu': rec['VCPUs_max'],
                  'cpu_time': 0}
--    def lookup(self, i):
++    @utils.deferredToThread
++    def _lookup(self, i):
++        return self._lookup_blocking(i)
++
++    def _lookup_blocking(self, i):
          vms = self._conn.xenapi.VM.get_by_name_label(i)
          n = len(vms)
          if n == 0:
@@ -253,9 +303,52 @@
          else:
              return vms[0]
--    def _call_plugin(self, plugin, fn, args):
++    def _wait_for_task(self, task):
++        """Return a Deferred that will give the result of the given task.
++        The task is polled until it completes."""
++        d = defer.Deferred()
++        reactor.callLater(0, self._poll_task, task, d)
++        return d
++
++    @utils.deferredToThread
++    def _poll_task(self, task, deferred):
++        """Poll the given XenAPI task, and fire the given Deferred if we
++        get a result."""
++        try:
++            #logging.debug('Polling task %s...', task)
++            status = self._conn.xenapi.task.get_status(task)
++            if status == 'pending':
++                reactor.callLater(FLAGS.xenapi_task_poll_interval,
++                                  self._poll_task, task, deferred)
++            elif status == 'success':
++                result = self._conn.xenapi.task.get_result(task)
++                logging.info('Task %s status: success.  %s', task, result)
++                deferred.callback(_parse_xmlrpc_value(result))
++            else:
++                error_info = self._conn.xenapi.task.get_error_info(task)
++                logging.warn('Task %s status: %s.  %s', task, status,
++                             error_info)
++                deferred.errback(XenAPI.Failure(error_info))
++            #logging.debug('Polling task %s done.', task)
++        except Exception, exn:
++            logging.warn(exn)
++            deferred.errback(exn)
++
++    @utils.deferredToThread
++    def _call_xenapi(self, method, *args):
++        """Call the specified XenAPI method on a background thread.  Returns
++        a Deferred for the result."""
++        f = self._conn.xenapi
++        for m in method.split('.'):
++            f = f.__getattr__(m)
++        return f(*args)
++
++    @utils.deferredToThread
++    def _async_call_plugin(self, plugin, fn, args):
++        """Call Async.host.call_plugin on a background thread.  Returns a
++        Deferred with the task reference."""
          return _unwrap_plugin_exceptions(
--            self._conn.xenapi.host.call_plugin,
++            self._conn.xenapi.Async.host.call_plugin,
              self._get_xenapi_host(), plugin, fn, args)
      def _get_xenapi_host(self):
@@ -281,3 +374,15 @@
      except xmlrpclib.ProtocolError, exn:
          logging.debug("Got exception: %s", exn)
          raise
++
++
++def _parse_xmlrpc_value(val):
++    """Parse the given value as if it were an XML-RPC value.  This is
++    sometimes used as the format for the task.result field."""
++    if not val:
++        return val
++    x = xmlrpclib.loads(
++        '<?xml version="1.0"?><methodResponse><params><param>' +
++        val +
++        '</param></params></methodResponse>')
++    return x[0][0]

OpenStack Compute (nova)

Merge lp:~ewanmellor/nova/xenapi-concurrency-model into lp:~hudson-openstack/nova/trunk

Commit message

Description of the change

Unmerged revisions

Preview Diff

Subscribers