2. Queuing Models

The operation of many systems can be abstracted as queuing models, where customers arrive (randomly) at a system requesting for services. When a customer arrives and if no server is available, the customer will wait in the queue. When a server is available, it will select a customer and serve the customer for some time, after which the customer will leave the system, and the server will select and serve the next customer.

Queuing models can be applied in many situations: customers in banks, machines at repair shops, telephone calls at call-centers, cars on road, jobs in computer systems, packets in networks, and so on. The word “customers” here can take on different meanings in different contexts. They can be people, machines, telephone calls, cars, computing jobs, and network packets, etc. Similarly, the word “servers” can also mean different things in different contexts: bank tellers, mechanics, receptionists, road lanes, traffic lights, processors, computing devices, network switches, and so on.

2.1. Characterization of Queuing Models

Queuing models can be characterized by five properties: the calling population, the arrival process, the service mechanism, the capacity of the system, and the queuing discipline:

  • Customers arrive at the system from what is called a calling population. The calling population can be finite or infinite. A closed system with a fixed number of customers has a finite calling population. Many queuing models are open systems with infinite calling population.
  • The arrival process describes how the customers arrive at the system: Are there different types of customers? What is the distribution of customers’ inter-arrival time? Are they coming in batches?
  • The service mechanism describes how the customers are being served: How many servers are there in the system? What is the service time distribution (for each server and for each type of customers)? Do the servers each have its own queue or do they all share a common queue?
  • The capacity of the system specifies whether there is a limit to the number of customers in the system (including those waiting in the queue and those currently being served).
  • The queuing discipline describes how a server chooses to serve the customers. Common queuing disciplines include FIFO (first in first out), LIFO (last in first out), SIRO (service in random order), and Priority (based on the importance of the customer, e.g., shortest processing time first). There are many complicated queuing disciplines (for example, weighted fair queuing with multiple queues handled in a round-robin fashion). To put this in a context of a real scenario, the Linux operating system currently implements about a dozen queuing disciplines for scheduling processes.

There is a well-accepted notation for common queuing models, which was first proposed by D. G. Kendall in 1953. In an abbreviated form, a queuing system can be denoted as A/B/c/K/N/D, where A represents the arrival process, B represents the service-time distribution, c is the number of servers, K is the system capacity, N is the size of the calling population, and D is the queuing discipline. Both A and B can use common symbols to describe the inter-arrival and service time distributions, such as M (for Markovian, or exponential), D (for deterministic), and G (for general). Also, when K and N are infinite, they are usually dropped from the notation. D is usually ignored from the context or when FIFO is assumed.

For example, M/M/1 represents a single-server queue with exponentially distributed inter-arrival time and service time. The queue is FIFO and has an infinite capacity and an infinite calling population. M/G/c/K represents a queue with c servers with exponentially distributed inter-arrival time, and general distribution for the service time. The queue is FIFO and has a finite capacity.

2.2. Lindley’s Equation

For a single-server FIFO queue with infinite capacity, one can use the Lindley’s equation to compute the queuing time of the customers without simulation. The queuing time is the time between the time of a customer’s arrival and the time when the customer enters service.

Let \(a_n\) be the inter-arrival time of the n-th customer. Let \(b_n\) is the service time of the n-th customer. The queuing time of first customer (\({wq}_0\)) is zero. The wait time of all subsequent customers can be calculated using the following recursive equation: \({wq}_{n+1} = \max(0, {wq}_n + b_n - a_n)\). This method was first proposed by Dennis Lindley in 1952. The following code shows the use of Lindley’s equation to compute the customer’s queuing time.

import numpy as np

a = np.random.exponential(1.2, 100000)
b = np.random.exponential(0.8, 100000)

z = 0.0
wq = []
for x, y in zip(a, b):
    z = max(0, z+y-x)
wq = np.array(wq)

print('inter-arrival time (mean=%g): %r...' % (a.mean(), a[:3]))
print('service time (mean=%g): %r...' % (b.mean(), b[:3]))
print('queuing time (mean=%g): %r' % (wq.mean(), wq[:3]))
print('wait time (in system): mean=%g' % (b.mean()+wq.mean()))
inter-arrival time (mean=1.20109): array([0.25511839, 1.16777865, 0.69092292])...
service time (mean=0.79616): array([0.02259983, 1.2754942 , 1.44756566])...
queuing time (mean=1.53788): array([0.        , 0.10771554, 0.86435829])
wait time (in system): mean=2.33404

The main advantage of using Lindley’s equation is that one does not need a simulator to obtain the queuing time and wait time of the customers. However, Lindley’s equation applies only when the queue is a single-server queue with the FIFO queuing discipline.

2.3. Queuing Theory

The analytical approach to solving queuing models is collectively called the queuing theory. One can apply queuing theory to mathematically find steady-state solutions for a number of queuing models. For example, one can calculate the average wait time of customers for an M/M/1 queue to be \(1/(\mu - \lambda)\), where \(\lambda\) is the arrival rate and \(\mu\) is service rate. The average number of customers in an M/M/1 queue can also be calculated as \(\lambda/(\mu - \lambda)\). In the above example, \(lambda = 1/1.2\) and \(\mu = 1/0.8\), we can calculate the mean wait time to be 2.4 (without using Lindley’s equation or simulation).

The result assume that the queuing system reaches the “steady-state”. We can see from the equation that both the average wait time and the number customers in system would increase as the difference between the service rate and the arrival rate gets smaller. When the arrival rate is equal to or greater than the service rate, the queue becomes “unstable”, meaning that the system will not have a finite solution as both numbers will grow without bound.

The queuing theory is a powerful mathematical tool for people to understand the queuing behaviors. However, the queuing theory can only be applied to a small set of queuing models. These queuing models tend to be simplistic and restrictive. For example, the arrival process and service time need to have well-formed probability distributions; and the queuing theory mostly deals with stead-state behaviors.

Simulation can handle far more complicated situations. A system oftentimes can be modeled as a system of interconnected queues and studied under representative workloads. For example, a computer can be modeled as a system of queues each representing a processor or an I/O device (and possibly having quite complicated queuing disciplines), where jobs circulate among the queues receiving services (for CPU calculation or I/O operation). A computer network can also be models as having many queues in connected devices, such as routers and switches, where data are stored and forwarded. Similarly, a transportation system can be modeled as a network of road segments and intersections as queues, where cars travel through them and experience various delays.

Simulators, like simulus, are designed to deal with these complicated scenarios, where queuing theory meets with its limitations. The focus of this tutorial is to show how to use simulus to model different kinds of queuing systems.

[ ]: