Reza Aghajani, Xingjie Li, K. Ramanan
Dec 19, 2017
Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems
We introduce a new framework for the analysis of large-scale load balancing networks with general service time distributions, motivated by applications in server farms, distributed memory machines, cloud computing and communication systems. For a parallel server network using the so-called $SQ(d)$ load balancing routing policy, we use a novel representation for the state of the system and identify its fluid limit, when the number of servers goes to infinity and the arrival rate per server tends to a constant. The fluid limit is characterized as the unique solution to a countable system of coupled partial differential equations (PDE), which serve to approximate transient Quality of Service parameters such as the expected virtual waiting time and queue length distribution. In the special case when the service time distribution is exponential, our method recovers the well-known ordinary differential equation characterization of the fluid limit. Furthermore, we develop a numerical scheme to solve the PDE, and demonstrate the efficacy of the PDE approximation by comparing it with Monte Carlo simulations. We also illustrate how the PDE can be used to gain insight into the performance of large networks in practical scenarios by analyzing relaxation times in a backlogged network. In particular, our numerical approximation of the PDE uncovers two interesting properties of relaxation times under the SQ(2) algorithm. Firstly, when the service time distribution is Pareto with unit mean, the relaxation time decreases as the tail becomes heavier. This is a priori counterintuitive given that for the Pareto distribution, heavier tails have been shown to lead to worse tail behavior in equilibrium. Secondly, for unit mean light-tailed service distributions such as the Weibull and lognormal, the relaxation time decreases as the variance increases. This is in contrast to the behavior observed under random routing, where the relaxation time increases with increase in variance.