(formerly hosted at www.minha.pt)
The correctness and performance of large scale distributed systems depends on middleware components performing various communication and coordination functions. It is, however, very difficult to experimentally assess such middleware components as interesting behavior often arises exclusively in large scale settings, with costly and time consuming distributed deployments with realistic workloads.
Minha virtualizes multiple JVM instances in each JVM while simulating key environment components, reproducing the concurrency, distribution, and performance characteristics of a much larger distributed system. It enables that multiple instances run in each JVM, significantly reducing the resources required by typical alternatives. Moreover, by virtualizing time using simulation, it reduces the interference resulting from competing for shared resources and provides a time reference and control point.
The application and middleware classes for each instance are loaded by a custom class loader that replaces native libraries and synchronization bytecode for references to simulation models. Most of the code is run unmodified and time is accounted using the CPU time-stamp counter to closely obtain true performance characteristics. Some of these simulation models are developed from scratch while others are produced by translating native libraries themselves. The resulting code makes use of the simulation kernel and time virtualization to run. Multiple instances are loaded under the control of a command line user interface and configuration loader.
This project is extending the Minha platform towards supporting a wider range of middleware and environments as follows:
Experimental evaluation of middleware usually requires instances of a stub application deployed in multiple hosts to accurately reproduce the impact of distribution and avoid mutual interference. The amount of required hardware resources to assess systems aimed at large scale is thus significant. Moreover, a distributed system is inherently difficult to observe due to the lack of a time reference and central control point.
As a consequence, many middleware components have performance bottlenecks and even outright flaws that are exposed only in production in large scale deployments and with real workloads. For instance, in small scale deployments with several instances in the same or in a few physical hosts, concurrency issues might not be exposed, scalability challenges do not arise (e.g. unstable performance or oscillation), and run-time complexity of embedded algorithms is left unchecked.
The goal of this project is to provide a platform that has the following desirable properties:
Research leading to Minha is described here:
[MMN+19] N. Machado, F. Maia, F. Neves, F. Coelho, J. Pereira, Minha: Large-Scale Distributed Systems Testing Made Practical. International Conference on Principles of Distributed Systems (OPODIS).
[JMM+15] T. Jorge, F. Maia, M. Matos, J. Pereira, and R. Oliveira. Practical evaluation of large scale applications. In Distributed Applications and Interoperable Systems (DAIS, with DisCoTec), Lecture Notes in Computer Science (LNCS). Springer, 2015.
[CBCP11] N. A. Carvalho, J. Bordalo, F. Campos, and J. Pereira. Experimental evaluation of distributed middleware with a virtualized Java environment. In MW4SOC ‘11: Proceedings of the 6th workshop on Middleware for service oriented computing, 2011.
Some results obtained with Minha (current and past versions):
[PMP20] João Pereira, Nuno Machado, Jorge Sousa Pinto. Testing for Race Conditions in Distributed Systems via SMT Solving. In TAP 2020. Lecture Notes in Computer Science, vol 12165. Springer, 2020.
[MMV+14] Francisco Maia, Miguel Matos, Ricardo Vilaça, José Pereira, Rui Oliveira, and Etienne Rivière. Dataflasks: Epidemic store for massive scale systems. In SRDS ’14. IEEE Computer Society, 2014.
[SPS+05] A. Sousa, J. Pereira, L. Soares, A. Correia Jr., L. Rocha, R. Oliveira, and F. Moura. Testing the dependability and performance of group communication based database replication protocols. In IEEE/IFIP International Conference on Dependable Systems and Networks, pages 792–801. IEEE Computer Society, 2005.
[CSS+05] A. Correia Jr., A. Sousa, L. Soares, J. Pereira, F. Moura, and R. Oliveira. Group-based replication of on-line transaction processing servers. In C. Maziero, J. Silva, A. Santos Andrade, and F. Silva, editors, Proc. of Latin-American Symp. Dependable Computing, volume 3747 of Lecture Notes in Computer Science, pages 245–260. Springer Verlag, 2005.