Аннотация:A new generation of pre-trained transformer language models has established new state-of-the-art results on many tasks, even exceeding the human level in standard NLU benchmarks. Despite the rapid progress, the benchmark-based evaluation has generally relied on the downstream task performance as a primary metric, limiting the scope of model comparison in their practical use, which is also limited by the resources required by the models to run. This paper presents MOdel ResOurCe COmparison (MOROCCO), a publicly available framework (https://github.com/RussianNLP/MOROCCO/) that allows assessing models concerning their downstream quality, combined with two computational efficiency metrics such as memory consumption and throughput during the inference stage. The framework allows flexible integration with popular leaderboards compatible with jiant environment, e.g. SuperGLUE. We demonstrate the MOROCCO applicability by evaluating ten transformer models on two multi-task GLUE-style benchmarks in English and Russian and provide the model analysis.