Skip to main content

RankMe: Assessing the Downstream Performance of Pretrained Self-Supervised Representations by Their Rank

Quentin Garrido, Randall Balestriero, Laurent Najman, Yann LeCun

Abstract

Joint-Embedding Self Supervised Learning (JE-SSL) has seen a rapid development, with the emergence of many method variations but only few principled guidelines that would help practitioners to successfully deploy them. The main reason for that pitfall comes from JE-SSL's core principle of not employing any input reconstruction therefore lacking visual cues of unsuccessful training. Adding non informative loss values to that, it becomes difficult to deploy SSL on a new dataset for which no labels can help to judge the quality of the learned representation. In this study, we develop a simple unsupervised criterion that is indicative of the quality of the learned JE-SSL representations: their effective rank. Albeit simple and computationally friendly, this method ---coined {\em RankMe}--- allows one to assess the performance of JE-SSL representations, even on different downstream datasets, without requiring any labels. A further benefit of RankMe is that it does not have any training or hyper-parameters to tune. Through thorough empirical experiments involving hundreds of training episodes, we demonstrate how RankMe can be used for hyperparameter selection with nearly no reduction in final performance compared to the current selection method that involve a dataset's labels. We hope that RankMe will facilitate the deployment of JE-SSL towards domains that do not have the opportunity to rely on labels for representations' quality assessment.

RankMe: Assessing the Downstream Performance of Pretrained Self-Supervised Representations by Their Rank

Quentin Garrido 1 2 Randall Balestriero 1 Laurent Najman 2 Yann LeCun 1 3 4

Joint-Embedding Self Supervised Learning (JESSL) has seen a rapid development, with the emergence of many method variations but only few principled guidelines that would help practitioners to successfully deploy them. The main reason for that pitfall comes from JE-SSL's core principle of not employing any input reconstruction therefore lacking visual cues of unsuccessful training. Adding non informative loss values to that, it becomes difficult to deploy SSL on a new dataset for which no labels can help to judge the quality of the learned representation. In this study, we develop a simple unsupervised criterion that is indicative of the quality of the learned JE-SSL representations: their effective rank. Albeit simple and computationally friendly, this method -coined RankMe -allows one to assess the performance of JE-SSL representations, even on different downstream datasets, without requiring any labels. A further benefit of RankMe is that it does not have any training or hyper-parameters to tune. Through thorough empirical experiments involving hundreds of training episodes, we demonstrate how RankMe can be used for hyperparameter selection with nearly no reduction in final performance compared to the current selection method that involve a dataset's labels. We hope that RankMe will facilitate the deployment of JE-SSL towards domains that do not have the opportunity to rely on labels for representations' quality assessment.

1 Meta AI - FAIR 2 Univ Gustave Eiffel, CNRS, LIGM, F-77454 Marne-la-Vallée, France 3 Courant Institute, New York University 4 Center for Data Science, New York University. Correspondence to: Quentin Garrido garridoq@meta.com.

Proceedings of the 40 th International Conference on Machine Learning , Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright 2023 by the author(s).

Introduction

Self-supervised learning (SSL) has shown great progress to learn informative data representations in recent years (Chen et al., 2020a; He et al., 2020; Chen et al., 2020b; Grill et al., 2020; Lee et al., 2021; Caron et al., 2020; Zbontar et al., 2021; Bardes et al., 2021; Tomasev et al., 2022; Caron et al., 2021; Chen et al., 2021; Li et al., 2022b; Zhou et al., 2022a;b; HaoChen et al., 2021; He et al., 2022), catching up to supervised baselines and even surpassing them in few-shot learning, i.e., when evaluating the SSL model from only a few labeled examples. Although various SSL families of losses have emerged, most are variants of the joint-embedding (JE) framework with a siamese network architecture (Bromley et al., 1994), denoted as JE-SSL for short. The only technicality we ought to introduce to make our study precise is the fact that JE-SSL has introduced some different notations to denote an input's representation. In short, JE-SSL often composes a backbone or encoder network e.g., a ResNet50 and a projector network e.g., a multilayer perceptron. This projector is only employed during training, and we refer to its outputs as embeddings , while the actual inputs' representation employed for downstream tasks are obtained at the encoder's output.

Although downstream tasks performance of JE-SSL representations might seem impressive, one pondering fact should be noted: all existing methods, hyperparameters, models and thus performances - are obtained by manual search involving the labels of the considered datasets . In words, JE-SSL is tuned by monitoring the supervised performance of the model at hand. Therefore, successfully deploying a SSL model on a new dataset relies on the strong assumption of having labels on that dataset to tune the SSL method e.g. through a linear classifier feeding on the JE-SSL representations (Misra & Maaten, 2020). This quality assessment strategy was also extended to the use of nonlinear classifiers, e.g., a k -nn classifier (Wu et al., 2018; Zhuang et al., 2019). Hence, although labels are not directly employed to compute the weight updates, they are used as a proxy. This limitation prevents the deployment of JE-SSL in challenging domains where the number of available labelled examples is limited. Adding to the challenge, one milestone of JE-SSL is to move away from reconstruction based learning; hence with-

out labels and without visual cues, tuning JE-SSL methods on unlabeled datasets remains challenging. This led to the application of feature inversion methods e.g. Deep Image Prior (Ulyanov et al., 2018) or conditional diffusion models (Bordes et al., 2021) to be deployed onto learned JE-SSL representation to try to visualize the learned features. Those alternative visualization solutions however suffer from their own limitations e.g. bias of the used method, or computational cost. More importantly, those feature inversion strategies have been designed for natural images i.e. it is not clear how such methods would perform on different data modalities.

In this study we propose RankMe to assess a model's performance without having access to any labels; a simple method that does not require any training or tuning. RankMe accurately predicts a model's performance both on In-Distribution (ID), i.e., same data distribution as used during the JE-SSL training, and on Out-Of-Distribution (OOD), i.e., different data distribution onto which the learned model is deployed onto. We highlight this crucial property at the top of Figure 1. The strength of RankMe lies in the fact that it is solely based on the singular values distribution of the learned embeddings which is not only simple to obtain but also easy to interpret. In fact, RankMe's motivation hinges on Cover's theorem (Cover, 1965) that states how increasing the rank of a linear classifier's input increases its training performance, and three simple hypotheses that thoroughly validate empirically at the end of our study. Since RankMe provides a step towards (unlabeled) JE-SSL by allowing practitioners to cross-validate hyperparameters and select models without resorting to labels or feature inversion methods, we hope that it will allow JE-SSL to move away from using labels as part of their design search strategy. We summarize our contributions below:

  1. We introduce RankMe (Equation (1)) and motivate its construction from first principles (Section 5) e.g. Cover's theorem
  2. We demonstrate that RankMe's ability to inform about JE-SSL downstream performances is consistent across methods, e.g. VICReg, SimCLR, DINO, and their variants, and across architectures, e.g. using a projector network and/or a nonlinear evaluation method (see Figure 2 and Section 3.3)
  3. We demonstrate that RankMe enables hyperparameter cross-validation for JE-SSL methods; RankMe is able to retrieve and sometimes surpass most of the performance previously found by manual -and label-guided- search while not employing any labels, on both in domain and out of domain datasets (Figure 1 and Tables 1 and 2)

We provide a hyperparameter free numerically stable implementation of RankMe in Section 3.1 and pseudo-code for cross-validation in Figure 4. Through extensive experiments involving 11 datasets and 110 models over 5 methods, we demonstrate that in the linear and nonlinear probing regime, RankMe is able to tell apart successful and sub-optimal JESSL training, even on different downstream tasks without having access to labels or downstream task data samples.

Background

Joint embedding self-supervised learning (JE-SSL). In JE-SSL, two main families of method can be distinguished: contrastive and non-contrastive. Contrastive methods (Chen et al., 2020a; He et al., 2020; Chen et al., 2020b; 2021; Yeh et al., 2021) mostly rely on the InfoNCE criterion (Oord et al., 2018) except for (HaoChen et al., 2021) which uses squared similarities between the embedding. A clustering variant of contrastive learning has also emerged (Caron et al., 2018; 2020; 2021) and can be thought of as contrastive methods, but between cluster centroids instead of samples. Non-contrastive methods (Grill et al., 2020; Chen &He, 2020; Caron et al., 2021; Bardes et al., 2021; Zbontar et al., 2021; Ermolov et al., 2021; Li et al., 2022c) aim at bringing together embeddings of positive samples, similar to contrastive learning. However, a key difference with contrastive learning lies in how those methods prevent a representational collapse. In the former, the criterion explicitly pushes away negative samples, i.e., all samples that are not positive, from each other. In the latter, the criterion does not prevent collapse by distinguishing positive and negative samples, but instead considers the embeddings as a whole and encourages information content maximization e.g., by regularizing the empirical covariance matrix of the embeddings. Such a categorization is not needed for our development, and we thus refer to any of the above method as JE-SSL.

Known Observations About Representations' Spectrum in JE-SSL. The phenomenon of learning rank-deficient or dimensional collapsed, embeddings in JE-SSL has recently been studied from both a theoretical and empirical point of view. The empirical emergence of dimensional collapse was studied in (Hua et al., 2021) where they proposed the use of a whitening batch normalization layer to help alleviate it. In (Jing et al., 2022), a focus on contrastive approaches in a linear setting enabled a better understanding of dimensional collapse and the role of augmentations in its emergence. Performance in a low label regime of a partially collapsed encoder can also be improved by forcing the whitening of its output, as shown in (He & Ozay, 2022). Furthermore, it was shown in (Balestriero & LeCun, 2022) how dimensional collapse is a phenomenon that should not necessarily happen in theory and how its emergence is mostly due to practical concerns. Interestingly, we will see through the lens of RankMe that dimensional collapse is tightly linked with the quality of the representation. In supervised learning, the collapse of the embeddings was also studied and found to

Figure 1. Top: Performance of JE-SSL representations (encoder output) in y-axis against the embeddings (projector output) RankMe values in x-axis on ImageNet-1k. Except for some degenerate solutions at full-rank, RankMe values correlate well with in-distribution ( left column ) and out-of-distribution ( right columns ) classification performance. Bottom: Hyperparameter selection using the common supervised linear probe strategy, α -ReQ the proposed unsupervised RankMe strategy. Values in bold represent the best performance between RankMe and α -ReQ. OOD indicates the average performance over all the considered datasets other than ImageNet. Without any label, optimization or parameters, RankMe is able to recover most of the performance obtained by using ImageNet validation set, highlighting its strength as a hyperparameter selection tool. RankMe also outperforms α -ReQ on average and does not suffer from as big performance drops in worst cases.

Figure 1. Top: Performance of JE-SSL representations (encoder output) in y-axis against the embeddings (projector output) RankMe values in x-axis on ImageNet-1k. Except for some degenerate solutions at full-rank, RankMe values correlate well with in-distribution ( left column ) and out-of-distribution ( right columns ) classification performance. Bottom: Hyperparameter selection using the common supervised linear probe strategy, α -ReQ the proposed unsupervised RankMe strategy. Values in bold represent the best performance between RankMe and α -ReQ. OOD indicates the average performance over all the considered datasets other than ImageNet. Without any label, optimization or parameters, RankMe is able to recover most of the performance obtained by using ImageNet validation set, highlighting its strength as a hyperparameter selection tool. RankMe also outperforms α -ReQ on average and does not suffer from as big performance drops in worst cases.

be detrimental to performances (Ganea et al., 2019).

As such, existing studies have started to prescribe informally the choice of representations that have a lesser collapse; yet no formal study on the ability of this recipe to actually identify successfully trained models, nor how to quantify the amount of collapse to improve representations as been proposed; this is the goal of our study.

3 RankMe Consistently Predicts Downstream performances From Representations

The goal of this section is to introduce and motivate RankMe while providing a numerically stable implementation. We defer a theoretical justification to Section 5. To ease notations, we refer to the (train) dataset used to obtain the JE-SSL model as source dataset , and the test set on the same dataset or a different OOD dataset as target dataset .

3.1 RankMe: A Simple Method and Its Implementation

The most crucial step of RankMe is the estimation of the embeddings' rank. A trivial solution could be to check at the number of nonzero singular values. Denoting by σ k the k th singular value of the ( N × K ) embedding matrix Z , this would lead to rank( Z ) = ∑ min( N,K ) k =1 1 { σ k > 0 } . However, such a definition is too rigid for practical scenarios. For example, round-off error alone could have a dramatic impact on the rank estimate. Instead, alternative and robust rank definitions have emerged (Press et al., 2007) such as rank( Z ) = ∑ min( N,K ) k =1 1 { σ k > max i σ i × max( M,N ) × ϵ } , where ϵ is a small constant dependent on the data type, typically 10 -7 for float32 . An alternative measure of rank comes from a probabilistic viewpoint where the singular values are normalized to sum to 1 and the Shannon Entropy (Shannon, 1948) is used, which corresponds to our definition of RankMe from Equation (1). We thus introduce RankMe formally as the following smooth rank measure, originally introduced in (Roy & Vetterli, 2007),

$$

$$

$$

$$

where Z is the source dataset's embeddings. As opposed to the classical rank, the chosen Equation (1) does not rely on specifying the exact threshold at which the singular value is treated as nonzero. Throughout our study, we employ Equation (1), and provide the matching analysis with the classical rank in the appendix. Another benefit of RankMe's Equation (1) is in its quantification of the whitening of the embeddings in addition to their rank, which is known to simplify optimization of (non)linear probes put on top of them (Santurkar et al., 2018). Lastly, although Equation (1) is defined with the full embedding matrix Z , we observe that not all of the samples need to be used to have an accurate estimate of RankMe. In practice, we use 25600 samples as ablation studies provided in Appendix G and Figure S11 indicate that this provides a highly accurate estimate. RankMe should however only be used to compare different runs of a given method, since the embeddings' rank is not the only factor that affects performance.

Relation of RankMe To Existing Solutions. Performance evaluation without labels can also be done using a pretexttask, such as rotation prediction. This technique helped in selecting data augmentation policies in (Reed et al., 2021). One limitation lies in the need to select and train the classifier of the pretext-task, and on the strong assumption that rotation were not part of the transformations one aimed to be invariant to. Since (supervised) linear evaluation is the most widely used evaluation method, we will focus on showing how RankMe compares with it. In (Li et al., 2022a), it is shown that the eigenspectrum of representations can be used to assess performance when used in conjunction with the loss value. This requires training an additional classifier to predict the performance and as such is not usable as is in a completely unsupervised fashion. Most related to us is (Ghosh et al., 2022) where representations are evaluated by their eigenspectrum decay, giving a baseline for unsupervised hyperparameter selection. α -ReQ relies on strong assumptions, and if they hold, then RankMe and α -ReQ can match, but we show that we outperform it on average. In fact the assumptions made by α -ReQ are known to not hold in light of collapse (He & Ozay, 2022). We investigate α -ReQ's behavior in detail in Appendix E.

3.2 RankMe Predicts Linear Probing performance Even on Unseen Datasets

In order to empirically validate RankMe, we compare it to linear evaluation, which is the default evaluation method of JE-SSL methods. Finetuning has gained in popularity with Masked Image Modeling methods (He et al., 2021), but this can have a significant impact on the properties of the embeddings and alters what was learned during the pretraining. As such, we do not focus on this evaluation.

Experimental Methods and Datasets Considered. In order to provide a meaningful assessment of the embeddings rank's impact on performance, we focus on 5 JESSL methods. We use SimCLR as a representative contrastive method, VICReg as a representative covariance based method, and VICReg-exp and VICReg-ctr which were introduced in (Garrido et al., 2022). We also include DINO (Caron et al., 2021) as a clustering approach. Applying RankMe to DINO is not as straightforward due to the clustering layer in the projector, so embeddings have to be taken right before the last projector layer. Confer Appendix C for more details. To make our work self-contained, we present the methods in Appendix A. We chose to use VICReg-exp and VICReg-ctr as they provide small modifications to VICReg and SimCLR while producing embeddings with different rank properties. For each method we vary parameters that directly influence the rank of the embeddings, whether it is the temperature used in softmax based methods, which directly impacts the hardness of the softmax, or the loss weights to give more or less importance to the regularizing aspect of loss functions. We also vary optimization parameters such as the learning rate and weight decay to provide a more complete analysis. We provide the hyperparameters used for all experiments in Appendix K. All approaches were trained in the same experimental setting with a ResNet-50 (He et al., 2016) backbone with a MLP projector having intermediate layers of size 8192 , 8192 , 2048 , which avoids any architectural rank constraints. The models were trained for 100 epochs on ImageNet with the LARS (You et al., 2017; Goyal et al., 2017) optimizer. DINO was also trained using multi-crop.

In order to evaluate the methods, we use ImageNet (our source dataset), as well as iNaturalist18 (Horn et al., 2018), Places205 (Zhou et al., 2014), EuroSat (Helber et al., 2019), SUN397 (Xiao et al., 2010), and StanfordCars (Krause et al., 2013) to evaluate the trained models on unseen datasets. While we focus on these datasets for our visualizations, we also include CIFAR10, CIFAR100 (Krizhevsky et al., 2009), Food101 (Bossard et al., 2014), VOC07 (Everingham et al.) and CLVR-count (Johnson et al., 2017) for our hyperparameter selection results, and provide matching visualizations in Appendix D. These commonly used datasets provide a wide range of scenarios that differ from ImageNet and provide meaningful ways to test the robustness of RankMe. For example, iNaturalist18 consists of 8142 classes focused on fauna and flora which requires more granularity than similar classes on ImageNet, SUN397 focuses on scene understanding, deviating from the single object and objectcentric images of ImageNet, and EuroSat consists of satellite images which again differ from ImageNet. Datasets such as iNaturalist can also allow theoretical limitations to manifest themselves more clearly due to the number of classes being significantly higher than the rank of learned representations.

Figure 2. Validation of RankMe when evaluating performance on representations. We see that having a high rank is a necessary condition for good downstream performance.

Figure 2. Validation of RankMe when evaluating performance on representations. We see that having a high rank is a necessary condition for good downstream performance.

In order to evaluate on those datasets, we rely on the VISSL library (Goyal et al., 2021). We provide complete details on the pretraining and evaluation setup in Appendix I.

Reproduction of figures with the classical rank estimator

As we can see in Figures 1 and 2, for a given method the performance on the representations is improved by a higher embedding rank, whether we look on ImageNet on which the models were pretrained or on downstream datasets. This is best seen when looking at DINO, where we notice a clear trend across all datasets. On EuroSat, the relationship is not clear since the performances are so close between all models. When looking at VICReg on on StanfordCars, we can clearly see that a high rank is only a necessary condition. Here the best performance is not achieved with the highest rank, even if full rank embeddings still achieve good performance. We discuss the link between rank, number of classes, and performance in Section 5 to give some insights into RankMe's behavior in settings with few classes such as StanfordCars. It is also very tempting to draw conclusions when comparing different approaches, especially when looking at the ImageNet performance, however since dimensional collapse is not the only performance deciding factor one should refrain from doing so.

Figure 3. Impact of rank on performance on other architectures and evaluation protocols. (Left) Using a 3 layer MLP as classification head does not alter the performance before or after the projector, showing that RankMe can go beyond linear evaluation. (Right) The same conclusion holds for k-NN evaluation on ImageNet, where RankMe remains a good indicator of performance.

Figure 3. Impact of rank on performance on other architectures and evaluation protocols. (Left) Using a 3 layer MLP as classification head does not alter the performance before or after the projector, showing that RankMe can go beyond linear evaluation. (Right) The same conclusion holds for k-NN evaluation on ImageNet, where RankMe remains a good indicator of performance.

3.3 RankMe Also Holds for Non-linear Probing

While we have been focusing on linear evaluation, one can wonder if the behaviors change when using a more complex task-related head. We thus give some evidence that the previously observed behaviors are similar with a non-linear classification head. we use a simple 3 layer MLP with intermediate dimensions 2048 , where each layer is followed by a ReLU activation. This choice of dimensions ensures that there are no architectural rank constraints on the embeddings. We focus on SUN397 for its conceptual difference to ImageNet. The low rank of embeddings produced by SimCLR would suggest that a non-linear classifier might help improve performance, since it is not as theoretically limited by the embeddings' rank as it is in the linear setting. However we can see in Figure 3 that the behaviors for all methods are the same as in the linear regime. This would suggest that RankMe is also a suitable metric to evaluate downstream performance in a non-linear setting. We perform the same analysis using a k -NN classifier, following the protocol of (Zhuang et al., 2019; Caron et al., 2020), where we use 36 combinations of k and temperature and report the best performance. We see in Figure 3 that RankMe remains a good predictor of dowstream performance, with curves that are similar to what was observed with a linear classifier. Since a k -NN classifier evaluates the preservation of the euclidean distance instead of the linear separability, the results suggest that RankMe can extend to more evaluation protocols.

4 RankMe for Label-Free Cross-Validation

We previously focused on validating RankMe by comparing overall performance compared to linear evaluation. In this section we focus on the evolution of rank and performance when varying one hyperparameter at a time in order to demonstrate how RankMe can be used for hyperparameter selection. We focus on loss specific hyperparameters such as the loss weights or temperature as well as hyperparameters related to optimization, such as the learning rate and weight decay.

Figure 4. (Left) Algorithm describing how to use RankMe for hyperparameter selection. We select either the highest rank model, or if there are multiple ones, the one with the minimal/maximal value achieving it. (Right) Visual example of the hyperparameter selection applied to SimCLR's temperature and learning rate. The star indicates the value that is selected using RankMe, and the triangle the one with the ImageNet oracle. Notice the high rank of oracle selected models.

Figure 4. (Left) Algorithm describing how to use RankMe for hyperparameter selection. We select either the highest rank model, or if there are multiple ones, the one with the minimal/maximal value achieving it. (Right) Visual example of the hyperparameter selection applied to SimCLR's temperature and learning rate. The star indicates the value that is selected using RankMe, and the triangle the one with the ImageNet oracle. Notice the high rank of oracle selected models.

Detailed tables for hyperparameter selection

As we have shown before, having a higher rank is necessary for better performance, and using RankMe to find the best value of an hyperparameter is as simple as choosing the value that leads to the highest rank, as illustrated in Figure 4. Certain hyperparameters will lead to plateaus of equal rank, and for those the value that first achieves the maximal value of RankMe should be selected. This second part is however only applicable when hyperparameter values can be ordered.

Even in cases where the values cannot be compared, and equal ranks are found in a different setting, this still makes it possible to discard some runs and only focus on the one that achieve the maximal rank. This further highlights how maximal rank is only a necessary condition for good performance. Nonetheless, when the hyperparameters are ordered we can go one step further and use the rank alone to find a good hyperparameter value.

Experiments

In order to demonstrate the effectiveness of RankMe for hyperparameter selection, we apply the algorithm presented in Figure 4 to find the best values for a given set of hyperparameters for VICReg, SimCLR and DINO. Our focus is on the covariance and invariance weights in VICReg, the temperature in SimCLR, on learning rate and weight decay for both, and on the student and teacher temperatures in DINO. We compare the performance on ImageNet as well as the average performance on the previously discussed OOD datasets to models selected by their ImageNet top-1 accuracy on its validation set. For per dataset performance, confer Appendix J.

On the embeddings. As we can see in Table 1, using RankMe we are able to retrieve most of the performance on ImageNet, with gaps being lower than half a point on average. It is not possible to beat the selection using ImageNet's validation, since this is the metric we are evaluating on. However, on OOD datasets we are able to improve the performance in certain settings, while having similar performance on average. Thus, when comparing performance after the projector, RankMe is the better approach of the two to select the hyperparameters that will generalize best to unseen datasets. When comparing to α -ReQ, RankMe achieves better in domain performance, but on OOD datasets α -ReQ performs slightly better, though with bigger worst case performance gaps. We provide an in-depth analysis of α -ReQ in Appendix E, where we find that the power law prior of α -ReQ fails on the embeddings and as such those results must be interpreted with care. As pointed out in (Girish et al., 2022), using ImageNet performance to select models can lead to suboptimal performance in downstream tasks, which our results further confirm and reinforces the need for a new way of selecting hyperparameters.

On the representations. When looking at performance before the projector in Figure 1, we can see that RankMe does not beat the models selected with ImageNet's validation set, even on OOD datasets. However, RankMe performs better than α -ReQ in most settings, while not suffering from as severe drops in the worst cases. Nevertheless, the gaps between RankMe and the ImageNet oracle are on average of less than half a point, which shows how competitive RankMe can be for hyperparameter selection, despite using no labeled data, having no parameters to tune, and being able to be computed in a couple of minutes.

iNat-18 pretraining. To show how our results extend beyond ImageNet pretraining, we applied the same protocol but pretrained our models on iNat-18. For these experiments

Table 1. Top-1 accuracies obtained on the embeddings by doing hyperparameter selection using ImageNet validation performance, α -ReQ or RankMe. OOD indicates the average performance over all the considered datasets other than ImageNet.

Table 2. Using RankMe on networks pretrained on iNat-18. We see than RankMe can improve OOD performance for VICReg, but leads to a small drop for SimCLR.

we only compare for SimCLR's temperature and VICReg's covariance weight. Due to the high number of classes of iNat-18, we chose a projector with output dimension 8192. Since the rank cannot be higher than 2048, we apply a threshold to not choose the highest rank but the highest realistically possible. See Appendix B for more details. We also compare RankMe to the performance on ImageNet, to imitate a practical setting where we do not have labels for our source dataset, but have access to labels for another related one. As we can see in Table 2, for VICReg's covariance weight, RankMe leads to performance similar to the iNat-18 oracle on iNat-18, but slightly outperforms it on OOD datasets. It also beats the ImageNet oracle and α -ReQ by a significant margin. On SimCLR's temperature, we notice a small drop in performance for RankMe compared to the oracles, but it still outperforms α -ReQ by a significant margin in all settings. These results further reinforce the use of RankMe in general settings, even beyond ImageNet.

Finetuning based benchmarks. While we have studied how RankMe is able to perform hyperparameter selection when targeting linear evaluation, finetuning based evaluations are also popular for tasks such as semi-supervised classification or object detection. Even though this setup alters the pretrained weight and thus can change the rank of the representations, our goal is to see whether RankMe can still be used when targeting these evaluations. We com-

Table 3. Using RankMe on finetuning based benchmarks. The ImageNet oracle is the linear evaluation oracle. In the semi-supervised setting we report the top-1 accuracy and report the AP50 for object detection. We see that in the semi-supervised setting on ImageNet RankMe only leads to small drops in performance compared to the task or full ImageNet Oracle. For object detection we even see matching or increased performance over the ImageNet oracle.

pare against the task oracle, α -ReQ and to the ImageNet linear evaluation oracle, which allows us to see if linear accuracy on ImageNet is correlated for these benchmarks. We evaluate all methods on ImageNet 1% and 10% for semisupervised classification, as well as on PascalVOC07+12 for object detection, following the protocol of (Bardes et al., 2021). As we can see in Table 3, RankMe is able to retrieve most of the performance of the task oracle, except in the case of SimCLR's temperature on ImageNet-1% where all methods lag behind the task oracle. This also shows that the linear performance on the full ImageNet dataset is not perfectly correlated with the performance in a few shot setting. There is no clear winner between α -ReQ and RankMe in these finetuning-based evaluations, but we can see smaller drops in performance for RankMe, similarly as in previous experiments. Nevertheless, these results suggest that even in a setting where the applicability of RankMe is not guaranteed due to the finetuning, it can still be a good method to

Figure 5. Validation of the hypotheses motivating RankMe. (Left, Middle Left) Embeddings' rank transfers from source to target datasets. The estimates use 25600 images from the respective datasets. (Middle Right) Train and test accuracy are highly correlated across datasets. (Right) An increase in performance on embeddings leads to an increase in performance on representations.

Figure 5. Validation of the hypotheses motivating RankMe. (Left, Middle Left) Embeddings' rank transfers from source to target datasets. The estimates use 25600 images from the respective datasets. (Middle Right) Train and test accuracy are highly correlated across datasets. (Right) An increase in performance on embeddings leads to an increase in performance on representations.

select hyperparameters in an unsupervised fashion.

5 RankMe: From Theory to Implementation

Our goal is to build a theoretically grounded intuition into the construction of RankMe. To that hand, we first quantify approximation and classification errors of learned embeddings as a function of their rank, and then motivate how embeddings' rank can be sufficient to compare test performance of JE-SSL models's representations.

From Source Embeddings's Rank to Target Representations performance. We first build some intuition in the regression settings. In this case, the Eckart-Young-Mirsky theorem (Eckart & Young, 1936) ties the best-case and worstcase approximation error of any target matrix Y ∈ R N × C from a rankR matrix P ∈ R N × C to the singular values of Y that run from R to the rank of Y when ordered in decreasing order. Without loss of generality, we only consider the case C > N in this study, i.e., we have more samples than dimensions. Formally, this provides a lower bound on

$$

$$

which is tight for P of rank R , and with σ k the operator returning the k th singular value of its argument, ordered in decreasing order. This result, on which RankMe relies on, demonstrates that a necessary (but not sufficient) condition for an approximation P to well approximate Y is to have at least the same rank as Y . Asimilar result can be obtained in classification by considering multiple one-vs-all classifiers. In practice, however, we commonly employ a linear probe network on top of given embeddings Z to best adapt them to the target Y , i.e., P = ZW + 1 b T . However, a linear transformation is not able to increase the rank of the input matrix since

$$

$$

We directly obtain that min W , b ∥ Y -ZW -1 b T ∥ 2 F ≥ ∑ C r = R +1 σ 2 r ( Y ) . In short, the approximation lower bound is not improved by allowing linear transformation of the embeddings. Further supporting the above, we ought to recall Cover's theorem (Cover, 1965) stating that the probability of a randomly labeled set of points being linearly separable only increases if N is reduced or R is increased. We combine those results below.

Proposition 5.1. The maximum training accuracy of given embeddings in linear regression or classification increases with their rank. For classification, it plateaus when the rank surpasses the number of classes.

By noticing that RankMe provides a smooth measure of the embeddings' rank we can lean on Proposition 5.1 to see that given two models, the one with greater RankMe value will have greater training performance. This is only guaranteed for different models of the same method, since embedding rank is not necessarily the only factor that affects performance.

The above result is however not too practical yet since what we are truly interested in are (i) performance on unseen samples, i.e., on the test set and out-of-distribution tasks, and (ii) performance on the representations and not the embeddings since it is common to ablate the projector network of JE-SSL models. Below, we validate three key hypotheses which, when verified, imply that we can extend the impact of RankMe such that (OOD) test performance of JE-SSL representations are increased when RankMe's value on their train set embeddings is increased .

Validating RankMe's Hypotheses. The development of RankMe is theoretically grounded when it comes to guaranteeing improved source dataset embeddings performance. To empirically extend it to target dataset representations performance we need to verify three hypotheses: (i) linear probes do not overfit, (ii) embeddings and representations performance are monotonically linked, and (iii) source and (OOD) target embeddings ranks are monotonically linked. Due to the different nature of datasets used for downstream

tasks, there is no inherent reason for the rank of embeddings to transfer in a monotonic way to them. However, if the source dataset is diverse enough and target datasets have some semantic overlap with the source dataset, then we have

$$

$$

We observe in Section 3.2 and Figure 5 that the rank of JE-SSL representations scales linearly between different input distributions e.g. going from a source task such as Imagenet (Deng et al., 2009) to a target task such as iNaturalist. This is further confirmed by Pearson correlation coefficients greater than 0 . 99 . Interestingly, we observe that the StanfordCars dataset suffers from a less distinctive linear scaling due to the dataset distribution having a small overlap with ImageNet. This indicates that as long as the source dataset is relatively diverse, then using RankMe to select a model with greater embeddings' rank will also correspond to selecting a model with greater embeddings' rank on the target dataset.

Furthermore, as the train performance increases, so does the test performance. We validate this in the middle right of Figure 5. As a result, using RankMe to select a model with greater train performance is enough to also select a model with greater test performance.

Finally, we report on the right of Figure 5 that the performance on embeddings and representations scales almost monotonically. These results are supported by visualizations of embeddings and representations from feature inversion models (Bordes et al., 2021). Hence, using RankMe to select the model maximizing the performance on the former also selects a model maximizing performance on the latter. With these hypotheses validated empirically, we can confidently say that RankMe computed on the embeddings of the source dataset is a predictor of representations' performance on target datasets, reinforcing our experimental insights.

Conclusion

We have shown how the phenomenon of dimensional collapse in self-supervised learning can be used as a powerful metric to evaluate models. By using a theoretically motivated analogue of the rank of embeddings, we show that the performance on downstream datasets can easily be assessed by only looking at the training dataset, without any labels, training, or parameters. While our work focuses on linear classification, we show promising results in nonlinear classification that raise the question of how general this simple metric can be. Furthermore, its competitiveness with traditional oracle based hyperparameter selection methods makes it a promising tool in settings where labels are scarce, such as in the case of large uncurated datasets. As such, this work makes a step towards completely labelless self-supervised learning, as most existing approaches' hyperparameters are tuned with the help of ImageNet's validation set. Further work will explore the use of RankMe in more varied scenarios, to further legitimize its use in designing better self-supervised approaches.

Acknowledgments

The authors wish to thank Li Jing, Grégoire Mialon, Adrien Bardes, and Yubei Chen in no particular order, for insightful discussions. We also thank Florian Bordes for the efficient implementations that were used for our experiments.

Experiments

Balestriero, R. and LeCun, Y. Contrastive and noncontrastive self-supervised learning recover global and local spectral embedding methods. arXiv preprint arXiv:2205.11508 , 2022.

Bossard, L., Guillaumin, M., and Van Gool, L. Food-101 mining discriminative components with random forests. In European Conference on Computer Vision , 2014.

Bromley, J., Guyon, I., LeCun, Y., Sackinger, E., and Shah, R. Signature verification using a 'siamese' time delay neural network. In NeurIPS , 1994.

Caron, M., Bojanowski, P., Joulin, A., and Douze, M. Deep clustering for unsupervised learning. In ECCV , 2018.

Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. A simple framework for contrastive learning of visual representations. In ICML , pp. 1597-1607. PMLR, 2020a.

Background

In order to make our work as self-contained as possible, we recall the loss functions of the methods we study. For conciseness, we refer to the outputs of the encoder as representations and the outputs of the projection head as embeddings , which we denote by z i ∈ R d . We first briefly recall that the SimCLR loss is given by

̸

$$

$$

with P the set of all positive pairs in the current mini-batch or dataset that comprise N exemplars.

VICReg's loss is defined with three components. The variances loss v acts as a norm regularizer for the dimensions, and the covariance loss aims at decorrelating dimensions in the embeddings. They are respectively defined as

$$

$$

Both of these loss are combined with an invariance loss that matches positives pairs, giving a final loss of

$$

$$

VICReg-exp is defined similarly, but with the exponential covariance loss defined as

$$

$$

VICReg-ctr is then VICReg-exp but applied to Z T , making it a contrastive approach and conceptually similar to SimCLR. These methods give us different scenarios of collapse and allow us to make a more general study of the rank of representations as a powerful metric.

Visualizations on iNaturalist-18 label{sec:inat-pretrain

Figure S1. RankMe applied to iNaturalist18 pretrainings. The vertical line indicates the rank constraint placed by the representation size, and so any rank above should be counted as 2048.

Figure S1. RankMe applied to iNaturalist18 pretrainings. The vertical line indicates the rank constraint placed by the representation size, and so any rank above should be counted as 2048.

As we can see in Figure S1, RankMe produces curves with the same trend as on ImageNet, for both SimCLR and VICReg. We can see that VICReg leads to ranks that go beyond 2048, but the dimension of the manifold formed by the embeddings cannot be higher than 2048 due to the dimension of the representations. As such, for any practical purpose we clip the value of RankMe at 2048.

Applicability to cluster based methods label{sec:dino

While we have studied the applicability of RankMe on contrastive methods, cluster based methods such as DINO have become extremely popular, and since the definition of embeddings is not as clear cut in them, a thorough analysis is required. We will proceed in two steps:

Figure S2. DINO's projection head can be split in two parts, a classical projector and a clustering layer (Left) . Collapse happens before the clustering layer and not on the clustering prototypes (Right) .

Figure S2. DINO's projection head can be split in two parts, a classical projector and a clustering layer (Left) . Collapse happens before the clustering layer and not on the clustering prototypes (Right) .

As we can see in figure S2, DINO's projector can be interpreted as both a classical projector and a clustering layer, whose weights are clustering prototypes. This interpretation comes from the softmax that is applied on the output of the projection head which can be interpreted as an InfoNCE between the embeddings and the clustering prototypes that make up the clustering layer. We see that both the embeddings and the clustering prototypes are collapsed, though at different levels.

As we can see in Figure S3, the phenomenon of dimensional collapse is highly visible in DINO, which enables the use RankMe to find optimal hyperparameter values. While in Figure 1 we applied RankMe to the embeddings to be consistent with other methods, we see that it can be applied directly to the prototypes, yielding very similar results and matching the ImageNet oracle here. The main advantage coming from using prototypes is that they are already computed during training, and as such the application of RankMe does not require computing any embeddings. This makes RankMe even more appealing for clustering based methods where such technique can be applied.

Complete visualizations on all datasets label{sec:sup-datasets

While we previously focused on certain datasets for their interesting natures, we provide additional visualizations for the remaining datasets, as well as for performance on the embeddings.

As we can see in Figures S4 and S5, we find similar behaviors as before, apart from Food101 where performance are almost identical for all methods. This reinforces the previous validation of RankMe. The relative simplicity of the datasets targeted here makes the theoretical limitations of rank-deficient embeddings harder to see, even though we still see that a high rank helps generalization.

Figure S4. Link between embedding rank and downstream performance on the embeddings.

Figure S4. Link between embedding rank and downstream performance on the embeddings.

Detailed results for $ alpha$-ReQ label{sec:areg

In order to further study the performance of α -ReQ, we reproduce our plots for RankMe using α -ReQ instead of the rank of embeddings. We compare both the intended use of α -ReQ in Figure S6, as well as applying it on the embeddings to measure performance on the representations, which we found was necessary for RankMe in Figure S7. We do not include DINO in those plots for readability, as it would force us to change the x-axis scale, making the results harder to interpret.

As we can see in Figure S6, there are no clear link visible between the value of α -ReQ and downstream performance. Especially, we are unable to see the tendency of performance to increase as α tends to one. Nonetheless α -ReQ was still able to lead to good performance when used for hyperparameter selection.

When applying α -ReQ as we would RankMe, we can see in Figure S7 that there is again no trend of performance increase when α tends to one. On the contrary we even find that performance tends to get better with a lower α , as is most visible on StanfordCars, iNaturalist18 or ImageNet for example. α going towards zero means that the singular values of the embeddings tends to a uniform distribution, in line with the goal of RankMe.

As we can see in Figures S8 and S9, the power-law prior of α -ReQ holds well in the case of non-collapsed embeddings, but when we apply it on collapsed ones, this assumptions fails. It even provides a poor approximation of the main rank "plateau" with the highest singular values as can be seen on the right of Figure S9. This further confirms the findings of (He & Ozay, 2022), and shows that when applying α -ReQ directly on the embeddings one must be careful since the core assumptions of the method is violated.

Figure S6. Link between α -ReQ measured on the representations and performance on the representations.

Figure S6. Link between α -ReQ measured on the representations and performance on the representations.

Figure

Figure S8. Validation of the power-law prior on un-collapsed representations. (Left) Overall visualization. (Right) Zoom on the high singular values.

Figure

Index

Comparison of the rank estimators

Figure S10. Relationship between the two rank estimators, Pearson correlation coefficient of 0 . 99 . Outliers indicate embeddings with singular values to the threshold, showing how the entropic rank takes into account this information.

Figure S10. Relationship between the two rank estimators, Pearson correlation coefficient of 0 . 99 . Outliers indicate embeddings with singular values to the threshold, showing how the entropic rank takes into account this information.

Since we do not rely on the classical threshold-based rank estimator, it is important to verify how well our entropy based one correlates with it. As we can see in Figure S10, both estimates discussed previously correlate extremely well, showing that using one or the other should not lead to significant differences, as validated in Appendix H. Nonetheless, the entropic estimator takes into account the degree of whitening of the embeddings, which links better to theoretical results.

Convergence of the rank estimators

Figure S11. Convergence of the rank estimators on ImageNet as a function of the number of samples for 2048 dimensional outputs, as indicated by the vertical line.

Figure S11. Convergence of the rank estimators on ImageNet as a function of the number of samples for 2048 dimensional outputs, as indicated by the vertical line.

As we can see in Figure S11, the rank estimates converge extremely quickly, especially for VICReg. For both VICReg and SimCLR, 10000 samples are enough to obtain more than 95% of the final rank. It is worth noting that the entropic rank estimator converges more slowly than the classical rank estimator, as it is more sensitive to the singular values. The fact that the rank can be approximated with few samples is encouraging for its use during training and not only as a measure of performance after pretraining.

Figure S12. Reproduction of Figure 5 with the classical rank estimator. Embeddings' rank transfers from source to target datasets. The estimates used 25600 images from the respective datasets.

Figure S12. Reproduction of Figure 5 with the classical rank estimator. Embeddings' rank transfers from source to target datasets. The estimates used 25600 images from the respective datasets.

Figure S13. Reproduction of Figure 2 with the classical rank estimator. (Left) Validation of RankMe on embeddings, a higher ImageNet rank leads to improved performance across methods and datasets. (Right) Validation of RankMe on representations, where the link is even clearer, reinforcing RankMe's practical use.

Figure S13. Reproduction of Figure 2 with the classical rank estimator. (Left) Validation of RankMe on embeddings, a higher ImageNet rank leads to improved performance across methods and datasets. (Right) Validation of RankMe on representations, where the link is even clearer, reinforcing RankMe's practical use.

Reproduction of figures with the classical rank estimator

As can be seen in Figures S12 and S13, the results that we obtain using the classical threshold-based rank estimator are extremely similar to the ones with the entropic estimator. The exact values do differ, but the behaviors stay the same. One of the main differences is illustrated in Figure S13, where we can see that the target rank is almost identical to the source one when we previously saw a drop of around 50% . This can be explained by the fact that some features may be less present in the target dataset, reducing the associated singular values, and thus the entropic rank.

All of this shows that using one or the other will lead to similar results in practical scenarios.

Detailed training and evaluation procedures

Pretraining

Table S1. Image augmentation parameters, taken from (Grill et al., 2020).

All pretrainings were done with ResNet-50 backbones. The projector used is a MLP with intermediate dimensions 8192 , 8192 , 2048 ( 8192 , 8192 , 2048 , 32768 for DINO). VICReg, VICReg-ctr, VICReg-exp and SimCLR were trained with the LARS optimizer using a momentum of 0 . 9 , weight decay 10 -6 and varying learning rates depending on the method. VICReg used 0 . 3 base learning rate, SimCLR 0 . 5 or 0 . 6 depending on the experiment, VICReg-exp 0 . 6 and VICReg-ctr 0 . 6 . DINO was trained with AdamW (Loshchilov & Hutter, 2017) using a learning rate of 0 . 00025 , using multi-crop 6 additional crops of size 96 × 96 . The learning rate is then computed as lr = base _ lr ∗ batch _ size/ 256 . We do a 10 -epochs linear warmup and then use cosine annealing. we use batch sizes of 2048 for SimCLR and 1024 for other methods. SimCLR and VICReg-ctr also use a default temperature of 0 . 15 , and 0 . 1 for VICReg-exp.

we use the image augmentation strategy from (Grill et al., 2020) illustrated in Table S1. For the pretrainings on iNaturalist-18, we use the same protocol but with a 300 epoch pretraining to account for its smaller size compared to ImageNet.

Evaluation

Table S2. Optimization parameters used to evaluate on downstream datasets

For all datasets except StanfordCars, we use the standard protocol in VISSL. On StanfordCars we mostly tuned the learning rate. The parameters that we use are described in Table S2. For data augmentation, we use random resized crops and random horizontal flips during training, and center crop for evaluation. For VOC07, we follow the common protocol using SVMs, as used in (Bardes et al., 2021). We use the default VISSL settings for this evaluation.

Detailed tables for hyperparameter selection

Table S3. Top-1 accuracies computed on representations when tuning hyperparameters with ImageNet validation performance, RankMe or with α -ReQ.

Complete performance tables

Table S5. Hyperparameters for all runs.

DatasetMethodLabelsVICRegVICRegVICRegVICRegSimCLRSimCLRSimCLRDINODINO
cov.inv.LRWDtemp.LR.WD.t-temp.s-temp.
ImageNetImageNet Oracle68.268.268.668.068.568.568.372.372.4
ImageNetα -ReQX67.967.559.567.863.568.132.371.766.2
ImageNetRankMeX67.867.968.267.867.168.068.372.272.4
OODImageNet Oracle68.768.768.968.868.768.768.871.972.5
OODα -ReQX68.167.863.868.465.168.268.671.868.5
OODRankMeX67.768.368.768.467.668.468.871.872.5
Algorithm 1 Hyperparameter selection with RankMeAlgorithm 1 Hyperparameter selection with RankMe
Require: Models f 1 , . . . , f N to compare, in increasing value of the hy- perparameterRequire: Models f 1 , . . . , f N to compare, in increasing value of the hy- perparameter
Require: Corresponding ranks r 1 , . . . , r NRequire: Corresponding ranks r 1 , . . . , r N
1:f best ← f 1 , r best ← r 1
2:for i = 2 to N do
3:if r i > r best then
4:f best ← f i , r best ← r i
5:else if r i = r best and ( r i > r i - 1 or r i > r i +1 ) then
6:f best ← f i , r best ← r i
7:return f best
DatasetMethodVICRegVICRegVICRegVICRegSimCLRSimCLRSimCLRDINODINO
DatasetMethodcov.inv.LRWDtemp.LR.WD.t-temp.s-temp.
ImageNet Oracle59.759.759.759.756.956.957.154.664.8
α -ReQ59.659.236.259.351.556.449.053.353.3
RankMe59.659.759.759.556.556.057.153.364.8
ImageNet Oracle55.355.655.355.554.754.754.755.660.6
α -ReQ55.555.748.055.156.954.654.852.652.6
RankMe55.555.655.355.056.454.454.752.660.6
DatasetMethodCov.temp.
iNat-18iNat-18 Oracle36.9628.60 28.60 22.94
iNat-18ImageNet Oracle35.63
iNat-18α -ReQ25.43
iNat-18RankMe36.8927.14
OODiNat-18 Oracle60.758.23
OODImageNet Oracle60.6558.23
OODα -ReQ56.5156.30
OODRankMe60.9157.34
DatasetMethodCov.temp.
ImageNet-1%Task Oracle ImageNet Oracle α -ReQ RankMe39.7 39.7 39.234.6 31.3 27.3
Task Oracle ImageNet Oracle α -ReQ62.7 62.6 62.762.6 62.6 59.1
ImageNet-10%38.730.9
VOC07+12 (AP50)Task Oracle ImageNet Oracle α -ReQ
VOC07+12 (AP50)RankMe62.761.8
VOC07+12 (AP50)79.781.8
78.281.0
79.080.3
RankMe79.781.0
DatasetMethodDINODINO
t-temp.s-temp.
ImageNetImageNet Oracle α -ReQ72.3 71.772.4 66.2
ImageNetRankMe-embs72.272.4
ImageNetRankMe-prots72.372.4
OODImageNet Oracle71.972.5
OODα -ReQ71.868.5
OODRankMe-embs71.872.5
OODRankMe-prots71.972.5
ParameterView 1View 2
Random crop probability1 . 01 . 0
Horizontal flip probability0 . 50 . 5
Color jittering probability0 . 80 . 8
Brightness adjustment max intensity0 . 40 . 4
Contrast adjustment max intensity0 . 40 . 4
Saturation adjustment max intensity0 . 20 . 2
Hue adjustment max intensity0 . 10 . 1
Grayscale probability0 . 20 . 2
Gaussian blurring probability1 . 00 . 1
Solarization probability.0 . 00 . 2
DatasetOptimizerWeight decayMomentumLearning rateEpochs
ImageNetSGD (w/ Nesterov)0 . 000040 . 90 . 330
iNaturalist18SGD (w/ Nesterov)0 . 00050 . 90 . 0184
Places205SGD (w/ Nesterov)0 . 00050 . 90 . 0114
EuroSatSGD (w/ Nesterov)0 . 00050 . 90 . 0128
Sun397SGD (w/ Nesterov)0 . 00050 . 90 . 0128
StanfordCarsSGD (w/ Nesterov)0 . 00050 . 90 . 128
CIFAR10SGD (w/ Nesterov)0 . 00050 . 90 . 0128
CIFAR100SGD (w/ Nesterov)0 . 00050 . 90 . 0128
CLEVR-countSGD (w/ Nesterov)0 . 00050 . 90 . 0150
Food101SGD (w/ Nesterov)0 . 00050 . 90 . 0128
VOC07N/A, see in textN/A, see in textN/A, see in textN/A, see in textN/A, see in text
DatasetMethodVICRegVICRegVICRegVICRegSimCLRSimCLRSimCLRDINODINO
DatasetMethodcov.inv.LRWDtemp.LR.WD.t-temp.s-temp.
ImageNet Oracle68 . 268 . 268 . 668 . 068 . 568 . 568 . 372 . 372 . 4
RankMe67 . 867 . 968 . 267 . 867 . 168 . 068 . 372 . 272 . 4
α -ReQ67 . 967 . 559 . 567 . 863 . 568 . 132 . 371 . 766 . 2
ImageNet Oracle38 . 438 . 438 . 838 . 339 . 239 . 238 . 945 . 846 . 3
RankMe36 . 737 . 238 . 438 . 337 . 838 . 138 . 946 . 046 . 3
α -ReQ37 . 836 . 928 . 938 . 334 . 138 . 438 . 745 . 139 . 2
ImageNet Oracle51 . 251 . 251 . 851 . 352 . 452 . 452 . 654 . 354 . 4
RankMe51 . 251 . 451 . 251 . 652 . 352 . 352 . 654 . 254 . 4
α -ReQ51 . 151 . 447 . 851 . 650 . 752 . 352 . 654 . 452 . 8
ImageNet Oracle96 . 296 . 296 . 396 . 296 . 596 . 596 . 496 . 696 . 6
RankMe96 . 196 . 196 . 296 . 096 . 696 . 496 . 496 . 396 . 6
α -ReQ96 . 196 . 195 . 196 . 096 . 496 . 696 . 296 . 895 . 9
ImageNet Oracle68 . 468 . 468 . 668 . 668 . 968 . 969 . 271 . 771 . 8
RankMe68 . 668 . 368 . 468 . 869 . 168 . 569 . 272 . 171 . 8
α -ReQ68 . 767 . 964 . 168 . 866 . 468 . 468 . 571 . 569 . 8
ImageNet Oracle55 . 755 . 755 . 855 . 654 . 454 . 454 . 965 . 166 . 0
RankMe51 . 154 . 055 . 755 . 451 . 553 . 954 . 965 . 866 . 0
α -ReQ54 . 251 . 743 . 255 . 445 . 254 . 354 . 763 . 554 . 5
ImageNet Oracle75 . 075 . 075 . 075 . 075 . 075 . 075 . 075 . 075 . 0
RankMe75 . 075 . 075 . 075 . 075 . 075 . 075 . 075 . 075 . 0
α -ReQ75 . 075 . 075 . 075 . 075 . 075 . 075 . 075 . 075 . 0
ImageNet Oracle84 . 384 . 384 . 384 . 084 . 584 . 583 . 988 . 488 . 0
RankMe84 . 183 . 884 . 384 . 083 . 883 . 983 . 988 . 388 . 0
ImageNet Oracle55 . 755 . 756 . 056 . 851 . 951 . 953 . 2
53 . 055 . 455 . 753 . 148 . 052 . 353 . 256 . 859 . 3
α -ReQ50 . 644 . 050 . 556 . 859 . 3
RankMe52 . 155 . 190 . 053 . 151 . 359 . 254 . 6
ImageNet Oracle90 . 190 . 189 . 890 . 690 . 690 . 391 . 592 . 2
RankMe89 . 589 . 890 . 189 . 789 . 490 . 690 . 390 . 592 . 2
72 . 372 . 8
72 . 372 . 172 . 273 . 173 . 7
ImageNet Oracle71 . 672 . 372 . 273 . 873 . 873 . 774 . 375 . 6
RankMe72 . 373 . 175 . 6
ImageNet Oracle68 . 768 . 768 . 968 . 768 . 768 . 768 . 872 . 072 . 5
RankMe67 . 768 . 368 . 768 . 367 . 568 . 468 . 871 . 872 . 5
67 . 863 . 468 .64 . 965 . 371 . 8
α -ReQ68 . 1368 . 268 . 3
MethodVICRegVICRegVICRegVICRegSimCLRSimCLRSimCLRDINODINO
Methodcov.inv.LRWDtemp.LR.WD.t-temp.s-temp.
ImageNet Oracle59 . 759 . 759 . 759 . 756 . 956 . 957 . 154 . 664 . 8
RankMe59 . 659 . 759 . 759 . 556 . 556 . 057 . 153 . 364 . 8
α -ReQ59 . 659 . 236 . 259 . 351 . 556 . 449 . 053 . 353 . 3
ImageNet Oracle13 . 514 . 213 . 513 . 610 . 310 . 310 . 15 . 015 . 8
RankMe14 . 214 . 213 . 513 . 416 . 79 . 910 . 13 . 615 . 8
α -ReQ14 . 214 . 82 . 513 . 221 . 510 . 010 . 03 . 63 . 6
ImageNet Oracle42 . 743 . 342 . 743 . 441 . 241 . 241 . 238 . 944 . 9
RankMe43 . 243 . 342 . 742 . 743 . 440 . 841 . 236 . 444 . 9
α -ReQ43 . 243 . 629 . 642 . 942 . 641 . 041 . 536 . 436 . 4
ImageNet Oracle91 . 391 . 791 . 391 . 090 . 490 . 489 . 591 . 393 . 2
RankMe91 . 091 . 791 . 391 . 392 . 389 . 089 . 589 . 393 . 2
α -ReQ91 . 091 . 485 . 190 . 894 . 489 . 689 . 889 . 389 . 3
ImageNet Oracle57 . 357 . 057 . 357 . 356 . 456 . 456 . 254 . 563 . 7
RankMe57 . 457 . 057 . 356 . 759 . 155 . 456 . 250 . 063 . 7
α -ReQ57 . 457 . 442 . 557 . 259 . 956 . 256 . 250 . 050 . 0
ImageNet Oracle12 . 012 . 012 . 011 . 914 . 014 . 013 . 210 . 322 . 0
RankMe11 . 612 . 012 . 011 . 517 . 613 . 413 . 27 . 722 . 0
α -ReQ11 . 612 . 07 . 511 . 321 . 313 . 913 . 57 . 77 . 7
ImageNet Oracle75 . 075 . 075 . 075 . 075 . 075 . 075 . 075 . 075 . 0
RankMe75 . 075 . 075 . 075 . 075 . 075 . 075 . 075 . 075 . 0
α -ReQ75 . 075 . 075 . 075 . 075 . 075 . 075 . 075 . 075 . 0
ImageNet Oracle79 . 579 . 279 . 579 . 779 . 779 . 779 . 785 . 387 . 0
RankMe79 . 279 . 279 . 579 . 378 . 579 . 379 . 784 . 287 . 0
α -ReQ79 . 279 . 273 . 179 . 676 . 879 . 579 . 984 . 284 . 2
ImageNet Oracle43 . 944 . 443 . 946 . 143 . 543 . 546 . 049 . 951 . 2
RankMe43 . 944 . 443 . 943 . 043 . 044 . 846 . 041 . 951 . 2
α -ReQ43 . 943 . 841 . 744 . 937 . 045 . 245 . 941 . 941 . 9
ImageNet Oracle80 . 481 . 280 . 479 . 779 . 379 . 379 . 884 . 387 . 0
RankMe80 . 681 . 280 . 480 . 379 . 579 . 579 . 880 . 787 . 0
ImageNet Oracle52 . 853 . 352 . 852 . 952 . 658 . 5
65 . 4
53 . 853 . 941 . 554 . 0 56 . 552 . 652 . 265 . 4
RankMe α -ReQ53 . 853 . 352 . 852 . 5 52 . 252 . 2 52 . 052 . 2 52 . 353 . 0 53 . 053 . 0
ImageNet Oracle55 . 355 . 555 . 355 . 554 . 554 . 554 . 555 . 260 . 9
RankMe55 . 455 . 555 . 355 . 056 . 054 . 154 . 552 . 360 . 9
α -ReQ55 . 455 . 646 . 155 . 156 . 054 . 353 . 952 . 352 . 3
MethodRunBatch sizeLearning rateWeight decayLoss hyperparameters
010240 . 310 - 6λ : 25 , µ : 25 , ν : 0 . 3
110240 . 310 - 6λ : 25 , µ : 25 , ν : 0 . 4
210240 . 310 - 6λ : 25 , µ : 25 , ν : 0 . 5
310240 . 310 - 6λ : 25 , µ : 25 , ν : 0 . 6
410240 . 310 - 6λ : 25 , µ : 25 , ν : 0 . 7
510240 . 310 - 6λ : 25 , µ : 25 , ν : 0 . 8
610240 . 310 - 6λ : 25 , µ : 25 , ν : 0 . 9
710240 . 310 - 6λ : 25 , µ : 25 , ν : 1
810240 . 310 - 6λ : 25 , µ : 25 , ν : 2
910240 . 310 - 6λ : 25 , µ : 25 , ν : 4
1010240 . 310 - 6λ : 25 , µ : 25 , ν : 8
1110240 . 310 - 6λ : 25 , µ : 25 , ν : 16
1210240 . 310 - 6λ : 5 , µ : 25 , ν : 4
1310240 . 310 - 6λ : 10 , µ : 25 , ν : 4
1410240 . 310 - 6λ : 15 , µ : 25 , ν : 4
1510240 . 310 - 6λ : 20 , µ : 25 , ν : 4
VICReg1610240 . 310 - 6λ : 30 , µ : 25 , ν : 4
1710240 . 310 - 6λ : 35 , µ : 25 , ν : 4
1810240 . 310 - 6λ : 40 , µ : 25 , ν : 4
1910240 . 310 - 6λ : 45 , µ : 25 , ν : 4
2010240 . 310 - 6λ : 50 , µ : 25 , ν : 4
2110240 . 110 - 6λ : 25 , µ : 25 , ν : 4
2210240 . 210 - 6λ : 25 , µ : 25 , ν : 4
2310240 . 310 - 6λ : 25 , µ : 25 , ν : 4
2410240 . 410 - 6λ : 25 , µ : 25 , ν : 4
2510240 . 510 - 6λ : 25 , µ : 25 , ν : 4
2610240 . 310 - 7λ : 25 , µ : 25 , ν : 4
2710240 . 310 - 6λ : 25 , µ : 25 , ν : 4
2810240 . 310 - 5λ : 25 , µ : 25 , ν : 4
2910240 . 310 - 4λ : 25 , µ : 25 , ν : 4
3010240 . 310 - 3λ : 25 , µ : 25 , ν : 4
3110240 . 310 - 2λ : 25 , µ : 25 , ν : 4
010240 . 510 - 6λ : 1 , µ : 1 , ν : 2 , τ : 0 . 05
110240 . 510 - 6λ : 1 , µ : 1 , ν : 2 , τ : 0 . 07
210240 . 510 - 6λ : 1 , µ : 1 , ν : 2 , τ : 0 . 1
310240 . 510 - 6λ : 1 , µ : 1 , ν : 2 , τ : 0 . 2
410240 . 510 - 6λ : 1 , µ : 1 , ν : 2 , τ : 0 . 3
510240 . 510 - 6λ : 1 , µ : 1 , ν : 2 , τ : 0 . 4
VICReg-exp610240 . 510 - 6λ : 1 , µ : 1 , ν : 0 . 1 , τ : 0 . 1
710240 . 510 - 6λ : 1 , µ : 1 , ν : 0 . 5 , τ : 0 . 1
810240 . 510 - 6λ : 1 , µ : 1 , ν : 1 , τ : 0 . 1
910240 . 510 - 6λ : 1 , µ : 1 , ν : 4 , τ : 0 . 1
1010240 . 510 - 6λ : 1 , µ : 1 , ν : 8 , τ : 0 . 1
1110240 . 510 - 6λ : 1 , µ : 1 , ν : 16 , τ : 0 . 1
MethodRunBatch sizeLearning rateWeight decayLoss hyperparameters
010240 . 510 - 6λ : 1 , µ : 1 , ν : 1 , τ : 0 . 05
110240 . 510 - 6λ : 1 , µ : 1 , ν : 1 , τ : 0 . 07
210240 . 510 - 6λ : 1 , µ : 1 , ν : 1 , τ : 0 . 1
310240 . 510 - 6λ : 1 , µ : 1 , ν : 1 , τ : 0 . 2
410240 . 510 - 6λ : 1 , µ : 1 , ν : 1 , τ : 0 . 3
VICReg-ctr510240 . 510 - 6λ : 1 , µ : 1 , ν : 1 , τ : 0 . 4
610240 . 510 - 6λ : 1 , µ : 1 , ν : 0 . 1 , τ : 0 . 1
710240 . 510 - 6λ : 1 , µ : 1 , ν : 0 . 5 , τ : 0 . 1
810240 . 510 - 6λ : 1 , µ : 1 , ν : 2 , τ : 0 . 1
910240 . 510 - 6λ : 1 , µ : 1 , ν : 4 , τ : 0 . 1
1010240 . 510 - 6λ : 1 , µ : 1 , ν : 8 , τ : 0 . 1
020480 . 610 - 6d : 512 , τ : 0 . 05
120480 . 610 - 6d : 512 , τ : 0 . 07
220480 . 610 - 6d : 512 , τ : 0 . 1
320480 . 610 - 6d : 512 , τ : 0 . 2
420480 . 610 - 6d : 512 , τ : 0 . 3
520480 . 610 - 6d : 512 , τ : 0 . 4
620480 . 610 - 6d : 2048 , τ : 0 . 05
720480 . 610 - 6d : 2048 , τ : 0 . 07
820480 . 610 - 6d : 2048 , τ : 0 . 1
920480 . 610 - 6d : 2048 , τ : 0 . 2
1020480 . 610 - 6d : 2048 , τ : 0 . 3
1120480 . 610 - 6d : 2048 , τ : 0 . 4
1220480 . 510 - 6d : 2048 , τ : 0 . 05
1320480 . 510 - 6d : 2048 , τ : 0 . 07
1420480 . 510 - 6d : 2048 , τ : 0 . 1
SimCLR1520480 . 510 - 6d : 2048 , τ : 0 . 15
1620480 . 510 - 6d : 2048 , τ : 0 . 2
1720480 . 510 - 6d : 2048 , τ : 0 . 3
1820480 . 510 - 6d : 2048 , τ : 0 . 4
1920480 . 510 - 7d : 2048 , τ : 0 . 15
2020480 . 510 - 6d : 2048 , τ : 0 . 15
2120480 . 510 - 5d : 2048 , τ : 0 . 15
2220480 . 510 - 4d : 2048 , τ : 0 . 15
2320480 . 510 - 3d : 2048 , τ : 0 . 15
2420480 . 510 - 2d : 2048 , τ : 0 . 15
2520480 . 210 - 6d : 2048 , τ : 0 . 15
2620480 . 310 - 6d : 2048 , τ : 0 . 15
2720480 . 410 - 6d : 2048 , τ : 0 . 15
2820480 . 510 - 6d : 2048 , τ : 0 . 15
2920480 . 610 - 6d : 2048 , τ : 0 . 15
3020480 . 810 - 6d : 2048 , τ : 0 . 15
MethodRunBatch sizeLearning rateWeight decayLoss hyperparameters
DINO01024 2 . 5 × 10 - 410 - 6 10 - 610 - 6 10 - 6 10 - 6 10 - 6 10 - 6 10 - 6τ t : 0 . 01 , τ t : 0 . 02 , τ t : 0 . 04 , τ t : 0 . 06 , τ t : 0 . 07 , τ t : 0 . 04 , τ s : 0 . 07 , τ t : 0 . 04 , τ s : 0 . 2 , τ t : 0 . 04 , τ s : 0 . 3 ,
DINO110242 . 5 × 10 - 4
DINO210242 . 5 × 10 - 4
DINO310242 . 5 × 10 - 4
DINO410242 . 5 × 10 - 4
DINO510242 . 5 × 10 - 4
DINO610242 . 5 × 10 - 4
DINO710242 . 5 × 10 - 4
DINO810242 . 5 × 10 - 410 - 6τ t : 0 . 04 , τ s : 0 . 4 ,
MethodRunImageNetiNat18Places205EuroSatSUN397Cars
063 . 9034 . 1248 . 7795 . 9465 . 9652 . 56
165 . 0835 . 6549 . 6896 . 1066 . 8754 . 47
265 . 6736 . 9749 . 7396 . 0267 . 3355 . 76
366 . 1737 . 2050 . 1396 . 1067 . 5556 . 37
466 . 4037 . 4250 . 1596 . 3467 . 9956 . 86
566 . 8338 . 0550 . 5396 . 0668 . 4057 . 63
667 . 3038 . 1350 . 9696 . 2068 . 0857 . 83
767 . 3438 . 2650 . 9696 . 3668 . 1958 . 89
868 . 0038 . 6851 . 2896 . 3668 . 4656 . 90
968 . 1638 . 3651 . 1796 . 2068 . 4255 . 70
1067 . 9137 . 7551 . 1496 . 1468 . 7554 . 21
1167 . 7736 . 7051 . 2096 . 0668 . 5751 . 05
1264 . 1231 . 3749 . 8395 . 5666 . 1742 . 56
1366 . 6734 . 8150 . 7695 . 6867 . 6147 . 33
1467 . 4936 . 9151 . 4096 . 1067 . 9551 . 72
1567 . 8737 . 1851 . 4096 . 0668 . 2654 . 00
VICReg1667 . 9938 . 7151 . 1196 . 1668 . 6856 . 05
1767 . 7838 . 5250 . 7996 . 3868 . 3957 . 13
1867 . 2538 . 0850 . 8596 . 3468 . 6956 . 29
1966 . 9537 . 9350 . 8896 . 0667 . 9857 . 67
2066 . 5137 . 7950 . 1196 . 1067 . 7457 . 23
2159 . 5428 . 8547 . 8095 . 1064 . 1443 . 15
2266 . 3635 . 4750 . 3296 . 0467 . 4551 . 64
2368 . 1638 . 3651 . 1796 . 2068 . 4255 . 70
2468 . 5638 . 8051 . 7596 . 3068 . 6055 . 75
2562 . 7731 . 8548 . 0295 . 7264 . 8243 . 23
2667 . 7938 . 2551 . 5796 . 0468 . 8455 . 38
2767 . 9738 . 2651 . 2996 . 1668 . 6255 . 57
2867 . 8738 . 4351 . 5196 . 0868 . 5254 . 53
2963 . 3638 . 3151 . 1796 . 0668 . 4355 . 06
3054 . 5237 . 9251 . 3296 . 1067 . 9954 . 82
3140 . 7337 . 0350 . 9796 . 3068 . 4052 . 28
067 . 7437 . 5351 . 4496 . 3668 . 4152 . 12
167 . 6438 . 0051 . 4296 . 4668 . 6054 . 16
267 . 8438 . 2551 . 0796 . 4468 . 1255 . 94
65 . 0936 . 6449 . 6567 . 1256 . 37
3 431 . 2248 . 0496 . 54 95 . 8064 . 2846 . 96
560 . 67 57 . 4626 . 5446 . 2596 . 0262 . 3341 . 90
VICReg-exp655 . 1224 . 7345 . 6895 . 4461 . 8239 . 71
764 . 8736 . 5149 . 6996 . 1666 . 8255 . 30
968 . 0838 . 0351 . 3496 . 4069 . 2853 . 72
1067 . 8037 . 2051 . 5796 . 4668 . 6152 . 15
1166 . 6835 . 0250 . 9496 . 0067 . 8147 . 49
MethodRunImageNetiNat18Places205EuroSatSUN397Cars
065 . 5435 . 0050 . 1595 . 8867 . 6249 . 63
166 . 3235 . 7250 . 6996 . 1068 . 1651 . 66
266 . 0935 . 2650 . 8096 . 4268 . 3250 . 72
364 . 0633 . 1650 . 4895 . 9867 . 4044 . 91
462 . 0630 . 8049 . 5396 . 2266 . 0843 . 24
VICReg-ctr560 . 1728 . 7648 . 7895 . 9064 . 9241 . 13
661 . 6631 . 0549 . 4096 . 1066 . 4744 . 12
765 . 4734 . 6350 . 7196 . 2067 . 5548 . 05
865 . 9934 . 7750 . 6395 . 9867 . 6051 . 14
963 . 8733 . 6349 . 6496 . 2066 . 1750 . 35
1058 . 8129 . 2447 . 6095 . 7863 . 7746 . 23
057 . 6830 . 5048 . 5196 . 3263 . 3642 . 42
162 . 7933 . 5050 . 5696 . 1866 . 3443 . 76
266 . 1335 . 9452 . 1096 . 2268 . 2949 . 17
366 . 3535 . 6051 . 9696 . 6468 . 1749 . 68
465 . 1734 . 3851 . 3296 . 1067 . 7848 . 17
563 . 5433 . 2950 . 7196 . 2267 . 3948 . 31
657 . 8430 . 8248 . 6496 . 3464 . 0741 . 97
762 . 7333 . 3050 . 5796 . 5666 . 0344 . 99
866 . 3036 . 2551 . 7996 . 4067 . 9948 . 95
966 . 7136 . 5651 . 8296 . 5268 . 5250 . 47
1065 . 2934 . 9051 . 3296 . 3067 . 4049 . 16
1163 . 5233 . 3550 . 9296 . 4266 . 8948 . 59
1259 . 4931 . 1348 . 8095 . 9464 . 1142 . 46
1363 . 5134 . 1450 . 7596 . 4266 . 4445 . 18
1467 . 1437 . 8052 . 2996 . 6269 . 0651 . 47
SimCLR1568 . 4839 . 2052 . 3796 . 4668 . 9254 . 43
1668 . 2738 . 4852 . 2996 . 4669 . 1955 . 22
1767 . 4837 . 0751 . 7296 . 5868 . 3051 . 92
1866 . 4435 . 8751 . 5896 . 4468 . 1549 . 76
1968 . 3338 . 9352 . 5696 . 4069 . 2154 . 86
2068 . 1339 . 0952 . 4296 . 4269 . 1554 . 83
2166 . 4738 . 8052 . 8196 . 5869 . 0355 . 19
2259 . 6238 . 8652 . 6996 . 6269 . 0755 . 47
2347 . 5839 . 0352 . 7096 . 1668 . 7754 . 96
2432 . 2738 . 7052 . 6296 . 1868 . 5354 . 67
2566 . 3736 . 0651 . 6296 . 8468 . 2252 . 17
2667 . 9638 . 1252 . 3396 . 4468 . 5453 . 86
2768 . 3238 . 4452 . 4296 . 8069 . 0854 . 63
2868 . 4839 . 2052 . 3796 . 4668 . 9254 . 43
2968 . 4138 . 7752 . 4296 . 2468 . 6555 . 81
3068 . 1238 . 4568 . 4154 . 30
52 . 3396 . 64
Table S10. Top-1 on representations in all settings, continued.Table S10. Top-1 on representations in all settings, continued.Table S10. Top-1 on representations in all settings, continued.Table S10. Top-1 on representations in all settings, continued.Table S10. Top-1 on representations in all settings, continued.Table S10. Top-1 on representations in all settings, continued.Table S10. Top-1 on representations in all settings, continued.Table S10. Top-1 on representations in all settings, continued.
MethodRunImageNetiNat18Places205EuroSatSUN397Cars
070 . 7444 . 3053 . 4596 . 6071 . 2364 . 03
171 . 2945 . 3754 . 1096 . 4671 . 4064 . 73
272 . 1945 . 9654 . 1896 . 3272 . 1265 . 81
372 . 3045 . 8054 . 2596 . 6071 . 6965 . 09
DINO471 . 6845 . 0654 . 4096 . 8471 . 5563 . 46
572 . 4146 . 3254 . 3796 . 5871 . 8465 . 97
669 . 2342 . 1852 . 9396 . 3070 . 3556 . 36
766 . 1839 . 2452 . 7795 . 8869 . 7854 . 48
864 . 1037 . 8351 . 6095 . 8668 . 7151 . 82
MethodRunImageNetCIFAR10CIFAR100FOOD101VOC07CLEVR-count
063 . 9088 . 9469 . 9275 . 0081 . 4948 . 94
165 . 0888 . 7469 . 9675 . 0082 . 1449 . 88
265 . 6788 . 2570 . 2375 . 0082 . 3554 . 96
366 . 1789 . 1771 . 5175 . 0082 . 9752 . 35
466 . 4089 . 4171 . 7075 . 0182 . 8155 . 27
566 . 8389 . 9172 . 1275 . 0083 . 1055 . 95
667 . 3090 . 1171 . 9075 . 0183 . 1554 . 37
767 . 3490 . 3472 . 4275 . 0083 . 2153 . 92
868 . 0089 . 7972 . 7375 . 0083 . 7749 . 75
968 . 1690 . 1472 . 2675 . 0184 . 2755 . 69
1067 . 9189 . 6772 . 3975 . 0083 . 9952 . 10
1167 . 7789 . 4571 . 6375 . 0084 . 1053 . 05
1264 . 1286 . 6867 . 0275 . 0082 . 4451 . 46
1366 . 6788 . 3269 . 8675 . 0083 . 5055 . 48
1467 . 4989 . 2271 . 0175 . 0183 . 8555 . 05
1567 . 8789 . 8272 . 3075 . 0083 . 7655 . 36
VICReg1667 . 9990 . 2972 . 8175 . 0083 . 9055 . 00
1767 . 7890 . 0973 . 1475 . 0083 . 7451 . 97
1867 . 2590 . 4072 . 7575 . 0083 . 3653 . 18
1966 . 9589 . 6272 . 1475 . 0082 . 9950 . 33
2066 . 5189 . 9472 . 4175 . 0082 . 8952 . 83
2159 . 5486 . 8166 . 2375 . 0080 . 4450 . 64
2266 . 3688 . 9271 . 0575 . 0082 . 9456 . 19
2368 . 1690 . 1472 . 2675 . 0184 . 2755 . 69
2468 . 5689 . 9572 . 8075 . 0084 . 2756 . 03
2562 . 7787 . 6367 . 9274 . 9982 . 3852 . 63
2667 . 7989 . 7072 . 1175 . 0083 . 9853 . 08
2767 . 9789 . 8372 . 2275 . 0084 . 0556 . 77
2867 . 8790 . 2372 . 1375 . 0083 . 7256 . 20
2963 . 3689 . 7672 . 3675 . 0084 . 0454 . 71
3054 . 5289 . 5871 . 8975 . 0084 . 1453 . 45
3140 . 7389 . 6571 . 5075 . 0183 . 9756 . 34
067 . 7489 . 6672 . 1775 . 0084 . 6752 . 79
167 . 6490 . 1272 . 3075 . 0084 . 5855 . 29
267 . 8489 . 5572 . 0775 . 0084 . 2053 . 45
365 . 0989 . 1871 . 5575 . 0082 . 1954 . 41
460 . 6788 . 0668 . 9475 . 0080 . 2051 . 35
557 . 4686 . 7065 . 0975 . 0078 . 5449 . 30
VICReg-exp655 . 1287 . 2365 . 5375 . 0077 . 8753 . 14
64 . 8771 . 3875 . 0082 . 3949 . 13
7 866 . 8489 . 28 89 . 6671 . 9275 . 0083 . 9150 . 41
968 . 0889 . 5675 . 0084 . 6456 . 00
1067 . 8089 . 5071 . 9075 . 0084 . 4555 . 80
71 . 61
1166 . 6888 . 9970 . 1375 . 0084 . 2955 . 87
MethodRunImageNetCIFAR10CIFAR100FOOD101VOC07CLEVR-count
065 . 5488 . 8770 . 7775 . 0183 . 2853 . 97
166 . 3289 . 5770 . 9375 . 0084 . 1753 . 19
266 . 0989 . 4971 . 1775 . 0083 . 9053 . 29
364 . 0689 . 6271 . 3975 . 0083 . 1848 . 57
462 . 0688 . 6069 . 4175 . 0082 . 3546 . 48
VICReg-ctr560 . 1788 . 9768 . 6175 . 0081 . 4351 . 27
665 . 4789 . 6571 . 6275 . 0184 . 0951 . 07
765 . 9988 . 9770 . 4075 . 0083 . 6946 . 92
863 . 8788 . 5169 . 0275 . 0082 . 9951 . 36
958 . 8186 . 9666 . 0675 . 0079 . 9555 . 33
057 . 6886 . 3166 . 6975 . 0077 . 5637 . 65
162 . 7987 . 1568 . 7175 . 0080 . 9850 . 26
266 . 1389 . 1971 . 1375 . 0083 . 5747 . 75
366 . 3589 . 9972 . 4475 . 0084 . 2554 . 73
465 . 1789 . 8972 . 1875 . 0183 . 9950 . 78
563 . 5489 . 5071 . 0975 . 0183 . 3752 . 87
657 . 8486 . 4466 . 4275 . 0077 . 0142 . 72
762 . 7387 . 5768 . 3375 . 0081 . 2645 . 19
866 . 3089 . 0771 . 5575 . 0083 . 6152 . 95
966 . 7190 . 1272 . 5275 . 0084 . 1752 . 93
1065 . 2989 . 4471 . 6275 . 0083 . 8154 . 83
1163 . 5289 . 3270 . 8875 . 0083 . 3948 . 44
1259 . 4986 . 4166 . 4575 . 0077 . 9850 . 64
1363 . 5187 . 9869 . 5375 . 0081 . 1944 . 03
1467 . 1489 . 4072 . 2075 . 0183 . 8047 . 97
SimCLR1568 . 4890 . 5773 . 7875 . 0084 . 5451 . 91
1668 . 2790 . 3473 . 6375 . 0184 . 4850 . 11
1767 . 4890 . 0472 . 8175 . 0084 . 3147 . 31
1866 . 4489 . 8072 . 0275 . 0084 . 3549 . 94
1968 . 3390 . 2973 . 6575 . 0083 . 9553 . 17
2068 . 1390 . 6773 . 8575 . 0084 . 6154 . 20
2166 . 4790 . 3373 . 3975 . 0084 . 2255 . 01
2259 . 6290 . 5373 . 6375 . 0084 . 6150 . 53
2347 . 5890 . 2972 . 9975 . 0084 . 4448 . 27
2432 . 2790 . 2973 . 9675 . 0084 . 4251 . 33
2566 . 3789 . 9573 . 1675 . 0083 . 0150 . 75
2667 . 9690 . 6573 . 1375 . 0083 . 9452 . 29
2768 . 3290 . 2173 . 1375 . 0084 . 3153 . 57
2868 . 4890 . 5775 . 0051 . 91
2968 . 4190 . 1773 . 7884 . 5452 . 97
73 . 3575 . 0084 . 27
3068 . 1289 . 6772 . 5375 . 0184 . 2750 . 47
MethodRunImageNetCIFAR10CIFAR100FOOD101VOC07CLEVR-count
DINO070 . 7491 . 8774 . 7175 . 0087 . 5051 . 67
DINO171 . 2991 . 8174 . 3375 . 0087 . 9957 . 19
DINO272 . 1990 . 5173 . 0975 . 0188 . 3556 . 78
DINO372 . 3091 . 4674 . 3475 . 0088 . 3956 . 75
471 . 6890 . 8972 . 8275 . 0088 . 4859 . 19
572 . 4192 . 2475 . 5875 . 0088 . 0459 . 29
669 . 2390 . 3671 . 8675 . 0087 . 6153 . 87
766 . 1887 . 9768 . 1875 . 0086 . 7254 . 56
864 . 1086 . 8066 . 6875 . 0185 . 3855 . 23
MethodRunImageNetiNat18Places205EuroSatSUN397Cars
026 . 350 . 9521 . 4865 . 1031 . 235 . 10
130 . 541 . 3921 . 0763 . 8031 . 605 . 24
236 . 921 . 8524 . 7769 . 9235 . 245 . 12
347 . 604 . 3435 . 4887 . 8447 . 627 . 56
451 . 266 . 1436 . 4388 . 5849 . 298 . 82
554 . 397 . 8738 . 0488 . 4451 . 889 . 76
655 . 668 . 7638 . 9789 . 4253 . 1410 . 31
756 . 339 . 5639 . 6589 . 7653 . 3910 . 53
858 . 6512 . 0841 . 7390 . 8856 . 3511 . 75
959 . 7113 . 4742 . 7291 . 2857 . 2611 . 95
1059 . 5814 . 1843 . 2290 . 9657 . 4311 . 58
1159 . 2214 . 6343 . 4891 . 3457 . 7511 . 99
1253 . 7812 . 8042 . 3592 . 2255 . 4110 . 46
1357 . 9414 . 3643 . 6191 . 6057 . 8911 . 30
1459 . 2014 . 7743 . 6391 . 4057 . 3612 . 00
1559 . 7314 . 2043 . 3191 . 6657 . 0411 . 95
VICReg1659 . 0912 . 3542 . 2190 . 3456 . 2111 . 55
1758 . 2311 . 3441 . 0290 . 1654 . 9711 . 69
1856 . 8210 . 1540 . 1989 . 9454 . 5010 . 76
1955 . 229 . 2639 . 0290 . 0053 . 0710 . 98
2053 . 758 . 2937 . 8789 . 7652 . 1610 . 60
2151 . 5312 . 9740 . 8491 . 6454 . 2613 . 13
2257 . 5713 . 6042 . 4091 . 6456 . 4312 . 47
2359 . 7113 . 4742 . 7291 . 2857 . 2611 . 95
2456 . 229 . 9240 . 1888 . 3653 . 698 . 46
2536 . 222 . 4829 . 5985 . 0642 . 467 . 55
2659 . 3313 . 2242 . 8690 . 7657 . 2111 . 28
2759 . 5113 . 3742 . 6991 . 3456 . 6611 . 53
2859 . 7013 . 6443 . 3790 . 9657 . 3211 . 89
2959 . 0314 . 0043 . 1091 . 5057 . 4412 . 27
3056 . 3714 . 1043 . 2391 . 3657 . 6912 . 52
3149 . 9612 . 3641 . 9391 . 5257 . 1311 . 37
058 . 1912 . 5641 . 9391 . 8257 . 1310 . 19
158 . 5312 . 1042 . 0391 . 6456 . 8510 . 96
257 . 4110 . 7840 . 8990 . 4455 . 2410 . 73
347 . 805 . 0234 . 6788 . 9247 . 148 . 39
425 . 141 . 0222 . 0477 . 2232 . 665 . 17
519 . 240 . 7520 . 0875 . 7030 . 225 . 77
VICReg-exp612 . 030 . 5116 . 9570 . 1826 . 064 . 15
747 . 334 . 7846 . 908 . 87
33 . 8887 . 32
958 . 8712 . 2242 . 2791 . 3456 . 9710 . 93
1058 . 0912 . 7042 . 5656 . 9010 . 42
1157 . 2414 . 0143 . 1891 . 58 92 . 3857 . 3611 . 18
MethodRunImageNetiNat18Places205EuroSatSUN397Cars
050 . 2610 . 8338 . 5489 . 5452 . 7311 . 39
150 . 999 . 8138 . 4390 . 0653 . 9010 . 53
248 . 277 . 6736 . 7888 . 0251 . 4510 . 21
336 . 773 . 2030 . 0184 . 1243 . 856 . 68
425 . 921 . 5123 . 9377 . 5436 . 574 . 46
VICReg-ctr517 . 690 . 7018 . 1969 . 0028 . 303 . 73
626 . 901 . 6524 . 6975 . 4636 . 724 . 86
744 . 315 . 8134 . 8287 . 8449 . 239 . 25
849 . 318 . 7137 . 6489 . 1651 . 9810 . 78
946 . 438 . 3836 . 1189 . 4450 . 2710 . 25
1038 . 336 . 2132 . 1086 . 4444 . 078 . 97
043 . 0520 . 4837 . 8394 . 1255 . 2022 . 57
149 . 6921 . 2841 . 8293 . 4858 . 2321 . 38
254 . 4517 . 4942 . 9291 . 8857 . 8616 . 55
350 . 248 . 3639 . 2288 . 4251 . 9411 . 58
445 . 776 . 3636 . 5587 . 1648 . 5910 . 20
541 . 144 . 8134 . 0484 . 7645 . 369 . 10
643 . 3120 . 5138 . 1694 . 4455 . 4023 . 24
749 . 6021 . 5141 . 9393 . 8859 . 0522 . 43
854 . 4817 . 9243 . 0092 . 8659 . 5318 . 22
950 . 728 . 6539 . 6489 . 4454 . 4712 . 96
1043 . 325 . 8436 . 5187 . 6250 . 8511 . 33
1141 . 155 . 0234 . 2685 . 5248 . 1610 . 87
1244 . 6121 . 3339 . 2094 . 0856 . 1522 . 82
1351 . 5421 . 5042 . 6494 . 4059 . 8721 . 30
1456 . 5116 . 6843 . 3992 . 2659 . 1017 . 56
SimCLR1556 . 8910 . 3541 . 2190 . 3856 . 3714 . 04
1654 . 187 . 1639 . 4287 . 9454 . 2411 . 47
1749 . 194 . 9837 . 1287 . 1450 . 8510 . 33
1844 . 723 . 8935 . 1186 . 0048 . 389 . 49
1957 . 0610 . 1241 . 1889 . 5256 . 2013 . 19
2056 . 7210 . 1741 . 3890 . 0856 . 4013 . 23
2156 . 1410 . 2641 . 4789 . 1256 . 7413 . 18
2248 . 9810 . 0341 . 4889 . 8456 . 1513 . 47
2335 . 9210 . 0341 . 3389 . 7456 . 3813 . 87
2428 . 269 . 6841 . 2286 . 8856 . 0213 . 13
2552 . 659 . 9840 . 4790 . 2055 . 3813 . 72
2655 . 989 . 8840 . 8489 . 0055 . 4313 . 41
2756 . 4310 . 0341 . 0489 . 6056 . 2313 . 89
2856 . 8910 . 3541 . 2190 . 3856 . 3714 . 04
2956 . 6510 . 3041 . 4089 . 2256 . 4213 . 79
10 . 6013 . 90
3056 . 5641 . 6190 . 1456 . 82
MethodRunImageNetiNat18Places205EuroSatSUN397Cars
DINO019 . 722 . 4130 . 1288 . 6044 . 105 . 81
DINO135 . 642 . 8532 . 7288 . 7846 . 076 . 59
DINO253 . 333 . 6436 . 3889 . 3450 . 007 . 67
DINO354 . 634 . 9738 . 8691 . 3254 . 4810 . 31
453 . 016 . 7340 . 2491 . 8456 . 619 . 74
564 . 7915 . 8444 . 9593 . 2463 . 6822 . 04
610 . 020 . 4515 . 8780 . 9023 . 242 . 24
74 . 280 . 1810 . 1972 . 3016 . 561 . 98
82 . 600 . 146 . 8161 . 6212 . 551 . 82
MethodRunImageNetCIFAR10CIFAR100FOOD101VOC07CLEVR-count
026 . 3559 . 2325 . 8475 . 0066 . 1521 . 75
130 . 5460 . 5725 . 6775 . 0167 . 3319 . 64
236 . 9263 . 7729 . 6775 . 0070 . 9423 . 72
347 . 6075 . 5944 . 5875 . 0175 . 9041 . 44
451 . 2676 . 8845 . 2675 . 0176 . 8534 . 20
554 . 3978 . 3449 . 6375 . 0177 . 8540 . 24
655 . 6678 . 7149 . 8975 . 0178 . 1738 . 37
756 . 3378 . 8950 . 5275 . 0178 . 4139 . 88
858 . 6579 . 5750 . 9575 . 0179 . 4643 . 13
959 . 7180 . 4352 . 7575 . 0179 . 5043 . 93
1059 . 5880 . 5953 . 8075 . 0279 . 1943 . 87
1159 . 2280 . 9453 . 6675 . 0378 . 9044 . 90
1253 . 7878 . 9651 . 8375 . 0376 . 1743 . 71
1357 . 9481 . 4353 . 9275 . 0278 . 0445 . 23
1459 . 2081 . 0453 . 8875 . 0379 . 1843 . 75
1559 . 7381 . 1653 . 3575 . 0279 . 1844 . 39
VICReg1659 . 0980 . 4652 . 8275 . 0279 . 2045 . 92
1758 . 2379 . 7651 . 7775 . 0179 . 5836 . 53
1856 . 8279 . 2051 . 1075 . 0178 . 6237 . 81
1955 . 2278 . 8250 . 1975 . 0178 . 3638 . 50
2053 . 7577 . 8749 . 3475 . 0178 . 0738 . 62
2151 . 5378 . 4551 . 4675 . 0175 . 9949 . 13
2257 . 5780 . 6752 . 8775 . 0278 . 5345 . 93
2359 . 7180 . 4352 . 7575 . 0179 . 5043 . 93
2456 . 2275 . 8045 . 7375 . 0278 . 9039 . 71
2536 . 2272 . 5541 . 5075 . 0073 . 1241 . 73
2659 . 3379 . 5852 . 2575 . 0279 . 6144 . 89
2759 . 5180 . 2652 . 5575 . 0179 . 2942 . 99
2859 . 7079 . 7452 . 9275 . 0279 . 6746 . 11
2959 . 0381 . 2554 . 9675 . 0180 . 1043 . 33
3056 . 3780 . 8153 . 5575 . 0180 . 1146 . 97
3149 . 9680 . 8653 . 1075 . 0179 . 6847 . 09
058 . 1980 . 8052 . 3075 . 0179 . 7345 . 89
158 . 5380 . 1553 . 0875 . 0180 . 1843 . 75
257 . 4179 . 2251 . 6975 . 0179 . 3943 . 92
347 . 8076 . 7045 . 9375 . 0076 . 2143 . 52
425 . 1466 . 9133 . 4975 . 0066 . 1237 . 21
VICReg-exp519 . 2465 . 8529 . 8775 . 0062 . 0434 . 52
612 . 0362 . 4826 . 0675 . 0055 . 7134 . 17
747 . 3376 . 9146 . 4775 . 0076 . 4841 . 40
853 . 7277 . 9948 . 6575 . 0078 . 7244 . 80
958 . 8780 . 3653 . 7875 . 0180 . 1543 . 85
1058 . 0980 . 6453 . 4775 . 0179 . 9845 . 68
1157 . 2481 . 1054 . 2875 . 0179 . 5843 . 71
MethodRunImageNetCIFAR10CIFAR100FOOD101VOC07CLEVR-count
050 . 2678 . 7649 . 3475 . 0077 . 8937 . 61
150 . 9978 . 6349 . 8075 . 0078 . 7743 . 76
248 . 2777 . 8648 . 7575 . 0078 . 4840 . 49
336 . 7773 . 8340 . 4475 . 0072 . 1938 . 16
425 . 9266 . 8132 . 8275 . 0065 . 3231 . 63
VICReg-ctr517 . 6963 . 9425 . 5075 . 0058 . 4831 . 40
644 . 3176 . 8246 . 4375 . 0077 . 1642 . 48
749 . 3177 . 7048 . 6675 . 0078 . 5040 . 81
846 . 4375 . 5146 . 5475 . 0076 . 9744 . 25
938 . 3371 . 5140 . 6875 . 0072 . 6741 . 31
043 . 0575 . 8452 . 3975 . 0071 . 4346 . 86
149 . 6978 . 3253 . 2075 . 0076 . 1950 . 11
254 . 4579 . 2451 . 5675 . 0078 . 1049 . 63
350 . 2478 . 3647 . 7075 . 0079 . 0746 . 29
445 . 7775 . 7344 . 3775 . 0077 . 4646 . 63
541 . 1474 . 0441 . 8875 . 0075 . 0345 . 47
643 . 3175 . 6653 . 3175 . 0071 . 3741 . 40
749 . 6078 . 3654 . 7975 . 0076 . 4746 . 46
854 . 4880 . 3955 . 3975 . 0078 . 3849 . 53
950 . 7279 . 3251 . 0175 . 0079 . 2742 . 25
1043 . 3276 . 4348 . 0775 . 0077 . 5944 . 67
1141 . 1574 . 7545 . 0675 . 0074 . 7946 . 56
1244 . 6176 . 9753 . 2975 . 0072 . 7147 . 15
1351 . 5479 . 2056 . 4575 . 0076 . 8037 . 03
1456 . 5179 . 4853 . 9975 . 0078 . 5543 . 05
SimCLR1556 . 8979 . 3552 . 5875 . 0079 . 7343 . 51
1654 . 1878 . 1749 . 6275 . 0079 . 7143 . 32
1749 . 1976 . 5946 . 9875 . 0078 . 6143 . 48
1844 . 7276 . 6845 . 6274 . 9976 . 9542 . 88
1957 . 0679 . 8352 . 1975 . 0079 . 7345 . 97
2056 . 7279 . 4153 . 1275 . 0080 . 0545 . 65
2156 . 1479 . 4652 . 2575 . 0079 . 6447 . 19
2248 . 9879 . 3952 . 2875 . 0079 . 8945 . 92
2335 . 9279 . 5252 . 1575 . 0079 . 7443 . 86
2428 . 2679 . 2851 . 2575 . 0079 . 9045 . 39
2552 . 6579 . 3052 . 8575 . 0078 . 7442 . 71
2655 . 9879 . 5152 . 2175 . 0079 . 3444 . 75
2756 . 4378 . 5252 . 0475 . 0079 . 5545 . 19
2856 . 8979 . 3575 . 0043 . 51
2956 . 6578 . 9852 . 5879 . 7343 . 69
51 . 8075 . 0079 . 82
3056 . 5678 . 3651 . 8174 . 9979 . 7844 . 50
MethodRunImageNetCIFAR10CIFAR100FOOD101VOC07CLEVR-count
019 . 7281 . 2453 . 9775 . 0077 . 2142 . 39
135 . 6481 . 6655 . 3275 . 0082 . 4845 . 23
253 . 3380 . 7053 . 0375 . 0084 . 2241 . 93
354 . 6384 . 2658 . 4975 . 0085 . 2549 . 89
DINO453 . 0183 . 2058 . 1475 . 0085 . 7148 . 72
564 . 7987 . 0265 . 4375 . 0087 . 0351 . 19
610 . 0270 . 9837 . 6675 . 0061 . 2732 . 83
74 . 2859 . 0222 . 7175 . 0050 . 0234 . 77
82 . 6047 . 9716 . 2675 . 0041 . 1627 . 13
MethodRunImageNetiNat18Places205EuroSatSUN397Cars
0102 . 0738 . 1044 . 3914 . 6132 . 407 . 03
1229 . 8192 . 53129 . 4788 . 7898 . 4412 . 58
2374 . 25135 . 79206 . 29120 . 31163 . 3119 . 77
3612 . 12261 . 34336 . 16228 . 60265 . 6438 . 90
4831 . 49382 . 55467 . 68366 . 78374 . 5059 . 15
5952 . 55449 . 44539 . 24428 . 87435 . 9477 . 36
61033 . 93493 . 50587 . 19477 . 69478 . 3488 . 28
71088 . 13531 . 16630 . 80514 . 70517 . 4799 . 97
81442 . 63726 . 28849 . 29693 . 16723 . 53161 . 76
91809 . 06947 . 811110 . 80855 . 76954 . 83210 . 06
101920 . 811054 . 701247 . 93870 . 561075 . 89258 . 33
111938 . 441087 . 451275 . 60924 . 661119 . 33306 . 90
121937 . 781100 . 541337 . 88963 . 141172 . 38382 . 18
131944 . 951095 . 951307 . 62968 . 961155 . 65352 . 50
141940 . 041095 . 911280 . 85910 . 161126 . 89324 . 51
151942 . 121049 . 721240 . 87893 . 251070 . 12269 . 96
VICReg161521 . 07782 . 39919 . 54725 . 49771 . 86169 . 75
171278 . 67637 . 18757 . 19606 . 98627 . 48128 . 96
181079 . 67532 . 00634 . 88527 . 59524 . 80111 . 28
19909 . 71446 . 52525 . 65454 . 22431 . 4488 . 55
20777 . 82376 . 39447 . 53378 . 06360 . 4173 . 57
211409 . 29890 . 97996 . 12814 . 00889 . 66352 . 57
221652 . 41936 . 471070 . 40837 . 76932 . 17275 . 04
231809 . 06947 . 811110 . 80855 . 76954 . 83210 . 06
241422 . 16648 . 60813 . 33532 . 92650 . 3391 . 44
25101 . 2944 . 1246 . 0020 . 7736 . 6010 . 68
261821 . 80959 . 981130 . 27840 . 12962 . 04221 . 58
271814 . 64948 . 471107 . 25856 . 12946 . 73218 . 36
281728 . 89913 . 311065 . 74814 . 04911 . 39216 . 25
291587 . 36859 . 561008 . 93807 . 57864 . 18244 . 56
301384 . 68757 . 81881 . 53716 . 36767 . 14229 . 93
31974 . 91508 . 81613 . 44508 . 01526 . 43143 . 61
01006 . 58530 . 95637 . 48501 . 16551 . 60142 . 88
11002 . 17521 . 34626 . 39515 . 00534 . 56132 . 72
2922 . 59473 . 26564 . 18475 . 88472 . 06119 . 75
3399 . 09192 . 27233 . 31202 . 71189 . 7836 . 95
463 . 8230 . 2536 . 9821 . 3930 . 637 . 90
519 . 4712 . 499 . 576 . 337 . 963 . 58
VICReg-exp69 . 427 . 195 . 413 . 804 . 732 . 55
7375 . 38180 . 63216 . 71191 . 99176 . 9431 . 86
636 . 60314 . 20380 . 13341 . 21312 . 0466 . 31
8139 . 28
9 101002 . 29 1048 . 58528 . 76629 . 07517 . 84536 . 91 581 . 30158 . 24
111326 . 31556 . 24 733 . 86673 . 46 875 . 62547 . 15 707 . 34771 . 39208 . 83
MethodRunImageNetiNat18Places205EuroSatSUN397Cars
0382 . 33224 . 68252 . 33207 . 81220 . 6869 . 48
1278 . 88163 . 91183 . 32154 . 70160 . 7150 . 29
2169 . 33101 . 44114 . 4997 . 8999 . 8434 . 97
348 . 4732 . 3834 . 9332 . 7731 . 7612 . 53
423 . 2216 . 7217 . 9017 . 7016 . 637 . 38
VICReg-ctr512 . 8810 . 0310 . 3110 . 669 . 715 . 01
622 . 9616 . 8717 . 7717 . 3016 . 557 . 61
796 . 3362 . 0868 . 0560 . 6860 . 3922 . 59
8251 . 52146 . 09166 . 32138 . 75143 . 8145 . 73
9309 . 22177 . 32204 . 38170 . 65175 . 8153 . 83
10316 . 89184 . 83213 . 74175 . 10185 . 9159 . 07
0109 . 07105 . 65104 . 6576 . 13105 . 6492 . 59
1164 . 07148 . 71149 . 89100 . 17148 . 00113 . 61
2244 . 34184 . 32203 . 04129 . 53188 . 30105 . 89
3150 . 9094 . 61116 . 9483 . 98102 . 1740 . 64
487 . 6957 . 7867 . 2354 . 6259 . 7925 . 36
563 . 6842 . 2348 . 2240 . 8343 . 2218 . 41
6110 . 59106 . 83105 . 8376 . 97106 . 8293 . 98
7165 . 49149 . 55150 . 27103 . 19148 . 65113 . 60
8246 . 56184 . 69204 . 24128 . 96189 . 86107 . 43
9164 . 66102 . 61128 . 1295 . 47112 . 2043 . 29
109 . 8830 . 272 . 7455 . 4665 . 0825 . 57
1163 . 6142 . 0048 . 4040 . 8643 . 2018 . 62
12122 . 60118 . 57116 . 9385 . 13118 . 16103 . 25
13197 . 36173 . 50176 . 32116 . 61173 . 24128 . 89
14313 . 67220 . 05239 . 53160 . 52222 . 80111 . 73
SimCLR15299 . 47172 . 75209 . 43140 . 66183 . 5161 . 44
16220 . 63122 . 46150 . 73106 . 96130 . 1640 . 02
17128 . 3371 . 7590 . 4065 . 7778 . 6426 . 24
1871 . 7548 . 9564 . 2548 . 6554 . 8418 . 93
19301 . 92173 . 11211 . 03147 . 45185 . 0460 . 83
20299 . 75173 . 05208 . 52141 . 84182 . 2161 . 56
21299 . 96173 . 61209 . 18144 . 25181 . 9961 . 11
22300 . 90173 . 89209 . 47147 . 78184 . 4561 . 40
23300 . 58174 . 18207 . 19142 . 58184 . 2960 . 94
24300 . 83174 . 63207 . 18146 . 11182 . 2760 . 50
2511 . 5615 . 9531 . 99144 . 8713 . 553 . 92
26293 . 13172 . 80211 . 66139 . 57184 . 9465 . 02
27295 . 23173 . 07208 . 46139 . 91181 . 0562 . 32
28299 . 47172 . 75209 . 43140 . 66183 . 5161 . 44
29298 . 69172 . 12206 . 63142 . 92181 . 3960 . 88
30294 . 42170 . 26177 . 3958 . 94
201 . 98141 . 29
MethodRunImageNetiNat18Places205EuroSatSUN397Cars
DINO0150 . 5162 . 45106 . 14100 . 0890 . 4623 . 52
DINO1276 . 35110 . 16179 . 46163 . 25147 . 5036 . 81
DINO2482 . 51190 . 78291 . 91213 . 05236 . 7652 . 59
DINO3409 . 79180 . 78269 . 96196 . 90225 . 4770 . 37
4347 . 69168 . 61235 . 11168 . 90202 . 5155 . 07
5523 . 67330 . 31392 . 16271 . 77356 . 99155 . 13
661 . 5921 . 3239 . 8934 . 8832 . 552 . 52
722 . 438 . 2617 . 919 . 8615 . 621 . 93
811 . 015 . 848 . 278 . 087 . 831 . 39
MethodRunImageNetCIFAR10CIFAR100FOOD101VOC07CLEVR-count
0102 . 0738 . 1044 . 3914 . 6132 . 407 . 03
1229 . 8192 . 53129 . 4788 . 7898 . 4412 . 58
2374 . 25135 . 79206 . 29120 . 31163 . 3119 . 77
3612 . 12261 . 34336 . 16228 . 60265 . 6438 . 90
4831 . 49382 . 55467 . 68366 . 78374 . 5059 . 15
5952 . 55449 . 44539 . 24428 . 87435 . 9477 . 36
61033 . 93493 . 50587 . 19477 . 69478 . 3488 . 28
71088 . 13531 . 16630 . 80514 . 70517 . 4799 . 97
81442 . 63726 . 28849 . 29693 . 16723 . 53161 . 76
91809 . 06947 . 811110 . 80855 . 76954 . 83210 . 06
101920 . 811054 . 701247 . 93870 . 561075 . 89258 . 33
111938 . 441087 . 451275 . 60924 . 661119 . 33306 . 90
121937 . 781100 . 541337 . 88963 . 141172 . 38382 . 18
131944 . 951095 . 951307 . 62968 . 961155 . 65352 . 50
141940 . 041095 . 911280 . 85910 . 161126 . 89324 . 51
151942 . 121049 . 721240 . 87893 . 251070 . 12269 . 96
VICReg161521 . 07782 . 39919 . 54725 . 49771 . 86169 . 75
171278 . 67637 . 18757 . 19606 . 98627 . 48128 . 96
181079 . 67532 . 00634 . 88527 . 59524 . 80111 . 28
19909 . 71446 . 52525 . 65454 . 22431 . 4488 . 55
20777 . 82376 . 39447 . 53378 . 06360 . 4173 . 57
211409 . 29890 . 97996 . 12814 . 00889 . 66352 . 57
221652 . 41936 . 471070 . 40837 . 76932 . 17275 . 04
231809 . 06947 . 811110 . 80855 . 76954 . 83210 . 06
241422 . 16648 . 60813 . 33532 . 92650 . 3391 . 44
25101 . 2944 . 1246 . 0020 . 7736 . 6010 . 68
261821 . 80959 . 981130 . 27840 . 12962 . 04221 . 58
271814 . 64948 . 471107 . 25856 . 12946 . 73218 . 36
281728 . 89913 . 311065 . 74814 . 04911 . 39216 . 25
291587 . 36859 . 561008 . 93807 . 57864 . 18244 . 56
301384 . 68757 . 81881 . 53716 . 36767 . 14229 . 93
31974 . 91508 . 81613 . 44508 . 01526 . 43143 . 61
01006 . 58530 . 95637 . 48501 . 16551 . 60142 . 88
11002 . 17521 . 34626 . 39515 . 00534 . 56132 . 72
2922 . 59473 . 26564 . 18475 . 88472 . 06119 . 75
3399 . 09192 . 27233 . 31202 . 71189 . 7836 . 95
463 . 8230 . 2536 . 9821 . 3930 . 637 . 90
519 . 4712 . 499 . 576 . 337 . 963 . 58
VICReg-exp69 . 427 . 195 . 413 . 804 . 732 . 55
7375 . 38180 . 63216 . 71191 . 99176 . 9431 . 86
636 . 60314 . 20341 . 21
8380 . 13312 . 0466 . 31
91002 . 29 1048 . 58528 . 76 556 . 24629 . 07517 . 84536 . 91 581 . 30139 . 28 158 . 24
10 111326 . 31733 . 86673 . 46 875 . 62547 . 15 707 . 34771 . 39208 . 83
MethodRunImageNetCIFAR10CIFAR100FOOD101VOC07CLEVR-count
0382 . 33224 . 68252 . 33207 . 81220 . 6869 . 48
1278 . 88163 . 91183 . 32154 . 70160 . 7150 . 29
2169 . 33101 . 44114 . 4997 . 8999 . 8434 . 97
348 . 4732 . 3834 . 9332 . 7731 . 7612 . 53
423 . 2216 . 7217 . 9017 . 7016 . 637 . 38
VICReg-ctr512 . 8810 . 0310 . 3110 . 669 . 715 . 01
696 . 3362 . 0868 . 0560 . 6860 . 3922 . 59
7251 . 52146 . 09166 . 32138 . 75143 . 8145 . 73
8309 . 22177 . 32204 . 38170 . 65175 . 8153 . 83
9316 . 89184 . 83213 . 74175 . 10185 . 9159 . 07
0109 . 07105 . 65104 . 6576 . 13105 . 6492 . 59
1164 . 07148 . 71149 . 89100 . 17148 . 00113 . 61
2244 . 34184 . 32203 . 04129 . 53188 . 30105 . 89
3150 . 9094 . 61116 . 9483 . 98102 . 1740 . 64
487 . 6957 . 7867 . 2354 . 6259 . 7925 . 36
563 . 6842 . 2348 . 2240 . 8343 . 2218 . 41
6110 . 59106 . 83105 . 8376 . 97106 . 8293 . 98
7165 . 49149 . 55150 . 27103 . 19148 . 65113 . 60
8246 . 56184 . 69204 . 24128 . 96189 . 86107 . 43
9164 . 66102 . 61128 . 1295 . 47112 . 2043 . 29
109 . 8830 . 272 . 7455 . 4665 . 0825 . 57
1163 . 6142 . 0048 . 4040 . 8643 . 2018 . 62
12122 . 60118 . 57116 . 9385 . 13118 . 16103 . 25
13197 . 36173 . 50176 . 32116 . 61173 . 24128 . 89
14313 . 67220 . 05239 . 53160 . 52222 . 80111 . 73
SimCLR15299 . 47172 . 75209 . 43140 . 66183 . 5161 . 44
16220 . 63122 . 46150 . 73106 . 96130 . 1640 . 02
17128 . 3371 . 7590 . 4065 . 7778 . 6426 . 24
1871 . 7548 . 9564 . 2548 . 6554 . 8418 . 93
19301 . 92173 . 11211 . 03147 . 45185 . 0460 . 83
20299 . 75173 . 05208 . 52141 . 84182 . 2161 . 56
21299 . 96173 . 61209 . 18144 . 25181 . 9961 . 11
22300 . 90173 . 89209 . 47147 . 78184 . 4561 . 40
23300 . 58174 . 18207 . 19142 . 58184 . 2960 . 94
24300 . 83174 . 63207 . 18146 . 11182 . 2760 . 50
2511 . 5615 . 9531 . 99144 . 8713 . 553 . 92
26293 . 13172 . 80211 . 66139 . 57184 . 9465 . 02
27295 . 23173 . 07208 . 46139 . 91181 . 0562 . 32
28299 . 47172 . 75209 . 43140 . 66183 . 5161 . 44
29298 . 69172 . 12181 . 3960 . 88
30206 . 63142 . 9258 . 94
294 . 42170 . 26201 . 98141 . 29177 . 39
MethodRunImageNetCIFAR10CIFAR100FOOD101VOC07CLEVR-count
DINO0150 . 5162 . 45106 . 14100 . 0890 . 4623 . 52
DINO1276 . 35110 . 16179 . 46163 . 25147 . 5036 . 81
DINO2482 . 51190 . 78291 . 91213 . 05236 . 7652 . 59
DINO3409 . 79180 . 78269 . 96196 . 90225 . 4770 . 37
4347 . 69168 . 61235 . 11168 . 90202 . 5155 . 07
5523 . 67330 . 31392 . 16271 . 77356 . 99155 . 13
661 . 5921 . 3239 . 8934 . 8832 . 552 . 52
722 . 438 . 2617 . 919 . 8615 . 621 . 93
811 . 015 . 848 . 278 . 087 . 831 . 39

Joint-Embedding Self Supervised Learning (JE-SSL) has seen a rapid development, with the emergence of many method variations but only few principled guidelines that would help practitioners to successfully deploy them. The main reason for that pitfall comes from JE-SSL’s core principle of not employing any input reconstruction therefore lacking visual cues of unsuccessful training. Adding non informative loss values to that, it becomes difficult to deploy SSL on a new dataset for which no labels can help to judge the quality of the learned representation. In this study, we develop a simple unsupervised criterion that is indicative of the quality of the learned JE-SSL representations: their effective rank. Albeit simple and computationally friendly, this method —coined RankMe— allows one to assess the performance of JE-SSL representations, even on different downstream datasets, without requiring any labels. A further benefit of RankMe is that it does not have any training or hyper-parameters to tune. Through thorough empirical experiments involving hundreds of training episodes, we demonstrate how RankMe can be used for hyperparameter selection with nearly no reduction in final performance compared to the current selection method that involve a dataset’s labels. We hope that RankMe will facilitate the deployment of JE-SSL towards domains that do not have the opportunity to rely on labels for representations’ quality assessment.

Self-supervised learning (SSL) has shown great progress to learn informative data representations in recent years (Chen et al., 2020a; He et al., 2020; Chen et al., 2020b; Grill et al., 2020; Lee et al., 2021; Caron et al., 2020; Zbontar et al., 2021; Bardes et al., 2021; Tomasev et al., 2022; Caron et al., 2021; Chen et al., 2021; Li et al., 2022b; Zhou et al., 2022a, b; HaoChen et al., 2021; He et al., 2022), catching up to supervised baselines and even surpassing them in few-shot learning, i.e., when evaluating the SSL model from only a few labeled examples. Although various SSL families of losses have emerged, most are variants of the joint-embedding (JE) framework with a siamese network architecture (Bromley et al., 1994), denoted as JE-SSL for short. The only technicality we ought to introduce to make our study precise is the fact that JE-SSL has introduced some different notations to denote an input’s representation. In short, JE-SSL often composes a backbone or encoder network e.g., a ResNet50 and a projector network e.g., a multilayer perceptron. This projector is only employed during training, and we refer to its outputs as embeddings, while the actual inputs’ representation employed for downstream tasks are obtained at the encoder’s output.

Although downstream tasks performance of JE-SSL representations might seem impressive, one pondering fact should be noted: all existing methods, hyperparameters, models — and thus performances — are obtained by manual search involving the labels of the considered datasets. In words, JE-SSL is tuned by monitoring the supervised performance of the model at hand. Therefore, successfully deploying a SSL model on a new dataset relies on the strong assumption of having labels on that dataset to tune the SSL method e.g. through a linear classifier feeding on the JE-SSL representations (Misra & Maaten, 2020). This quality assessment strategy was also extended to the use of nonlinear classifiers, e.g., a k𝑘k-nn classifier (Wu et al., 2018; Zhuang et al., 2019). Hence, although labels are not directly employed to compute the weight updates, they are used as a proxy. This limitation prevents the deployment of JE-SSL in challenging domains where the number of available labelled examples is limited. Adding to the challenge, one milestone of JE-SSL is to move away from reconstruction based learning; hence without labels and without visual cues, tuning JE-SSL methods on unlabeled datasets remains challenging. This led to the application of feature inversion methods e.g. Deep Image Prior (Ulyanov et al., 2018) or conditional diffusion models (Bordes et al., 2021) to be deployed onto learned JE-SSL representation to try to visualize the learned features. Those alternative visualization solutions however suffer from their own limitations e.g. bias of the used method, or computational cost. More importantly, those feature inversion strategies have been designed for natural images i.e. it is not clear how such methods would perform on different data modalities.

In this study we propose RankMe to assess a model’s performance without having access to any labels; a simple method that does not require any training or tuning. RankMe accurately predicts a model’s performance both on In-Distribution (ID), i.e., same data distribution as used during the JE-SSL training, and on Out-Of-Distribution (OOD), i.e., different data distribution onto which the learned model is deployed onto. We highlight this crucial property at the top of Figure 1. The strength of RankMe lies in the fact that it is solely based on the singular values distribution of the learned embeddings which is not only simple to obtain but also easy to interpret. In fact, RankMe’s motivation hinges on Cover’s theorem (Cover, 1965) that states how increasing the rank of a linear classifier’s input increases its training performance, and three simple hypotheses that thoroughly validate empirically at the end of our study. Since RankMe provides a step towards (unlabeled) JE-SSL by allowing practitioners to cross-validate hyperparameters and select models without resorting to labels or feature inversion methods, we hope that it will allow JE-SSL to move away from using labels as part of their design search strategy. We summarize our contributions below:

We introduce RankMe (Equation 1) and motivate its construction from first principles (Section 5) e.g. Cover’s theorem

We demonstrate that RankMe’s ability to inform about JE-SSL downstream performances is consistent across methods, e.g. VICReg, SimCLR, DINO, and their variants, and across architectures, e.g. using a projector network and/or a nonlinear evaluation method (see Figures 2 and 3.3)

We demonstrate that RankMe enables hyperparameter cross-validation for JE-SSL methods; RankMe is able to retrieve and sometimes surpass most of the performance previously found by manual –and label-guided– search while not employing any labels, on both in domain and out of domain datasets (Figures 1, 1 and 2)

We provide a hyperparameter free numerically stable implementation of RankMe in Section 3.1 and pseudo-code for cross-validation in Figure 4. Through extensive experiments involving 11 datasets and 110 models over 5 methods, we demonstrate that in the linear and nonlinear probing regime, RankMe is able to tell apart successful and sub-optimal JE-SSL training, even on different downstream tasks without having access to labels or downstream task data samples.

Joint embedding self-supervised learning (JE-SSL). In JE-SSL, two main families of method can be distinguished: contrastive and non-contrastive. Contrastive methods (Chen et al., 2020a; He et al., 2020; Chen et al., 2020b, 2021; Yeh et al., 2021) mostly rely on the InfoNCE criterion (Oord et al., 2018) except for (HaoChen et al., 2021) which uses squared similarities between the embedding. A clustering variant of contrastive learning has also emerged (Caron et al., 2018, 2020, 2021) and can be thought of as contrastive methods, but between cluster centroids instead of samples. Non-contrastive methods (Grill et al., 2020; Chen & He, 2020; Caron et al., 2021; Bardes et al., 2021; Zbontar et al., 2021; Ermolov et al., 2021; Li et al., 2022c) aim at bringing together embeddings of positive samples, similar to contrastive learning. However, a key difference with contrastive learning lies in how those methods prevent a representational collapse. In the former, the criterion explicitly pushes away negative samples, i.e., all samples that are not positive, from each other. In the latter, the criterion does not prevent collapse by distinguishing positive and negative samples, but instead considers the embeddings as a whole and encourages information content maximization e.g., by regularizing the empirical covariance matrix of the embeddings. Such a categorization is not needed for our development, and we thus refer to any of the above method as JE-SSL.

Known Observations About Representations’ Spectrum in JE-SSL. The phenomenon of learning rank-deficient or dimensional collapsed, embeddings in JE-SSL has recently been studied from both a theoretical and empirical point of view. The empirical emergence of dimensional collapse was studied in (Hua et al., 2021) where they proposed the use of a whitening batch normalization layer to help alleviate it. In (Jing et al., 2022), a focus on contrastive approaches in a linear setting enabled a better understanding of dimensional collapse and the role of augmentations in its emergence. Performance in a low label regime of a partially collapsed encoder can also be improved by forcing the whitening of its output, as shown in (He & Ozay, 2022). Furthermore, it was shown in (Balestriero & LeCun, 2022) how dimensional collapse is a phenomenon that should not necessarily happen in theory and how its emergence is mostly due to practical concerns. Interestingly, we will see through the lens of RankMe that dimensional collapse is tightly linked with the quality of the representation. In supervised learning, the collapse of the embeddings was also studied and found to be detrimental to performances (Ganea et al., 2019).

As such, existing studies have started to prescribe informally the choice of representations that have a lesser collapse; yet no formal study on the ability of this recipe to actually identify successfully trained models, nor how to quantify the amount of collapse to improve representations as been proposed; this is the goal of our study.

The goal of this section is to introduce and motivate RankMe while providing a numerically stable implementation. We defer a theoretical justification to Section 5. To ease notations, we refer to the (train) dataset used to obtain the JE-SSL model as source dataset, and the test set on the same dataset or a different OOD dataset as target dataset.

The most crucial step of RankMe is the estimation of the embeddings’ rank. A trivial solution could be to check at the number of nonzero singular values. Denoting by σksubscript𝜎𝑘\sigma_{k} the kthsuperscript𝑘thk^{\rm th} singular value of the (N×K)𝑁𝐾(N\times K) embedding matrix 𝒁𝒁{\bm{Z}}, this would lead to rank⁡(𝒁)=∑k=1min⁡(N,K)1{σk>0}rank𝒁superscriptsubscript𝑘1𝑁𝐾subscript1subscript𝜎𝑘0\operatorname{rank}({\bm{Z}})=\sum_{k=1}^{\min(N,K)}1_{{\sigma_{k}>0}}. However, such a definition is too rigid for practical scenarios. For example, round-off error alone could have a dramatic impact on the rank estimate. Instead, alternative and robust rank definitions have emerged (Press et al., 2007) such as rank⁡(𝒁)=∑k=1min⁡(N,K)1{σk>maxi⁡σi×max⁡(M,N)×ϵ},rank𝒁superscriptsubscript𝑘1𝑁𝐾subscript1subscript𝜎𝑘subscript𝑖subscript𝜎𝑖𝑀𝑁italic-ϵ\operatorname{rank}({\bm{Z}})=\sum_{k=1}^{\min(N,K)}1_{{\sigma_{k}>\max_{i}\sigma_{i}\times\max(M,N)\times\epsilon}}, where ϵitalic-ϵ\epsilon is a small constant dependent on the data type, typically 10−7superscript107~{}10^{-7} for float32. An alternative measure of rank comes from a probabilistic viewpoint where the singular values are normalized to sum to 1 and the Shannon Entropy (Shannon, 1948) is used, which corresponds to our definition of RankMe from Equation 1. We thus introduce RankMe formally as the following smooth rank measure, originally introduced in (Roy & Vetterli, 2007),

where 𝒁𝒁{\bm{Z}} is the source dataset’s embeddings. As opposed to the classical rank, the chosen Equation 1 does not rely on specifying the exact threshold at which the singular value is treated as nonzero. Throughout our study, we employ Equation 1, and provide the matching analysis with the classical rank in the appendix. Another benefit of RankMe’s Equation 1 is in its quantification of the whitening of the embeddings in addition to their rank, which is known to simplify optimization of (non)linear probes put on top of them (Santurkar et al., 2018). Lastly, although Equation 1 is defined with the full embedding matrix 𝒁𝒁{\bm{Z}}, we observe that not all of the samples need to be used to have an accurate estimate of RankMe. In practice, we use 256002560025600 samples as ablation studies provided in Appendix G and Figure S11 indicate that this provides a highly accurate estimate. RankMe should however only be used to compare different runs of a given method, since the embeddings’ rank is not the only factor that affects performance.

Relation of RankMe To Existing Solutions. Performance evaluation without labels can also be done using a pretext-task, such as rotation prediction. This technique helped in selecting data augmentation policies in (Reed et al., 2021). One limitation lies in the need to select and train the classifier of the pretext-task, and on the strong assumption that rotation were not part of the transformations one aimed to be invariant to. Since (supervised) linear evaluation is the most widely used evaluation method, we will focus on showing how RankMe compares with it. In (Li et al., 2022a), it is shown that the eigenspectrum of representations can be used to assess performance when used in conjunction with the loss value. This requires training an additional classifier to predict the performance and as such is not usable as is in a completely unsupervised fashion. Most related to us is (Ghosh et al., 2022) where representations are evaluated by their eigenspectrum decay, giving a baseline for unsupervised hyperparameter selection. α𝛼\alpha-ReQ relies on strong assumptions, and if they hold, then RankMe and α𝛼\alpha-ReQ can match, but we show that we outperform it on average. In fact the assumptions made by α𝛼\alpha-ReQ are known to not hold in light of collapse (He & Ozay, 2022). We investigate α𝛼\alpha-ReQ’s behavior in detail in Appendix E.

In order to empirically validate RankMe, we compare it to linear evaluation, which is the default evaluation method of JE-SSL methods. Finetuning has gained in popularity with Masked Image Modeling methods (He et al., 2021), but this can have a significant impact on the properties of the embeddings and alters what was learned during the pretraining. As such, we do not focus on this evaluation.

Experimental Methods and Datasets Considered. In order to provide a meaningful assessment of the embeddings rank’s impact on performance, we focus on 5 JE-SSL methods. We use SimCLR as a representative contrastive method, VICReg as a representative covariance based method, and VICReg-exp and VICReg-ctr which were introduced in (Garrido et al., 2022). We also include DINO (Caron et al., 2021) as a clustering approach. Applying RankMe to DINO is not as straightforward due to the clustering layer in the projector, so embeddings have to be taken right before the last projector layer. Confer Appendix C for more details. To make our work self-contained, we present the methods in Appendix A. We chose to use VICReg-exp and VICReg-ctr as they provide small modifications to VICReg and SimCLR while producing embeddings with different rank properties. For each method we vary parameters that directly influence the rank of the embeddings, whether it is the temperature used in softmax based methods, which directly impacts the hardness of the softmax, or the loss weights to give more or less importance to the regularizing aspect of loss functions. We also vary optimization parameters such as the learning rate and weight decay to provide a more complete analysis. We provide the hyperparameters used for all experiments in Appendix K. All approaches were trained in the same experimental setting with a ResNet-50 (He et al., 2016) backbone with a MLP projector having intermediate layers of size 8192,8192,20488192819220488192,8192,2048, which avoids any architectural rank constraints. The models were trained for 100 epochs on ImageNet with the LARS (You et al., 2017; Goyal et al., 2017) optimizer. DINO was also trained using multi-crop.

In order to evaluate the methods, we use ImageNet (our source dataset), as well as iNaturalist18 (Horn et al., 2018), Places205 (Zhou et al., 2014), EuroSat (Helber et al., 2019), SUN397 (Xiao et al., 2010), and StanfordCars (Krause et al., 2013) to evaluate the trained models on unseen datasets. While we focus on these datasets for our visualizations, we also include CIFAR10, CIFAR100 (Krizhevsky et al., 2009), Food101 (Bossard et al., 2014), VOC07 (Everingham et al., ) and CLVR-count (Johnson et al., 2017) for our hyperparameter selection results, and provide matching visualizations in Appendix D. These commonly used datasets provide a wide range of scenarios that differ from ImageNet and provide meaningful ways to test the robustness of RankMe. For example, iNaturalist18 consists of 8142 classes focused on fauna and flora which requires more granularity than similar classes on ImageNet, SUN397 focuses on scene understanding, deviating from the single object and object-centric images of ImageNet, and EuroSat consists of satellite images which again differ from ImageNet. Datasets such as iNaturalist can also allow theoretical limitations to manifest themselves more clearly due to the number of classes being significantly higher than the rank of learned representations. In order to evaluate on those datasets, we rely on the VISSL library (Goyal et al., 2021). We provide complete details on the pretraining and evaluation setup in Appendix I.

RankMe as a prediction of linear classification accuracy. As we can see in Figures 2 and 1, for a given method the performance on the representations is improved by a higher embedding rank, whether we look on ImageNet on which the models were pretrained or on downstream datasets. This is best seen when looking at DINO, where we notice a clear trend across all datasets. On EuroSat, the relationship is not clear since the performances are so close between all models. When looking at VICReg on on StanfordCars, we can clearly see that a high rank is only a necessary condition. Here the best performance is not achieved with the highest rank, even if full rank embeddings still achieve good performance. We discuss the link between rank, number of classes, and performance in Section 5 to give some insights into RankMe’s behavior in settings with few classes such as StanfordCars. It is also very tempting to draw conclusions when comparing different approaches, especially when looking at the ImageNet performance, however since dimensional collapse is not the only performance deciding factor one should refrain from doing so.

While we have been focusing on linear evaluation, one can wonder if the behaviors change when using a more complex task-related head. We thus give some evidence that the previously observed behaviors are similar with a non-linear classification head. we use a simple 3 layer MLP with intermediate dimensions 204820482048, where each layer is followed by a ReLU activation. This choice of dimensions ensures that there are no architectural rank constraints on the embeddings. We focus on SUN397 for its conceptual difference to ImageNet. The low rank of embeddings produced by SimCLR would suggest that a non-linear classifier might help improve performance, since it is not as theoretically limited by the embeddings’ rank as it is in the linear setting. However we can see in Figure 3 that the behaviors for all methods are the same as in the linear regime. This would suggest that RankMe is also a suitable metric to evaluate downstream performance in a non-linear setting. We perform the same analysis using a k𝑘k-NN classifier, following the protocol of (Zhuang et al., 2019; Caron et al., 2020), where we use 36 combinations of k𝑘k and temperature and report the best performance. We see in Figure 3 that RankMe remains a good predictor of dowstream performance, with curves that are similar to what was observed with a linear classifier. Since a k𝑘k-NN classifier evaluates the preservation of the euclidean distance instead of the linear separability, the results suggest that RankMe can extend to more evaluation protocols.

We previously focused on validating RankMe by comparing overall performance compared to linear evaluation. In this section we focus on the evolution of rank and performance when varying one hyperparameter at a time in order to demonstrate how RankMe can be used for hyperparameter selection. We focus on loss specific hyperparameters such as the loss weights or temperature as well as hyperparameters related to optimization, such as the learning rate and weight decay.

As we have shown before, having a higher rank is necessary for better performance, and using RankMe to find the best value of an hyperparameter is as simple as choosing the value that leads to the highest rank, as illustrated in Figure 4. Certain hyperparameters will lead to plateaus of equal rank, and for those the value that first achieves the maximal value of RankMe should be selected. This second part is however only applicable when hyperparameter values can be ordered. Even in cases where the values cannot be compared, and equal ranks are found in a different setting, this still makes it possible to discard some runs and only focus on the one that achieve the maximal rank. This further highlights how maximal rank is only a necessary condition for good performance. Nonetheless, when the hyperparameters are ordered we can go one step further and use the rank alone to find a good hyperparameter value.

In order to demonstrate the effectiveness of RankMe for hyperparameter selection, we apply the algorithm presented in Figure 4 to find the best values for a given set of hyperparameters for VICReg, SimCLR and DINO. Our focus is on the covariance and invariance weights in VICReg, the temperature in SimCLR, on learning rate and weight decay for both, and on the student and teacher temperatures in DINO. We compare the performance on ImageNet as well as the average performance on the previously discussed OOD datasets to models selected by their ImageNet top-1 accuracy on its validation set. For per dataset performance, confer Appendix J.

On the embeddings. As we can see in Table 1, using RankMe we are able to retrieve most of the performance on ImageNet, with gaps being lower than half a point on average. It is not possible to beat the selection using ImageNet’s validation, since this is the metric we are evaluating on. However, on OOD datasets we are able to improve the performance in certain settings, while having similar performance on average. Thus, when comparing performance after the projector, RankMe is the better approach of the two to select the hyperparameters that will generalize best to unseen datasets. When comparing to α𝛼\alpha-ReQ, RankMe achieves better in domain performance, but on OOD datasets α𝛼\alpha-ReQ performs slightly better, though with bigger worst case performance gaps. We provide an in-depth analysis of α𝛼\alpha-ReQ in Appendix E, where we find that the power law prior of α𝛼\alpha-ReQ fails on the embeddings and as such those results must be interpreted with care. As pointed out in (Girish et al., 2022), using ImageNet performance to select models can lead to suboptimal performance in downstream tasks, which our results further confirm and reinforces the need for a new way of selecting hyperparameters.

On the representations. When looking at performance before the projector in Figure 1, we can see that RankMe does not beat the models selected with ImageNet’s validation set, even on OOD datasets. However, RankMe performs better than α𝛼\alpha-ReQ in most settings, while not suffering from as severe drops in the worst cases. Nevertheless, the gaps between RankMe and the ImageNet oracle are on average of less than half a point, which shows how competitive RankMe can be for hyperparameter selection, despite using no labeled data, having no parameters to tune, and being able to be computed in a couple of minutes.

iNat-18 pretraining. To show how our results extend beyond ImageNet pretraining, we applied the same protocol but pretrained our models on iNat-18. For these experiments we only compare for SimCLR’s temperature and VICReg’s covariance weight. Due to the high number of classes of iNat-18, we chose a projector with output dimension 8192. Since the rank cannot be higher than 2048, we apply a threshold to not choose the highest rank but the highest realistically possible. See Appendix B for more details. We also compare RankMe to the performance on ImageNet, to imitate a practical setting where we do not have labels for our source dataset, but have access to labels for another related one. As we can see in Table 2, for VICReg’s covariance weight, RankMe leads to performance similar to the iNat-18 oracle on iNat-18, but slightly outperforms it on OOD datasets. It also beats the ImageNet oracle and α𝛼\alpha-ReQ by a significant margin. On SimCLR’s temperature, we notice a small drop in performance for RankMe compared to the oracles, but it still outperforms α𝛼\alpha-ReQ by a significant margin in all settings. These results further reinforce the use of RankMe in general settings, even beyond ImageNet.

Finetuning based benchmarks. While we have studied how RankMe is able to perform hyperparameter selection when targeting linear evaluation, finetuning based evaluations are also popular for tasks such as semi-supervised classification or object detection. Even though this setup alters the pretrained weight and thus can change the rank of the representations, our goal is to see whether RankMe can still be used when targeting these evaluations. We compare against the task oracle, α𝛼\alpha-ReQ and to the ImageNet linear evaluation oracle, which allows us to see if linear accuracy on ImageNet is correlated for these benchmarks. We evaluate all methods on ImageNet 1% and 10% for semi-supervised classification, as well as on PascalVOC07+12 for object detection, following the protocol of (Bardes et al., 2021). As we can see in Table 3, RankMe is able to retrieve most of the performance of the task oracle, except in the case of SimCLR’s temperature on ImageNet-1% where all methods lag behind the task oracle. This also shows that the linear performance on the full ImageNet dataset is not perfectly correlated with the performance in a few shot setting. There is no clear winner between α𝛼\alpha-ReQ and RankMe in these finetuning-based evaluations, but we can see smaller drops in performance for RankMe, similarly as in previous experiments. Nevertheless, these results suggest that even in a setting where the applicability of RankMe is not guaranteed due to the finetuning, it can still be a good method to select hyperparameters in an unsupervised fashion.

Our goal is to build a theoretically grounded intuition into the construction of RankMe. To that hand, we first quantify approximation and classification errors of learned embeddings as a function of their rank, and then motivate how embeddings’ rank can be sufficient to compare test performance of JE-SSL models’s representations.

From Source Embeddings’s Rank to Target Representations performance. We first build some intuition in the regression settings. In this case, the Eckart-Young-Mirsky theorem (Eckart & Young, 1936) ties the best-case and worst-case approximation error of any target matrix 𝒀∈ℝN×C𝒀superscriptℝ𝑁𝐶{\bm{Y}}\in\mathbb{R}^{N\times C} from a rank-R𝑅R matrix 𝑷∈ℝN×C𝑷superscriptℝ𝑁𝐶{\bm{P}}\in\mathbb{R}^{N\times C} to the singular values of 𝒀𝒀{\bm{Y}} that run from R𝑅R to the rank of 𝒀𝒀{\bm{Y}} when ordered in decreasing order. Without loss of generality, we only consider the case C>N𝐶𝑁C>N in this study, i.e., we have more samples than dimensions. Formally, this provides a lower bound on

which is tight for 𝑷𝑷{\bm{P}} of rank R𝑅R, and with σksubscript𝜎𝑘\sigma_{k} the operator returning the kthsuperscript𝑘thk^{\rm th} singular value of its argument, ordered in decreasing order. This result, on which RankMe relies on, demonstrates that a necessary (but not sufficient) condition for an approximation 𝑷𝑷{\bm{P}} to well approximate 𝒀𝒀{\bm{Y}} is to have at least the same rank as 𝒀𝒀{\bm{Y}}. A similar result can be obtained in classification by considering multiple one-vs-all classifiers. In practice, however, we commonly employ a linear probe network on top of given embeddings 𝒁𝒁{\bm{Z}} to best adapt them to the target 𝒀𝒀{\bm{Y}}, i.e., 𝑷=𝒁​𝑾+𝟏​𝒃T𝑷𝒁𝑾1superscript𝒃𝑇{\bm{P}}={\bm{Z}}{\bm{W}}+\mathbf{1}{\bm{b}}^{T}. However, a linear transformation is not able to increase the rank of the input matrix since

We directly obtain that min𝑾,𝒃⁡‖𝒀−𝒁​𝑾−𝟏​𝒃T‖F2≥∑r=R+1Cσr2​(𝒀)subscript𝑾𝒃superscriptsubscriptnorm𝒀𝒁𝑾1superscript𝒃𝑇𝐹2superscriptsubscript𝑟𝑅1𝐶subscriptsuperscript𝜎2𝑟𝒀\min_{{\bm{W}},{\bm{b}}}|{\bm{Y}}-{\bm{Z}}{\bm{W}}-\mathbf{1}{\bm{b}}^{T}|{F}^{2}\geq\sum{r=R+1}^{C}\sigma^{2}_{r}({\bm{Y}}). In short, the approximation lower bound is not improved by allowing linear transformation of the embeddings. Further supporting the above, we ought to recall Cover’s theorem (Cover, 1965) stating that the probability of a randomly labeled set of points being linearly separable only increases if N𝑁N is reduced or R𝑅R is increased. We combine those results below.

The maximum training accuracy of given embeddings in linear regression or classification increases with their rank. For classification, it plateaus when the rank surpasses the number of classes.

By noticing that RankMe provides a smooth measure of the embeddings’ rank we can lean on Proposition 5.1 to see that given two models, the one with greater RankMe value will have greater training performance. This is only guaranteed for different models of the same method, since embedding rank is not necessarily the only factor that affects performance. The above result is however not too practical yet since what we are truly interested in are (i) performance on unseen samples, i.e., on the test set and out-of-distribution tasks, and (ii) performance on the representations and not the embeddings since it is common to ablate the projector network of JE-SSL models. Below, we validate three key hypotheses which, when verified, imply that we can extend the impact of RankMe such that (OOD) test performance of JE-SSL representations are increased when RankMe’s value on their train set embeddings is increased.

Validating RankMe’s Hypotheses. The development of RankMe is theoretically grounded when it comes to guaranteeing improved source dataset embeddings performance. To empirically extend it to target dataset representations performance we need to verify three hypotheses: (i) linear probes do not overfit, (ii) embeddings and representations performance are monotonically linked, and (iii) source and (OOD) target embeddings ranks are monotonically linked. Due to the different nature of datasets used for downstream tasks, there is no inherent reason for the rank of embeddings to transfer in a monotonic way to them. However, if the source dataset is diverse enough and target datasets have some semantic overlap with the source dataset, then we have

We observe in Section 3.2 and Figure 5 that the rank of JE-SSL representations scales linearly between different input distributions e.g. going from a source task such as Imagenet (Deng et al., 2009) to a target task such as iNaturalist. This is further confirmed by Pearson correlation coefficients greater than 0.990.990.99. Interestingly, we observe that the StanfordCars dataset suffers from a less distinctive linear scaling due to the dataset distribution having a small overlap with ImageNet. This indicates that as long as the source dataset is relatively diverse, then using RankMe to select a model with greater embeddings’ rank will also correspond to selecting a model with greater embeddings’ rank on the target dataset. Furthermore, as the train performance increases, so does the test performance. We validate this in the middle right of Figure 5. As a result, using RankMe to select a model with greater train performance is enough to also select a model with greater test performance. Finally, we report on the right of Figure 5 that the performance on embeddings and representations scales almost monotonically. These results are supported by visualizations of embeddings and representations from feature inversion models (Bordes et al., 2021). Hence, using RankMe to select the model maximizing the performance on the former also selects a model maximizing performance on the latter. With these hypotheses validated empirically, we can confidently say that RankMe computed on the embeddings of the source dataset is a predictor of representations’ performance on target datasets, reinforcing our experimental insights.

We have shown how the phenomenon of dimensional collapse in self-supervised learning can be used as a powerful metric to evaluate models. By using a theoretically motivated analogue of the rank of embeddings, we show that the performance on downstream datasets can easily be assessed by only looking at the training dataset, without any labels, training, or parameters. While our work focuses on linear classification, we show promising results in non-linear classification that raise the question of how general this simple metric can be. Furthermore, its competitiveness with traditional oracle based hyperparameter selection methods makes it a promising tool in settings where labels are scarce, such as in the case of large uncurated datasets. As such, this work makes a step towards completely label-less self-supervised learning, as most existing approaches’ hyperparameters are tuned with the help of ImageNet’s validation set. Further work will explore the use of RankMe in more varied scenarios, to further legitimize its use in designing better self-supervised approaches.

The authors wish to thank Li Jing, Grégoire Mialon, Adrien Bardes, and Yubei Chen in no particular order, for insightful discussions. We also thank Florian Bordes for the efficient implementations that were used for our experiments.

In order to make our work as self-contained as possible, we recall the loss functions of the methods we study. For conciseness, we refer to the outputs of the encoder as representations and the outputs of the projection head as embeddings, which we denote by zi∈ℝdsubscript𝑧𝑖superscriptℝ𝑑z_{i}\in\mathbb{R}^{d}. We first briefly recall that the SimCLR loss is given by

with ℙℙ{\mathbb{P}} the set of all positive pairs in the current mini-batch or dataset that comprise N𝑁N exemplars.

VICReg’s loss is defined with three components. The variances loss v𝑣v acts as a norm regularizer for the dimensions, and the covariance loss aims at decorrelating dimensions in the embeddings. They are respectively defined as

Both of these loss are combined with an invariance loss that matches positives pairs, giving a final loss of

VICReg-exp is defined similarly, but with the exponential covariance loss defined as

VICReg-ctr is then VICReg-exp but applied to 𝒁Tsuperscript𝒁𝑇{\bm{Z}}^{T}, making it a contrastive approach and conceptually similar to SimCLR. These methods give us different scenarios of collapse and allow us to make a more general study of the rank of representations as a powerful metric.

As we can see in Figure S1, RankMe produces curves with the same trend as on ImageNet, for both SimCLR and VICReg. We can see that VICReg leads to ranks that go beyond 2048, but the dimension of the manifold formed by the embeddings cannot be higher than 2048 due to the dimension of the representations. As such, for any practical purpose we clip the value of RankMe at 2048.

While we have studied the applicability of RankMe on contrastive methods, cluster based methods such as DINO have become extremely popular, and since the definition of embeddings is not as clear cut in them, a thorough analysis is required. We will proceed in two steps:

Show that dimensional collapse happens right before the clustering layer, and on the prototypes

Show that RankMe is a good measure of performance on DINO

As we can see in figure S2, DINO’s projector can be interpreted as both a classical projector and a clustering layer, whose weights are clustering prototypes. This interpretation comes from the softmax that is applied on the output of the projection head which can be interpreted as an InfoNCE between the embeddings and the clustering prototypes that make up the clustering layer. We see that both the embeddings and the clustering prototypes are collapsed, though at different levels.

As we can see in Figure S3, the phenomenon of dimensional collapse is highly visible in DINO, which enables the use RankMe to find optimal hyperparameter values. While in Figure 1 we applied RankMe to the embeddings to be consistent with other methods, we see that it can be applied directly to the prototypes, yielding very similar results and matching the ImageNet oracle here. The main advantage coming from using prototypes is that they are already computed during training, and as such the application of RankMe does not require computing any embeddings. This makes RankMe even more appealing for clustering based methods where such technique can be applied.

While we previously focused on certain datasets for their interesting natures, we provide additional visualizations for the remaining datasets, as well as for performance on the embeddings.

As we can see in Figures S5 and S4, we find similar behaviors as before, apart from Food101 where performance are almost identical for all methods. This reinforces the previous validation of RankMe. The relative simplicity of the datasets targeted here makes the theoretical limitations of rank-deficient embeddings harder to see, even though we still see that a high rank helps generalization.

In order to further study the performance of α𝛼\alpha-ReQ, we reproduce our plots for RankMe using α𝛼\alpha-ReQ instead of the rank of embeddings. We compare both the intended use of α𝛼\alpha-ReQ in Figure S6, as well as applying it on the embeddings to measure performance on the representations, which we found was necessary for RankMe in Figure S7. We do not include DINO in those plots for readability, as it would force us to change the x-axis scale, making the results harder to interpret.

As we can see in Figure S6, there are no clear link visible between the value of α𝛼\alpha-ReQ and downstream performance. Especially, we are unable to see the tendency of performance to increase as α𝛼\alpha tends to one. Nonetheless α𝛼\alpha-ReQ was still able to lead to good performance when used for hyperparameter selection.

When applying α𝛼\alpha-ReQ as we would RankMe, we can see in Figure S7 that there is again no trend of performance increase when α𝛼\alpha tends to one. On the contrary we even find that performance tends to get better with a lower α𝛼\alpha, as is most visible on StanfordCars, iNaturalist18 or ImageNet for example. α𝛼\alpha going towards zero means that the singular values of the embeddings tends to a uniform distribution, in line with the goal of RankMe.

As we can see in Figures S8 and S9, the power-law prior of α𝛼\alpha-ReQ holds well in the case of non-collapsed embeddings, but when we apply it on collapsed ones, this assumptions fails. It even provides a poor approximation of the main rank "plateau" with the highest singular values as can be seen on the right of Figure S9. This further confirms the findings of (He & Ozay, 2022), and shows that when applying α𝛼\alpha-ReQ directly on the embeddings one must be careful since the core assumptions of the method is violated.

Since we do not rely on the classical threshold-based rank estimator, it is important to verify how well our entropy based one correlates with it. As we can see in Figure S10, both estimates discussed previously correlate extremely well, showing that using one or the other should not lead to significant differences, as validated in Appendix H. Nonetheless, the entropic estimator takes into account the degree of whitening of the embeddings, which links better to theoretical results.

As we can see in Figure S11, the rank estimates converge extremely quickly, especially for VICReg. For both VICReg and SimCLR, 100001000010000 samples are enough to obtain more than 95%percent9595% of the final rank. It is worth noting that the entropic rank estimator converges more slowly than the classical rank estimator, as it is more sensitive to the singular values. The fact that the rank can be approximated with few samples is encouraging for its use during training and not only as a measure of performance after pretraining.

As can be seen in Figures S13 and S12, the results that we obtain using the classical threshold-based rank estimator are extremely similar to the ones with the entropic estimator. The exact values do differ, but the behaviors stay the same. One of the main differences is illustrated in Figure S13, where we can see that the target rank is almost identical to the source one when we previously saw a drop of around 50%percent5050%. This can be explained by the fact that some features may be less present in the target dataset, reducing the associated singular values, and thus the entropic rank. All of this shows that using one or the other will lead to similar results in practical scenarios.

All pretrainings were done with ResNet-50 backbones. The projector used is a MLP with intermediate dimensions 8192,8192,20488192819220488192,8192,2048 (8192,8192,2048,32768819281922048327688192,8192,2048,32768 for DINO). VICReg, VICReg-ctr, VICReg-exp and SimCLR were trained with the LARS optimizer using a momentum of 0.90.90.9, weight decay 10−6superscript10610^{-6} and varying learning rates depending on the method. VICReg used 0.30.30.3 base learning rate, SimCLR 0.50.50.5 or 0.60.60.6 depending on the experiment, VICReg-exp 0.60.60.6 and VICReg-ctr 0.60.60.6. DINO was trained with AdamW (Loshchilov & Hutter, 2017) using a learning rate of 0.000250.000250.00025, using multi-crop 6 additional crops of size 96×96969696\times 96. The learning rate is then computed as l​r=b​a​s​e​_​l​r∗b​a​t​c​h​_​s​i​z​e/256𝑙𝑟𝑏𝑎𝑠𝑒_𝑙𝑟𝑏𝑎𝑡𝑐ℎ_𝑠𝑖𝑧𝑒256lr=base_lr*batch_size/256. We do a 101010-epochs linear warmup and then use cosine annealing. we use batch sizes of 204820482048 for SimCLR and 1024 for other methods. SimCLR and VICReg-ctr also use a default temperature of 0.150.150.15, and 0.10.10.1 for VICReg-exp. we use the image augmentation strategy from (Grill et al., 2020) illustrated in Table S1. For the pretrainings on iNaturalist-18, we use the same protocol but with a 300 epoch pretraining to account for its smaller size compared to ImageNet.

For all datasets except StanfordCars, we use the standard protocol in VISSL. On StanfordCars we mostly tuned the learning rate. The parameters that we use are described in Table S2. For data augmentation, we use random resized crops and random horizontal flips during training, and center crop for evaluation. For VOC07, we follow the common protocol using SVMs, as used in (Bardes et al., 2021). We use the default VISSL settings for this evaluation.

Table: S4.T1: Top-1 accuracies obtained on the embeddings by doing hyperparameter selection using ImageNet validation performance, α𝛼\alpha-ReQ or RankMe. OOD indicates the average performance over all the considered datasets other than ImageNet.

cov.inv.LRWDtemp.LR.WD.t-temp.s-temp.
ImageNetImageNet Oracle59.759.759.759.756.956.957.154.664.8
α𝛼\alpha-ReQ59.659.236.259.351.556.449.053.353.3
RankMe59.659.759.759.556.556.057.153.364.8
OODImageNet Oracle55.355.655.355.554.754.754.755.660.6
α𝛼\alpha-ReQ55.555.748.055.156.954.654.852.652.6
RankMe55.555.655.355.056.454.454.752.660.6

Table: S4.T2: Using RankMe on networks pretrained on iNat-18. We see than RankMe can improve OOD performance for VICReg, but leads to a small drop for SimCLR.

DatasetMethodCov.temp.
iNat-18iNat-18 Oracle36.9628.60
ImageNet Oracle35.6328.60
α𝛼\alpha-ReQ25.4322.94
RankMe36.8927.14
OODiNat-18 Oracle60.7058.23
ImageNet Oracle60.6558.23
α𝛼\alpha-ReQ56.5156.30
RankMe60.9157.34

Table: S4.T3: Using RankMe on finetuning based benchmarks. The ImageNet oracle is the linear evaluation oracle. In the semi-supervised setting we report the top-1 accuracy and report the AP50 for object detection. We see that in the semi-supervised setting on ImageNet RankMe only leads to small drops in performance compared to the task or full ImageNet Oracle. For object detection we even see matching or increased performance over the ImageNet oracle.

DatasetMethodCov.temp.
ImageNet-1%Task Oracle39.734.6
ImageNet Oracle39.731.3
α𝛼\alpha-ReQ39.227.3
RankMe38.730.9
ImageNet-10%Task Oracle62.762.6
ImageNet Oracle62.662.6
α𝛼\alpha-ReQ62.759.1
RankMe62.761.8
VOC07+12 (AP50)Task Oracle79.781.8
ImageNet Oracle78.281.0
α𝛼\alpha-ReQ79.080.3
RankMe79.781.0

Table: A9.T1: Table S1: Image augmentation parameters, taken from (Grill et al., 2020).

ParameterView 1View 2
Random crop probability1.01.01.01.01.01.0
Horizontal flip probability0.50.50.50.50.50.5
Color jittering probability0.80.80.80.80.80.8
Brightness adjustment max intensity0.40.40.40.40.40.4
Contrast adjustment max intensity0.40.40.40.40.40.4
Saturation adjustment max intensity0.20.20.20.20.20.2
Hue adjustment max intensity0.10.10.10.10.10.1
Grayscale probability0.20.20.20.20.20.2
Gaussian blurring probability1.01.01.00.10.10.1
Solarization probability.0.00.00.00.20.20.2

Table: A11.T8: Table S8: Top-1 on representations in all settings.

MethodRunImageNetiNat18Places205EuroSatSUN397Cars
VICReg063.9063.9063.9034.1234.1234.1248.7748.7748.7795.9495.9495.9465.9665.9665.9652.5652.5652.56
165.0865.0865.0835.6535.6535.6549.6849.6849.6896.1096.1096.1066.8766.8766.8754.4754.4754.47
265.6765.6765.6736.9736.9736.9749.7349.7349.7396.0296.0296.0267.3367.3367.3355.7655.7655.76
366.1766.1766.1737.2037.2037.2050.1350.1350.1396.1096.1096.1067.5567.5567.5556.3756.3756.37
466.4066.4066.4037.4237.4237.4250.1550.1550.1596.3496.3496.3467.9967.9967.9956.8656.8656.86
566.8366.8366.8338.0538.0538.0550.5350.5350.5396.0696.0696.0668.4068.4068.4057.6357.6357.63
667.3067.3067.3038.1338.1338.1350.9650.9650.9696.2096.2096.2068.0868.0868.0857.8357.8357.83
767.3467.3467.3438.2638.2638.2650.9650.9650.9696.3696.3696.3668.1968.1968.1958.8958.8958.89
868.0068.0068.0038.6838.6838.6851.2851.2851.2896.3696.3696.3668.4668.4668.4656.9056.9056.90
968.1668.1668.1638.3638.3638.3651.1751.1751.1796.2096.2096.2068.4268.4268.4255.7055.7055.70
1067.9167.9167.9137.7537.7537.7551.1451.1451.1496.1496.1496.1468.7568.7568.7554.2154.2154.21
1167.7767.7767.7736.7036.7036.7051.2051.2051.2096.0696.0696.0668.5768.5768.5751.0551.0551.05
1264.1264.1264.1231.3731.3731.3749.8349.8349.8395.5695.5695.5666.1766.1766.1742.5642.5642.56
1366.6766.6766.6734.8134.8134.8150.7650.7650.7695.6895.6895.6867.6167.6167.6147.3347.3347.33
1467.4967.4967.4936.9136.9136.9151.4051.4051.4096.1096.1096.1067.9567.9567.9551.7251.7251.72
1567.8767.8767.8737.1837.1837.1851.4051.4051.4096.0696.0696.0668.2668.2668.2654.0054.0054.00
1667.9967.9967.9938.7138.7138.7151.1151.1151.1196.1696.1696.1668.6868.6868.6856.0556.0556.05
1767.7867.7867.7838.5238.5238.5250.7950.7950.7996.3896.3896.3868.3968.3968.3957.1357.1357.13
1867.2567.2567.2538.0838.0838.0850.8550.8550.8596.3496.3496.3468.6968.6968.6956.2956.2956.29
1966.9566.9566.9537.9337.9337.9350.8850.8850.8896.0696.0696.0667.9867.9867.9857.6757.6757.67
2066.5166.5166.5137.7937.7937.7950.1150.1150.1196.1096.1096.1067.7467.7467.7457.2357.2357.23
2159.5459.5459.5428.8528.8528.8547.8047.8047.8095.1095.1095.1064.1464.1464.1443.1543.1543.15
2266.3666.3666.3635.4735.4735.4750.3250.3250.3296.0496.0496.0467.4567.4567.4551.6451.6451.64
2368.1668.1668.1638.3638.3638.3651.1751.1751.1796.2096.2096.2068.4268.4268.4255.7055.7055.70
2468.5668.5668.5638.8038.8038.8051.7551.7551.7596.3096.3096.3068.6068.6068.6055.7555.7555.75
2562.7762.7762.7731.8531.8531.8548.0248.0248.0295.7295.7295.7264.8264.8264.8243.2343.2343.23
2667.7967.7967.7938.2538.2538.2551.5751.5751.5796.0496.0496.0468.8468.8468.8455.3855.3855.38
2767.9767.9767.9738.2638.2638.2651.2951.2951.2996.1696.1696.1668.6268.6268.6255.5755.5755.57
2867.8767.8767.8738.4338.4338.4351.5151.5151.5196.0896.0896.0868.5268.5268.5254.5354.5354.53
2963.3663.3663.3638.3138.3138.3151.1751.1751.1796.0696.0696.0668.4368.4368.4355.0655.0655.06
3054.5254.5254.5237.9237.9237.9251.3251.3251.3296.1096.1096.1067.9967.9967.9954.8254.8254.82
3140.7340.7340.7337.0337.0337.0350.9750.9750.9796.3096.3096.3068.4068.4068.4052.2852.2852.28
VICReg-exp067.7467.7467.7437.5337.5337.5351.4451.4451.4496.3696.3696.3668.4168.4168.4152.1252.1252.12
167.6467.6467.6438.0038.0038.0051.4251.4251.4296.4696.4696.4668.6068.6068.6054.1654.1654.16
267.8467.8467.8438.2538.2538.2551.0751.0751.0796.4496.4496.4468.1268.1268.1255.9455.9455.94
365.0965.0965.0936.6436.6436.6449.6549.6549.6596.5496.5496.5467.1267.1267.1256.3756.3756.37
460.6760.6760.6731.2231.2231.2248.0448.0448.0495.8095.8095.8064.2864.2864.2846.9646.9646.96
557.4657.4657.4626.5426.5426.5446.2546.2546.2596.0296.0296.0262.3362.3362.3341.9041.9041.90
655.1255.1255.1224.7324.7324.7345.6845.6845.6895.4495.4495.4461.8261.8261.8239.7139.7139.71
764.8764.8764.8736.5136.5136.5149.6949.6949.6996.1696.1696.1666.8266.8266.8255.3055.3055.30
866.8466.8466.8438.2538.2538.2550.8550.8550.8596.2496.2496.2468.3468.3468.3457.1257.1257.12
968.0868.0868.0838.0338.0338.0351.3451.3451.3496.4096.4096.4069.2869.2869.2853.7253.7253.72
1067.8067.8067.8037.2037.2037.2051.5751.5751.5796.4696.4696.4668.6168.6168.6152.1552.1552.15
1166.6866.6866.6835.0235.0235.0250.9450.9450.9496.0096.0096.0067.8167.8167.8147.4947.4947.49

Refer to caption Top: Performance of JE-SSL representations (encoder output) in y-axis against the embeddings (projector output) RankMe values in x-axis on ImageNet-1k. Except for some degenerate solutions at full-rank, RankMe values correlate well with in-distribution (left column) and out-of-distribution (right columns) classification performance. Bottom: Hyperparameter selection using the common supervised linear probe strategy, α𝛼\alpha-ReQ the proposed unsupervised RankMe strategy. Values in bold represent the best performance between RankMe and α𝛼\alpha-ReQ. OOD indicates the average performance over all the considered datasets other than ImageNet. Without any label, optimization or parameters, RankMe is able to recover most of the performance obtained by using ImageNet validation set, highlighting its strength as a hyperparameter selection tool. RankMe also outperforms α𝛼\alpha-ReQ on average and does not suffer from as big performance drops in worst cases.

Refer to caption Validation of RankMe when evaluating performance on representations. We see that having a high rank is a necessary condition for good downstream performance.

Refer to caption Impact of rank on performance on other architectures and evaluation protocols. (Left) Using a 3 layer MLP as classification head does not alter the performance before or after the projector, showing that RankMe can go beyond linear evaluation. (Right) The same conclusion holds for k-NN evaluation on ImageNet, where RankMe remains a good indicator of performance.

Refer to caption Algorithm 1 Hyperparameter selection with RankMe

Refer to caption Validation of the hypotheses motivating RankMe.(Left, Middle Left) Embeddings’ rank transfers from source to target datasets. The estimates use 256002560025600 images from the respective datasets.(Middle Right) Train and test accuracy are highly correlated across datasets.(Right) An increase in performance on embeddings leads to an increase in performance on representations.

Refer to caption Figure S1: RankMe applied to iNaturalist18 pretrainings. The vertical line indicates the rank constraint placed by the representation size, and so any rank above should be counted as 2048.

Refer to caption Figure S2: DINO’s projection head can be split in two parts, a classical projector and a clustering layer (Left). Collapse happens before the clustering layer and not on the clustering prototypes (Right).

Refer to caption Figure S4: Link between embedding rank and downstream performance on the embeddings.

Refer to caption Figure S8: Validation of the power-law prior on un-collapsed representations.(Left) Overall visualization. (Right) Zoom on the high singular values.

Refer to caption Figure S10: Relationship between the two rank estimators, Pearson correlation coefficient of 0.990.990.99. Outliers indicate embeddings with singular values to the threshold, showing how the entropic rank takes into account this information.

Refer to caption Figure S11: Convergence of the rank estimators on ImageNet as a function of the number of samples for 2048 dimensional outputs, as indicated by the vertical line.

Refer to caption Figure S12: Reproduction of Figure 5 with the classical rank estimator. Embeddings’ rank transfers from source to target datasets. The estimates used 256002560025600 images from the respective datasets.

Refer to caption Figure S13: Reproduction of Figure 2 with the classical rank estimator.(Left) Validation of RankMe on embeddings, a higher ImageNet rank leads to improved performance across methods and datasets.(Right) Validation of RankMe on representations, where the link is even clearer, reinforcing RankMe’s practical use.

$$ \rank(\mZ_{\rm target})\propto \rank(\mZ_{\rm source}). $$

$$ v(\mZ) = \frac{1}{d} \sum_{i=1}^d \max\left(0,1-\sqrt{\text{Var}(Z_{\cdot,i})}\right) ; \text{and} ; c(\mZ) = \frac{1}{d} \sum_{i\neq j} \text{Cov}(\mZ)^2. $$

$$ \mathcal{L} = \lambda \sum_{(i,j)\in \sP}|z_i - z_j |_2^2 + \mu; c(\mZ) + \nu; v(\mZ). $$

$$ c_{exp}(\mZ) = \frac{1}{d}\sum_i \log\left(\sum_{j \neq i} e^{\text{Cov}(\mZ)_{i,j}/\tau} \right). $$

$$ &\text{\name{}}(\mZ) = \exp\left(-\sum_{k=1}^{\min(N,K)} p_k \log p_k\right),\label{eq:rank_smooth} \ & \text{with}\quad p_k = \frac{\sigma_k(\mZ)}{|\sigma(\mZ)|_1}+\epsilon, $$ \tag{eq:rank_smooth}

$$ \displaystyle\text{with}\quad p_{k}=\frac{\sigma_{k}({\bm{Z}})}{|\sigma({\bm{Z}})|_{1}}+\epsilon, $$

$$ \mathcal{L}=-\sum_{(i,j)\in \sP}\frac{e^{CoSim( \vz_i,\vz_j)}}{\sum_{k=1}^{N}\1_{{k\not = i}}e^{CoSim(\vz_i,\vz_k)}}, $$

Proposition. The maximum training accuracy of given embeddings in linear regression or classification increases with their rank. For classification, it plateaus when the rank surpasses the number of classes.

$$ |\mY-\mP |F^2 \geq \sum{r=R+1}^{C}\sigma^2_{r}(\mY) , $$

$$ \rank (\mP)\leq \min(\rank(\mZ),\rank(\mW))+1. $$

Algorithm: algorithm
[H]
\caption{Hyperparameter selection with \name{}}
\begin{algorithmic}[1]
\Require Models $f_1,\ldots,f_N$ to compare, in increasing value of the hyperparameter
\Require Corresponding ranks $r_1,\ldots,r_N$
\State $f_{best} \gets f_1$,\; $r_{best} \gets r_1$
\For{ $i = 2$ to $N$}
\If{$ r_i > r_{best}$}
\State $f_{best} \gets f_i$,\; $r_{best} \gets r_i$
\ElsIf{$ r_i = r_{best}$ and ($r_i > r_{i-1}$ or $r_i > r_{i+1}$)}
\State $f_{best} \gets f_i$,\; $r_{best} \gets r_i$
\EndIf
\EndFor
\State \Return $f_{best}$
\end{algorithmic}
Algorithm: algorithm
[1]
\Require Models $f_1,\ldots,f_N$ to compare, in increasing value of the hyperparameter
\Require Corresponding ranks $r_1,\ldots,r_N$
\State $f_{best} \gets f_1$,\; $r_{best} \gets r_1$
\For{ $i = 2$ to $N$}
\If{$ r_i > r_{best}$}
\State $f_{best} \gets f_i$,\; $r_{best} \gets r_i$
\ElsIf{$ r_i = r_{best}$ and ($r_i > r_{i-1}$ or $r_i > r_{i+1}$)}
\State $f_{best} \gets f_i$,\; $r_{best} \gets r_i$
\EndIf
\EndFor
\State \Return $f_{best}$
DatasetMethodLabelsVICRegVICRegVICRegVICRegSimCLRSimCLRSimCLRDINODINO
cov.inv.LRWDtemp.LR.WD.t-temp.s-temp.
ImageNetImageNet Oracle68.268.268.668.068.568.568.372.372.4
ImageNetα -ReQX67.967.559.567.863.568.132.371.766.2
ImageNetRankMeX67.867.968.267.867.168.068.372.272.4
OODImageNet Oracle68.768.768.968.868.768.768.871.972.5
OODα -ReQX68.167.863.868.465.168.268.671.868.5
OODRankMeX67.768.368.768.467.668.468.871.872.5
Algorithm 1 Hyperparameter selection with RankMeAlgorithm 1 Hyperparameter selection with RankMe
Require: Models f 1 , . . . , f N to compare, in increasing value of the hy- perparameterRequire: Models f 1 , . . . , f N to compare, in increasing value of the hy- perparameter
Require: Corresponding ranks r 1 , . . . , r NRequire: Corresponding ranks r 1 , . . . , r N
1:f best ← f 1 , r best ← r 1
2:for i = 2 to N do
3:if r i > r best then
4:f best ← f i , r best ← r i
5:else if r i = r best and ( r i > r i - 1 or r i > r i +1 ) then
6:f best ← f i , r best ← r i
7:return f best
DatasetMethodVICRegVICRegVICRegVICRegSimCLRSimCLRSimCLRDINODINO
DatasetMethodcov.inv.LRWDtemp.LR.WD.t-temp.s-temp.
ImageNet Oracle59.759.759.759.756.956.957.154.664.8
α -ReQ59.659.236.259.351.556.449.053.353.3
RankMe59.659.759.759.556.556.057.153.364.8
ImageNet Oracle55.355.655.355.554.754.754.755.660.6
α -ReQ55.555.748.055.156.954.654.852.652.6
RankMe55.555.655.355.056.454.454.752.660.6
DatasetMethodCov.temp.
iNat-18iNat-18 Oracle36.9628.60 28.60 22.94
iNat-18ImageNet Oracle35.63
iNat-18α -ReQ25.43
iNat-18RankMe36.8927.14
OODiNat-18 Oracle60.758.23
OODImageNet Oracle60.6558.23
OODα -ReQ56.5156.30
OODRankMe60.9157.34
DatasetMethodCov.temp.
ImageNet-1%Task Oracle ImageNet Oracle α -ReQ RankMe39.7 39.7 39.234.6 31.3 27.3
Task Oracle ImageNet Oracle α -ReQ62.7 62.6 62.762.6 62.6 59.1
ImageNet-10%38.730.9
VOC07+12 (AP50)Task Oracle ImageNet Oracle α -ReQ
VOC07+12 (AP50)RankMe62.761.8
VOC07+12 (AP50)79.781.8
78.281.0
79.080.3
RankMe79.781.0
DatasetMethodDINODINO
t-temp.s-temp.
ImageNetImageNet Oracle α -ReQ72.3 71.772.4 66.2
ImageNetRankMe-embs72.272.4
ImageNetRankMe-prots72.372.4
OODImageNet Oracle71.972.5
OODα -ReQ71.868.5
OODRankMe-embs71.872.5
OODRankMe-prots71.972.5
ParameterView 1View 2
Random crop probability1 . 01 . 0
Horizontal flip probability0 . 50 . 5
Color jittering probability0 . 80 . 8
Brightness adjustment max intensity0 . 40 . 4
Contrast adjustment max intensity0 . 40 . 4
Saturation adjustment max intensity0 . 20 . 2
Hue adjustment max intensity0 . 10 . 1
Grayscale probability0 . 20 . 2
Gaussian blurring probability1 . 00 . 1
Solarization probability.0 . 00 . 2
DatasetOptimizerWeight decayMomentumLearning rateEpochs
ImageNetSGD (w/ Nesterov)0 . 000040 . 90 . 330
iNaturalist18SGD (w/ Nesterov)0 . 00050 . 90 . 0184
Places205SGD (w/ Nesterov)0 . 00050 . 90 . 0114
EuroSatSGD (w/ Nesterov)0 . 00050 . 90 . 0128
Sun397SGD (w/ Nesterov)0 . 00050 . 90 . 0128
StanfordCarsSGD (w/ Nesterov)0 . 00050 . 90 . 128
CIFAR10SGD (w/ Nesterov)0 . 00050 . 90 . 0128
CIFAR100SGD (w/ Nesterov)0 . 00050 . 90 . 0128
CLEVR-countSGD (w/ Nesterov)0 . 00050 . 90 . 0150
Food101SGD (w/ Nesterov)0 . 00050 . 90 . 0128
VOC07N/A, see in textN/A, see in textN/A, see in textN/A, see in textN/A, see in text
DatasetMethodVICRegVICRegVICRegVICRegSimCLRSimCLRSimCLRDINODINO
DatasetMethodcov.inv.LRWDtemp.LR.WD.t-temp.s-temp.
ImageNet Oracle68 . 268 . 268 . 668 . 068 . 568 . 568 . 372 . 372 . 4
RankMe67 . 867 . 968 . 267 . 867 . 168 . 068 . 372 . 272 . 4
α -ReQ67 . 967 . 559 . 567 . 863 . 568 . 132 . 371 . 766 . 2
ImageNet Oracle38 . 438 . 438 . 838 . 339 . 239 . 238 . 945 . 846 . 3
RankMe36 . 737 . 238 . 438 . 337 . 838 . 138 . 946 . 046 . 3
α -ReQ37 . 836 . 928 . 938 . 334 . 138 . 438 . 745 . 139 . 2
ImageNet Oracle51 . 251 . 251 . 851 . 352 . 452 . 452 . 654 . 354 . 4
RankMe51 . 251 . 451 . 251 . 652 . 352 . 352 . 654 . 254 . 4
α -ReQ51 . 151 . 447 . 851 . 650 . 752 . 352 . 654 . 452 . 8
ImageNet Oracle96 . 296 . 296 . 396 . 296 . 596 . 596 . 496 . 696 . 6
RankMe96 . 196 . 196 . 296 . 096 . 696 . 496 . 496 . 396 . 6
α -ReQ96 . 196 . 195 . 196 . 096 . 496 . 696 . 296 . 895 . 9
ImageNet Oracle68 . 468 . 468 . 668 . 668 . 968 . 969 . 271 . 771 . 8
RankMe68 . 668 . 368 . 468 . 869 . 168 . 569 . 272 . 171 . 8
α -ReQ68 . 767 . 964 . 168 . 866 . 468 . 468 . 571 . 569 . 8
ImageNet Oracle55 . 755 . 755 . 855 . 654 . 454 . 454 . 965 . 166 . 0
RankMe51 . 154 . 055 . 755 . 451 . 553 . 954 . 965 . 866 . 0
α -ReQ54 . 251 . 743 . 255 . 445 . 254 . 354 . 763 . 554 . 5
ImageNet Oracle75 . 075 . 075 . 075 . 075 . 075 . 075 . 075 . 075 . 0
RankMe75 . 075 . 075 . 075 . 075 . 075 . 075 . 075 . 075 . 0
α -ReQ75 . 075 . 075 . 075 . 075 . 075 . 075 . 075 . 075 . 0
ImageNet Oracle84 . 384 . 384 . 384 . 084 . 584 . 583 . 988 . 488 . 0
RankMe84 . 183 . 884 . 384 . 083 . 883 . 983 . 988 . 388 . 0
ImageNet Oracle55 . 755 . 756 . 056 . 851 . 951 . 953 . 2
53 . 055 . 455 . 753 . 148 . 052 . 353 . 256 . 859 . 3
α -ReQ50 . 644 . 050 . 556 . 859 . 3
RankMe52 . 155 . 190 . 053 . 151 . 359 . 254 . 6
ImageNet Oracle90 . 190 . 189 . 890 . 690 . 690 . 391 . 592 . 2
RankMe89 . 589 . 890 . 189 . 789 . 490 . 690 . 390 . 592 . 2
72 . 372 . 8
72 . 372 . 172 . 273 . 173 . 7
ImageNet Oracle71 . 672 . 372 . 273 . 873 . 873 . 774 . 375 . 6
RankMe72 . 373 . 175 . 6
ImageNet Oracle68 . 768 . 768 . 968 . 768 . 768 . 768 . 872 . 072 . 5
RankMe67 . 768 . 368 . 768 . 367 . 568 . 468 . 871 . 872 . 5
67 . 863 . 468 .64 . 965 . 371 . 8
α -ReQ68 . 1368 . 268 . 3
MethodVICRegVICRegVICRegVICRegSimCLRSimCLRSimCLRDINODINO
Methodcov.inv.LRWDtemp.LR.WD.t-temp.s-temp.
ImageNet Oracle59 . 759 . 759 . 759 . 756 . 956 . 957 . 154 . 664 . 8
RankMe59 . 659 . 759 . 759 . 556 . 556 . 057 . 153 . 364 . 8
α -ReQ59 . 659 . 236 . 259 . 351 . 556 . 449 . 053 . 353 . 3
ImageNet Oracle13 . 514 . 213 . 513 . 610 . 310 . 310 . 15 . 015 . 8
RankMe14 . 214 . 213 . 513 . 416 . 79 . 910 . 13 . 615 . 8
α -ReQ14 . 214 . 82 . 513 . 221 . 510 . 010 . 03 . 63 . 6
ImageNet Oracle42 . 743 . 342 . 743 . 441 . 241 . 241 . 238 . 944 . 9
RankMe43 . 243 . 342 . 742 . 743 . 440 . 841 . 236 . 444 . 9
α -ReQ43 . 243 . 629 . 642 . 942 . 641 . 041 . 536 . 436 . 4
ImageNet Oracle91 . 391 . 791 . 391 . 090 . 490 . 489 . 591 . 393 . 2
RankMe91 . 091 . 791 . 391 . 392 . 389 . 089 . 589 . 393 . 2
α -ReQ91 . 091 . 485 . 190 . 894 . 489 . 689 . 889 . 389 . 3
ImageNet Oracle57 . 357 . 057 . 357 . 356 . 456 . 456 . 254 . 563 . 7
RankMe57 . 457 . 057 . 356 . 759 . 155 . 456 . 250 . 063 . 7
α -ReQ57 . 457 . 442 . 557 . 259 . 956 . 256 . 250 . 050 . 0
ImageNet Oracle12 . 012 . 012 . 011 . 914 . 014 . 013 . 210 . 322 . 0
RankMe11 . 612 . 012 . 011 . 517 . 613 . 413 . 27 . 722 . 0
α -ReQ11 . 612 . 07 . 511 . 321 . 313 . 913 . 57 . 77 . 7
ImageNet Oracle75 . 075 . 075 . 075 . 075 . 075 . 075 . 075 . 075 . 0
RankMe75 . 075 . 075 . 075 . 075 . 075 . 075 . 075 . 075 . 0
α -ReQ75 . 075 . 075 . 075 . 075 . 075 . 075 . 075 . 075 . 0
ImageNet Oracle79 . 579 . 279 . 579 . 779 . 779 . 779 . 785 . 387 . 0
RankMe79 . 279 . 279 . 579 . 378 . 579 . 379 . 784 . 287 . 0
α -ReQ79 . 279 . 273 . 179 . 676 . 879 . 579 . 984 . 284 . 2
ImageNet Oracle43 . 944 . 443 . 946 . 143 . 543 . 546 . 049 . 951 . 2
RankMe43 . 944 . 443 . 943 . 043 . 044 . 846 . 041 . 951 . 2
α -ReQ43 . 943 . 841 . 744 . 937 . 045 . 245 . 941 . 941 . 9
ImageNet Oracle80 . 481 . 280 . 479 . 779 . 379 . 379 . 884 . 387 . 0
RankMe80 . 681 . 280 . 480 . 379 . 579 . 579 . 880 . 787 . 0
ImageNet Oracle52 . 853 . 352 . 852 . 952 . 658 . 5
65 . 4
53 . 853 . 941 . 554 . 0 56 . 552 . 652 . 265 . 4
RankMe α -ReQ53 . 853 . 352 . 852 . 5 52 . 252 . 2 52 . 052 . 2 52 . 353 . 0 53 . 053 . 0
ImageNet Oracle55 . 355 . 555 . 355 . 554 . 554 . 554 . 555 . 260 . 9
RankMe55 . 455 . 555 . 355 . 056 . 054 . 154 . 552 . 360 . 9
α -ReQ55 . 455 . 646 . 155 . 156 . 054 . 353 . 952 . 352 . 3
MethodRunBatch sizeLearning rateWeight decayLoss hyperparameters
010240 . 310 - 6λ : 25 , µ : 25 , ν : 0 . 3
110240 . 310 - 6λ : 25 , µ : 25 , ν : 0 . 4
210240 . 310 - 6λ : 25 , µ : 25 , ν : 0 . 5
310240 . 310 - 6λ : 25 , µ : 25 , ν : 0 . 6
410240 . 310 - 6λ : 25 , µ : 25 , ν : 0 . 7
510240 . 310 - 6λ : 25 , µ : 25 , ν : 0 . 8
610240 . 310 - 6λ : 25 , µ : 25 , ν : 0 . 9
710240 . 310 - 6λ : 25 , µ : 25 , ν : 1
810240 . 310 - 6λ : 25 , µ : 25 , ν : 2
910240 . 310 - 6λ : 25 , µ : 25 , ν : 4
1010240 . 310 - 6λ : 25 , µ : 25 , ν : 8
1110240 . 310 - 6λ : 25 , µ : 25 , ν : 16
1210240 . 310 - 6λ : 5 , µ : 25 , ν : 4
1310240 . 310 - 6λ : 10 , µ : 25 , ν : 4
1410240 . 310 - 6λ : 15 , µ : 25 , ν : 4
1510240 . 310 - 6λ : 20 , µ : 25 , ν : 4
VICReg1610240 . 310 - 6λ : 30 , µ : 25 , ν : 4
1710240 . 310 - 6λ : 35 , µ : 25 , ν : 4
1810240 . 310 - 6λ : 40 , µ : 25 , ν : 4
1910240 . 310 - 6λ : 45 , µ : 25 , ν : 4
2010240 . 310 - 6λ : 50 , µ : 25 , ν : 4
2110240 . 110 - 6λ : 25 , µ : 25 , ν : 4
2210240 . 210 - 6λ : 25 , µ : 25 , ν : 4
2310240 . 310 - 6λ : 25 , µ : 25 , ν : 4
2410240 . 410 - 6λ : 25 , µ : 25 , ν : 4
2510240 . 510 - 6λ : 25 , µ : 25 , ν : 4
2610240 . 310 - 7λ : 25 , µ : 25 , ν : 4
2710240 . 310 - 6λ : 25 , µ : 25 , ν : 4
2810240 . 310 - 5λ : 25 , µ : 25 , ν : 4
2910240 . 310 - 4λ : 25 , µ : 25 , ν : 4
3010240 . 310 - 3λ : 25 , µ : 25 , ν : 4
3110240 . 310 - 2λ : 25 , µ : 25 , ν : 4
010240 . 510 - 6λ : 1 , µ : 1 , ν : 2 , τ : 0 . 05
110240 . 510 - 6λ : 1 , µ : 1 , ν : 2 , τ : 0 . 07
210240 . 510 - 6λ : 1 , µ : 1 , ν : 2 , τ : 0 . 1
310240 . 510 - 6λ : 1 , µ : 1 , ν : 2 , τ : 0 . 2
410240 . 510 - 6λ : 1 , µ : 1 , ν : 2 , τ : 0 . 3
510240 . 510 - 6λ : 1 , µ : 1 , ν : 2 , τ : 0 . 4
VICReg-exp610240 . 510 - 6λ : 1 , µ : 1 , ν : 0 . 1 , τ : 0 . 1
710240 . 510 - 6λ : 1 , µ : 1 , ν : 0 . 5 , τ : 0 . 1
810240 . 510 - 6λ : 1 , µ : 1 , ν : 1 , τ : 0 . 1
910240 . 510 - 6λ : 1 , µ : 1 , ν : 4 , τ : 0 . 1
1010240 . 510 - 6λ : 1 , µ : 1 , ν : 8 , τ : 0 . 1
1110240 . 510 - 6λ : 1 , µ : 1 , ν : 16 , τ : 0 . 1
MethodRunBatch sizeLearning rateWeight decayLoss hyperparameters
010240 . 510 - 6λ : 1 , µ : 1 , ν : 1 , τ : 0 . 05
110240 . 510 - 6λ : 1 , µ : 1 , ν : 1 , τ : 0 . 07
210240 . 510 - 6λ : 1 , µ : 1 , ν : 1 , τ : 0 . 1
310240 . 510 - 6λ : 1 , µ : 1 , ν : 1 , τ : 0 . 2
410240 . 510 - 6λ : 1 , µ : 1 , ν : 1 , τ : 0 . 3
VICReg-ctr510240 . 510 - 6λ : 1 , µ : 1 , ν : 1 , τ : 0 . 4
610240 . 510 - 6λ : 1 , µ : 1 , ν : 0 . 1 , τ : 0 . 1
710240 . 510 - 6λ : 1 , µ : 1 , ν : 0 . 5 , τ : 0 . 1
810240 . 510 - 6λ : 1 , µ : 1 , ν : 2 , τ : 0 . 1
910240 . 510 - 6λ : 1 , µ : 1 , ν : 4 , τ : 0 . 1
1010240 . 510 - 6λ : 1 , µ : 1 , ν : 8 , τ : 0 . 1
020480 . 610 - 6d : 512 , τ : 0 . 05
120480 . 610 - 6d : 512 , τ : 0 . 07
220480 . 610 - 6d : 512 , τ : 0 . 1
320480 . 610 - 6d : 512 , τ : 0 . 2
420480 . 610 - 6d : 512 , τ : 0 . 3
520480 . 610 - 6d : 512 , τ : 0 . 4
620480 . 610 - 6d : 2048 , τ : 0 . 05
720480 . 610 - 6d : 2048 , τ : 0 . 07
820480 . 610 - 6d : 2048 , τ : 0 . 1
920480 . 610 - 6d : 2048 , τ : 0 . 2
1020480 . 610 - 6d : 2048 , τ : 0 . 3
1120480 . 610 - 6d : 2048 , τ : 0 . 4
1220480 . 510 - 6d : 2048 , τ : 0 . 05
1320480 . 510 - 6d : 2048 , τ : 0 . 07
1420480 . 510 - 6d : 2048 , τ : 0 . 1
SimCLR1520480 . 510 - 6d : 2048 , τ : 0 . 15
1620480 . 510 - 6d : 2048 , τ : 0 . 2
1720480 . 510 - 6d : 2048 , τ : 0 . 3
1820480 . 510 - 6d : 2048 , τ : 0 . 4
1920480 . 510 - 7d : 2048 , τ : 0 . 15
2020480 . 510 - 6d : 2048 , τ : 0 . 15
2120480 . 510 - 5d : 2048 , τ : 0 . 15
2220480 . 510 - 4d : 2048 , τ : 0 . 15
2320480 . 510 - 3d : 2048 , τ : 0 . 15
2420480 . 510 - 2d : 2048 , τ : 0 . 15
2520480 . 210 - 6d : 2048 , τ : 0 . 15
2620480 . 310 - 6d : 2048 , τ : 0 . 15
2720480 . 410 - 6d : 2048 , τ : 0 . 15
2820480 . 510 - 6d : 2048 , τ : 0 . 15
2920480 . 610 - 6d : 2048 , τ : 0 . 15
3020480 . 810 - 6d : 2048 , τ : 0 . 15
MethodRunBatch sizeLearning rateWeight decayLoss hyperparameters
DINO01024 2 . 5 × 10 - 410 - 6 10 - 610 - 6 10 - 6 10 - 6 10 - 6 10 - 6 10 - 6τ t : 0 . 01 , τ t : 0 . 02 , τ t : 0 . 04 , τ t : 0 . 06 , τ t : 0 . 07 , τ t : 0 . 04 , τ s : 0 . 07 , τ t : 0 . 04 , τ s : 0 . 2 , τ t : 0 . 04 , τ s : 0 . 3 ,
DINO110242 . 5 × 10 - 4
DINO210242 . 5 × 10 - 4
DINO310242 . 5 × 10 - 4
DINO410242 . 5 × 10 - 4
DINO510242 . 5 × 10 - 4
DINO610242 . 5 × 10 - 4
DINO710242 . 5 × 10 - 4
DINO810242 . 5 × 10 - 410 - 6τ t : 0 . 04 , τ s : 0 . 4 ,
MethodRunImageNetiNat18Places205EuroSatSUN397Cars
063 . 9034 . 1248 . 7795 . 9465 . 9652 . 56
165 . 0835 . 6549 . 6896 . 1066 . 8754 . 47
265 . 6736 . 9749 . 7396 . 0267 . 3355 . 76
366 . 1737 . 2050 . 1396 . 1067 . 5556 . 37
466 . 4037 . 4250 . 1596 . 3467 . 9956 . 86
566 . 8338 . 0550 . 5396 . 0668 . 4057 . 63
667 . 3038 . 1350 . 9696 . 2068 . 0857 . 83
767 . 3438 . 2650 . 9696 . 3668 . 1958 . 89
868 . 0038 . 6851 . 2896 . 3668 . 4656 . 90
968 . 1638 . 3651 . 1796 . 2068 . 4255 . 70
1067 . 9137 . 7551 . 1496 . 1468 . 7554 . 21
1167 . 7736 . 7051 . 2096 . 0668 . 5751 . 05
1264 . 1231 . 3749 . 8395 . 5666 . 1742 . 56
1366 . 6734 . 8150 . 7695 . 6867 . 6147 . 33
1467 . 4936 . 9151 . 4096 . 1067 . 9551 . 72
1567 . 8737 . 1851 . 4096 . 0668 . 2654 . 00
VICReg1667 . 9938 . 7151 . 1196 . 1668 . 6856 . 05
1767 . 7838 . 5250 . 7996 . 3868 . 3957 . 13
1867 . 2538 . 0850 . 8596 . 3468 . 6956 . 29
1966 . 9537 . 9350 . 8896 . 0667 . 9857 . 67
2066 . 5137 . 7950 . 1196 . 1067 . 7457 . 23
2159 . 5428 . 8547 . 8095 . 1064 . 1443 . 15
2266 . 3635 . 4750 . 3296 . 0467 . 4551 . 64
2368 . 1638 . 3651 . 1796 . 2068 . 4255 . 70
2468 . 5638 . 8051 . 7596 . 3068 . 6055 . 75
2562 . 7731 . 8548 . 0295 . 7264 . 8243 . 23
2667 . 7938 . 2551 . 5796 . 0468 . 8455 . 38
2767 . 9738 . 2651 . 2996 . 1668 . 6255 . 57
2867 . 8738 . 4351 . 5196 . 0868 . 5254 . 53
2963 . 3638 . 3151 . 1796 . 0668 . 4355 . 06
3054 . 5237 . 9251 . 3296 . 1067 . 9954 . 82
3140 . 7337 . 0350 . 9796 . 3068 . 4052 . 28
067 . 7437 . 5351 . 4496 . 3668 . 4152 . 12
167 . 6438 . 0051 . 4296 . 4668 . 6054 . 16
267 . 8438 . 2551 . 0796 . 4468 . 1255 . 94
65 . 0936 . 6449 . 6567 . 1256 . 37
3 431 . 2248 . 0496 . 54 95 . 8064 . 2846 . 96
560 . 67 57 . 4626 . 5446 . 2596 . 0262 . 3341 . 90
VICReg-exp655 . 1224 . 7345 . 6895 . 4461 . 8239 . 71
764 . 8736 . 5149 . 6996 . 1666 . 8255 . 30
968 . 0838 . 0351 . 3496 . 4069 . 2853 . 72
1067 . 8037 . 2051 . 5796 . 4668 . 6152 . 15
1166 . 6835 . 0250 . 9496 . 0067 . 8147 . 49
MethodRunImageNetiNat18Places205EuroSatSUN397Cars
065 . 5435 . 0050 . 1595 . 8867 . 6249 . 63
166 . 3235 . 7250 . 6996 . 1068 . 1651 . 66
266 . 0935 . 2650 . 8096 . 4268 . 3250 . 72
364 . 0633 . 1650 . 4895 . 9867 . 4044 . 91
462 . 0630 . 8049 . 5396 . 2266 . 0843 . 24
VICReg-ctr560 . 1728 . 7648 . 7895 . 9064 . 9241 . 13
661 . 6631 . 0549 . 4096 . 1066 . 4744 . 12
765 . 4734 . 6350 . 7196 . 2067 . 5548 . 05
865 . 9934 . 7750 . 6395 . 9867 . 6051 . 14
963 . 8733 . 6349 . 6496 . 2066 . 1750 . 35
1058 . 8129 . 2447 . 6095 . 7863 . 7746 . 23
057 . 6830 . 5048 . 5196 . 3263 . 3642 . 42
162 . 7933 . 5050 . 5696 . 1866 . 3443 . 76
266 . 1335 . 9452 . 1096 . 2268 . 2949 . 17
366 . 3535 . 6051 . 9696 . 6468 . 1749 . 68
465 . 1734 . 3851 . 3296 . 1067 . 7848 . 17
563 . 5433 . 2950 . 7196 . 2267 . 3948 . 31
657 . 8430 . 8248 . 6496 . 3464 . 0741 . 97
762 . 7333 . 3050 . 5796 . 5666 . 0344 . 99
866 . 3036 . 2551 . 7996 . 4067 . 9948 . 95
966 . 7136 . 5651 . 8296 . 5268 . 5250 . 47
1065 . 2934 . 9051 . 3296 . 3067 . 4049 . 16
1163 . 5233 . 3550 . 9296 . 4266 . 8948 . 59
1259 . 4931 . 1348 . 8095 . 9464 . 1142 . 46
1363 . 5134 . 1450 . 7596 . 4266 . 4445 . 18
1467 . 1437 . 8052 . 2996 . 6269 . 0651 . 47
SimCLR1568 . 4839 . 2052 . 3796 . 4668 . 9254 . 43
1668 . 2738 . 4852 . 2996 . 4669 . 1955 . 22
1767 . 4837 . 0751 . 7296 . 5868 . 3051 . 92
1866 . 4435 . 8751 . 5896 . 4468 . 1549 . 76
1968 . 3338 . 9352 . 5696 . 4069 . 2154 . 86
2068 . 1339 . 0952 . 4296 . 4269 . 1554 . 83
2166 . 4738 . 8052 . 8196 . 5869 . 0355 . 19
2259 . 6238 . 8652 . 6996 . 6269 . 0755 . 47
2347 . 5839 . 0352 . 7096 . 1668 . 7754 . 96
2432 . 2738 . 7052 . 6296 . 1868 . 5354 . 67
2566 . 3736 . 0651 . 6296 . 8468 . 2252 . 17
2667 . 9638 . 1252 . 3396 . 4468 . 5453 . 86
2768 . 3238 . 4452 . 4296 . 8069 . 0854 . 63
2868 . 4839 . 2052 . 3796 . 4668 . 9254 . 43
2968 . 4138 . 7752 . 4296 . 2468 . 6555 . 81
3068 . 1238 . 4568 . 4154 . 30
52 . 3396 . 64
Table S10. Top-1 on representations in all settings, continued.Table S10. Top-1 on representations in all settings, continued.Table S10. Top-1 on representations in all settings, continued.Table S10. Top-1 on representations in all settings, continued.Table S10. Top-1 on representations in all settings, continued.Table S10. Top-1 on representations in all settings, continued.Table S10. Top-1 on representations in all settings, continued.Table S10. Top-1 on representations in all settings, continued.
MethodRunImageNetiNat18Places205EuroSatSUN397Cars
070 . 7444 . 3053 . 4596 . 6071 . 2364 . 03
171 . 2945 . 3754 . 1096 . 4671 . 4064 . 73
272 . 1945 . 9654 . 1896 . 3272 . 1265 . 81
372 . 3045 . 8054 . 2596 . 6071 . 6965 . 09
DINO471 . 6845 . 0654 . 4096 . 8471 . 5563 . 46
572 . 4146 . 3254 . 3796 . 5871 . 8465 . 97
669 . 2342 . 1852 . 9396 . 3070 . 3556 . 36
766 . 1839 . 2452 . 7795 . 8869 . 7854 . 48
864 . 1037 . 8351 . 6095 . 8668 . 7151 . 82
MethodRunImageNetCIFAR10CIFAR100FOOD101VOC07CLEVR-count
063 . 9088 . 9469 . 9275 . 0081 . 4948 . 94
165 . 0888 . 7469 . 9675 . 0082 . 1449 . 88
265 . 6788 . 2570 . 2375 . 0082 . 3554 . 96
366 . 1789 . 1771 . 5175 . 0082 . 9752 . 35
466 . 4089 . 4171 . 7075 . 0182 . 8155 . 27
566 . 8389 . 9172 . 1275 . 0083 . 1055 . 95
667 . 3090 . 1171 . 9075 . 0183 . 1554 . 37
767 . 3490 . 3472 . 4275 . 0083 . 2153 . 92
868 . 0089 . 7972 . 7375 . 0083 . 7749 . 75
968 . 1690 . 1472 . 2675 . 0184 . 2755 . 69
1067 . 9189 . 6772 . 3975 . 0083 . 9952 . 10
1167 . 7789 . 4571 . 6375 . 0084 . 1053 . 05
1264 . 1286 . 6867 . 0275 . 0082 . 4451 . 46
1366 . 6788 . 3269 . 8675 . 0083 . 5055 . 48
1467 . 4989 . 2271 . 0175 . 0183 . 8555 . 05
1567 . 8789 . 8272 . 3075 . 0083 . 7655 . 36
VICReg1667 . 9990 . 2972 . 8175 . 0083 . 9055 . 00
1767 . 7890 . 0973 . 1475 . 0083 . 7451 . 97
1867 . 2590 . 4072 . 7575 . 0083 . 3653 . 18
1966 . 9589 . 6272 . 1475 . 0082 . 9950 . 33
2066 . 5189 . 9472 . 4175 . 0082 . 8952 . 83
2159 . 5486 . 8166 . 2375 . 0080 . 4450 . 64
2266 . 3688 . 9271 . 0575 . 0082 . 9456 . 19
2368 . 1690 . 1472 . 2675 . 0184 . 2755 . 69
2468 . 5689 . 9572 . 8075 . 0084 . 2756 . 03
2562 . 7787 . 6367 . 9274 . 9982 . 3852 . 63
2667 . 7989 . 7072 . 1175 . 0083 . 9853 . 08
2767 . 9789 . 8372 . 2275 . 0084 . 0556 . 77
2867 . 8790 . 2372 . 1375 . 0083 . 7256 . 20
2963 . 3689 . 7672 . 3675 . 0084 . 0454 . 71
3054 . 5289 . 5871 . 8975 . 0084 . 1453 . 45
3140 . 7389 . 6571 . 5075 . 0183 . 9756 . 34
067 . 7489 . 6672 . 1775 . 0084 . 6752 . 79
167 . 6490 . 1272 . 3075 . 0084 . 5855 . 29
267 . 8489 . 5572 . 0775 . 0084 . 2053 . 45
365 . 0989 . 1871 . 5575 . 0082 . 1954 . 41
460 . 6788 . 0668 . 9475 . 0080 . 2051 . 35
557 . 4686 . 7065 . 0975 . 0078 . 5449 . 30
VICReg-exp655 . 1287 . 2365 . 5375 . 0077 . 8753 . 14
64 . 8771 . 3875 . 0082 . 3949 . 13
7 866 . 8489 . 28 89 . 6671 . 9275 . 0083 . 9150 . 41
968 . 0889 . 5675 . 0084 . 6456 . 00
1067 . 8089 . 5071 . 9075 . 0084 . 4555 . 80
71 . 61
1166 . 6888 . 9970 . 1375 . 0084 . 2955 . 87
MethodRunImageNetCIFAR10CIFAR100FOOD101VOC07CLEVR-count
065 . 5488 . 8770 . 7775 . 0183 . 2853 . 97
166 . 3289 . 5770 . 9375 . 0084 . 1753 . 19
266 . 0989 . 4971 . 1775 . 0083 . 9053 . 29
364 . 0689 . 6271 . 3975 . 0083 . 1848 . 57
462 . 0688 . 6069 . 4175 . 0082 . 3546 . 48
VICReg-ctr560 . 1788 . 9768 . 6175 . 0081 . 4351 . 27
665 . 4789 . 6571 . 6275 . 0184 . 0951 . 07
765 . 9988 . 9770 . 4075 . 0083 . 6946 . 92
863 . 8788 . 5169 . 0275 . 0082 . 9951 . 36
958 . 8186 . 9666 . 0675 . 0079 . 9555 . 33
057 . 6886 . 3166 . 6975 . 0077 . 5637 . 65
162 . 7987 . 1568 . 7175 . 0080 . 9850 . 26
266 . 1389 . 1971 . 1375 . 0083 . 5747 . 75
366 . 3589 . 9972 . 4475 . 0084 . 2554 . 73
465 . 1789 . 8972 . 1875 . 0183 . 9950 . 78
563 . 5489 . 5071 . 0975 . 0183 . 3752 . 87
657 . 8486 . 4466 . 4275 . 0077 . 0142 . 72
762 . 7387 . 5768 . 3375 . 0081 . 2645 . 19
866 . 3089 . 0771 . 5575 . 0083 . 6152 . 95
966 . 7190 . 1272 . 5275 . 0084 . 1752 . 93
1065 . 2989 . 4471 . 6275 . 0083 . 8154 . 83
1163 . 5289 . 3270 . 8875 . 0083 . 3948 . 44
1259 . 4986 . 4166 . 4575 . 0077 . 9850 . 64
1363 . 5187 . 9869 . 5375 . 0081 . 1944 . 03
1467 . 1489 . 4072 . 2075 . 0183 . 8047 . 97
SimCLR1568 . 4890 . 5773 . 7875 . 0084 . 5451 . 91
1668 . 2790 . 3473 . 6375 . 0184 . 4850 . 11
1767 . 4890 . 0472 . 8175 . 0084 . 3147 . 31
1866 . 4489 . 8072 . 0275 . 0084 . 3549 . 94
1968 . 3390 . 2973 . 6575 . 0083 . 9553 . 17
2068 . 1390 . 6773 . 8575 . 0084 . 6154 . 20
2166 . 4790 . 3373 . 3975 . 0084 . 2255 . 01
2259 . 6290 . 5373 . 6375 . 0084 . 6150 . 53
2347 . 5890 . 2972 . 9975 . 0084 . 4448 . 27
2432 . 2790 . 2973 . 9675 . 0084 . 4251 . 33
2566 . 3789 . 9573 . 1675 . 0083 . 0150 . 75
2667 . 9690 . 6573 . 1375 . 0083 . 9452 . 29
2768 . 3290 . 2173 . 1375 . 0084 . 3153 . 57
2868 . 4890 . 5775 . 0051 . 91
2968 . 4190 . 1773 . 7884 . 5452 . 97
73 . 3575 . 0084 . 27
3068 . 1289 . 6772 . 5375 . 0184 . 2750 . 47
MethodRunImageNetCIFAR10CIFAR100FOOD101VOC07CLEVR-count
DINO070 . 7491 . 8774 . 7175 . 0087 . 5051 . 67
DINO171 . 2991 . 8174 . 3375 . 0087 . 9957 . 19
DINO272 . 1990 . 5173 . 0975 . 0188 . 3556 . 78
DINO372 . 3091 . 4674 . 3475 . 0088 . 3956 . 75
471 . 6890 . 8972 . 8275 . 0088 . 4859 . 19
572 . 4192 . 2475 . 5875 . 0088 . 0459 . 29
669 . 2390 . 3671 . 8675 . 0087 . 6153 . 87
766 . 1887 . 9768 . 1875 . 0086 . 7254 . 56
864 . 1086 . 8066 . 6875 . 0185 . 3855 . 23
MethodRunImageNetiNat18Places205EuroSatSUN397Cars
026 . 350 . 9521 . 4865 . 1031 . 235 . 10
130 . 541 . 3921 . 0763 . 8031 . 605 . 24
236 . 921 . 8524 . 7769 . 9235 . 245 . 12
347 . 604 . 3435 . 4887 . 8447 . 627 . 56
451 . 266 . 1436 . 4388 . 5849 . 298 . 82
554 . 397 . 8738 . 0488 . 4451 . 889 . 76
655 . 668 . 7638 . 9789 . 4253 . 1410 . 31
756 . 339 . 5639 . 6589 . 7653 . 3910 . 53
858 . 6512 . 0841 . 7390 . 8856 . 3511 . 75
959 . 7113 . 4742 . 7291 . 2857 . 2611 . 95
1059 . 5814 . 1843 . 2290 . 9657 . 4311 . 58
1159 . 2214 . 6343 . 4891 . 3457 . 7511 . 99
1253 . 7812 . 8042 . 3592 . 2255 . 4110 . 46
1357 . 9414 . 3643 . 6191 . 6057 . 8911 . 30
1459 . 2014 . 7743 . 6391 . 4057 . 3612 . 00
1559 . 7314 . 2043 . 3191 . 6657 . 0411 . 95
VICReg1659 . 0912 . 3542 . 2190 . 3456 . 2111 . 55
1758 . 2311 . 3441 . 0290 . 1654 . 9711 . 69
1856 . 8210 . 1540 . 1989 . 9454 . 5010 . 76
1955 . 229 . 2639 . 0290 . 0053 . 0710 . 98
2053 . 758 . 2937 . 8789 . 7652 . 1610 . 60
2151 . 5312 . 9740 . 8491 . 6454 . 2613 . 13
2257 . 5713 . 6042 . 4091 . 6456 . 4312 . 47
2359 . 7113 . 4742 . 7291 . 2857 . 2611 . 95
2456 . 229 . 9240 . 1888 . 3653 . 698 . 46
2536 . 222 . 4829 . 5985 . 0642 . 467 . 55
2659 . 3313 . 2242 . 8690 . 7657 . 2111 . 28
2759 . 5113 . 3742 . 6991 . 3456 . 6611 . 53
2859 . 7013 . 6443 . 3790 . 9657 . 3211 . 89
2959 . 0314 . 0043 . 1091 . 5057 . 4412 . 27
3056 . 3714 . 1043 . 2391 . 3657 . 6912 . 52
3149 . 9612 . 3641 . 9391 . 5257 . 1311 . 37
058 . 1912 . 5641 . 9391 . 8257 . 1310 . 19
158 . 5312 . 1042 . 0391 . 6456 . 8510 . 96
257 . 4110 . 7840 . 8990 . 4455 . 2410 . 73
347 . 805 . 0234 . 6788 . 9247 . 148 . 39
425 . 141 . 0222 . 0477 . 2232 . 665 . 17
519 . 240 . 7520 . 0875 . 7030 . 225 . 77
VICReg-exp612 . 030 . 5116 . 9570 . 1826 . 064 . 15
747 . 334 . 7846 . 908 . 87
33 . 8887 . 32
958 . 8712 . 2242 . 2791 . 3456 . 9710 . 93
1058 . 0912 . 7042 . 5656 . 9010 . 42
1157 . 2414 . 0143 . 1891 . 58 92 . 3857 . 3611 . 18
MethodRunImageNetiNat18Places205EuroSatSUN397Cars
050 . 2610 . 8338 . 5489 . 5452 . 7311 . 39
150 . 999 . 8138 . 4390 . 0653 . 9010 . 53
248 . 277 . 6736 . 7888 . 0251 . 4510 . 21
336 . 773 . 2030 . 0184 . 1243 . 856 . 68
425 . 921 . 5123 . 9377 . 5436 . 574 . 46
VICReg-ctr517 . 690 . 7018 . 1969 . 0028 . 303 . 73
626 . 901 . 6524 . 6975 . 4636 . 724 . 86
744 . 315 . 8134 . 8287 . 8449 . 239 . 25
849 . 318 . 7137 . 6489 . 1651 . 9810 . 78
946 . 438 . 3836 . 1189 . 4450 . 2710 . 25
1038 . 336 . 2132 . 1086 . 4444 . 078 . 97
043 . 0520 . 4837 . 8394 . 1255 . 2022 . 57
149 . 6921 . 2841 . 8293 . 4858 . 2321 . 38
254 . 4517 . 4942 . 9291 . 8857 . 8616 . 55
350 . 248 . 3639 . 2288 . 4251 . 9411 . 58
445 . 776 . 3636 . 5587 . 1648 . 5910 . 20
541 . 144 . 8134 . 0484 . 7645 . 369 . 10
643 . 3120 . 5138 . 1694 . 4455 . 4023 . 24
749 . 6021 . 5141 . 9393 . 8859 . 0522 . 43
854 . 4817 . 9243 . 0092 . 8659 . 5318 . 22
950 . 728 . 6539 . 6489 . 4454 . 4712 . 96
1043 . 325 . 8436 . 5187 . 6250 . 8511 . 33
1141 . 155 . 0234 . 2685 . 5248 . 1610 . 87
1244 . 6121 . 3339 . 2094 . 0856 . 1522 . 82
1351 . 5421 . 5042 . 6494 . 4059 . 8721 . 30
1456 . 5116 . 6843 . 3992 . 2659 . 1017 . 56
SimCLR1556 . 8910 . 3541 . 2190 . 3856 . 3714 . 04
1654 . 187 . 1639 . 4287 . 9454 . 2411 . 47
1749 . 194 . 9837 . 1287 . 1450 . 8510 . 33
1844 . 723 . 8935 . 1186 . 0048 . 389 . 49
1957 . 0610 . 1241 . 1889 . 5256 . 2013 . 19
2056 . 7210 . 1741 . 3890 . 0856 . 4013 . 23
2156 . 1410 . 2641 . 4789 . 1256 . 7413 . 18
2248 . 9810 . 0341 . 4889 . 8456 . 1513 . 47
2335 . 9210 . 0341 . 3389 . 7456 . 3813 . 87
2428 . 269 . 6841 . 2286 . 8856 . 0213 . 13
2552 . 659 . 9840 . 4790 . 2055 . 3813 . 72
2655 . 989 . 8840 . 8489 . 0055 . 4313 . 41
2756 . 4310 . 0341 . 0489 . 6056 . 2313 . 89
2856 . 8910 . 3541 . 2190 . 3856 . 3714 . 04
2956 . 6510 . 3041 . 4089 . 2256 . 4213 . 79
10 . 6013 . 90
3056 . 5641 . 6190 . 1456 . 82
MethodRunImageNetiNat18Places205EuroSatSUN397Cars
DINO019 . 722 . 4130 . 1288 . 6044 . 105 . 81
DINO135 . 642 . 8532 . 7288 . 7846 . 076 . 59
DINO253 . 333 . 6436 . 3889 . 3450 . 007 . 67
DINO354 . 634 . 9738 . 8691 . 3254 . 4810 . 31
453 . 016 . 7340 . 2491 . 8456 . 619 . 74
564 . 7915 . 8444 . 9593 . 2463 . 6822 . 04
610 . 020 . 4515 . 8780 . 9023 . 242 . 24
74 . 280 . 1810 . 1972 . 3016 . 561 . 98
82 . 600 . 146 . 8161 . 6212 . 551 . 82
MethodRunImageNetCIFAR10CIFAR100FOOD101VOC07CLEVR-count
026 . 3559 . 2325 . 8475 . 0066 . 1521 . 75
130 . 5460 . 5725 . 6775 . 0167 . 3319 . 64
236 . 9263 . 7729 . 6775 . 0070 . 9423 . 72
347 . 6075 . 5944 . 5875 . 0175 . 9041 . 44
451 . 2676 . 8845 . 2675 . 0176 . 8534 . 20
554 . 3978 . 3449 . 6375 . 0177 . 8540 . 24
655 . 6678 . 7149 . 8975 . 0178 . 1738 . 37
756 . 3378 . 8950 . 5275 . 0178 . 4139 . 88
858 . 6579 . 5750 . 9575 . 0179 . 4643 . 13
959 . 7180 . 4352 . 7575 . 0179 . 5043 . 93
1059 . 5880 . 5953 . 8075 . 0279 . 1943 . 87
1159 . 2280 . 9453 . 6675 . 0378 . 9044 . 90
1253 . 7878 . 9651 . 8375 . 0376 . 1743 . 71
1357 . 9481 . 4353 . 9275 . 0278 . 0445 . 23
1459 . 2081 . 0453 . 8875 . 0379 . 1843 . 75
1559 . 7381 . 1653 . 3575 . 0279 . 1844 . 39
VICReg1659 . 0980 . 4652 . 8275 . 0279 . 2045 . 92
1758 . 2379 . 7651 . 7775 . 0179 . 5836 . 53
1856 . 8279 . 2051 . 1075 . 0178 . 6237 . 81
1955 . 2278 . 8250 . 1975 . 0178 . 3638 . 50
2053 . 7577 . 8749 . 3475 . 0178 . 0738 . 62
2151 . 5378 . 4551 . 4675 . 0175 . 9949 . 13
2257 . 5780 . 6752 . 8775 . 0278 . 5345 . 93
2359 . 7180 . 4352 . 7575 . 0179 . 5043 . 93
2456 . 2275 . 8045 . 7375 . 0278 . 9039 . 71
2536 . 2272 . 5541 . 5075 . 0073 . 1241 . 73
2659 . 3379 . 5852 . 2575 . 0279 . 6144 . 89
2759 . 5180 . 2652 . 5575 . 0179 . 2942 . 99
2859 . 7079 . 7452 . 9275 . 0279 . 6746 . 11
2959 . 0381 . 2554 . 9675 . 0180 . 1043 . 33
3056 . 3780 . 8153 . 5575 . 0180 . 1146 . 97
3149 . 9680 . 8653 . 1075 . 0179 . 6847 . 09
058 . 1980 . 8052 . 3075 . 0179 . 7345 . 89
158 . 5380 . 1553 . 0875 . 0180 . 1843 . 75
257 . 4179 . 2251 . 6975 . 0179 . 3943 . 92
347 . 8076 . 7045 . 9375 . 0076 . 2143 . 52
425 . 1466 . 9133 . 4975 . 0066 . 1237 . 21
VICReg-exp519 . 2465 . 8529 . 8775 . 0062 . 0434 . 52
612 . 0362 . 4826 . 0675 . 0055 . 7134 . 17
747 . 3376 . 9146 . 4775 . 0076 . 4841 . 40
853 . 7277 . 9948 . 6575 . 0078 . 7244 . 80
958 . 8780 . 3653 . 7875 . 0180 . 1543 . 85
1058 . 0980 . 6453 . 4775 . 0179 . 9845 . 68
1157 . 2481 . 1054 . 2875 . 0179 . 5843 . 71
MethodRunImageNetCIFAR10CIFAR100FOOD101VOC07CLEVR-count
050 . 2678 . 7649 . 3475 . 0077 . 8937 . 61
150 . 9978 . 6349 . 8075 . 0078 . 7743 . 76
248 . 2777 . 8648 . 7575 . 0078 . 4840 . 49
336 . 7773 . 8340 . 4475 . 0072 . 1938 . 16
425 . 9266 . 8132 . 8275 . 0065 . 3231 . 63
VICReg-ctr517 . 6963 . 9425 . 5075 . 0058 . 4831 . 40
644 . 3176 . 8246 . 4375 . 0077 . 1642 . 48
749 . 3177 . 7048 . 6675 . 0078 . 5040 . 81
846 . 4375 . 5146 . 5475 . 0076 . 9744 . 25
938 . 3371 . 5140 . 6875 . 0072 . 6741 . 31
043 . 0575 . 8452 . 3975 . 0071 . 4346 . 86
149 . 6978 . 3253 . 2075 . 0076 . 1950 . 11
254 . 4579 . 2451 . 5675 . 0078 . 1049 . 63
350 . 2478 . 3647 . 7075 . 0079 . 0746 . 29
445 . 7775 . 7344 . 3775 . 0077 . 4646 . 63
541 . 1474 . 0441 . 8875 . 0075 . 0345 . 47
643 . 3175 . 6653 . 3175 . 0071 . 3741 . 40
749 . 6078 . 3654 . 7975 . 0076 . 4746 . 46
854 . 4880 . 3955 . 3975 . 0078 . 3849 . 53
950 . 7279 . 3251 . 0175 . 0079 . 2742 . 25
1043 . 3276 . 4348 . 0775 . 0077 . 5944 . 67
1141 . 1574 . 7545 . 0675 . 0074 . 7946 . 56
1244 . 6176 . 9753 . 2975 . 0072 . 7147 . 15
1351 . 5479 . 2056 . 4575 . 0076 . 8037 . 03
1456 . 5179 . 4853 . 9975 . 0078 . 5543 . 05
SimCLR1556 . 8979 . 3552 . 5875 . 0079 . 7343 . 51
1654 . 1878 . 1749 . 6275 . 0079 . 7143 . 32
1749 . 1976 . 5946 . 9875 . 0078 . 6143 . 48
1844 . 7276 . 6845 . 6274 . 9976 . 9542 . 88
1957 . 0679 . 8352 . 1975 . 0079 . 7345 . 97
2056 . 7279 . 4153 . 1275 . 0080 . 0545 . 65
2156 . 1479 . 4652 . 2575 . 0079 . 6447 . 19
2248 . 9879 . 3952 . 2875 . 0079 . 8945 . 92
2335 . 9279 . 5252 . 1575 . 0079 . 7443 . 86
2428 . 2679 . 2851 . 2575 . 0079 . 9045 . 39
2552 . 6579 . 3052 . 8575 . 0078 . 7442 . 71
2655 . 9879 . 5152 . 2175 . 0079 . 3444 . 75
2756 . 4378 . 5252 . 0475 . 0079 . 5545 . 19
2856 . 8979 . 3575 . 0043 . 51
2956 . 6578 . 9852 . 5879 . 7343 . 69
51 . 8075 . 0079 . 82
3056 . 5678 . 3651 . 8174 . 9979 . 7844 . 50
MethodRunImageNetCIFAR10CIFAR100FOOD101VOC07CLEVR-count
019 . 7281 . 2453 . 9775 . 0077 . 2142 . 39
135 . 6481 . 6655 . 3275 . 0082 . 4845 . 23
253 . 3380 . 7053 . 0375 . 0084 . 2241 . 93
354 . 6384 . 2658 . 4975 . 0085 . 2549 . 89
DINO453 . 0183 . 2058 . 1475 . 0085 . 7148 . 72
564 . 7987 . 0265 . 4375 . 0087 . 0351 . 19
610 . 0270 . 9837 . 6675 . 0061 . 2732 . 83
74 . 2859 . 0222 . 7175 . 0050 . 0234 . 77
82 . 6047 . 9716 . 2675 . 0041 . 1627 . 13
MethodRunImageNetiNat18Places205EuroSatSUN397Cars
0102 . 0738 . 1044 . 3914 . 6132 . 407 . 03
1229 . 8192 . 53129 . 4788 . 7898 . 4412 . 58
2374 . 25135 . 79206 . 29120 . 31163 . 3119 . 77
3612 . 12261 . 34336 . 16228 . 60265 . 6438 . 90
4831 . 49382 . 55467 . 68366 . 78374 . 5059 . 15
5952 . 55449 . 44539 . 24428 . 87435 . 9477 . 36
61033 . 93493 . 50587 . 19477 . 69478 . 3488 . 28
71088 . 13531 . 16630 . 80514 . 70517 . 4799 . 97
81442 . 63726 . 28849 . 29693 . 16723 . 53161 . 76
91809 . 06947 . 811110 . 80855 . 76954 . 83210 . 06
101920 . 811054 . 701247 . 93870 . 561075 . 89258 . 33
111938 . 441087 . 451275 . 60924 . 661119 . 33306 . 90
121937 . 781100 . 541337 . 88963 . 141172 . 38382 . 18
131944 . 951095 . 951307 . 62968 . 961155 . 65352 . 50
141940 . 041095 . 911280 . 85910 . 161126 . 89324 . 51
151942 . 121049 . 721240 . 87893 . 251070 . 12269 . 96
VICReg161521 . 07782 . 39919 . 54725 . 49771 . 86169 . 75
171278 . 67637 . 18757 . 19606 . 98627 . 48128 . 96
181079 . 67532 . 00634 . 88527 . 59524 . 80111 . 28
19909 . 71446 . 52525 . 65454 . 22431 . 4488 . 55
20777 . 82376 . 39447 . 53378 . 06360 . 4173 . 57
211409 . 29890 . 97996 . 12814 . 00889 . 66352 . 57
221652 . 41936 . 471070 . 40837 . 76932 . 17275 . 04
231809 . 06947 . 811110 . 80855 . 76954 . 83210 . 06
241422 . 16648 . 60813 . 33532 . 92650 . 3391 . 44
25101 . 2944 . 1246 . 0020 . 7736 . 6010 . 68
261821 . 80959 . 981130 . 27840 . 12962 . 04221 . 58
271814 . 64948 . 471107 . 25856 . 12946 . 73218 . 36
281728 . 89913 . 311065 . 74814 . 04911 . 39216 . 25
291587 . 36859 . 561008 . 93807 . 57864 . 18244 . 56
301384 . 68757 . 81881 . 53716 . 36767 . 14229 . 93
31974 . 91508 . 81613 . 44508 . 01526 . 43143 . 61
01006 . 58530 . 95637 . 48501 . 16551 . 60142 . 88
11002 . 17521 . 34626 . 39515 . 00534 . 56132 . 72
2922 . 59473 . 26564 . 18475 . 88472 . 06119 . 75
3399 . 09192 . 27233 . 31202 . 71189 . 7836 . 95
463 . 8230 . 2536 . 9821 . 3930 . 637 . 90
519 . 4712 . 499 . 576 . 337 . 963 . 58
VICReg-exp69 . 427 . 195 . 413 . 804 . 732 . 55
7375 . 38180 . 63216 . 71191 . 99176 . 9431 . 86
636 . 60314 . 20380 . 13341 . 21312 . 0466 . 31
8139 . 28
9 101002 . 29 1048 . 58528 . 76629 . 07517 . 84536 . 91 581 . 30158 . 24
111326 . 31556 . 24 733 . 86673 . 46 875 . 62547 . 15 707 . 34771 . 39208 . 83
MethodRunImageNetiNat18Places205EuroSatSUN397Cars
0382 . 33224 . 68252 . 33207 . 81220 . 6869 . 48
1278 . 88163 . 91183 . 32154 . 70160 . 7150 . 29
2169 . 33101 . 44114 . 4997 . 8999 . 8434 . 97
348 . 4732 . 3834 . 9332 . 7731 . 7612 . 53
423 . 2216 . 7217 . 9017 . 7016 . 637 . 38
VICReg-ctr512 . 8810 . 0310 . 3110 . 669 . 715 . 01
622 . 9616 . 8717 . 7717 . 3016 . 557 . 61
796 . 3362 . 0868 . 0560 . 6860 . 3922 . 59
8251 . 52146 . 09166 . 32138 . 75143 . 8145 . 73
9309 . 22177 . 32204 . 38170 . 65175 . 8153 . 83
10316 . 89184 . 83213 . 74175 . 10185 . 9159 . 07
0109 . 07105 . 65104 . 6576 . 13105 . 6492 . 59
1164 . 07148 . 71149 . 89100 . 17148 . 00113 . 61
2244 . 34184 . 32203 . 04129 . 53188 . 30105 . 89
3150 . 9094 . 61116 . 9483 . 98102 . 1740 . 64
487 . 6957 . 7867 . 2354 . 6259 . 7925 . 36
563 . 6842 . 2348 . 2240 . 8343 . 2218 . 41
6110 . 59106 . 83105 . 8376 . 97106 . 8293 . 98
7165 . 49149 . 55150 . 27103 . 19148 . 65113 . 60
8246 . 56184 . 69204 . 24128 . 96189 . 86107 . 43
9164 . 66102 . 61128 . 1295 . 47112 . 2043 . 29
109 . 8830 . 272 . 7455 . 4665 . 0825 . 57
1163 . 6142 . 0048 . 4040 . 8643 . 2018 . 62
12122 . 60118 . 57116 . 9385 . 13118 . 16103 . 25
13197 . 36173 . 50176 . 32116 . 61173 . 24128 . 89
14313 . 67220 . 05239 . 53160 . 52222 . 80111 . 73
SimCLR15299 . 47172 . 75209 . 43140 . 66183 . 5161 . 44
16220 . 63122 . 46150 . 73106 . 96130 . 1640 . 02
17128 . 3371 . 7590 . 4065 . 7778 . 6426 . 24
1871 . 7548 . 9564 . 2548 . 6554 . 8418 . 93
19301 . 92173 . 11211 . 03147 . 45185 . 0460 . 83
20299 . 75173 . 05208 . 52141 . 84182 . 2161 . 56
21299 . 96173 . 61209 . 18144 . 25181 . 9961 . 11
22300 . 90173 . 89209 . 47147 . 78184 . 4561 . 40
23300 . 58174 . 18207 . 19142 . 58184 . 2960 . 94
24300 . 83174 . 63207 . 18146 . 11182 . 2760 . 50
2511 . 5615 . 9531 . 99144 . 8713 . 553 . 92
26293 . 13172 . 80211 . 66139 . 57184 . 9465 . 02
27295 . 23173 . 07208 . 46139 . 91181 . 0562 . 32
28299 . 47172 . 75209 . 43140 . 66183 . 5161 . 44
29298 . 69172 . 12206 . 63142 . 92181 . 3960 . 88
30294 . 42170 . 26177 . 3958 . 94
201 . 98141 . 29
MethodRunImageNetiNat18Places205EuroSatSUN397Cars
DINO0150 . 5162 . 45106 . 14100 . 0890 . 4623 . 52
DINO1276 . 35110 . 16179 . 46163 . 25147 . 5036 . 81
DINO2482 . 51190 . 78291 . 91213 . 05236 . 7652 . 59
DINO3409 . 79180 . 78269 . 96196 . 90225 . 4770 . 37
4347 . 69168 . 61235 . 11168 . 90202 . 5155 . 07
5523 . 67330 . 31392 . 16271 . 77356 . 99155 . 13
661 . 5921 . 3239 . 8934 . 8832 . 552 . 52
722 . 438 . 2617 . 919 . 8615 . 621 . 93
811 . 015 . 848 . 278 . 087 . 831 . 39
MethodRunImageNetCIFAR10CIFAR100FOOD101VOC07CLEVR-count
0102 . 0738 . 1044 . 3914 . 6132 . 407 . 03
1229 . 8192 . 53129 . 4788 . 7898 . 4412 . 58
2374 . 25135 . 79206 . 29120 . 31163 . 3119 . 77
3612 . 12261 . 34336 . 16228 . 60265 . 6438 . 90
4831 . 49382 . 55467 . 68366 . 78374 . 5059 . 15
5952 . 55449 . 44539 . 24428 . 87435 . 9477 . 36
61033 . 93493 . 50587 . 19477 . 69478 . 3488 . 28
71088 . 13531 . 16630 . 80514 . 70517 . 4799 . 97
81442 . 63726 . 28849 . 29693 . 16723 . 53161 . 76
91809 . 06947 . 811110 . 80855 . 76954 . 83210 . 06
101920 . 811054 . 701247 . 93870 . 561075 . 89258 . 33
111938 . 441087 . 451275 . 60924 . 661119 . 33306 . 90
121937 . 781100 . 541337 . 88963 . 141172 . 38382 . 18
131944 . 951095 . 951307 . 62968 . 961155 . 65352 . 50
141940 . 041095 . 911280 . 85910 . 161126 . 89324 . 51
151942 . 121049 . 721240 . 87893 . 251070 . 12269 . 96
VICReg161521 . 07782 . 39919 . 54725 . 49771 . 86169 . 75
171278 . 67637 . 18757 . 19606 . 98627 . 48128 . 96
181079 . 67532 . 00634 . 88527 . 59524 . 80111 . 28
19909 . 71446 . 52525 . 65454 . 22431 . 4488 . 55
20777 . 82376 . 39447 . 53378 . 06360 . 4173 . 57
211409 . 29890 . 97996 . 12814 . 00889 . 66352 . 57
221652 . 41936 . 471070 . 40837 . 76932 . 17275 . 04
231809 . 06947 . 811110 . 80855 . 76954 . 83210 . 06
241422 . 16648 . 60813 . 33532 . 92650 . 3391 . 44
25101 . 2944 . 1246 . 0020 . 7736 . 6010 . 68
261821 . 80959 . 981130 . 27840 . 12962 . 04221 . 58
271814 . 64948 . 471107 . 25856 . 12946 . 73218 . 36
281728 . 89913 . 311065 . 74814 . 04911 . 39216 . 25
291587 . 36859 . 561008 . 93807 . 57864 . 18244 . 56
301384 . 68757 . 81881 . 53716 . 36767 . 14229 . 93
31974 . 91508 . 81613 . 44508 . 01526 . 43143 . 61
01006 . 58530 . 95637 . 48501 . 16551 . 60142 . 88
11002 . 17521 . 34626 . 39515 . 00534 . 56132 . 72
2922 . 59473 . 26564 . 18475 . 88472 . 06119 . 75
3399 . 09192 . 27233 . 31202 . 71189 . 7836 . 95
463 . 8230 . 2536 . 9821 . 3930 . 637 . 90
519 . 4712 . 499 . 576 . 337 . 963 . 58
VICReg-exp69 . 427 . 195 . 413 . 804 . 732 . 55
7375 . 38180 . 63216 . 71191 . 99176 . 9431 . 86
636 . 60314 . 20341 . 21
8380 . 13312 . 0466 . 31
91002 . 29 1048 . 58528 . 76 556 . 24629 . 07517 . 84536 . 91 581 . 30139 . 28 158 . 24
10 111326 . 31733 . 86673 . 46 875 . 62547 . 15 707 . 34771 . 39208 . 83
MethodRunImageNetCIFAR10CIFAR100FOOD101VOC07CLEVR-count
0382 . 33224 . 68252 . 33207 . 81220 . 6869 . 48
1278 . 88163 . 91183 . 32154 . 70160 . 7150 . 29
2169 . 33101 . 44114 . 4997 . 8999 . 8434 . 97
348 . 4732 . 3834 . 9332 . 7731 . 7612 . 53
423 . 2216 . 7217 . 9017 . 7016 . 637 . 38
VICReg-ctr512 . 8810 . 0310 . 3110 . 669 . 715 . 01
696 . 3362 . 0868 . 0560 . 6860 . 3922 . 59
7251 . 52146 . 09166 . 32138 . 75143 . 8145 . 73
8309 . 22177 . 32204 . 38170 . 65175 . 8153 . 83
9316 . 89184 . 83213 . 74175 . 10185 . 9159 . 07
0109 . 07105 . 65104 . 6576 . 13105 . 6492 . 59
1164 . 07148 . 71149 . 89100 . 17148 . 00113 . 61
2244 . 34184 . 32203 . 04129 . 53188 . 30105 . 89
3150 . 9094 . 61116 . 9483 . 98102 . 1740 . 64
487 . 6957 . 7867 . 2354 . 6259 . 7925 . 36
563 . 6842 . 2348 . 2240 . 8343 . 2218 . 41
6110 . 59106 . 83105 . 8376 . 97106 . 8293 . 98
7165 . 49149 . 55150 . 27103 . 19148 . 65113 . 60
8246 . 56184 . 69204 . 24128 . 96189 . 86107 . 43
9164 . 66102 . 61128 . 1295 . 47112 . 2043 . 29
109 . 8830 . 272 . 7455 . 4665 . 0825 . 57
1163 . 6142 . 0048 . 4040 . 8643 . 2018 . 62
12122 . 60118 . 57116 . 9385 . 13118 . 16103 . 25
13197 . 36173 . 50176 . 32116 . 61173 . 24128 . 89
14313 . 67220 . 05239 . 53160 . 52222 . 80111 . 73
SimCLR15299 . 47172 . 75209 . 43140 . 66183 . 5161 . 44
16220 . 63122 . 46150 . 73106 . 96130 . 1640 . 02
17128 . 3371 . 7590 . 4065 . 7778 . 6426 . 24
1871 . 7548 . 9564 . 2548 . 6554 . 8418 . 93
19301 . 92173 . 11211 . 03147 . 45185 . 0460 . 83
20299 . 75173 . 05208 . 52141 . 84182 . 2161 . 56
21299 . 96173 . 61209 . 18144 . 25181 . 9961 . 11
22300 . 90173 . 89209 . 47147 . 78184 . 4561 . 40
23300 . 58174 . 18207 . 19142 . 58184 . 2960 . 94
24300 . 83174 . 63207 . 18146 . 11182 . 2760 . 50
2511 . 5615 . 9531 . 99144 . 8713 . 553 . 92
26293 . 13172 . 80211 . 66139 . 57184 . 9465 . 02
27295 . 23173 . 07208 . 46139 . 91181 . 0562 . 32
28299 . 47172 . 75209 . 43140 . 66183 . 5161 . 44
29298 . 69172 . 12181 . 3960 . 88
30206 . 63142 . 9258 . 94
294 . 42170 . 26201 . 98141 . 29177 . 39
MethodRunImageNetCIFAR10CIFAR100FOOD101VOC07CLEVR-count
DINO0150 . 5162 . 45106 . 14100 . 0890 . 4623 . 52
DINO1276 . 35110 . 16179 . 46163 . 25147 . 5036 . 81
DINO2482 . 51190 . 78291 . 91213 . 05236 . 7652 . 59
DINO3409 . 79180 . 78269 . 96196 . 90225 . 4770 . 37
4347 . 69168 . 61235 . 11168 . 90202 . 5155 . 07
5523 . 67330 . 31392 . 16271 . 77356 . 99155 . 13
661 . 5921 . 3239 . 8934 . 8832 . 552 . 52
722 . 438 . 2617 . 919 . 8615 . 621 . 93
811 . 015 . 848 . 278 . 087 . 831 . 39

Figure

Figure

References

[balestriero2022spectral] Balestriero, R. and LeCun, Y. \newblock Contrastive and non-contrastive self-supervised learning recover global and local spectral embedding methods. \newblock arXiv preprint arXiv:2205.11508, 2022.

[bardes2021vicreg] Bardes, A., Ponce, J., and LeCun, Y. \newblock Vicreg: Variance-invariance-covariance regularization for self-supervised learning. \newblock arXiv preprint arXiv:2105.04906, 2021.

[bordes2021high] Bordes, F., Balestriero, R., and Vincent, P. \newblock High fidelity visualization of what your self-supervised representation knows about. \newblock arXiv preprint arXiv:2112.09164, 2021.

[food101] Bossard, L., Guillaumin, M., and Van~Gool, L. \newblock Food-101 -- mining discriminative components with random forests. \newblock In European Conference on Computer Vision, 2014.

[bromley1994siamese] Bromley, J., Guyon, I., LeCun, Y., Sackinger, E., and Shah, R. \newblock Signature verification using a “siamese” time delay neural network. \newblock In NeurIPS, 1994.

[caron2018clustering] Caron, M., Bojanowski, P., Joulin, A., and Douze, M. \newblock Deep clustering for unsupervised learning. \newblock In ECCV, 2018.

[caron2020swav] Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., and Joulin, A. \newblock Unsupervised learning of visual features by contrasting cluster assignments. \newblock In NeurIPS, 2020.

[caron2021dino] Caron, M., Touvron, H., Misra, I., Jegou, H., and Joulin, J. M. P. B.~A. \newblock Emerging properties in self-supervised vision transformers. \newblock In ICCV, 2021.

[chen2020simple] Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. \newblock A simple framework for contrastive learning of visual representations. \newblock In ICML, pp.\ 1597--1607. PMLR, 2020{a}.

[chen2020simsiam] Chen, X. and He, K. \newblock Exploring simple siamese representation learning. \newblock In CVPR, 2020.

[chen2020mocov2] Chen, X., Fan, H., Girshick, R., and He, K. \newblock Improved baselines with momentum contrastive learning. \newblock arXiv preprint arXiv:2003.04297, 2020{b}.

[chen2021mocov3] Chen, X., Xie, S., and He, K. \newblock An empirical study of training self-supervised vision transformers. \newblock In ICCV, 2021.

[cover1965geometrical] Cover, T.~M. \newblock Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. \newblock IEEE transactions on electronic computers, \penalty0 (3):\penalty0 326--334, 1965.

[deng2009imagenet] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. \newblock Imagenet: A large-scale hierarchical image database. \newblock In CVPR, 2009.

[eckart1936approximation] Eckart, C. and Young, G. \newblock The approximation of one matrix by another of lower rank. \newblock Psychometrika, 1\penalty0 (3):\penalty0 211--218, 1936.

[ermolov2021whitening] Ermolov, A., Siarohin, A., Sangineto, E., and Sebe, N. \newblock Whitening for self-supervised representation learning, 2021.

[voc07] Everingham, M., Van~Gool, L., Williams, C. K.~I., Winn, J., and Zisserman, A. \newblock The {PASCAL} {V}isual {O}bject {C}lasses {C}hallenge 2007 {(VOC2007)} {R}esults. \newblock http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.

[ganea2019breaking] Ganea, O., Gelly, S., B{'e}cigneul, G., and Severyn, A. \newblock Breaking the softmax bottleneck via learnable monotonic pointwise non-linearities. \newblock In International Conference on Machine Learning, pp.\ 2073--2082. PMLR, 2019.

[garrido2022duality] Garrido, Q., Chen, Y., Bardes, A., Najman, L., and Lecun, Y. \newblock On the duality between contrastive and non-contrastive self-supervised learning. \newblock arXiv preprint arXiv:2206.02574, 2022.

[ghosh2022investigating] Ghosh, A., Mondal, A.~K., Agrawal, K.~K., and Richards, B. \newblock Investigating power laws in deep representation learning. \newblock arXiv preprint arXiv:2202.05808, 2022.

[girish2022one] Girish, S., Dey, D., Joshi, N., Vineet, V., Shah, S., Mendes, C. C.~T., Shrivastava, A., and Song, Y. \newblock One network doesn't rule them all: Moving beyond handcrafted architectures in self-supervised learning. \newblock arXiv preprint arXiv:2203.08130, 2022.

[goyal2017lars] Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., and He, K. \newblock Accurate, large minibatch sgd: Training imagenet in 1 hour. \newblock arXiv preprint arXiv:1706.02677, 2017.

[goyal2021vissl] Goyal, P., Duval, Q., Reizenstein, J., Leavitt, M., Xu, M., Lefaudeux, B., Singh, M., Reis, V., Caron, M., Bojanowski, P., Joulin, A., and Misra, I. \newblock Vissl. \newblock https://github.com/facebookresearch/vissl, 2021.

[grill2020byol] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P.~H., Buchatskaya, E., Doersch, C., Pires, B.~A., Guo, Z.~D., Azar, M.~G., Piot, B., Kavukcuoglu, K., Munos, R., and Valko, M. \newblock Bootstrap your own latent: A new approach to self-supervised learning. \newblock In NeurIPS, 2020.

[haochen2021provable] HaoChen, J.~Z., Wei, C., Gaidon, A., and Ma, T. \newblock Provable guarantees for self-supervised deep learning with spectral contrastive loss. \newblock NeurIPS, 34, 2021.

[he2022exploring] He, B. and Ozay, M. \newblock Exploring the gap between collapsed & whitened features in self-supervised learning. \newblock In International Conference on Machine Learning, pp.\ 8613--8634. PMLR, 2022.

[he2016resnet] He, K., Zhang, X., Ren, S., and Sun, J. \newblock Deep residual learning for image recognition. \newblock In CVPR, 2016.

[he2020moco] He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R. \newblock Momentum contrast for unsupervised visual representation learning. \newblock In CVPR, 2020.

[he2021mae] He, K., Chen, X., Xie, S., Li, Y., Doll{'a}r, P., and Girshick, R. \newblock Masked autoencoders are scalable vision learners. \newblock arXiv preprint arXiv:2111.06377, 2021.

[he2022masked] He, K., Chen, X., Xie, S., Li, Y., Doll{'a}r, P., and Girshick, R. \newblock Masked autoencoders are scalable vision learners. \newblock In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 16000--16009, 2022.

[helber2019eurosat] Helber, P., Bischke, B., Dengel, A., and Borth, D. \newblock Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. \newblock IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12\penalty0 (7):\penalty0 2217--2226, 2019.

[vanhorni2018naturalist] Horn, G.~V., Aodha, O.~M., Song, Y., Cui, Y., Sun, C., Shepard, A., Adam, H., Perona, P., and Belongie, S. \newblock The inaturalist species classification and detection dataset. \newblock In CVPR, 2018.

[hua2021feature] Hua, T., Wang, W., Xue, Z., Ren, S., Wang, Y., and Zhao, H. \newblock On feature decorrelation in self-supervised learning. \newblock In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.\ 9598--9608, 2021.

[jing2022understanding] Jing, L., Vincent, P., LeCun, Y., and Tian, Y. \newblock Understanding dimensional collapse in contrastive self-supervised learning. \newblock In International Conference on Learning Representations, 2022. \newblock URL https://openreview.net/forum?id=YevsQ05DEN7.

[johnson2017clevr] Johnson, J., Hariharan, B., Van DerMaaten, L., Fei-Fei, L., LawrenceZitnick, C., and Girshick, R. \newblock Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. \newblock In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.\ 2901--2910, 2017.

[krause20133d] Krause, J., Stark, M., Deng, J., and Fei-Fei, L. \newblock 3d object representations for fine-grained categorization. \newblock In Proceedings of the IEEE international conference on computer vision workshops, pp.\ 554--561, 2013.

[cifar] Krizhevsky, A., Hinton, G., et~al. \newblock Learning multiple layers of features from tiny images. \newblock 2009.

[lee2021cbyol] Lee, K.-H., Arnab, A., Guadarrama, S., Canny, J., and Fischer, I. \newblock Compressive visual representations. \newblock In NeurIPS, 2021.

[li2022understanding] Li, A.~C., Efros, A.~A., and Pathak, D. \newblock Understanding collapse in non-contrastive siamese representation learning. \newblock In European Conference on Computer Vision, pp.\ 490--505. Springer, 2022{a}.

[li2022esvit] Li, C., Yang, J., Zhang, P., Gao, M., Xiao, B., Dai, X., Yuan, L., and Gao, J. \newblock Efficient self-supervised vision transformers for representation learning. \newblock In ICLR, 2022{b}.

[li2022neural] Li, Z., Chen, Y., LeCun, Y., and Sommer, F.~T. \newblock Neural manifold clustering and embedding. \newblock arXiv preprint arXiv:2201.10000, 2022{c}.

[loshchilov2017adamw] Loshchilov, I. and Hutter, F. \newblock Decoupled weight decay regularization. \newblock arXiv preprint arXiv:1711.05101, 2017.

[misra2020pirl] Misra, I. and Maaten, L. v.~d. \newblock Self-supervised learning of pretext-invariant representations. \newblock In CVPR, 2020.

[oord2018infonce] Oord, A. v.~d., Li, Y., and Vinyals, O. \newblock Representation learning with contrastive predictive coding. \newblock arXiv preprint arXiv:1807.03748, 2018.

[press2007numerical] Press, W.~H., Teukolsky, S.~A., Vetterling, W.~T., and Flannery, B.~P. \newblock Numerical recipes 3rd edition: The art of scientific computing. \newblock Cambridge university press, 2007.

[reed2021selfaugment] Reed, C.~J., Metzger, S., Srinivas, A., Darrell, T., and Keutzer, K. \newblock Selfaugment: Automatic augmentation policies for self-supervised learning. \newblock In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 2674--2683, 2021.

[roy2007effective] Roy, O. and Vetterli, M. \newblock The effective rank: A measure of effective dimensionality. \newblock In 2007 15th European signal processing conference, pp.\ 606--610. IEEE, 2007.

[santurkar2018does] Santurkar, S., Tsipras, D., Ilyas, A., and Madry, A. \newblock How does batch normalization help optimization? \newblock Advances in neural information processing systems, 31, 2018.

[shannon1948mathematical] Shannon, C.~E. \newblock A mathematical theory of communication. \newblock The Bell system technical journal, 27\penalty0 (3):\penalty0 379--423, 1948.

[tomasev2022relicv2] Tomasev, N., Bica, I., McWilliams, B., Buesing, L., Pascanu, R., Blundell, C., and Mitrovic, J. \newblock Pushing the limits of self-supervised resnets: Can we outperform supervised learning without labels on imagenet? \newblock arXiv preprint arXiv:2201.05119, 2022.

[ulyanov2018deep] Ulyanov, D., Vedaldi, A., and Lempitsky, V. \newblock Deep image prior. \newblock In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.\ 9446--9454, 2018.

[wu2018discrimination] Wu, Z., Xiong, Y., Yu, S., , and Lin, D. \newblock Unsupervised feature learning via non-parametric instance discrimination. \newblock In CVPR, 2018.

[xiao2010sun] Xiao, J., Hays, J., Ehinger, K.~A., Oliva, A., and Torralba, A. \newblock Sun database: Large-scale scene recognition from abbey to zoo. \newblock In 2010 IEEE computer society conference on computer vision and pattern recognition, pp.\ 3485--3492. IEEE, 2010.

[yeh2021decoupled] Yeh, C.-H., Hong, C.-Y., Hsu, Y.-C., Liu, T.-L., Chen, Y., and LeCun, Y. \newblock Decoupled contrastive learning. \newblock arXiv preprint arXiv:2110.06848, 2021.

[you2017lars] You, Y., Gitman, I., and Ginsburg, B. \newblock Large batch training of convolutional networks. \newblock arXiv preprint arXiv:1708.03888, 2017.

[zbontar2021barlow] Zbontar, J., Jing, L., Misra, I., LeCun, Y., and Deny, S. \newblock Barlow twins: Self-supervised learning via redundancy reduction. \newblock In ICML, pp.\ 12310--12320. PMLR, 2021.

[zhou2014places] Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., and Oliva, A. \newblock Learning deep features for scene recognition using places database. \newblock In NeurIPS, 2014.

[zhou2022ibot] Zhou, J., Wei, C., Wang, H., Shen, W., Xie, C., Yuille, A., and Kong, T. \newblock ibot: Image bert pre-training with online tokenizer. \newblock In ICLR, 2022{a}.

[zhou2022mugs] Zhou, P., Zhou, Y., Si, C., Yu, W., Ng, T.~K., and Yan, S. \newblock Mugs: A multi-granular self-supervised learning framework. \newblock 2022{b}.

[zhuang2019local] Zhuang, C., Zhai, A.~L., and Yamins, D. \newblock Local aggregation for unsupervised learning of visual embeddings. \newblock In ICCV, 2019.

[bib1] Balestriero, R. and LeCun, Y. Contrastive and non-contrastive self-supervised learning recover global and local spectral embedding methods. arXiv preprint arXiv:2205.11508, 2022.

[bib2] Bardes et al. (2021) Bardes, A., Ponce, J., and LeCun, Y. Vicreg: Variance-invariance-covariance regularization for self-supervised learning. arXiv preprint arXiv:2105.04906, 2021.

[bib3] Bordes et al. (2021) Bordes, F., Balestriero, R., and Vincent, P. High fidelity visualization of what your self-supervised representation knows about. arXiv preprint arXiv:2112.09164, 2021.

[bib4] Bossard et al. (2014) Bossard, L., Guillaumin, M., and Van Gool, L. Food-101 – mining discriminative components with random forests. In European Conference on Computer Vision, 2014.

[bib5] Bromley et al. (1994) Bromley, J., Guyon, I., LeCun, Y., Sackinger, E., and Shah, R. Signature verification using a “siamese” time delay neural network. In NeurIPS, 1994.

[bib6] Caron et al. (2018) Caron, M., Bojanowski, P., Joulin, A., and Douze, M. Deep clustering for unsupervised learning. In ECCV, 2018.

[bib7] Caron et al. (2020) Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., and Joulin, A. Unsupervised learning of visual features by contrasting cluster assignments. In NeurIPS, 2020.

[bib8] Caron et al. (2021) Caron, M., Touvron, H., Misra, I., Jegou, H., and Joulin, J. M. P. B. A. Emerging properties in self-supervised vision transformers. In ICCV, 2021.

[bib9] Chen et al. (2020a) Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. A simple framework for contrastive learning of visual representations. In ICML, pp. 1597–1607. PMLR, 2020a.

[bib10] Chen, X. and He, K. Exploring simple siamese representation learning. In CVPR, 2020.

[bib11] Chen et al. (2020b) Chen, X., Fan, H., Girshick, R., and He, K. Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297, 2020b.

[bib12] Chen et al. (2021) Chen, X., Xie, S., and He, K. An empirical study of training self-supervised vision transformers. In ICCV, 2021.

[bib13] Cover, T. M. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE transactions on electronic computers, (3):326–334, 1965.

[bib14] Deng et al. (2009) Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.

[bib15] Eckart, C. and Young, G. The approximation of one matrix by another of lower rank. Psychometrika, 1(3):211–218, 1936.

[bib16] Ermolov et al. (2021) Ermolov, A., Siarohin, A., Sangineto, E., and Sebe, N. Whitening for self-supervised representation learning, 2021.

[bib17] Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.

[bib18] Ganea et al. (2019) Ganea, O., Gelly, S., Bécigneul, G., and Severyn, A. Breaking the softmax bottleneck via learnable monotonic pointwise non-linearities. In International Conference on Machine Learning, pp. 2073–2082. PMLR, 2019.

[bib19] Garrido et al. (2022) Garrido, Q., Chen, Y., Bardes, A., Najman, L., and Lecun, Y. On the duality between contrastive and non-contrastive self-supervised learning. arXiv preprint arXiv:2206.02574, 2022.

[bib20] Ghosh et al. (2022) Ghosh, A., Mondal, A. K., Agrawal, K. K., and Richards, B. Investigating power laws in deep representation learning. arXiv preprint arXiv:2202.05808, 2022.

[bib21] Girish et al. (2022) Girish, S., Dey, D., Joshi, N., Vineet, V., Shah, S., Mendes, C. C. T., Shrivastava, A., and Song, Y. One network doesn’t rule them all: Moving beyond handcrafted architectures in self-supervised learning. arXiv preprint arXiv:2203.08130, 2022.

[bib22] Goyal et al. (2017) Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., and He, K. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.

[bib23] Goyal et al. (2021) Goyal, P., Duval, Q., Reizenstein, J., Leavitt, M., Xu, M., Lefaudeux, B., Singh, M., Reis, V., Caron, M., Bojanowski, P., Joulin, A., and Misra, I. Vissl. https://github.com/facebookresearch/vissl, 2021.

[bib24] Grill et al. (2020) Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P. H., Buchatskaya, E., Doersch, C., Pires, B. A., Guo, Z. D., Azar, M. G., Piot, B., Kavukcuoglu, K., Munos, R., and Valko, M. Bootstrap your own latent: A new approach to self-supervised learning. In NeurIPS, 2020.

[bib25] HaoChen et al. (2021) HaoChen, J. Z., Wei, C., Gaidon, A., and Ma, T. Provable guarantees for self-supervised deep learning with spectral contrastive loss. NeurIPS, 34, 2021.

[bib26] He, B. and Ozay, M. Exploring the gap between collapsed & whitened features in self-supervised learning. In International Conference on Machine Learning, pp. 8613–8634. PMLR, 2022.

[bib27] He et al. (2016) He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In CVPR, 2016.

[bib28] He et al. (2020) He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R. Momentum contrast for unsupervised visual representation learning. In CVPR, 2020.

[bib29] He et al. (2021) He, K., Chen, X., Xie, S., Li, Y., Dollár, P., and Girshick, R. Masked autoencoders are scalable vision learners. arXiv preprint arXiv:2111.06377, 2021.

[bib30] He et al. (2022) He, K., Chen, X., Xie, S., Li, Y., Dollár, P., and Girshick, R. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009, 2022.

[bib31] Helber et al. (2019) Helber, P., Bischke, B., Dengel, A., and Borth, D. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 12(7):2217–2226, 2019.

[bib32] Horn et al. (2018) Horn, G. V., Aodha, O. M., Song, Y., Cui, Y., Sun, C., Shepard, A., Adam, H., Perona, P., and Belongie, S. The inaturalist species classification and detection dataset. In CVPR, 2018.

[bib33] Hua et al. (2021) Hua, T., Wang, W., Xue, Z., Ren, S., Wang, Y., and Zhao, H. On feature decorrelation in self-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9598–9608, 2021.

[bib34] Jing et al. (2022) Jing, L., Vincent, P., LeCun, Y., and Tian, Y. Understanding dimensional collapse in contrastive self-supervised learning. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=YevsQ05DEN7.

[bib35] Johnson et al. (2017) Johnson, J., Hariharan, B., Van Der Maaten, L., Fei-Fei, L., Lawrence Zitnick, C., and Girshick, R. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2901–2910, 2017.

[bib36] Krause et al. (2013) Krause, J., Stark, M., Deng, J., and Fei-Fei, L. 3d object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops, pp. 554–561, 2013.

[bib37] Krizhevsky et al. (2009) Krizhevsky, A., Hinton, G., et al. Learning multiple layers of features from tiny images. 2009.

[bib38] Lee et al. (2021) Lee, K.-H., Arnab, A., Guadarrama, S., Canny, J., and Fischer, I. Compressive visual representations. In NeurIPS, 2021.

[bib39] Li et al. (2022a) Li, A. C., Efros, A. A., and Pathak, D. Understanding collapse in non-contrastive siamese representation learning. In European Conference on Computer Vision, pp. 490–505. Springer, 2022a.

[bib40] Li et al. (2022b) Li, C., Yang, J., Zhang, P., Gao, M., Xiao, B., Dai, X., Yuan, L., and Gao, J. Efficient self-supervised vision transformers for representation learning. In ICLR, 2022b.

[bib41] Li et al. (2022c) Li, Z., Chen, Y., LeCun, Y., and Sommer, F. T. Neural manifold clustering and embedding. arXiv preprint arXiv:2201.10000, 2022c.

[bib42] Loshchilov, I. and Hutter, F. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.

[bib43] Misra, I. and Maaten, L. v. d. Self-supervised learning of pretext-invariant representations. In CVPR, 2020.

[bib44] Oord et al. (2018) Oord, A. v. d., Li, Y., and Vinyals, O. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.

[bib45] Press et al. (2007) Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. Numerical recipes 3rd edition: The art of scientific computing. Cambridge university press, 2007.

[bib46] Reed et al. (2021) Reed, C. J., Metzger, S., Srinivas, A., Darrell, T., and Keutzer, K. Selfaugment: Automatic augmentation policies for self-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2674–2683, 2021.

[bib47] Roy, O. and Vetterli, M. The effective rank: A measure of effective dimensionality. In 2007 15th European signal processing conference, pp. 606–610. IEEE, 2007.

[bib48] Santurkar et al. (2018) Santurkar, S., Tsipras, D., Ilyas, A., and Madry, A. How does batch normalization help optimization? Advances in neural information processing systems, 31, 2018.

[bib49] Shannon, C. E. A mathematical theory of communication. The Bell system technical journal, 27(3):379–423, 1948.

[bib50] Tomasev et al. (2022) Tomasev, N., Bica, I., McWilliams, B., Buesing, L., Pascanu, R., Blundell, C., and Mitrovic, J. Pushing the limits of self-supervised resnets: Can we outperform supervised learning without labels on imagenet? arXiv preprint arXiv:2201.05119, 2022.

[bib51] Ulyanov et al. (2018) Ulyanov, D., Vedaldi, A., and Lempitsky, V. Deep image prior. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 9446–9454, 2018.

[bib52] Wu et al. (2018) Wu, Z., Xiong, Y., Yu, S., , and Lin, D. Unsupervised feature learning via non-parametric instance discrimination. In CVPR, 2018.

[bib53] Xiao et al. (2010) Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., and Torralba, A. Sun database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE computer society conference on computer vision and pattern recognition, pp. 3485–3492. IEEE, 2010.

[bib54] Yeh et al. (2021) Yeh, C.-H., Hong, C.-Y., Hsu, Y.-C., Liu, T.-L., Chen, Y., and LeCun, Y. Decoupled contrastive learning. arXiv preprint arXiv:2110.06848, 2021.

[bib55] You et al. (2017) You, Y., Gitman, I., and Ginsburg, B. Large batch training of convolutional networks. arXiv preprint arXiv:1708.03888, 2017.

[bib56] Zbontar et al. (2021) Zbontar, J., Jing, L., Misra, I., LeCun, Y., and Deny, S. Barlow twins: Self-supervised learning via redundancy reduction. In ICML, pp. 12310–12320. PMLR, 2021.

[bib57] Zhou et al. (2014) Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., and Oliva, A. Learning deep features for scene recognition using places database. In NeurIPS, 2014.

[bib58] Zhou et al. (2022a) Zhou, J., Wei, C., Wang, H., Shen, W., Xie, C., Yuille, A., and Kong, T. ibot: Image bert pre-training with online tokenizer. In ICLR, 2022a.

[bib59] Zhou et al. (2022b) Zhou, P., Zhou, Y., Si, C., Yu, W., Ng, T. K., and Yan, S. Mugs: A multi-granular self-supervised learning framework. 2022b.

[bib60] Zhuang et al. (2019) Zhuang, C., Zhai, A. L., and Yamins, D. Local aggregation for unsupervised learning of visual embeddings. In ICCV, 2019.