SSA-BNS

Abstract

Cross-Domain Few-Shot Learning (CD-FSL) aims to recognize new classes from unseen domains, given limited training samples. Majority of the state-of-the-art approaches for this task introduce new task-specific additional parameters for adapting to the novel task, which involves changing the trained model architecture, in addition to increasing the number of model parameters. The first contribution of this work is to revisit existing approaches like modifying the Batch Normalization affine parameters and the scale hyperparameter in cosine similarity based softmax loss for adapting the trained model to new tasks, without changing the model architecture. Secondly, to aid model learning with few examples per class, we propose to augment the data of each class with the styles of semantically similar classes. Extensive evaluation on the challenging Meta-Dataset shows that this simple framework is very effective for the CD-FSL task. We also show that the Similar-class Style Augmentation module can be seamlessly integrated with existing approaches to further improve their performance, thus establishing state-of-the-art in this challenging area.

Cross-Domain Few-Shot Learning

In CD-FSL, a universal feature extractor \(F\) is trained on labeled data from multiple source domains \(D_\text{train}\). At test time, it must adapt to N-way K-shot tasks sampled from unseen classes in unseen domains \(D_\text{test}\). Each task \(\mathcal{T} = (\mathcal{S}, \mathcal{Q})\) consists of a labeled support set \(\mathcal{S}\) and an unlabeled query set \(\mathcal{Q}\).

Most state-of-the-art methods (FLUTE, URL, TSA) handle this by introducing task-specific learnable modules, which either increase model parameters or change the trained architecture. This is undesirable in many practical settings. SSA-BNS addresses CD-FSL without any architectural changes or extra parameters, by revisiting two under-explored components: BatchNorm adaptation and the cosine similarity scale factor.

CD-FSL task overview — **Figure 1. CD-FSL task.** Training (left): use labeled multi-domain data to learn a universal feature extractor. Testing (right): N-way K-shot tasks with unseen classes from unseen domains. The support set adapts the model; the query set evaluates it.

SSA-BNS Framework

1. BatchNorm Adaptation

Given the universal feature extractor, we adapt only the BatchNorm affine parameters \(\{\gamma, \beta\}\) at each layer \(l\), without changing any other part of the model. Batch-normalized activations are:

\begin{align} f^l_\text{BN} = \gamma^l \hat{f}^l + \beta^l; \quad \hat{f}^l = \frac{f^l - \mu^l}{\sqrt{(\sigma^l)^2 + \epsilon}} \end{align}

The BN parameters \(\{\gamma, \beta\}\) are optimized to minimize the Nearest Centroid Classifier (NCC) loss over the support set:

\begin{align} \min_{\gamma, \beta} \; \frac{1}{n_\mathcal{S}} \sum_{(x_i^s, y_i^s) \in \mathcal{S}} \mathcal{L}_\text{NCC}(z_i^s, y_i^s; \eta) \quad \text{where} \quad z_i^s = F(x_i^s) \end{align}

Class centroids are the mean of support features per class:

\begin{align} \mathbf{c}_k = \frac{1}{|\mathcal{S}_k|} \sum_{x_i^s \in \mathcal{S}_k} z_i^s \end{align}

2. Cosine Similarity Scale Factor

The NCC loss uses cosine similarity with a scale hyperparameter \(\eta\):

\begin{align} p(y=k \mid z_i^s; \eta) = \frac{e^{\eta \cos\theta_{i,k}}}{\sum_{j=1}^{C} e^{\eta \cos\theta_{i,j}}} \end{align}

Prior works URL and TSA fixed \(\eta = 10\). In CD-FSL, the test domain can be very different from training, so cosine similarities tend to be low. We find that \(\eta = 25\) is significantly better: it expands the probability range so that correctly classified samples receive high confidence without the collapse seen at \(\eta = 50\) (which causes rapid overfitting on the support set within ~10 iterations).

3. Similar Class Style Augmentation (SSA)

To overcome limited support data, we augment each sample with the style (channel-wise feature statistics) of a semantically similar class sample. Class similarity is measured via cosine similarity of centroids:

\begin{align} \text{sim}(\mathbf{c}_i, \mathbf{c}_j) = \frac{\mathbf{c}_i^T \mathbf{c}_j}{\|\mathbf{c}_i\| \|\mathbf{c}_j\|} \end{align}

The similar class set for class \(k\) is:

\begin{align} \mathcal{S}_k = \{t \mid \text{sim}(\mathbf{c}_t, \mathbf{c}_k) > \tau;\; t = 1,\ldots,C\} \end{align}

For sample \(x_i\) of class \(y_i\), we randomly pick \(x_j\) from a similar class \(y_j \in \mathcal{S}_{y_i}\) and mix their intermediate feature statistics at layer \(l\):

\begin{align} \mu_\text{ssa}(f_i; f_j) &= \lambda\,\mu(f_i) + (1-\lambda)\,\mu(f_j) \\ \sigma_\text{ssa}(f_i; f_j) &= \lambda\,\sigma(f_i) + (1-\lambda)\,\sigma(f_j) \\ f_i^\text{ssa} &= \sigma_\text{ssa} \odot \frac{f_i - \mu(f_i)}{\sigma(f_i)} + \mu_\text{ssa} \end{align}

The content of \(x_i\) is preserved in \(f_i^\text{ssa}\), so the augmented sample retains its class label \(y_i\). The final SSA-BNS objective jointly minimizes NCC loss on real and augmented features:

\begin{align} \min_{\gamma, \beta} \; \frac{1}{2n_\mathcal{S}} \sum_{(x_i^s, y_i^s)} \left[ \mathcal{L}_\text{NCC}(z_i^s, y_i^s; \eta) + \mathcal{L}_\text{NCC}(z_i^\text{ssa}, y_i^s; \eta) \right] \end{align}

SSA is inserted after the first two ResNet blocks with \(\lambda = 0.5\) and similarity threshold \(\tau = 0.7\). No additional parameters are introduced.

Experimental Results

We evaluate on the Meta-Dataset benchmark (8 seen + 5 unseen domains) using a ResNet-18 universal feature extractor. Average accuracy and 95% confidence interval are reported over 600 tasks.

Dataset	SUR	URT	FLUTE	tri-M	URL*	TSA*	SSA-BNS	TSA*+SSA
ImageNet	56.2±1.0	56.8±1.1	58.6±1.0	51.8±1.1	58.8±1.1	59.5±1.0	56.6±1.0	58.9±1.1
Omniglot	94.1±0.4	94.2±0.4	92.0±0.6	93.2±0.5	94.5±0.4	94.9±0.4	95.2±0.5	95.6±0.4
Aircraft	85.5±0.5	85.8±0.5	82.8±0.7	87.2±0.5	89.4±0.4	89.9±0.4	89.6±0.4	90.0±0.5
Birds	71.0±1.0	76.2±0.8	75.3±0.8	79.2±0.8	80.7±0.8	81.1±0.8	81.8±0.8	82.2±0.7
Textures	71.0±0.8	71.6±0.7	71.2±0.8	68.8±0.8	77.2±0.7	77.5±0.7	76.4±0.7	77.6±0.7
Quick Draw	81.8±0.6	82.4±0.6	77.3±0.7	79.5±0.7	82.5±0.6	81.7±0.6	82.8±0.6	82.7±0.7
Fungi	64.3±0.9	64.0±1.0	48.5±1.0	58.1±1.1	68.1±0.9	66.3±0.8	66.7±0.8	66.6±0.8
VGG Flower	82.9±0.8	87.9±0.6	90.5±0.5	91.6±0.6	92.0±0.5	92.2±0.5	92.8±0.6	93.0±0.5
Traffic Sign	51.0±1.1	48.2±1.1	63.0±1.0	58.4±1.1	63.3±1.1	82.8±1.0	77.9±1.1	84.9±1.1
MSCOCO	52.0±1.1	51.5±1.1	52.8±1.1	50.0±1.0	57.3±1.0	57.6±1.0	56.1±0.9	58.1±1.0
MNIST	94.3±0.4	90.6±0.5	96.2±0.3	95.6±0.5	94.7±0.4	96.7±0.4	98.3±0.5	98.5±0.4
CIFAR-10	66.5±0.9	67.0±0.8	75.4±0.8	78.6±0.7	74.2±0.8	82.9±0.7	79.4±0.7	82.9±0.7
CIFAR-100	56.9±1.1	57.3±1.0	62.0±1.0	67.1±1.0	63.5±1.0	70.4±0.9	69.0±0.9	70.8±0.9
Avg seen	75.9	77.4	74.5	76.2	80.4	80.4	80.2	80.8
Avg unseen	64.1	62.9	69.9	69.9	70.6	78.1	76.1	79.0
Avg all	71.4	71.8	72.7	73.8	76.6	79.5	78.7	80.1

Table 1. Average accuracy (%) over 600 tasks on Meta-Dataset. * indicates methods that use additional parameters beyond the feature extractor. SSA-BNS uses no additional parameters and outperforms URL with 262K extra parameters. TSA*+SSA achieves the best overall average.

Ablation: Effect of BN Adaptation and SSA

SSA	BNS (η)	Aircraft	Fungi	CIFAR-100	MSCOCO
✗	✗	87.0	65.6	59.9	53.1
✗	η=10	89.1	66.0	66.9	54.5
✗	η=25	89.5	66.4	68.4	55.7
✗	η=50	89.5	66.2	67.7	55.4
✓	η=25	89.6	66.7	69.0	56.1

Table 2. BN adaptation with η=25 consistently outperforms η=10 (default in URL/TSA). Adding SSA further improves performance across both seen and unseen domains.

Comparison with Other Augmentation Strategies

Augmentation	Aircraft	Fungi	CIFAR-100	MSCOCO
RandAugment	88.8	65.2	66.9	55.2
MixUp	88.4	66.3	67.9	54.6
Feature MixUp	88.9	66.3	68.3	55.3
Random MixStyle	89.6	66.2	68.2	55.3
SSA (Proposed)	89.6	66.7	69.0	56.1

Table 3. SSA outperforms all compared augmentation techniques. Restricting style mixing to semantically similar classes (SSA) consistently beats class-agnostic Random MixStyle.

Parameter Efficiency

Method	Additional parameters	Trainable parameters
FLUTE	32K	32K
URL	262K	262K
TSA	1482K	1482K
TSA+SSA	1482K	1482K
SSA-BNS	None	9.6K

Table 4. SSA-BNS introduces no additional parameters, training only the existing BN affine parameters (9.6K in ResNet-18). It outperforms URL which adds 262K parameters, using 154x fewer trainable parameters than TSA.

BibTeX

@InProceedings{sreenivas2023ssabns,
  author    = {Sreenivas, Manogna and Biswas, Soma},
  title     = {Similar Class Style Augmentation for Efficient Cross-Domain Few-Shot Learning},
  booktitle = {CVPR Workshops},
  year      = {2023}
}