skip to content
Site header image Anushka Sivakumar
Hope to put out more amazing research! Open to collaborative projects!

EMNLP 2025

SteerVLM: Robust Model Control Through Lightweight Activation Steering for Vision Language Models 🔗
Anushka Sivakumar, Andrew Zhang, Zaber Hakim, Chris Thomas

This work introduces SteerVLM, a lightweight steering module designed to guide Vision-Language Models (VLMs) towards outputs that better adhere to desired instructions. Our approach learns from the latent embeddings of paired prompts encoding target and converse behaviors to dynamically adjust activations connecting the language modality with image context. This allows for fine-grained, inference-time control over complex output semantics without modifying model weights while preserving performance on off-target tasks. Our steering module requires learning parameters equal to 0.14% of the original VLM's size. Our steering module gains model control through dimension-wise activation modulation and adaptive steering across layers without requiring pre-extracted static vectors or manual tuning of intervention points. Furthermore, we introduce VNIA (Visual Narrative Intent Alignment), a multimodal dataset specifically created to facilitate the development and evaluation of VLM steering techniques. Our method outperforms existing intervention techniques on steering and hallucination mitigation benchmarks for VLMs and proposes a robust solution for multimodal model control through activation engineering.

Flexible-length Text Infilling for Discrete Diffusion Models 🔗
Andrew Zhang, Anushka Sivakumar, Chia-wei Tang, Chris Thomas

Discrete diffusion models are a new class of text generators that offer advantages such as bidirectional context use, parallelizable generation, and flexible prompting compared to autoregressive models. However, a critical limitation of discrete diffusion models is their inability to perform flexible-length or flexible-position text infilling without access to ground-truth positional data. We introduce DDOT (Discrete Diffusion with Optimal Transport Position Coupling), the first discrete diffusion model to overcome this challenge. DDOT jointly denoises token values and token positions, employing a novel sample-level Optimal Transport (OT) coupling. This coupling preserves relative token ordering while dynamically adjusting the positions and length of infilled segments, a capability previously missing in text diffusion. Our method is orthogonal to existing discrete text diffusion methods and is compatible with various pretrained text denoisers. Extensive experiments on text infilling benchmarks such as One-Billion-Word and Yelp demonstrate that DDOT outperforms naive diffusion baselines. Furthermore, DDOT achieves performance on par with state-of-the-art non-autoregressive models and enables significant improvements in training efficiency and flexibility.


ACL 2025

Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval 🔗
Hani Alomari, Anushka Sivakumar, Andrew Zhang, Chris Thomas

Cross-modal image-text retrieval is challenging because of the diverse possible associations between content from different modalities. Traditional methods learn a single-vector embedding to represent semantics of each sample, but struggle to capture nuanced and diverse relationships that can exist across modalities. Set-based approaches, which represent each sample with multiple embeddings, offer a promising alternative, as they can capture richer and more diverse relationships. In this paper, we show that, despite their promise, these set-based representations continue to face issues including sparse supervision and set collapse, which limits their effectiveness. To address these challenges, we propose Maximal Pair Assignment Similarity to optimize one-to-one matching between embedding sets which preserve semantic diversity within the set. We also introduce two loss functions to further enhance the representations: Global Discriminative Loss to enhance distinction among embeddings, and Intra-Set Divergence Loss to prevent collapse within each set. Our method achieves state-of-the-art performance on MS-COCO and Flickr30k without relying on external data.


NATURE Journal (Evolving Systems) 2025

Microbe-drug association prediction using metapath aggregated node embeddings and self-supervised learning 🔗
K Syama, Anushka Sivakumar, Angel Arul Jothi

Human microorganisms are closely related to human health by regulating the immune system, producing hormones, and contributing to metabolism. Discovering potential associations between microbes and drugs aid drug research and development. This work proposes a novel computational framework called self-supervised metapath aggregated microbe-drug association prediction network (SMMDA-Net) to predict potential microbe-drug associations. Various biological datasets are leveraged to construct heterogeneous networks using microbe-drug associations and microbe-disease, disease-drug transitive associations. Feature matrices containing information on microbe functional similarity, microbe genome similarity, drug structure similarity and drug side-effect-based similarity are extracted. Then, a novel model called self-supervised metapath-based node embedding generator (SMNEG) is used to generate low-dimensional meaningful embeddings for nodes in the heterogeneous network. A CNN classifier is adopted and trained by the low-dimensional embeddings to predict potential microbe-drug associations. Extensive experiments on three datasets show that the proposed model outperforms six state-of-the-art methods. It achieved an AUC of 98.23% and an AUPR of 95.14% on the MDAD dataset, an AUC of 99.22% and an AUPR of 94.86% on the aBiofilm dataset, and an AUPR of 87.98% on the DrugVirus dataset. Furthermore, ablation and case studies were performed to evaluate the effectiveness of SMMDA-Net in predicting potential microbe-drug associations.

NEURIPS 2024

JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images 🔗
Zhecan Wang, J Liu, Chia-wei Tang, Hani Alomari, Anushka Sivakumar, Chris Thomas, et. al.,

Existing vision-language understanding benchmarks largely consist of images of objects in their usual contexts. As a consequence, recent multimodal large language models can perform well with only a shallow visual understanding by relying on background language biases. Thus, strong performance on these benchmarks does not necessarily correlate with strong visual understanding. In this paper, we release JourneyBench, a comprehensive human-annotated benchmark of generated images designed to assess the model's fine-grained multimodal reasoning abilities across five tasks: complementary multimodal chain of thought, multi-image VQA, imaginary image captioning, VQA with hallucination triggers, and fine-grained retrieval with sample-specific distractors. Unlike existing benchmarks, JourneyBench explicitly requires fine-grained multimodal reasoning in unusual imaginary scenarios where language bias and holistic image gist are insufficient. We benchmark state-of-the-art models on JourneyBench and analyze performance along a number of fine-grained dimensions. Results across all five tasks show that JourneyBench is exceptionally challenging for even the best models, indicating that models' visual reasoning abilities are not as strong as they first appear. We discuss the implications of our findings and propose avenues for further research.


NATURE Journal (IJIT) 2023

Shakey: an improved cipher for protection of IoT devices 🔗
Anushka Sivakumar, Asmi Sriwastawa, Raja Muthulagu

This article primarily focuses on improving an existing symmetric key lightweight cipher known as Shadow. The growing importance of internet of things (IoT) networks and the consequential need for lightweight cryptography is highlighted. Various ciphers from the current literature are reviewed, and the strengths and weaknesses of the most efficient ciphers and those of Shadow are analysed. Our findings are used to modify the subkey generation algorithm of Shadow and thus create Shakey. Experiments are conducted to determine the avalanche effect and throughput of Shakey. Cryptanalysis is also performed to gauge its security. Finally, the structural advantages of Shakey and its improvement in performance with respect to Shadow is discussed. The article concludes with an outline for the future scope of this work and our closing remarks.

NATURE Journal (Evolving Systems) 2023

Microbial Biomarkers Identification for Human Gut Disease Prediction using Microbial Interaction Network Embedded Deep Learning 🔗
Anushka Sivakumar, K Syama, Angel Arul Jothi

Human gut microorganisms are crucial in regulating the immune system. Disruption of the healthy relationship between the gut microbiota and gut epithelial cells leads to the development of diseases. Inflammatory Bowel Disease (IBD) and Colorectal Cancer (CRC) are gut-related disorders with complex pathophysiological mechanisms. With the massive availability of microbiome data, computer-aided microbial biomarker discovery for IBD and CRC is becoming common. However, microbial interactions were not considered by many of the existing biomarker identification methods. Hence, in this study, we aim to construct a microbial interaction network (MIN). The MIN accounts for the associations formed and interactions among microbes and hosts. This work explores graph embedding feature selection through the construction of a sparse MIN using MAGMA embedded into a deep feedforward neural network (DFNN). This aims to reduce dimensionality and select prominent features that form the disease biomarkers. The selected features are passed through a deep forest classifier for disease prediction. The proposed methodology is experimentally cross-validated (5-fold) with different classifiers, existing works, and different models of MIN embedded in DFNN for the IBD and CRC datasets. Also, the selected biomarkers are verified against biological studies for the IBD and CRC datasets. The highest achieved AUC, accuracy, and f1-score are 0.863, 0.839, and 0.897, respectively, for the IBD dataset and 0.837, 0.768, and 0.757, respectively, for the CRC dataset. As observed, the proposed method is successful in selecting a subset of informative and prominent biomarkers for IBD and CRC.