Projects, Mahsa Khoshnoodi

Published Research

FaceBench diagnostic examples for identity judgment and attribute agreement

CVPR-VisCon 2026

Do VLMs Reason About Faces? Probing the Perception-Reasoning Gap in Identity Judgment

Mahsa Khoshnoodi, Sarah Adel Bargal

CVPR 2026, Third Workshop on Visual Concepts (VisCon)

We probe whether vision-language models can reason about facial identity or merely pattern-match on low-level features. Using FaceBench, we build a diagnostic framework that localizes the perception-to-reasoning gap, distinguishing failures of visual perception from failures of reasoning over perceived attributes.

Emotion-aware multi-task model diagram for SemEval conspiracy detection

SemEval 2026

GUNLP at SemEval-2026 Task 10: Emotion-Aware Multi-Task Learning for Conspiracy Detection

Mahsa Khoshnoodi, Rojin Ziaei, Nazli Goharian

SemEval 2026

We present a multi-task learning system that jointly models emotion signals and conspiracy detection, motivated by the strong correlation between affective language and conspiratorial framing. The system achieves competitive performance on Task 10 of SemEval-2026.

Survey figure showing attention patterns over entities, tokens, and knowledge graph nodes

Under Review 2026

Reimagining Neurosymbolic AI through the Lens of Cognitive Science: A Survey

Devichand Budagam, Mahsa Khoshnoodi, Jibesh Patra, Ravid Shwartz-Ziv, Amit Sheth, Vinija Jain, Aman Chadha

Under review, ACM Computing Surveys

We survey neurosymbolic AI through the lens of cognitive science, examining how symbolic reasoning and neural learning can be combined to better mirror human cognition. The survey maps existing approaches to cognitive principles and identifies open problems in building interpretable, reasoning-capable AI systems.

Hierarchical prompting taxonomy diagram showing prompt complexity levels

KDD 2025

Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models

Devichand Budagam, Sankalp KJ, Mahsa Khoshnoodi, Ashutosh Kumar, Vinija Jain, Aman Chadha

KDD 2025

We introduce HPT, a cognitively motivated framework that categorizes prompting strategies by their cognitive demands and defines the Hierarchical Prompting Index (HPI) to quantify task complexity. The framework enables systematic analysis of reasoning capabilities across LLMs, yielding interpretable insights into problem-solving behavior.

Paper

Example synthetic image used to illustrate text-to-image coherence evaluation

NeurIPS Spotlight 2024

Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)

Mahsa Khoshnoodi, Fatima Jahara, Michael Saxon, Yujie Lu, Aditya Sharma, William Yang Wang

NeurIPS 2024, Spotlight (top 5% of submissions)

Automated metrics for text-to-image models suffer from systematic blindspots, they score semantically equivalent images inconsistently, often failing to detect object hallucination. We introduce T2IScoreScore (TS2), a curated meta-evaluation dataset that transitions images from high to low faithfulness, enabling rigorous benchmarking of T2I evaluation metrics themselves.

Paper Project Page

Diagram illustrating predict-verify style accelerated language generation

arXiv 2024

A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models

Mahsa Khoshnoodi, Vinija Jain, Mingye Gao, Malavika Srikanth, Aman Chadha

Preprint, 2024

A systematic survey of techniques for reducing inference latency in LLMs, covering speculative decoding, early exit mechanisms, and non-autoregressive generation. We analyze each method's underlying principles, trade-offs, and failure modes, with the goal of understanding how recent work addresses the fundamental bottleneck of autoregressive decoding.

arXiv

Master's Thesis

Basal ganglia pathway diagram showing cortex, striatum, thalamus, and pallidum

M.Sc. Thesis

Computational Models of the Basal Ganglia: Circuits, Dopamine, and Action Selection

Developed a novel computational model of the Basal Ganglia for action selection and motor sequence learning, grounded in physiological and anatomical findings. The direct cortico-basal ganglia pathway was modeled as a neuro-fuzzy network trained via reinforcement learning with reward prediction error, mirroring dopaminergic modulation. For the indirect pathway (Subthalamic Nucleus and External Globus Pallidus loop), a recurrent neural network was proposed with a novel neuron-level learning rule derived from gradient descent, enabling the network to function as local working memory for automatic sequence reproduction.

Industry and Applied Work

LLM-Powered Financial Forecasting

Industry

Designed and deployed an end-to-end forecasting pipeline that combined a fine-tuned domain-specific LLM with real-time sentiment signals from financial news and social media. Transformer and LSTM architectures were trained jointly on equity time-series data, with a distributed Spark pipeline handling feature extraction at scale. The system consistently outperformed rule-based baselines in volatile market conditions, demonstrating that language model representations carry genuine predictive signal beyond price history alone.

Real-Time Fraud Detection at Production Scale

Industry

Built a real-time anomaly detection system for a large-scale banking ecosystem, processing millions of transactions daily. The system used instance-based learning with peer-group risk scoring and behavioral feature engineering to flag unusual activity without relying on labeled fraud data. Deployed on Apache Spark Streaming and Kafka, the system reduced false positive rates significantly while maintaining sub-second latency across distributed infrastructure.

Graph-Based Threat Detection with Complex Event Processing

Industry

Architected a graph-based Complex Event Processing engine over Apache Flink that aggregated over 40 heterogeneous network log sources in near real time. Bayesian Networks and Hidden Markov Models formed the core inference layer, enabling detection of multi-stage attack patterns that single-event rules would miss entirely. Deployed across banking and insurance clients, the engine provided a unified threat visibility layer that previously required manual correlation by security analysts.

Customer Segmentation with Hierarchical Gaussian Mixture Models

Industry

Developed a hierarchical clustering framework based on Gaussian Mixture Models, implemented in Scala on Apache Spark, to segment commercial banking customers at scale. By modeling the full distribution of behavioral patterns rather than assigning hard cluster boundaries, the system uncovered fine-grained customer segments that flat clustering methods collapsed into noise. The resulting segmentation directly informed personalized product strategies, increasing targeting precision for the bank's retail offerings.