Kausik Lakkaraju
"Through my dissertation, I introduce a causally grounded, extensible, approach for rating AI models for robustness by detecting their sensitivity to input perturbations and protected attributes, quantifying this behavior, and translating it into user-understandable ordinal ratings (trust certificates). "
This dissertation examines how to assess and rate instability and bias in black-box AI models, with particular attention to large language models (LLMs) and composite AI models used in finance, healthcare, and other decision-sensitive contexts. Prior studies show that small changes in input or protected attributes (sensitive user information) can cause large shifts in model outputs, an issue that becomes more pronounced when multiple models are chained together to form a composite AI model.
The work introduces a causality-based rating method that tests black-box models to quantify sensitivity, statistical bias, and confounding effects under controlled input variations. Beyond measurement, the rating method converts raw metric scores into comparable ratings that aid users in model selection, provide holistic explanations when used in conjunction with traditional explanation methods to cater to the needs of multiple stakeholders, and support the assessment and construction of robust and efficient composite AI models when integrated with probabilistic planning methods. The rating method helps users make trade-offs among fairness, utility, and computational cost when choosing a model for a task based on the data in hand.
To support practical adoption, the dissertation presents ARC (AI Rating through Causality), a tool that applies the method across multiple tasks, supports Pareto analysis, and allows users to evaluate their own models within a fixed causal setup. User studies show that ratings reduce the effort required to understand model behavior and help users build efficient composite chatbots. This work also underpins a forthcoming Springer Nature book, Assessing, Explaining, and Rating AI Systems for Trust, With Applications in Finance.
From Predictions to Ratings
How can one detect instability - lack of robustness - of AI models in a general manner?
Can we have a principled, extensible, method to measure the robustness of AI models?
How to create extensible rating methods?
[Rating Method] Can we build a method to issue relative ratings to a model with respect to baselines, in a general manner?
[Method Evaluation / Usability] Is the method effective in helping users understand model behavior for selecting a model?
[General Tool for Rating] Can a general tool be built to rate and compare AI models across different tasks and domains?
What is the need for AI ratings if there are already explanations for the AI model? Conversely, what is the need for explanation, if there are ratings?
How can one calculate the ratings of composite AI based on the ratings of individual constituent models?
Presentation Deck (Scroll to navigate or use toolbar)
Loading Slides...
Selected outputs related to this dissertation
Kausik Lakkaraju & Biplav Srivastava — Springer Nature
A forthcoming book that discusses assessment, explanation, and rating of black-box AI models for trust.
View my full list of publications on my dissertation topic.
Major Professor
Department of Computer Science
Committee Chair
Department of Computer Science
Committee Member
Department of Integrated Information Technology
Committee Member
Department of Computer Science
Committee Member
AI Research Lead, J.P. Morgan AI Research
Photos from Dissertation Defense (February 2026 )
© 2026 Kausik Lakkaraju. All rights reserved.