Publications

2025

A Dataset and Visualization of Generalizable Election-Related Questions Compiled from Leading Global Democracies for Building AI-Enabled Tools
Authors:Kausik Lakkaraju, Bharath Muppasani, Sara Elizabeth Jones, Biplav Srivastava
Summary:
We collected 227 election-related questions from online sources across eight democratic countries, refining them into 85 parameterizable queries. Our interactive UI organizes these by country and lets users submit new questions. The analysis shows clear gaps between public concerns and official FAQs, providing a foundation for AI tools that bridge information gaps in election systems.
Publication Type: Book Chapter
Venue: PROMISE – PROMoting AI’s Safe usage for Elections
Paper |Bibtex

Towards Better Elections: A Discussion About the United Kingdom and Africa
Authors:Aurelia Ayisi, P Deepak, Marquita Smith, Biplav Srivastava, Anita Nikolich, Andrea Hickerson, Tarmo Koppel, Kausik Lakkaraju
Summary:
This chapter presents observations from Dr. Deepak P. (AI ethics and NLP, School of EEECS), Prof. Aurelia Ayisi (digital communication and media literacy, University of Ghana), and Prof. Marquita Smith (communication and leadership, University of Mississippi). The interview was conducted in August 2024 by the PROMISE book editors Biplav Srivastava, Anita Nikolich, Andrea Hickerson, and Tarmo Koppel, with assistance from Kausik Lakkaraju. It focused on election challenges in Africa, particularly Ghana, and in the UK.
Publication Type: Book Chapter
Venue: PROMISE – PROMoting AI’s Safe usage for Elections
Paper |Bibtex

GAICo: Demonstrating a Unified Framework for Multi-Modal GenAI Evaluation
Authors:Pallav Koppisetti, Nitin Gupta, Kausik Lakkaraju, Biplav Srivastava
Summary:
GAICo (Generative AI Comparator) is an open-source Python library for reproducible, multi-modal GenAI evaluation (text, images, audio, structured data). It makes debugging composite pipelines simple. For example, it helps you tell whether a failure comes from an LLM planner or an image generator, by comparing outputs to task-specific references. 14,000+ downloads. Here's a demo video.
Publication Type: Demonstration
Venue: The 40th Annual AAAI Conference on Artificial Intelligence (AAAI-26)
Paper

GAICo: A Deployed and Extensible Framework for Evaluating Diverse and Multimodal Generative AI Outputs
Authors:Nitin Gupta, Pallav Koppisetti, Kausik Lakkaraju, Biplav Srivastava
Summary:
GenAI evaluation is often ad-hoc and inconsistent, especially for structured or multi-modal outputs. GAICo is an open-source Python library that unifies evaluation across text, structured data, images, and audio, enabling reproducible comparisons, debugging, and faster development of reliable AI systems.
Publication Type: Conference
Venue: Thirty-Eighth Annual Conference on Innovative Applications of Artificial Intelligence (IAAI-26) at AAAI 2026
Paper |Bibtex

ARC: A Tool to Rate AI Models for Robustness Through a Causal Lens
Authors: Kausik Lakkaraju, Siva Likitha Valluru, Biplav Srivastava, Marco Valtorta
Summary:
ARC is a tool for rating AI models for robustness using a causal reasoning. It supports four tasks: binary classification, sentiment analysis, group recommendation, and time-series forecasting. Users can test model stability under perturbations and detect biases across protected attributes like gender, race, and age. The ratings are model-independent and causally interpretable. Watch the demo here.
Publication Type: Workshop
Venue: IJCAI 2025 Workshop on User-Aligned Assessment of Adaptive AI Systems (AIA 2025)
Paper |Bibtex

Rating AI Models for Robustness Through a Causal Lens
Authors: Kausik Lakkaraju
Summary:
ARC (AI Rating through Causality) is a causally grounded framework for rating AI models for robustness. ARC identifies statistical and confounding biases and measures how input changes affect model performance. It quantifies robustness and provides ratings to help users understand and compare different AI models. Future work includes extending ARC to composite models and integrating it with explainable AI methods to provide a holistic view of the model.
Publication Type: Doctoral Consortium
Venue: Thirty-Fourth International Joint Conference on Artificial Intelligence (IJCAI)
Paper |Bibtex

On Identifying Why and When Foundation Models Perform Well on Time-Series Forecasting Using Automated Explanations and Rating
Authors: Michael Widener, Kausik Lakkaraju, John Aydin, Biplav Srivastava
Summary:
Time-series forecasting models are widely used in critical domains, but their performance and failures remain hard to interpret. We combine traditional XAI with Rating Driven Explanations (RDE) to evaluate model accuracy and interpretability across finance, energy, transport, and automotive datasets. Our results show feature-based models like Gradient Boosting often outperform foundation models such as Chronos in volatile domains, while foundation models excel mainly in stable, trend-driven contexts.
Publication Type: Symposium
Venue: AAAI 2025 Fall Symposium on AI Trustworthiness and Risk Assessment for Challenged Contexts (ATRACC)
Paper |Bibtex

Holistic Explainable AI (H-XAI): Extending Transparency Beyond Developers in AI-Driven Decision Making
Authors: Kausik Lakkaraju, Siva Likitha Valluru, Biplav Srivastava
Summary:
Current XAI tools mainly explain model outputs for developers, without addressing the different needs of other stakeholders. Our Holistic-XAI (H-XAI) framework combines causality-based rating approach with traditional explanation methods, letting users test hypotheses, ask questions, and compare model behavior against random or biased baselines. It works at both the individual and global level, helping stakeholders understand decisions, detect bias, and evaluate robustness.
Publication Type: [Under Review]
Paper |Bibtex

SafeChat: A Framework for Building Trustworthy Collaborative Assistants and a Case Study of its Usefulness
Authors: Biplav Srivastava, Kausik Lakkaraju, Nitin Gupta, Vansh Nagpal, Bharath C Muppasani, Sara E Jones
Summary:
Modern chatbots powered by large language models (LLMs) are widely accessible but face issues like lack of transparency, safety concerns, and complex development. These limitations make them unsuitable for sensitive areas like elections or healthcare. To address this, we introduce SafeChat, a flexible and trustworthy chatbot framework built on Rasa. It supports source-traceable answers, deflects unsafe queries, summarizes responses, and enables rapid development through a CSV-driven workflow. We used it to build ElectionBot-SC and other safe assistants. Project link: https://github.com/ai4society/trustworthy-chatbot
Publication Type: [Under Review]
Paper |Bibtex

On Creating a Causally Grounded Usable Rating Method for Assessing the Robustness of Foundation Models Supporting Time Series
Authors: Kausik Lakkaraju, Rachneet Kaur, Parisa Zehtabi, Sunandita Patra, Siva Likitha Valluru, Zhen Zeng, Biplav Srivastava, Marco Valtorta
Summary:
Foundation Models have improved time-series forecasting but remain sensitive to input noise. We propose a causal rating framework to evaluate their robustness using stock prediction as a case study. Our findings show that multi-modal and task-specific foundation models are both more accurate and more robust. Our user study confirmed that our ratings help users better compare model reliability.
Publication Type: [Under Review]
Paper |Bibtex

2024

Promoting Nutrition Adherence with Convenience Using Group Recommendations and Multimodal Food Reasoning - Initial Results
Authors: Nitin Gupta, Biplav Srivastava, Vansh Nagpal, Likitha S. Valluru Kausik Lakkaraju, Zach Abdulrahman, Andrew Davison
Summary:
Choosing what to eat often means balancing health and convenience. In this work, we tackle the meal recommendation problem by designing a system that considers both nutrition and practicality, while also understanding ingredients and cooking steps. We introduce a new way to rate meals, convert recipes into a rich format, and use learning methods that adapt to user context, all showing early promise.
Publication Type: Workshop
Venue: IEEE ICDM International Workshop on AI for Nudging and Personalization (WAIN-2025)
Paper |Bibtex

Rating Multi-Modal Time-Series Forecasting Models (MM-TSFM) for Robustness Through a Causal Lens
Authors: Kausik Lakkaraju, Rachneet Kaur, Zhen Zeng, Parisa Zehtabi, Sunandita Patra, Biplav Srivastava, Marco Valtorta
Summary:
AI forecasting models can behave unpredictably when inputs change slightly, which is risky in finance. We study models that use both numbers and images (multi-modal) and propose a causal rating method to test how robust they are. Across a large experiment, we find that multi-modal models (ViT-num-spec models) are not just more accurate, but also more reliable, making them a better fit for decision-making under uncertainty.
Publication Type: [Under Review]
Paper |Bibtex

Rating Sentiment Analysis Systems for Bias Through a Causal Lens
Authors: Kausik Lakkaraju, Biplav Srivastava, Marco Valtorta
Summary:
Sentiment Analysis Systems (SASs) analyze text emotions but can inaccurately change ratings over minor input variations, showing potential bias towards attributes like gender or race. We propose a method to evaluate and rate SASs on their sensitivity to these attributes, aiming to help choose more fair and reliable systems and reduce bias-induced hate speech online.
Publication Type: Journal
Venue: IEEE Transactions on Technology and Society
Paper |Bibtex

Trust and ethical considerations in a multi-modal, explainable AI-driven chatbot tutoring system: The case of collaboratively solving Rubik's Cube
Authors: Kausik Lakkaraju, Vedant Khandelwal, Biplav Srivastava, Forest Agostinelli, Hengtao Tang, Prathamjeet Singh, Dezhi Wu, Matt Irvin, Ashish Kundu
Summary:
AI can revolutionize education by analyzing vast data on student learning but faces unresolved ethical concerns, such as data privacy and fairness, especially in high school settings. This paper introduces the ALLURE chatbot, a platform designed to address these ethical issues, allowing students to collaboratively solve the Rubik's cube with AI. Key features include prioritizing informed consent for data use and ensuring safe interaction and language use to protect students. It also focuses on preventing information leakage between user groups as the system learns and improves.
Publication Type: Workshop
Venue: ICML Workshop on What’s left to TEACH (Trustworthy, Enhanced, Adaptable, Capable and Human-centric) chatbots?
Paper |Bibtex

Advances in Automatically Rating the Trustworthiness of Text Processing Services
Authors: Biplav Srivastava, Kausik Lakkaraju, Mariana Bernagozzi, Marco Valtorta
Summary:
In this symposium paper, we talked about the previous approaches that were used to rate the trustworthiness of AI systems and we also outlined the challenges and vision for a principled, causality-based, and multi-modal rating methodologies.
Publication Type: Journal, Symposium
Venue: AI and Ethics Journal; AAAI Spring Symposium
Paper |Bibtex

2023

LLMs for Financial Advisement: A Fairness and Efficacy Study in Personal Decision Making
Authors: Kausik Lakkaraju, Sara E Jones, Sai Krishna Revanth Vuruma, Vishal Pallagani, Bharath C Muppasani, Biplav Srivastava
Summary:
We compared ChatGPT and Bard, LLM-based chatbots, with SafeFinance, a rule-based chatbot, in the personal finance domain. Our findings reveal that ChatGPT and Bard often provide inconsistent and unreliable financial advice, while SafeFinance, though simpler, offers dependable and accurate information. This study highlights the current limitations of LLM-based chatbots in handling financial advisement tasks effectively.
Publication Type: Conference
Venue: Proceedings of the Fourth ACM International Conference on AI in Finance (ICAIF)
Paper |Bibtex

The Effect of Human v/s Synthetic Test Data and Round-tripping on Assessment of Sentiment Analysis Systems for Bias
Authors: Kausik Lakkaraju, Aniket Gupta, Biplav Srivastava, Marco Valtorta, Dezhi Wu
Summary:
Sentiment Analysis Systems (SASs), AI tools that analyze text sentiment, can show unstable and biased behavior, raising trust issues. A new method rates these systems for bias using synthetic data. We enhanced this by using real chatbot conversations and a technique that translates data through another language and back. This revealed more bias in real compared to synthetic data, but translating through Spanish or Danish reduced bias significantly in real data.
Publication Type: Conference
Venue: The Fifth IEEE International Conference on Trust, Privacy and Security in Intelligent Systems, and Applications
Paper |Bibtex

Can LLMs be Good Financial Advisors?: An Initial Study in Personal Decision Making for Optimized Outcomes
Authors: Kausik Lakkaraju, Sai Krishna Revanth Vuruma, Vishal Pallagani, Bharath Muppasani, Biplav Srivastava
Summary:
We tested advanced chatbots like ChatGPT and Bard on personal finance advice, using 13 questions in different languages and dialects. Although the chatbots' answers sounded good, we found they often lacked accuracy and reliability in providing financial information.
Publication Type: Workshop
Venue: ICAPS Workshop on Planning for Financial Services (FinPlan)
Paper |Bibtex

On Safe and Usable Chatbots for Promoting Voter Participation.
Authors: Bharath Muppasani, Vishal Pallagani, Kausik Lakkaraju, Shuge Lei, Biplav Srivastava, Brett Robertson, Andrea Hickerson, Vignesh Narayanan
Summary:
We created chatbots to help increase voting among seniors and first-time voters by giving them easy access to trusted election information tailored to their needs. Our system, built on the Rasa platform, ensures the information is reliable and allows for quick chatbot setup for any region. We've tested these chatbots in two US states where voting has been difficult, focusing on groups of senior citizens. This project aims to support voters and democracy by making accurate election information more accessible.
Publication Type: Published Article, Workshop
Venue: AI Magazine; AAAI Workshop on AI for Credible Elections (AI4CE)
Paper |Bibtex

2022

Why is my System Biased?: Rating of AI Systems through a Causal Lens
Authors: Kausik Lakkaraju
Summary:
This is a student paper which formulates my PhD dissertation problem and gives an overview of the solution. Idea is to evaluate / rate AI systems for bias using causal analysis.
Publication Type: Doctoral Consortium
Venue: Fifth AAAI/ACM Conference on AI, Ethics, and Society (AIES 2022)
Paper |Bibtex

ALLURE: A Multi-Modal Guided Environment for Helping Children Learn to Solve a Rubik’s Cube with Automatic Solving and Interactive Explanations
Authors: Kausik Lakkaraju, Thahimum Hassan, Vedant Khandelwal, Prathamjeet Singh, Cassidy Bradley, Ronak Shah, Forest Agostinelli, Biplav Srivastava, Dezhi Wu
Summary:
ALLURE is a Deep Reinforcement Learning based, multi-modal, explainable chatbot which teaches children how to solve a Rubik’s Cube and allows the children to interact with the multi-modal chatbot while trying to solve the Cube.
Publication Type: Demonstration
Venue: 36th AAAI Conference on Artificial Intelligence (AAAI 2022)
Paper |Bibtex |Video

Data-Based Insights for the Masses: Scaling Natural Language Querying to Middleware Data
Authors: Lakkaraju Kausik, Palaiya Vinamra, Paladi Sai Teja, Appajigowda Chinmayi, Srivastava Biplav, Johri Lokesh
Summary:
This is a demonstration paper which talks about a RASA-based chatbot that allows users to control their network usage and bandwith using smart routers in a household or office setting. We also demonstrated another chatbot in the same paper which helps users in monitoring the power usage in a house, office or university setting using smart sensors. These were deployed on Alexa device and Web for demonstration.
Publication Type: Demonstration
Venue: 27th International Conference on Database Systems for Advanced Applications (DASFAA 2022)
Paper |Bibtex |Video

A Rich Recipe Representation as Plan to Support Expressive Multi-Modal Queries on Recipe Content and Preparation Process.
Authors: Vishal Pallagani, Priyadharsini Ramamurthy, Vedant Khandelwal, Revathy Venkataramanan, Kausik Lakkaraju, Sathyanarayanan N Aakur, Biplav Srivastava
Summary:
In this paper, we discussed the construction of machine-understandable rich recipe representation (R3), in the form of plans, from the recipes available in natural language. R3 is infused with additional knowledge like allergens and possible failures at each cooking step.
Publication Type: Workshop
Venue: ICAPS Workshop on Knowledge Engineering for Planning and Scheduling (KEPS 2022)
Paper |Bibtex |Video

Explainable Pathfinding for Inscrutable Planners with Inductive Logic Programming
Authors: Rojina Panta, Forest Agostinelli, Vedant Khandelwal, Biplav Srivastava, Bharath Chandra Muppasani, Kausik Lakkaraju, Dezhi Wu
Summary:
By combining inductive logic programming (ILP) with a given inscrutable planner, we constructed an explainable graph representing solutions to all states in the state space. This graph can then be summarized using a variety of methods such as hierarchical representations or simple if/else rules. We tested our approach on Towers of Hanoi.
Publication Type: Workshop
Venue: ICAPS Workshop on Explainable AI Planning (XAIP 2022)
Paper |Bibtex |Video

ROSE: Tool and Data ResOurces to Explore the Instability of SEntiment Analysis Systems
Authors: Gaurav Mundada, Kausik Lakkaraju, Biplav Srivastava
Summary:
ROSE is a tool that helps examine gender bias in Sentiment Analysis Systems (SASs), which score text for sentiment and emotion. It offers a dataset of text inputs with their sentiment scores and a visualization tool for analyzing SAS behavior towards gender. Developed with d3.js, ROSE is freely accessible for public use.
Publication Type: Companion Resource for 'Trust and ethical considerations in a multi-modal, explainable AI-driven chatbot tutoring system: The case of collaboratively solving Rubik's Cube' (accepted at ICML TEACH 2023 Workshop)
Paper |BibTex |Tool