Reza Khan Mohammadi

Computer Science Ph.D. Candidate | AI Researcher

I work on making AI systems honest about their own certainty, so when they're confident, you can be too.

My research develops confidence calibration methods for language models (LLMs, LRMs, and VLMs) with a focus on high-stakes domains like oncology and finance where a model's overconfidence can cause serious harm.

Contact

Google Scholar

Email

khanreza@msu.edu

GitHub

Donwload Resume

Education

Aug 2022 - Present

Michigan State University

Ph.D.

Computer Science

Aug 2022 - Apr 2024

Michigan State University

M.Sc.

Computer Science

Computer Engineering

Sep 2017 - Aug 2021

University of Guilan

B.Sc.

Computer Engineering

Leadership

Rasht School of AI Leader

University of Guilan

Dec 2020 - Aug 2022

Brain and Cognition Association AI Head

University of Guilan

Oct 2020 - Oct 2021

CE Scientific Association Head of Research Affairs

University of Guilan

Oct 2020 - Oct 2021

Awards

NSF-EMBS-Google Young Professional NextGen Scholar

IEEE BHI 2025, Atlanta, USA

Recognized for outstanding accomplishments in biomedical AI and health informatics.

Sigma Xi Full Member

Sigma Xi Scientific Research Honor Society

Full membership by invitation only, recognizing excellence in scientific research

View Certificate

Selected Speaker

3rd Henry Ford + MSU Cancer Research Symposium (2023)

Poster Award Winner at "Cancer Control & Prevention"

View Announcement

Selected Tutorial

4th ACM International Conference on AI in Finance (2023)

Large Language Models for NLP in Finance

View Slides

FISU Ambassador

International University Sports Federation (2019)

View Certificate

Marathoner

Long Beach Marathon (2024)

View Results

Teacher Assistanship

Deep Learning
University of Guilan
Head TA - Spring 2022
Instructor: Dr. S. A. Mirroshandel
Natural Language Processing
University of Guilan
Head TA - Spring 2021
Instructor: Dr. Y. Boreshban
Computational Intelligence
University of Guilan
Head TA - Spring 2020 / Fall 2020 / Spring 2021
Instructor: A. Turani
Language Theories and Automata
University of Guilan
TA - Spring 2020
Instructor: Dr. S. M. Shekarian
Data Structures and Algorithm Design
University of Guilan
Head TA - Fall 2019
Instructor: Dr. S. A. Mirroshandel
Advanced Programming
University of Guilan
Head TA - Spring 2019 / Fall 2019
Instructor: Dr. S. A. Mirroshandel
Basics of Computer and Programming
University of Guilan
TA - Fall 2018
Instructor: Dr. S. A. Mirroshandel
Specialized English for engineering students
University of Guilan
TA - Spring 2018
Instructor: Dr. M. Shakeri
Deep Learning
University of Guilan
Head TA - Spring 2022
Instructor: Dr. S. A. Mirroshandel
Natural Language Processing
University of Guilan
Head TA - Spring 2021
Instructor: Dr. Y. Boreshban
Computational Intelligence
University of Guilan
Head TA - Spring 2020 / Fall 2020 / Spring 2021
Instructor: A. Turani
Language Theories and Automata
University of Guilan
TA - Spring 2020
Instructor: Dr. S. M. Shekarian
Data Structures and Algorithm Design
University of Guilan
Head TA - Fall 2019
Instructor: Dr. S. A. Mirroshandel
Advanced Programming
University of Guilan
Head TA - Spring 2019 / Fall 2019
Instructor: Dr. S. A. Mirroshandel
Basics of Computer and Programming
University of Guilan
TA - Fall 2018
Instructor: Dr. S. A. Mirroshandel
Specialized English for engineering students
University of Guilan
TA - Spring 2018
Instructor: Dr. M. Shakeri
Deep Learning
University of Guilan
Head TA - Spring 2022
Instructor: Dr. S. A. Mirroshandel
Natural Language Processing
University of Guilan
Head TA - Spring 2021
Instructor: Dr. Y. Boreshban
Computational Intelligence
University of Guilan
Head TA - Spring 2020 / Fall 2020 / Spring 2021
Instructor: A. Turani
Language Theories and Automata
University of Guilan
TA - Spring 2020
Instructor: Dr. S. M. Shekarian
Data Structures and Algorithm Design
University of Guilan
Head TA - Fall 2019
Instructor: Dr. S. A. Mirroshandel
Advanced Programming
University of Guilan
Head TA - Spring 2019 / Fall 2019
Instructor: Dr. S. A. Mirroshandel
Basics of Computer and Programming
University of Guilan
TA - Fall 2018
Instructor: Dr. S. A. Mirroshandel
Specialized English for engineering students
University of Guilan
TA - Spring 2018
Instructor: Dr. M. Shakeri
Deep Learning
University of Guilan
Head TA - Spring 2022
Instructor: Dr. S. A. Mirroshandel
Natural Language Processing
University of Guilan
Head TA - Spring 2021
Instructor: Dr. Y. Boreshban
Computational Intelligence
University of Guilan
Head TA - Spring 2020 / Fall 2020 / Spring 2021
Instructor: A. Turani
Language Theories and Automata
University of Guilan
TA - Spring 2020
Instructor: Dr. S. M. Shekarian
Data Structures and Algorithm Design
University of Guilan
Head TA - Fall 2019
Instructor: Dr. S. A. Mirroshandel
Advanced Programming
University of Guilan
Head TA - Spring 2019 / Fall 2019
Instructor: Dr. S. A. Mirroshandel
Basics of Computer and Programming
University of Guilan
TA - Fall 2018
Instructor: Dr. S. A. Mirroshandel
Specialized English for engineering students
University of Guilan
TA - Spring 2018
Instructor: Dr. M. Shakeri

Certifications

Reinforcement Learning Specialization
Coursera - 2019
Instructor: Martha and Adam White
View Certificate
Oral Presentation
International Conference on Web Research (ICWR) - 2021
View Certificate
Artificial Intelligence School
Institute for Research in Fundamental Sciences - 2020
View Certificate
Neuroscience and Cognitive Sciences Summer School
Sharif University of Technology - 2019
View Certificate
Machine Learning Course
Coursera - 2019
Instructor: Andrew Ng
View Certificate
Deep Learning Specialization
Coursera - 2019
Instructor: Andrew Ng
View Certificate
Introduction to Data Science in Python Course
Coursera - 2019
Instructor: Christopher Brooks
View Certificate
Basketball Coaching License
Iran Basketball Federation - 2019
View Certificate
Python Advanced Course
University of Tehran - 2018
View Certificate
Web Development Specialization
Tehran Institute of Technology - 2016
View Certificate
Reinforcement Learning Specialization
Coursera - 2019
Instructor: Martha and Adam White
View Certificate
Oral Presentation
International Conference on Web Research (ICWR) - 2021
View Certificate
Artificial Intelligence School
Institute for Research in Fundamental Sciences - 2020
View Certificate
Neuroscience and Cognitive Sciences Summer School
Sharif University of Technology - 2019
View Certificate
Machine Learning Course
Coursera - 2019
Instructor: Andrew Ng
View Certificate
Deep Learning Specialization
Coursera - 2019
Instructor: Andrew Ng
View Certificate
Introduction to Data Science in Python Course
Coursera - 2019
Instructor: Christopher Brooks
View Certificate
Basketball Coaching License
Iran Basketball Federation - 2019
View Certificate
Python Advanced Course
University of Tehran - 2018
View Certificate
Web Development Specialization
Tehran Institute of Technology - 2016
View Certificate
Reinforcement Learning Specialization
Coursera - 2019
Instructor: Martha and Adam White
View Certificate
Oral Presentation
International Conference on Web Research (ICWR) - 2021
View Certificate
Artificial Intelligence School
Institute for Research in Fundamental Sciences - 2020
View Certificate
Neuroscience and Cognitive Sciences Summer School
Sharif University of Technology - 2019
View Certificate
Machine Learning Course
Coursera - 2019
Instructor: Andrew Ng
View Certificate
Deep Learning Specialization
Coursera - 2019
Instructor: Andrew Ng
View Certificate
Introduction to Data Science in Python Course
Coursera - 2019
Instructor: Christopher Brooks
View Certificate
Basketball Coaching License
Iran Basketball Federation - 2019
View Certificate
Python Advanced Course
University of Tehran - 2018
View Certificate
Web Development Specialization
Tehran Institute of Technology - 2016
View Certificate
Reinforcement Learning Specialization
Coursera - 2019
Instructor: Martha and Adam White
View Certificate
Oral Presentation
International Conference on Web Research (ICWR) - 2021
View Certificate
Artificial Intelligence School
Institute for Research in Fundamental Sciences - 2020
View Certificate
Neuroscience and Cognitive Sciences Summer School
Sharif University of Technology - 2019
View Certificate
Machine Learning Course
Coursera - 2019
Instructor: Andrew Ng
View Certificate
Deep Learning Specialization
Coursera - 2019
Instructor: Andrew Ng
View Certificate
Introduction to Data Science in Python Course
Coursera - 2019
Instructor: Christopher Brooks
View Certificate
Basketball Coaching License
Iran Basketball Federation - 2019
View Certificate
Python Advanced Course
University of Tehran - 2018
View Certificate
Web Development Specialization
Tehran Institute of Technology - 2016
View Certificate

Language

English

Proficient

German

Intermediate

Persian

Native

Experience

Aug 2022- Present

East Lansing, USA

AI Research Assistant

Human Augmentation and Artificial Intelligence Laboratory

Introduced CCPS, a state-of-the-art LLM confidence estimation method published at EMNLP 2025, and led the first systematic investigation of confidence estimation in large reasoning models, published at EACL 2026.
Applied LLMs to cancer care in collaboration with Henry Ford Health and Cedars-Sinai, with work on oncology QA, toxicity extraction, and toxicity grading published in IJROBP and presented at AAPM, ASTRO, and IEEE BHI.
Partnered with JPMorgan AI Research on financial AI, building a 70M+ node temporal graph for investment prediction and mapping science-to-industry funding pipelines, published in IEEE TCSS.

Sep 2018 - Jul 2022

Rasht, Iran

NLP Research Assistant

Guilan NLP Group

Developed models and datasets for Persian NLP from the ground up, addressing the challenges of a critically low-resource language setting.
Key contributions include Prose2Poem, a prose-to-poetry translation model; COPER, a semantic search engine with the accompanying PerSICK similarity dataset; and PGST, a Persian text style transfer method.

Techstack

AWS
PyTorch
Hugging Face
PyG
NumPy
Pandas
Neo4j

Presentations

Publications

How Reliable are Confidence Estimators for Large Reasoning Models? A Systematic Benchmark on High-Stakes Domains

Khanmohammadi, R., Miahi, E., Kaur, S., Brugere, I., Smiley, C. H., Thind, K., & Ghassemi, M. M.

Published at the 2026 European Chapter of the Association for Computational Linguistics (EACL'26) - doi:10.18653/v1/2026.eacl-long.78

Introduced RMCB, a large-scale benchmark of 347K+ reasoning traces across six large reasoning models and five high-stakes domains.

Evaluated 10+ architectures and uncovered a fundamental calibration-discrimination trade-off with no existing method dominating both.

Showed that structural awareness of the reasoning trace improves calibration by 7.5% relative without affecting discrimination.

In collaboration with:

JPMorgan Chase

Henry Ford Health

Calibrating LLM Confidence by Probing Perturbed Representation Stability

Khanmohammadi, R., Miahi, E., Mardikoraem, M., Kaur, S., Brugere, I., Smiley, C. H., Thind, K., & Ghassemi, M. M.

Published at the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP'25) - doi:10.18653/v1/2025.emnlp-main.530

Introduced CCPS, a confidence calibration method for LLMs using perturbation-based representation stability features.

Reduces ECE by 55% and improves Brier score by 21% without modifying model weights.

Outperforms all prior approaches on MMLU and MMLU-Pro benchmarks.

Nominated for the Outstanding Paper Award, placing in the top 0.4% of 8,174 submissions.

In collaboration with:

JPMorgan Chase

Henry Ford Health

Efficient CTCAE Grading for Post-Radiotherapy Toxicities Using Large Language Models: A Privacy-Preserving Approach Using Instruction Fine-Tuning

Khanmohammadi, R., Ghanem, A. I., Bhatnagar, A., Turfa, J., Siddiqui, S., Elshaikh, M., Bagher-Ebadian, H., Movsas, B., Chetty, I. J., & Ghassemi, M. M., Thind, K

Published in International Journal of Radiation Oncology, Biology, Physics (2025) - doi:10.1016/j.ijrobp.2025.06.3177

In collaboration with:

Henry Ford Health

Cedars-Sinai

Hybrid student-teacher large language model refinement for cancer toxicity symptom extraction

Khanmohammadi, R., Ghanem, A. I., Verdecchia, K., Hall, R., Elshaikh, M., Movsas, B., Bagher-Ebadian, H., Luo, B., Chetty, I. J., Alhanai, T., Thind, K., & Ghassemi, M. M.

Published at the 2025 IEEE International Conference on Biomedical and Health Informatics (BHI'25) - doi:10.1109/BHI67747.2025.11269490

Applied a student-teacher framework to improve compact LLMs for symptom extraction.

Used GPT-4o (teacher) to guide compact LLMs in refining prompts, using RAG, and finetuning.

Achieved F1 improvements of 26% for Phi3 and 13% for Zephyr.

Reduced costs: Phi3 was 48x and Zephyr 30x cheaper than GPT-4o.

Demonstrated an efficient, cost-effective approach for using LLMs in clinical settings.

In collaboration with:

Henry Ford Health

Cedars-Sinai

NYU-AD

Bridging Scientific Research, Innovation, and Finance: A Temporal Heterogeneous Graph Dataset for Financial Investment Prediction

Khanmohammadi, R., Singh, K., Maheshwari, P., Panda, V., Kaur, S., Brugere, I., Smiley, C. H., Nourbakhsh, A., Alhanai, T., & Ghassemi, M. M. - Under review

Built a 70M+ node graph dataset linking papers, patents, and financial data (2001–2022).

Developed ML models and an advanced TGNN model, Durendal++, for investment predictions.

Durendal++ achieved top performance, with F1 Micro scores up to 89% F1 in 2022.

Showcased the benefits of diverse data integration in financial predictions.

In collaboration with:

JPMorgan Chase

NYU-AD

Investigating the Temporal Association of Biomedical Research on Small Business Funding: A Bibliometric and Data Analytic Approach

Khanmohammadi, R., Kaur, S., Smiley, C. H., Alhanai, T., Brugere, I., Nourbakhsh, A., & Ghassemi, M. M.

Published in IEEE Transactions on Computational Social Systems (2024) - doi:10.1109/TCSS.2024.3466010

Analyzed 10,873 biomedical topics to link scientific innovation with small business funding.

Combined bibliometric analysis with SBIR data to assess science’s industrial impact.

Measured time-lagged effects of scientific advances on industry funding (2010-2021).

Found impactful scientific topics as predictors of future funding (p-values < 0.05).

Revealed strong contextual overlap between scientific papers and industry projects.

In collaboration with:

JPMorgan Chase

NYU-AD

Iterative Prompt Refinement for Radiation Oncology Symptom Extraction Using Teacher-Student Large Language Models

Khanmohammadi, R., Ghanem, A. I., Verdecchia, K., Hall, R., Elshaikh, M., Movsas, B., Bagher-Ebadian, H., Chetty, I., Ghassemi, M. M., & Thind, K.

Published at the 2024 International Conference on the use of Computers in Radiation therapy (ICCR'24) - HAL ID: hal-04720234

Automated prompt optimization through a teacher-student model setup.

Improved model performance using zero-shot learning, avoiding additional training.

Ensured local data processing to protect sensitive clinical information.

Improved domain-specific concept extraction accuracy through iterative refinement.

In collaboration with:

Henry Ford Health

Cedars-Sinai

A Novel Localized Student-Teacher LLM for Enhanced Toxicity Extraction in Radiation Oncology

Khanmohammadi, R., Ghanem, A. I., Verdecchia, K., Hall, R., Elshaikh, M. A., Movsas, B., Bagher-Ebadian, H., Chetty, I. J., Ghassemi, M. M., & Thind, K.

Published in International Journal of Radiation Oncology, Biology, Physics (2024) - doi:10.1016/j.ijrobp.2024.07.1392

Developed a student-teacher LLM system to improve toxicity extraction in radiation oncology.

Tested on prostate cancer notes, focusing on key symptoms and treatments from 177 patients.

Achieved significant accuracy, precision, recall, and F1 score improvements in single and multi-symptom as well as single and multi-treatment notes (p < 0.05).

Demonstrated potential for local, privacy-preserving NLP in clinical environments.

In collaboration with:

Henry Ford Health

Cedars-Sinai

Integrating Natural Language Processing into Radiation Oncology: A Practical Guide to Transformer Architecture and Large Language Models

Khanmohammadi, R., Ghassemi, M. M., Verdecchia, K., Ghanem, A. I., Bing, L., Chetty, I. J., Bagher-Ebadian, H., Siddiqui, F., Elshaikh, M., Movsas, B., & Thind, K. (2023).

Published in BJR|Artificial Intelligence (2025) - doi:10.1093/bjrai/ubaf010

Introduced NLP's role in converting clinical text to structured data for radiation oncology.

Reviewed major advancements in NLP, focusing on applications in radiation oncology.

Proposed a comprehensive evaluation framework for assessing NLP models' readiness for clinical use, focusing on purpose, technical performance, bias, ethics, and quality assurance.

Identified current challenges with LLMs, including hallucinations, bias, and issues in clinical deployment.

Outlined a checklist for clinical implementation, providing practical guidance for researchers and clinicians to evaluate NLP models for safe and effective use.

In collaboration with:

Henry Ford Health

MambaNet: A Hybrid Neural Network for Predicting the NBA Playoffs

Khanmohammadi, R., Saba-Sadiya, S., Esfandiarpour, S., Alhanai, T., & Ghassemi, M. M.

Published in SN Computer Science (2024) - doi:10.1007/s42979-024-02977-0

Introduced MambaNet for NBA playoff prediction with advanced neural layers.

Leveraged Feature Imitating Networks (FINs) for improved statistical feature representation.

Outperformed baseline models, achieving AUC up to 0.82.

Demonstrated model generalizability with NBA and Iranian Super League data.

In collaboration with:

Hudl Instat

NYU-AD

The Broad Impact of Feature Imitation: Neural Enhancements Across Financial, Speech, and Physiological Domains

Khanmohammadi, R., Alhanai, T., & Ghassemi, M. M.

Under review - https://arxiv.org/abs/2309.12279

FINs with Tsallis entropy boosted performance in finance, speech, and physiology tasks.

FIN-ENN improved Bitcoin prediction accuracy by reducing RMSE and MAPE.

Enhanced speech emotion recognition by 2.65% with FIN.

Improved Chronic Neck Pain detection accuracy to 62.5%, outperforming traditional models.

In collaboration with:

NYU-AD

Fetal Biological Sex Identification using Machine and Deep Learning Algorithms on Phonocardiogram Signals

Khanmohammadi, R., Mirshafiee, M. S., Alhanai, T., & Ghassemi, M. M.

Under review - https://arxiv.org/abs/2110.06131

Developed a method to identify fetal biological sex from fetal phonocardiogram (FPCG) signals.

Achieved 91% accuracy, surpassing previous baselines by 10%.

Analyzed a dataset of 1000 FPCG samples, balanced across male and female fetuses.

Combined statistical and sound features to improve classification over individual models.

In collaboration with:

NYU-AD

COPER: a Query-Adaptable Semantics-based Search Engine for Persian COVID-19 Articles

Khanmohammadi, R., Mirshafiee, M. S., Allahyari, M. (2021)

Published at the 2021 International Conference on Web Research (ICWR'21) - doi:10.1109/ICWR51868.2021.9443151

Built COPER, a search engine with 3,500 Persian COVID-19 articles.

Used BM25, TF-IDF, and BERT/SBERT for query-adaptive re-ranking.

Developed PerSICK, the first Persian semantic textual similarity dataset with 3,000 pairs.

Fine-tuned SBERT, achieving 97% STS accuracy.

Prose2Poem: The Blessing of Transformers in Translating Prose to Persian Poetry

Khanmohammadi, R., Mirshafiee, M. S., Rezaee, Y., Mirroshandel, S. A.

Published in ACM Transactions on Asian and Low-Resource Language Information Processing (2023) - doi:10.1145/359279

Created the first Persian Prose-to-Poem translation using a new low-resource NMT method.

Released a unique prose-poem and synonym-antonym dataset in Persian.

PGST: A Persian gender style transfer method

Khanmohammadi, R., Mirroshandel, S. A.

Published in Natural Language Engineering (2023) - doi:10.1017/S1351324923000426

PGST is the first Persian text style transfer method for gender-based language differences.

A benchmark compares PGST with models using word and character embeddings.

PGST is extended to English and evaluated against top models with various metrics.

Contact

Google Scholar

Email

GitHub

LinkedIn

Education

Michigan State University

Michigan State University

Michigan State University

Michigan State University

University of Guilan

University of Guilan

Leadership

Rasht School of AI Leader

Brain and Cognition Association AI Head

CE Scientific Association Head of Research Affairs

Awards

NSF-EMBS-Google Young Professional NextGen Scholar

Sigma Xi Full Member

Selected Speaker

Selected Tutorial

FISU Ambassador

Marathoner

Teacher Assistanship

Deep Learning

Natural Language Processing

Computational Intelligence

Language Theories and Automata

Data Structures and Algorithm Design

Advanced Programming

Basics of Computer and Programming

Specialized English for engineering students

Certifications

Reinforcement Learning Specialization

Oral Presentation

Artificial Intelligence School

Neuroscience and Cognitive Sciences Summer School

Machine Learning Course

Deep Learning Specialization

Introduction to Data Science in Python Course

Basketball Coaching License

Python Advanced Course

Web Development Specialization

Language

English

German

Persian

Experience

AI Research Assistant

NLP Research Assistant

Publications

How Reliable are Confidence Estimators for Large Reasoning Models? A Systematic Benchmark on High-Stakes Domains

Calibrating LLM Confidence by Probing Perturbed Representation Stability

Efficient CTCAE Grading for Post-Radiotherapy Toxicities Using Large Language Models: A Privacy-Preserving Approach Using Instruction Fine-Tuning

Hybrid student-teacher large language model refinement for cancer toxicity symptom extraction

Bridging Scientific Research, Innovation, and Finance: A Temporal Heterogeneous Graph Dataset for Financial Investment Prediction

Investigating the Temporal Association of Biomedical Research on Small Business Funding: A Bibliometric and Data Analytic Approach

Iterative Prompt Refinement for Radiation Oncology Symptom Extraction Using Teacher-Student Large Language Models

A Novel Localized Student-Teacher LLM for Enhanced Toxicity Extraction in Radiation Oncology

Integrating Natural Language Processing into Radiation Oncology: A Practical Guide to Transformer Architecture and Large Language Models

MambaNet: A Hybrid Neural Network for Predicting the NBA Playoffs

The Broad Impact of Feature Imitation: Neural Enhancements Across Financial, Speech, and Physiological Domains

Fetal Biological Sex Identification using Machine and Deep Learning Algorithms on Phonocardiogram Signals

COPER: a Query-Adaptable Semantics-based Search Engine for Persian COVID-19 Articles

Prose2Poem: The Blessing of Transformers in Translating Prose to Persian Poetry

PGST: A Persian gender style transfer method