Saikat Chakraborty

I work on training and evaluating code LLMs using structured correctness signals from execution, program analysis, and verification.

Focus

Reliable AI for Code: post-training, evaluation, and symbolic feedback for trustworthy LLMs.

I'm a Senior Researcher in the RiSE group at Microsoft Research, where I train and evaluate code LLMs using execution, program analysis, and formal verification as structured feedback signals. My work connects model generation to real-world execution—turning tests, analysis results, and verification outcomes into supervision for fine-tuning and reinforcement learning. I bring 10 years of experience training and evaluating code models, with a focus on correctness and fidelity of generated programs.

Ph.D. in Computer Science from Columbia University (2022), advised by Baishakhi Ray. Thesis: Learning to Edit Code.

Key Projects

DeepTest

Agent-driven testing via symbolic analysis + LLMs. Deployed on production code across Windows & Azure; integrated into developer workflows as a Copilot plugin. On 60 real-world vulnerabilities (OpenSSL, FFmpeg, SQLite): up to 60% higher precision than baseline systems.

DeepProof + FStarDataset

Post-training for theorem proving and program verification. State-of-the-art on FStarDataset, CoqStoq (Rocq), and Verina (Lean). Published million+ lines of verified programs/proofs as open-source training data.

Preference Optimization for Code

Localized preference optimization with distilled reward signals for secure code generation. ACL'25. Explores fine-grained, composable preference signals aligned with specific correctness and security properties.

Selected Publications

ACL'25

Teaching an Old LLM Secure Coding via Localized Preference Optimization

ICSE'25

Neural Synthesis for Proof-Oriented Programming

🏆 Distinguished Paper Award

ISSTA'23

Contrastive Learning for Code Understanding

🏆 Distinguished Paper Award

SCAM'24

Reinforcement Learning of Code Similarity from Execution

🏆 Best Paper Award

Selected work spans ACL/EMNLP/NAACL, ICSE/FSE/ISSTA, and multiple distinguished/best paper awards.

All publications →

Skills

Post-training

SFT, RLHF/GRPO, reward modeling, reranking, retrieval-augmented fine-tuning

Supervision

Reward design via test generation, execution feedback, mutation testing, specification inference, program analysis

Systems

PyTorch, Megatron-LM, Ray, distributed GPU clusters, Kubernetes

Languages

Python, C++, Java; F* (formal verification)