
Senior Researcher
Microsoft Research
Saikat Chakraborty
I work on training and evaluating code LLMs using structured correctness signals from execution, program analysis, and verification.
Focus
Reliable AI for Code: post-training, evaluation, and symbolic feedback for trustworthy LLMs.
I'm a Senior Researcher in the RiSE group at Microsoft Research, where I train and evaluate code LLMs using execution, program analysis, and formal verification as structured feedback signals. My work connects model generation to real-world execution—turning tests, analysis results, and verification outcomes into supervision for fine-tuning and reinforcement learning. I bring 10 years of experience training and evaluating code models, with a focus on correctness and fidelity of generated programs.
Ph.D. in Computer Science from Columbia University (2022), advised by Baishakhi Ray. Thesis: Learning to Edit Code.
Key Projects
DeepTest
Agent-driven testing via symbolic analysis + LLMs. Deployed on production code across Windows & Azure; integrated into developer workflows as a Copilot plugin. On 60 real-world vulnerabilities (OpenSSL, FFmpeg, SQLite): up to 60% higher precision than baseline systems.
DeepProof + FStarDataset
Post-training for theorem proving and program verification. State-of-the-art on FStarDataset, CoqStoq (Rocq), and Verina (Lean). Published million+ lines of verified programs/proofs as open-source training data.
Preference Optimization for Code
Localized preference optimization with distilled reward signals for secure code generation. ACL'25. Explores fine-grained, composable preference signals aligned with specific correctness and security properties.
Selected Publications
Teaching an Old LLM Secure Coding via Localized Preference Optimization
Neural Synthesis for Proof-Oriented Programming
🏆 Distinguished Paper AwardContrastive Learning for Code Understanding
🏆 Distinguished Paper AwardReinforcement Learning of Code Similarity from Execution
🏆 Best Paper AwardSelected work spans ACL/EMNLP/NAACL, ICSE/FSE/ISSTA, and multiple distinguished/best paper awards.
Skills
Post-training
SFT, RLHF/GRPO, reward modeling, reranking, retrieval-augmented fine-tuning
Supervision
Reward design via test generation, execution feedback, mutation testing, specification inference, program analysis
Systems
PyTorch, Megatron-LM, Ray, distributed GPU clusters, Kubernetes
Languages
Python, C++, Java; F* (formal verification)