Hey, I'm Ritesh, currently working as a Research Scientist at d_model. I am broadly interested in the mechanistic interpretability of large language models. While I am pretty flexible, my current focus lies in foundational interpretability research, reasoning models, and "model psychology" -- understanding and controlling model behavior. More generally, I aim to develop principled methods that make advanced AI systems more interpretable, reliable, and aligned with human intent.
Apart from this, I enjoy picking up math and physics concepts that catch my eye. I am always looking for opportunities to collaborate with fellow innovators :)
CGPA : 8.47
Percentage : 94.67
Percentage : 93.00
Bronze Medal (Solo Participant)
Achieved a Bronze Medal in the Kaggle Featured Competition on detecting personally identifiable information (PII) in student essays.
certificate link
12th Global Rank.
Our solution for Biomass supply chain optimization for state of Gujarat using ML and MILP was ranked 12th on global leaderboard
certificate link
Regional Finalists.
EYIC is a national level competition . My team built an app called "Enabled" which is community platform for persons with disabilities. Pitch link
certificate link
Second place.
My team's solution - "Rhythm" which is a web and app based integrated platform in the domain of preventive cardio vascular self-care won 2nd prize at this hackathon.
Certificate link Solution linkDeveloped an automated pipeline to generate environments eliciting evaluation awareness in LLMs. Benchmarked multiple black-box and white-box suppression techniques.
Project link
Slides
Investigated whether LLMs form internal "Trustworthiness" attributes of users and whether these can bypass safety guardrails. Trained linear probes on Llama models to extract trust vectors from synthetic multi-turn conversations. Demonstrated that trust vectors are mechanistically distinct from compliance/refusal directions and successfully induce jailbreaking through a novel mechanism—making the model perceive users as trusted individuals rather than directly suppressing refusal.
Project link
Developed and applied S-KANformer (Transformers infused with Kolmogorov-Arnold Networks Using Sinusoidal Activation Functions) for high-energy physics symbolic calculations, achieving state-of-the-art performance.
Project link
Investigated belief fragility in hybrid reasoning model by fine-tuning Qwen3-1.7B on counterfactual data and evaluating behavioral robustness. Applied mechanistic interpretability using BatchTopKCrossCoder, identifying fine-tune-specific latents linked to incepted counterfactuals.
Project link
This project develops an open-source chatbot by fine-tuning LLaMA 2 with RAFT, RAG, and a fine-tuned retriever using LLM-generated QnA data.
Project link
Detecting personally identifiable information (PII) in student writing using Longformer & Deberta. Bronze medal in this featured Kaggle competition.
Project link
Fine-tuned Small Language Models (SLMs) to test their efficacy against LLMs on domain-specific tasks.
Project link
Transformer Models for Symbolic Regression trained on a subset of Feynman Dataset. You can find it here.
Project link
Biomass yield forecasting using AutoML and large-scale optimization using Mixed Integer Linear Programming combined with density-based clustering. Solution for the Shell.ai Hackathon 2023.
Project link
Identifying which essay was written by a large language model.
Project link
Feature packed and completely accessible application for PwDs.
Project link
Certificate link
Certificate link
Certificate link
Certificate link
Certificate link
Certificate link
Certificate link
Certificate link