Sambhav Shrestha

// who am i

About

I'm a Machine Learning Researcher and Software Engineer pursuing an M.S. in Computer Science at Stony Brook University. My work lives at the intersection of AI infrastructure, distributed systems, and applied ML research.

Most recently at Meta, I built Sherlock — a production Agentic AI system that fine-tunes LLaMA on Meta's internal Workplace forum data to autonomously resolve ML infrastructure queries for engineers and research scientists. Before that, I shipped microservices at Amazon powering Just Walk Out retail.

Outside the terminal: hiking, snowboarding, and researching things

ML / AI

PyTorchAgentic AILLMsVLMsCUDARAGFine-tuning

Languages

PythonJavaTypeScriptC/C++RRustVerilog

Infra

AWSGCPDockerKubernetesPostgreSQLCI/CD

sambhav@portfolio — zsh

// where i've worked

Experience

2025

HCL Technologies · embedded at Meta

ML Infrastructure Engineer

New York, NY Mar 2024 – Jul 2025 16 months

Architected and deployed Sherlock, a production Agentic AI system — fine-tuned LLaMA on Meta's internal Workplace forum data and combined it with RAG to autonomously resolve ML infrastructure queries, reducing escalation volume and accelerating onboarding
Engineered monitoring systems and a doctor tool to track production LLM health, with real-time alerting for entropy explosions, weight deviations, and throughput degradation
Partnered with Research Scientists and ML Engineers to translate model requirements into robust infrastructure across the full experimentation and deployment lifecycle
Built automated tooling to detect and remediate errors across model packing, splitting, lowering, and transformation workflows

Agentic AILLaMARAGPyTorchPythonMonitoring

2024

Tarifica

Software / Data Engineer

New York, NY Jun 2023 – Feb 2024 9 months

Architected and maintained 300+ web scrapers using Python and BeautifulSoup, transforming raw data from heterogeneous sources into structured PostgreSQL records
Designed scalable ETL pipelines with Flask to validate, process, and route high-fidelity data to downstream analytics and visualization systems
Drove a comprehensive refactor of legacy codebases, introducing structured logging, observability, and an automated test suite for long-term production reliability

PythonFlaskPostgreSQLETLBeautifulSoup

2023

Amazon

Software Development Engineer

Seattle, WA Jul 2022 – Mar 2023 9 months

Designed and operated mission-critical microservices on AWS (EC2, Lambda, CloudWatch) powering Amazon Go and Just Walk Out retail stores
Led end-to-end migration of a high-traffic service to AWS, delivering reductions in cost, latency, and support tickets through automated CI/CD
Shipped production features across Java, Kotlin, Python, TypeScript, and Ruby in a Linux environment, consistently meeting sprint commitments

AWSJavaKotlinPythonCI/CD

2021

Microsoft Research · DS3

Data Science Research Fellow

New York, NY Jun – Jul 2021 2 months

Extended the Financial Times police complaints study, conducting rigorous statistical analysis of officer-victim race and gender dynamics across NYC, Chicago, and Philadelphia datasets
Developed regression and ML models in R to surface novel patterns, delivering findings through publication-quality visualizations to researchers at MSR

Rggplot2tidyverseStatistics

// what i've built

Projects

sherlock.py

internal · meta

Sherlock — Agentic AI @ Meta

123456789101112131415161718

"""

Production Agentic AI built at Meta. Fine-tuned LLaMA on internal Workplace forum data to create a domain-expert agent that autonomously resolves ML infra queries — combining RAG with agentic reasoning across model packing & lowering.

"""

import torch

from llama import LLaMA # Meta internal

class SherlockAgent:

def __init__(self, llm):

self.rag = RAGPipeline()

self.llm = llm

def resolve(self, query):

ctx = self.rag.retrieve(query)

return self.llm.generate(ctx)

PythonUTF-8Ln 18

argument_quality.py

Argument Quality Ranking

123456789101112131415161718

"""

Fine-tuned RoBERTa v3 for pairwise argument quality ranking. Achieves 0.657 Spearman ρ, matching GPT-5.5 (0.665) at a fraction of cost. Margin ranking loss + test-time flip removes positional bias.

"""

import torch, transformers

from sklearn.metrics import spearmanr

class ArgumentRanker:

spearman_rho = 0.657

def forward(self, a, b):

sa, sb = self.encode(a), self.encode(b)

return self.margin_loss(sa - sb)

def encode(self, text):

return self.roberta(text).pooler_output

PythonUTF-8Ln 18

freq_prior.py

Frequency Prior for Image Generation

123456789101112131415161718

"""

Explored swapping self-attention with Fourier frequency priors in autoregressive image generation. Representing tokens in frequency space captures global structure more efficiently, improving training stability, convergence speed, and output image quality.

"""

import torch, wandb

from pytorch_lightning import Trainer

def train_with_fourier(config):

# replaces attention transformer

model = FourierPriorNet(config)

trainer = Trainer(**config)

trainer.fit(model)

wandb.log({"fid": model.fid})

PythonUTF-8Ln 18

stock_lstm.py

Stock Price Prediction — LSTM

123456789101112131415161718

"""

LSTM neural network for multi-step stock price forecasting. Uses sliding-window preprocessing, dropout regularization, and multi-horizon temporal sequence prediction.

"""

import tensorflow as tf

from keras.layers import LSTM, Dropout, Dense

model = tf.keras.Sequential([

LSTM(128, dropout=0.2, return_sequences=True),

LSTM(64, dropout=0.2),

Dropout(0.3),

Dense(64, activation='relu'),

Dense(forecast_horizon)

])

model.compile(optimizer='adam', loss='mse')

PythonUTF-8Ln 18

consensus_sim.rs

Raft / Paxos Consensus Simulator

123456789101112131415161718

* Simulates distributed consensus with configurable node counts, network partitions, and leader election. Visualizes log replication and fault-tolerance under failure scenarios.

use raft::{Cluster, Config};

use paxos::PhaseOne;

pub async fn simulate(nodes: usize) {

let mut cluster = Cluster::new(nodes);

cluster.inject_partition(0.3_f64);

let leader = cluster.elect().await;

leader.replicate_log(&entries).await;

PhaseOne::run(&cluster).await;

}

Rust · Python · TOMLUTF-8Ln 14

mips_proc.v

Pipelined MIPS Processor

123456789101112131415161718

* 32-bit five-stage pipeline in Verilog with full hazard detection, data forwarding, stalling logic, and integrated instruction and data cache support.

module MIPSProcessor #(

parameter STAGES = 5,

parameter WIDTH = 32

) (input clk, rst,

output [31:0] result);

IF_ID_reg if_id;

ID_EX_reg id_ex;

EX_MEM_reg ex_mem;

always @(posedge clk)

if_id <= fetch(pc);

endmodule

VerilogUTF-8Ln 18

police_analysis.R

Police Complaints Analysis

123456789101112131415161718

# Extended the Financial Times police complaints study at Microsoft Research DS3. Uncovered race & gender patterns in officer complaint datasets across NYC, Chicago, and Philadelphia.

library(tidyverse)

library(ggplot2)

library(dplyr)

analyze <- function(city) {

df <- load_complaints(city)

df |> group_by(race, gender) |>

summarise(rate = mean(sustained)) |>

ggplot(aes(race, rate, fill = gender)) +

geom_col() + theme_minimal()

}

RUTF-8Ln 18

fraud_detect.R

Credit Card Fraud Detection

123456789101112131415161718

# XGBoost, Random Forest, and Logistic Regression on an imbalanced fraud dataset. Uses SMOTE oversampling + threshold tuning to maximize recall on the minority fraud class.

library(xgboost)

library(randomForest)

library(caret)

train_model <- function(df) {

df <- SMOTE(df)

xgb.fit <- xgb.train(params, dtrain)

rf.fit <- randomForest(label ~ ., df)

threshold <- 0.35

preds <- predict(xgb.fit, dtest)

preds[preds > threshold] <- 1L

}

RUTF-8Ln 18

SAMBHAV SHRESTHA

About

Experience

Education

Projects

Contact