Sitemap

◉ — ♡ —

How Devices Find Each Other Without IPs: The mDNS + AWDL Story

15 minute read

June 19, 2026

A practical walkthrough of mDNS, Zeroconf, and Apple’s AWDL — how AirDrop finds your phone, how smoltorrent discovers Raspberry Pis with no hardcoded IPs, and how a Swift sidecar bridges Python to Apple’s peer-to-peer WiFi stack. Read more

◉ — ♡ —

Bonsai LLM Benchmark: Jetson Orin Nano Super 8GB

35 minute read

June 08, 2026

5 Bonsai-family 1–1.58bit LLMs benchmarked across 4 power modes on Jetson Orin Nano Super 8GB. 25W sweet spot: 47–48% more tok/s than 15W, best output tok/J for all sub-4B models. Read more

◉ — ♡ —

Clustering 3 Jetson Orin Nano Super

14 minute read

June 05, 2026

Build a 3-node Jetson Orin Nano Super 8GB cluster with active cooling. Real numbers: ~759 Mbps per link (gigabit), peak 58.3°C across all 3 nodes under full 18-core sustained load, zero throttling at 1728 MHz throughout. Read more

◉ — ♡ —

Tiny LLM Benchmark: Jetson Orin Nano Super 8GB

28 minute read

May 29, 2026

8 tiny LLMs benchmarked across 4 power modes on Jetson Orin Nano Super 8GB with llama.cpp. 25W sweet spot: 43% more tok/s than 15W, better tok/J than MAXN. Read more

◉ — ♡ —

Length-Constrained Summarization with GRPO: Reward Signal Ablations on Reddit TL;DR

30 minute read

May 26, 2026

An ablation study of GRPO reward signals for 64-token Reddit TL;DR summarization across Qwen2.5-0.5B and LFM-2.5-350M on Apple Silicon. Read more

◉ — ♡ —

smoltorrent: Distributing ML Checkpoints Across a Pi Cluster

24 minute read

May 19, 2026

A 942 MB checkpoint. Four Raspberry Pis. ~1.5 min gather. No single point of failure. A deep dive into smoltorrent - a distributed checkpoint sharding system built over raw TCP with replication, SHA-256 integrity verification, mDNS discovery, and Prometheus monitoring. Read more

◉ — ♡ —

Clustering 4 Raspberry Pis 4B

14 minute read

May 10, 2026

Build a 4-node Raspberry Pi 4B cluster with UCTRONICS enclosure, PoE+ hats, and TP-Link LS110P PoE switch. Real numbers: 94.4 Mbps per link (100 Mbps switch ceiling), 62.3°C under full 16-core load, zero throttling at 1800 MHz throughout. Read more

◉ — ♡ —

Mac Minis Thunderbolt Cluster Setup Guide

9 minute read

May 09, 2026

Wire Mac minis into a high-bandwidth local Thunderbolt cluster for distributed training and inference with zero cloud egress cost, low latency, and direct control over cluster networking. Read more

◉ — ♡ —

QnA Irrigation Diseases Dataset

This dataset contains a comprehensive Question & Answer collection focused on water management technologies, irrigation systems, and related agricultural practices for sustainable farming. The dataset is derived from technical documentation and research publications related to water management in ag Read more

QnA Plant Diseases Dataset

This dataset contains a comprehensive Question & Answer collection focused on plant diseases, their management, treatment protocols, and diagnostic techniques. The dataset provides detailed information about various plant pathologies, fungicide applications, and disease identification methods for ag Read more

QnA Soil Diseases Dataset

This dataset contains a comprehensive Question & Answer collection focused on soil management, soil health, organic farming practices, and soil-related agricultural techniques. The dataset is derived from technical guides and documentation related to sustainable soil management practices, with empha Read more

ViT

June 20, 2024

ViT-B/16 from scratch on a 3-class Food-101 subset. Train loss 1.20 / test loss 1.52. Read more

GPT

February 08, 2025

Decoder-only transformer trained on TinyShakespeare, replicating the original OpenAI GPT architecture from scratch. Read more

BERT

February 09, 2025

Bidirectional encoder pre-trained with masked language modelling on the Cornell Movie Dialogs corpus. Read more

CycleGANs

February 09, 2025

Cycle-consistent unpaired image translation on Cityscapes — two generators, two discriminators, cycle + identity losses. Read more

Differential Transformer

February 09, 2025

Differential attention replicated from scratch — two attention maps subtracted to cancel noise. Trained on TinyShakespeare on A100. Read more

Encoder-Decoder

March 01, 2025

LSTM-based Seq2Seq encoder-decoder for German→English translation. Train/val loss ~1.38 in 10 epochs. Read more

Fine Tuning using PEFT

March 01, 2025

QLoRA fine-tuning scripts using PEFT + BitsAndBytes for both decoder and encoder-type models. Read more

GRU

March 05, 2025

GRU from scratch. 16 hidden units, 50 epochs. Train loss 0.51 / val loss 0.48. Read more

Attention Mechanisms

March 07, 2025

From-scratch implementations of Bahdanau and Luong attention in PyTorch. Read more

RNNs

March 07, 2025

Vanilla RNN from scratch. 16 neurons, 50 epochs. Train loss 0.51 / val loss 0.50. Read more

Transformer

March 10, 2025

Encoder-decoder transformer for English→Hindi translation on Samanantar (~25M params). Published on HuggingFace. Read more

Mixtral

March 20, 2025

Sparse MoE transformer replicated from scratch on TinyShakespeare. Train loss 2.04 / val loss 2.09 in 1,000 steps on T4. Read more

DPO

April 04, 2025

Direct Preference Optimization applied to Qwen0.5B-Instruct on UltraFeedback. Train loss 0.67 in 3,000 iterations. Read more

SimplePO

April 04, 2025

Reference-free preference optimization (SimplePO) on OPT-330M. Batch size 128, lr=2e-5, beta=2 on UltraFeedback. Read more

LoRA

April 05, 2025

Low-rank adaptation implemented from scratch in PyTorch. Train/val loss ~3.5 in 1,000 steps on A100. Read more

ORPO

April 10, 2025

Odds Ratio Preference Optimization on OPT-330M. Reference-free alignment reaching train loss 1.70 in 3,000 iterations. Read more

Gemma

April 20, 2025

Google’s Gemma architecture replicated from scratch — multi-query attention and GeGLU activations on TinyShakespeare. Read more

Llama

April 20, 2025

Decoder-only Llama replicated from scratch with RoPE, SwiGLU, RMSNorm and GQA. Read more

CLiP

April 25, 2025

Contrastive vision-language model trained on Flickr8K. Train loss 1.3 / val loss 2.2 in 30 epochs on T4. Read more

DDP

April 25, 2025

Llama trained with PyTorch DistributedDataParallel (torchrun). Val loss 1.1 in 8,000 iterations on TinyShakespeare. Read more

Llava

April 25, 2025

Visual instruction tuning replicated from scratch on Flickr8K. Train loss 0.23 / val loss 0.22 in 5 epochs on T4. Read more

Seq2Seq

April 25, 2025

GRU-based Seq2Seq with both Bahdanau and Luong attention from scratch. 128 hidden units, 50 epochs. Read more

Whisper

April 25, 2025

Whisper ASR from scratch — CNN on 80-channel mel spectrograms + 6-layer transformer decoder. Trained on GigaSpeech. Read more

LSTM

April 25, 2025

LSTM from scratch (~128K params). 128 hidden units, 50 epochs. Train loss 0.49 / val loss 0.48. Read more

Gemma3

May 01, 2025

90M-parameter Gemma 3 with local sliding-window attention (128-token blocks). Val loss 1.77 in 25k steps on TinyStories. Read more

Llama4

May 01, 2025

1.2B-parameter MoE (32×12M experts, top-1 routing) trained on TinyStories. Val loss 1.70 in 20k steps on Kaggle P100. Read more

Moonshine

May 01, 2025

Compact transformer ASR (288-dim, 6 heads) trained on GigaSpeech for 1,500 steps. Notes on overfitting at ~25 hours. Read more

PaliGemma

May 01, 2025

Google’s PaliGemma VLM (SigLIP + Gemma) replicated from scratch on Flickr8K. Read more

Pix2Pix

May 01, 2025

Conditional GAN for paired image-to-image translation (aerial→map) replicated from scratch. PatchGAN discriminator. Read more

SigLip

May 01, 2025

Sigmoid-loss vision-language pretraining replicated from scratch on Flickr8K — avoids global softmax normalisation. Read more

TTS

May 01, 2025

Tacotron-style transformer TTS from scratch — 512-dim phoneme encoder, mel spectrogram decoder, 16kHz on GigaSpeech. Read more

VAE

May 01, 2025

VAE on CelebA (128×128). 4-layer conv encoder, 32D latent, ConvTranspose decoder. Reconstruction + KL loss over 200 epochs. Read more

WGANs

May 01, 2025

Wasserstein GAN and WGAN-GP implemented from scratch on MNIST — gradient penalty for stable training. Read more

Kimi-K2

August 01, 2025

DeepSeekV3-inspired MoE with latent attention trained with Muon optimizer. Pre-trained weights on HuggingFace. Read more

CGANs

August 06, 2025

Conditional GAN on MNIST — class-conditioned 64×64 digit generation. 30 epochs, BCE loss, TensorBoard logging. Read more

CLAP

August 06, 2025

Contrastive Language-Audio Pretraining from scratch on GigaSpeech. 768D text / 2048D audio → 1024D shared space. Read more

DCGANs

August 06, 2025

Deep Convolutional GAN trained on CelebA and CIFAR-10. ~7,800 steps (CelebA) and ~11,700 steps (CIFAR-10). Read more

DeepSeekV3

August 06, 2025

16×4 MoE with Multi-head Latent Attention and auxiliary-free load balancing, trained on TinyStories on Kaggle P100. Read more

Portfolio item number 1

Short description of portfolio item number 1 Read more

Portfolio item number 2

Short description of portfolio item number 2 Read more

Movies Review System Spoiler-Free Sentiment-Analysis based Movies Review System)

September 01, 2023

Introducing the Movie Review System, where AI meets movie magic to revolutionize how viewers experience films. This project goal is to provide an interface for spoiler-free reviews and sentiment analysis, enhancing the viewing journey. With advanced models like Voting Classifier and Bi-LSTMs powered by Keras and TensorFlow, we achieve impressive metrics—a 91% accuracy, 91% precision,... Read more

MoviesMania (Geek-o-thon) A Reverse Search based Movies Recommendation System

October 03, 2023

Step into the future of entertainment discovery with MoviesMania. The rpoduct aims to simplify your search for the perfect movie or web series. Using various AI/ML techniques and elements, we analyze uploaded video clips to predict movie titles and recommend similar content with an impressive accuracy. Experience flavoured recommendations tailored to your tastes, powered by... Read more

PlogPayouts AI-driven Plogging System

December 01, 2023

Transform your daily jog into a mission for a cleaner world with PlogPayouts. Our innovative website + app rewards you for collecting litter, promoting fitness and environmental cleanliness. Utilizing AI for trash categorization and optimized routes, and fostering community through shared stories, PlogPayouts turns every step into a step towards a greener, more inclusive society.... Read more

Insight-Ed (HackNITR 5.0) EdTech Platform for Student and Teacher

March 03, 2024

Imagine an online classroom where teachers instantly know when and why students lose focus. Our AI-powered solution bridges the knowledge gap by detecting student emotions and attentiveness, highlighting problem areas, makes the teacher aware of each student’s progress. With features like reverse video search, dynamic questionnaires, and advanced Q&A bots, we transform the learning experience,... Read more

FarmGenie (GeoHack 2024) Empowering farmers with real-time insights and expert guidance via AI-driven space

July 19, 2024

Our platform utilizes LLMs and a Mixture of Expert (MoE) approaches to provide precise guidance on soil management, plant disease identification, and irrigation techniques. Built as a scalable web application with a Next.js frontend and backend, and supported by a Redis queue and multiple worker nodes, FarmGenie ensures robust performance. The system’s multilingual support, interactive... Read more

NeatRL Playground AI Games Showcase powered by Reinforcement Learning

January 01, 2025

Beautiful, interactive website showcasing AI-powered games with reinforcement learning agents. Features Pong AI with Deep Q-Learning, real-time WebSocket communication, and smooth animations. Deployed on Vercel (frontend) and Render (game server) with production-ready health checks and headless mode. Read more

Paper Replications ML/DL Research Paper Implementations

January 01, 2025

A comprehensive collection of code implementations replicating results from influential machine learning and deep learning research papers. Features 30+ models including Transformers, GANs, Vision models, and RLHF techniques. Read more

NeatRL Deep Reinforcement Learning Algorithms Library

February 01, 2025

Comprehensive implementations of deep RL algorithms including DQN, A2C, PPO, DDPG, TD3, and SAC. Features one-file implementations, experiment tracking with W&B, automatic video recording, and support for Gymnasium environments. Main NeatRL library provides high-quality training utilities with focus on simplicity and performance. Read more

SmolCluster Distributed Deep Learning Library for Heterogeneous Hardware

January 01, 2026

Educational Library for training/inference of neural networks across heterogenous compute like Mac minis, Raspberry Pi, and GPUs, written using only socket library in Python. Supports FSDP, Classic Data Parallelism, Elastic DP, Synchronous Parameter Server, and Model Parallelism. Read more

Yuvraj Singh

Sitemap

Pages

Posts

datasets

models

portfolio

projects

publications

rl

smolhub

teaching