Aman Hanspal

Projects

Machine Learning

Applied ML pipelines — voice synthesis, face processing, and model deployment at scale.

Singing Voice Synthesis

DiffSinger, OpenUtau, Python, AWS

Built a DiffSinger-based singing voice synthesis pipeline for an Ogilvy × Cadbury marketing campaign that reached 14,000+ users during Christmas 2024. Handled phoneme alignment, variance modeling, and acoustic synthesis on a 3-hour custom music dataset. Deployed real-time inference workflows using AWS Lambda.

Sample outputs ↗

FaceFusion Replicate Deployment

Modal, Replicate, ffmpeg, Python

Deployed FaceFusion on Replicate for internal use cases and marketing campaigns. The pipeline accepts a JSON config with faceswap parameters per video segment, applies them frame-accurately, and stitches the final output using ffmpeg.

View on GitHub ↗

Computer Vision

VisionQuery (Open-Vocabulary Video Search)

Python, FastAPI, YOLO-World

Enabled natural language querying over videos using open-vocabulary object detection. Built frame sampling and a YOLO-World inference pipeline for dynamic class detection without retraining. Designed APIs for upload, querying, and timestamp-level detection results.

View on GitHub ↗

VisionQuery — open-vocabulary video search demo

Engineering

Retrieval pipelines, document processing, data tooling, and open-source packages.

Speech-to-Speech (JioStar)

OpenAI Realtime API, Python

Led research and development for a speech-to-speech POC for JioStar — converting English audio to Indian regional languages. Evaluated multiple pipeline approaches with a technical report benchmarking latency, speed, and output accuracy. Final POC built on OpenAI's Realtime API.

View on GitHub ↗

SIGGRAPH 2025 RAG Search

Next.js, FastAPI, Qdrant, OpenRouter, Cohere

Full-stack RAG application for searching 11,000+ SIGGRAPH 2025 paper chunks. Hybrid retrieval (semantic + BM25), Cohere re-ranking, LLM-generated answers with inline citations, and real-time streaming via SSE. Deployed frontend on Vercel, backend on Render.

View on GitHub ↗

RAG ChatBot — Tata Motors

LangChain, LlamaIndex

Enterprise RAG system for Tata Motors — query internal knowledge bases via natural language. Hybrid retrieval (embeddings + BM25), re-ranking, and RAGAS-based quality evaluation.

View on GitHub ↗

Document Analysis

Python, OCR, Celery, Redis

Scalable pipeline for parsing scanned and structured PDFs. Multimodal OCR with layout understanding, asynchronous distributed processing via Celery and Redis.

Pinterest Scraper

Python, Playwright, PyPI

Open-source Python package for scraping Pinterest images — search, download, HTML gallery generation, rate-limit handling, and CLI support. Published on PyPI.

View on GitHub ↗ Download Stats ↗

YouTube Analytics Tool

Python, LLMs, API

Analyzed correlation between video performance and semantic/emotional content features. Used LLMs for metadata extraction, structuring, and feature engineering.

Diffusion Models

Fine-tuning LoRAs, building ComfyUI inference workflows, generating images across styles, and producing full AI-generated films and video content.

AI Films & Video Generation

FLUX, Midjourney, Runway, Kling AI, Hunyuan Video, ComfyUI

Brave New Art — JioHotstar (Dec 2024)
Collaborated with Lenovo to create India's first fully AI-generated short film. End-to-end production using FLUX, Midjourney, Runway, and Kling AI.

SMFG India Credit — Mother's Day AI Film (May 2025)
Produced an AI-generated short film for SMFG India Credit's Mother's Day campaign.

Image Generation

Stable Diffusion, FLUX, LoRA, Dreambooth, ComfyUI, A1111

Generated images in varied artistic styles. Fine-tuned FLUX, SDXL, SD 3.5 LoRAs on Kohya_SS and published models to CivitAI.

Published LoRA Models

SDXL, FLUX, Kohya_SS, ComfyUI

Fine-tuned and published LoRA models on CivitAI:

Aman Hanspal AI / ML Engineer

Machine Learning

Computer Vision

Engineering

Diffusion Models

Aman Hanspal
AI / ML Engineer