Aman Hanspal
AI / ML Engineer

Aman Hanspal

I build end-to-end AI systems across diffusion models, speech, video, retrieval-augmented generation, and computer vision — from research and fine-tuning to production deployment.

Experience

Aeos Labs
Bengaluru · May 2024 – Jan 2026
Generative AI Researcher
  • Built generative AI pipelines across multimodal domains — image, video, speech, and automation systems.
  • Developed PoCs across RAG, speech synthesis, document intelligence, and enterprise AI workflows.
  • Translated business requirements into deployable AI systems with production-oriented thinking.
Python · FastAPI · LangChain · n8n · AWS · Replicate

Projects

Machine Learning

Applied ML pipelines — voice synthesis, face processing, and model deployment at scale.

Singing Voice Synthesis
DiffSinger, OpenUtau, Python, AWS

Built a DiffSinger-based singing voice synthesis pipeline for an Ogilvy × Cadbury marketing campaign that reached 14,000+ users during Christmas 2024. Handled phoneme alignment, variance modeling, and acoustic synthesis on a 3-hour custom music dataset. Deployed real-time inference workflows using AWS Lambda.

Sample outputs ↗
FaceFusion Replicate Deployment
Modal, Replicate, ffmpeg, Python

Deployed FaceFusion on Replicate for internal use cases and marketing campaigns. The pipeline accepts a JSON config with faceswap parameters per video segment, applies them frame-accurately, and stitches the final output using ffmpeg.

View on GitHub ↗

Computer Vision

VisionQuery (Open-Vocabulary Video Search)
Python, FastAPI, YOLO-World

Enabled natural language querying over videos using open-vocabulary object detection. Built frame sampling and a YOLO-World inference pipeline for dynamic class detection without retraining. Designed APIs for upload, querying, and timestamp-level detection results.

View on GitHub ↗
VisionQuery — open-vocabulary video search demo

Engineering

Retrieval pipelines, document processing, data tooling, and open-source packages.

Speech-to-Speech (JioStar)
OpenAI Realtime API, Python

Led research and development for a speech-to-speech POC for JioStar — converting English audio to Indian regional languages. Evaluated multiple pipeline approaches with a technical report benchmarking latency, speed, and output accuracy. Final POC built on OpenAI's Realtime API.

View on GitHub ↗
SIGGRAPH 2025 RAG Search
Next.js, FastAPI, Qdrant, OpenRouter, Cohere

Full-stack RAG application for searching 11,000+ SIGGRAPH 2025 paper chunks. Hybrid retrieval (semantic + BM25), Cohere re-ranking, LLM-generated answers with inline citations, and real-time streaming via SSE. Deployed frontend on Vercel, backend on Render.

View on GitHub ↗
RAG ChatBot — Tata Motors
LangChain, LlamaIndex

Enterprise RAG system for Tata Motors — query internal knowledge bases via natural language. Hybrid retrieval (embeddings + BM25), re-ranking, and RAGAS-based quality evaluation.

View on GitHub ↗
Document Analysis
Python, OCR, Celery, Redis

Scalable pipeline for parsing scanned and structured PDFs. Multimodal OCR with layout understanding, asynchronous distributed processing via Celery and Redis.

Pinterest Scraper
Python, Playwright, PyPI

Open-source Python package for scraping Pinterest images — search, download, HTML gallery generation, rate-limit handling, and CLI support. Published on PyPI.

View on GitHub ↗ Download Stats ↗
Pinterest Scraper — PyPI download stats
YouTube Analytics Tool
Python, LLMs, API

Analyzed correlation between video performance and semantic/emotional content features. Used LLMs for metadata extraction, structuring, and feature engineering.

Diffusion Models

Fine-tuning LoRAs, building ComfyUI inference workflows, generating images across styles, and producing full AI-generated films and video content.

AI Films & Video Generation
FLUX, Midjourney, Runway, Kling AI, Hunyuan Video, ComfyUI

Brave New Art — JioHotstar (Dec 2024)
Collaborated with Lenovo to create India's first fully AI-generated short film. End-to-end production using FLUX, Midjourney, Runway, and Kling AI.

SMFG India Credit — Mother's Day AI Film (May 2025)
Produced an AI-generated short film for SMFG India Credit's Mother's Day campaign.

Image Generation
Stable Diffusion, FLUX, LoRA, Dreambooth, ComfyUI, A1111

Generated images in varied artistic styles. Fine-tuned FLUX, SDXL, SD 3.5 LoRAs on Kohya_SS and published models to CivitAI.

AI generated image 1 AI generated image 2 AI generated image 3
Published LoRA Models
SDXL, FLUX, Kohya_SS, ComfyUI

Fine-tuned and published LoRA models on CivitAI:

Hackathons

Note It — VideoDB Online Hackathon
Electron, React, FastAPI, VideoDB, Notion

Built note·it, a desktop learning agent for educational videos at the VideoDB Global Online Hackathon (May 2026). The app runs as a floating Electron overlay, captures screen and audio through VideoDB, indexes live streams with multimodal models, and writes structured study notes with screenshots into Notion.

Watch demo ↗ View on GitHub ↗

Skills

Domains
Generative AI, Applied ML, Machine Learning
Programming
Python, FastAPI
Frameworks
LangChain, LangGraph, LlamaIndex
Tools
Docker, Git, n8n, ComfyUI
Cloud
AWS, Replicate

Reading List

Research papers I'm reading, breaking down, and implementing.

View on GitHub ↗

Education

DY Patil University, RAIT
Mumbai · Class of 2024
B.Tech in Computer Engineering · CGPA 9.22

Extras

Contact