Mishig Davaadorj

dmishig@gmail.com

Software & ML engineer specializing in large-scale ML infrastructure and developer tools. I hold a bachelor’s degree in computer science from Colorado College (2019). Since 2021, I’ve been building the Hugging Face Hub and contributing to Hugging Face’s open-source Machine Learning libraries.

Key contributions:

  • Architected and shipped core features for Hugging Face Hub, helping establish it as the leading platform for sharing ML models and datasets. Built with typescript, svelte, tailwindcss, mongodb, and express. Notable features include:
  • diffusers#559: JAX/Flax implementation of Stable Diffusion 1.1, enabling efficient inference on TPUs.

  • tokenizers#890: optimized serialization/deserialization performance for tokenizers using Rust macros and serde trait implementations.

  • huggingface/chat-ui: conversational interface for interacting with LLMs, built with svelte, tailwindcss, mongodb, and typescript. Features include tool calling (image generation) and RAG capabilities (websearch, document extraction).

  • transformers#13828: implemented image segmentation pipeline for facebook/detr-resnet-50 and other segmentation models, simplifying inference workflows.

  • tokenizers#976: parallelized unigram tokenization trainer with rayon, significantly improving training performance.

  • lerobot#277: robotics dataset visualizer for testing and debugging real-world robotics systems, handling video and sensor signal data.

  • huggingface/doc-builder: documentation framework that parses markdown, jupyter notebooks, and python docstrings (via inspect) to generate documentation websites with svelte, powering hf.co/docs.

  • huggingface/gguf.js: JavaScript GGUF parser with remote file support, enabling efficient parsing of model weights without full downloads. GGUF is the weights format created by Georgi Gerganov (creator of llama.cpp).

  • diffuse-the-rest: viral web app that transformed sketches into high-quality images using Stable Diffusion (diffusers#559 backend). One of the earliest Stable Diffusion applications to gain widespread adoption (Aug 2022).