Mishig Davaadorj

dmishig@gmail.com https://mishig25.github.io/

I’m a software & ML engineer. I received my bachelor’s degree in computer science from Colorado College in 2019. Since the summer of 2021, I’ve been part of a team building the Hugging Face Hub and contributing to Hugging Face’s open-source Machine Learning libraries.

Here are some of the highlights:

  • Built various aspects of Hugging Face Hub, making it the de facto platform for sharing models and datasets. Used typescript, svelte, tailwindcss, mongodb, express. The features I’ve worked on are:
  • diffusers#559: jax/flax implementation of Stable Diffusion 1.1.

  • tokenizers#890: improve serialization/deserialization of tokenizers through Rust macros that implement necessary serde traits.

  • huggingface/chat-ui: UI for chatting with LLMs (used svelte, tailwindcss, mongodb, typescript). Supports tool calling (image generation) & RAG (websearch, document extraction).

  • transformers#13828: image segmentation pipeline implementation for facebook/detr-resnet-50 & other models that can do image segmentation.

  • tokenizers#976: parallelize unigram tokenization trainer using rayon, a popular Rust parallelization crate.

  • lerobot#277: robotics dataset (videos & sensor signals) visualizer for easily testing & debugging real-life robitics.

  • huggingface/doc-builder: python package that parses markdown files + jupyter notebooks + python docstrings (through inspect) and creates docs websites (using svelte), powering hf.co/docs.

  • huggingface/gguf.js: js GGUF parser that works on remotely hosted files. GGUF is a weights file format, created by Georgi Gerganov (the creator of llama.cpp).

  • diffuse-the-rest: a web app that uses Stable Diffusion (diffusers#559 backend) to turn sketches into higher-quality images. One of the first apps to go viral that used Stable Diffusion (Aug 2022).