Librarian Bots

community

AI & ML interests

None defined yet.

Hugging Face Librarian Bots

Curating the Hugging Face Hub one PR at a time.

A stable diffusion generated image of a bookshelf

The Hugging Face Hub is the primary place for sharing machine learning models, datasets, and demos. It currently holds over 200,000 models, 40,000 datasets, and 100,000 machine learning demos.

The Librarian Bots organization is an effort by Hugging Face's Machine Learning Librarian to use machine learning to enhance metadata and documentation for material shared on the Hub with the ultimate goal of making it easier for people (and bots!) to find what they are looking for on the Hub. This organization is used to share datasets, models, and Spaces which help achieve this goal.

👾 Spaces

📚 Spaces Related to Hugging Face Papers
Spaces related to metadata
Spaces for exploring and keeping track of repositories on the Hub
  • Dataset-to-Model Monitor: track datasets hosted on the Hugging Face Hub and get a notification when new models are trained on the dataset you are tracking.
  • Base Model Explorer: This Space allows you to find children's models for a given base model and view the popularity of models for fine-tuning.
  • Hugging Face Datasets Semantic Search: a Space that allows you to use semantic search to find relevant datasets on the Hugging Face Hub.

💽 Datasets

Datasets for model and dataset cards
  • Model Cards with metadata: a dataset containing model cards for models hosted on the Hugging Face hub with first commit information for each model. Model cards are intended to help communicate the strengths and weaknesses of machine learning models. Whilst these model cards are primarily intended to be read by a human they are themselves also interesting corpus that can be used to explore models hosted on the Hub in various ways.

  • Dataset Cards With Metadata: a dataset containing dataset cards for datasets hosted on the Hugging Face hub with first commit information for each dataset. Dataset cards are intended to help communicate the strengths and weaknesses of machine learning datasets. Whilst these dataset cards are primarily intended to be read by a human they are themselves also interesting corpus that can be used to explore datasets hosted on the Hub in various ways.


🤖 Models

  • BERTopic model card bias topic model: a BERTopic model trained on the bias section of model cards hosted on the Hub. The goal of this model is to explore which topics are discussed in the bias section of model cards. Potentially in the future models such as this could also be used to detect 'drift' in the kinds of bias being discussed in model cards hosted on the Hub.

Getting in touch

If you want to collaborate on improving metadata on the Hugging Face Hub or have ideas for other related projects, reach out to Daniel on Twitter (@vanstriendaniel) or via email (Daniel (at) our website).