Return to Blog Home

Laion AI Image API

 

Laion AI Image API

A simple A​PI to get images from the Laion 5B Dataset without having to download it.

For the unaware, Laion 5B is a dataset of over 5 billion images, they have even exposed a frontend to retrieve images from the dataset - https://knn5.laion.ai.

But sometimes you just want to get the images into your project without downloading the entire dataset - of course the quality of these images won't be as high as a paid API, but it's a great way to get started or to develop a proof of concept.

API

Unfortunatly the API documentation is lacking to the degree that I couldn't actually find any, but thankfully the provided frontend is more then enough to reverse engineer it, in addition to the Python Clip Client that exists.

The endpoint is https://knn5.laion.ai/knn-service, and all it requires is a POST request with a JSON body containing a few attributes, most of which are customizable by the frontend itself:

{
  "text": "cat",
  "image": null,
  "image_url": null,
  "embedding_input": null,
  "modality": "image",
  "num_images": 40,
  "indice_name": "laion5B",
  "num_result_ids": 3000,
  "use_mclip": false,
  "deduplicate": true,
  "use_safety_model": true,
  "use_violence_detector": true,
  "aesthetic_score": "9",
  "aesthetic_weight": "0.5"
}

Most of this is straight-forward or explained by the linked UI, but of that which isn't:

Now an object is at minimum:

{
  "id": 1,
  "similarity": 0.5
}

while a full object actually has the image URL and caption:

{
  "id": 1,
  "similarity": 0.5,
  "url": "https://example.com/image.png",
  "caption": "Example Image"
}

Pagination

This approach can be used to fetch all that you need, but only the images you need for the current page, and then you can make a POST request to the https://knn5.laion.ai/metadata with an array of IDs you want to get the full objects of:

{
  "ids": [1, 2],
  "indice_name": "laion5B"
}

which as expected returns an array of the full objects, with the additional data under a metadata property:

[
  {
    "id": 1,
    "metadata": {
      "caption": "Example Image",
      "url": "https://example.com/image.png"
    }
  },
  {
    "id": 2,
    "metadata": {
      "caption": "Example Image",
      "url": "https://example.com/image.png"
    }
  }
]

Conclusion

Of course the scale of this dataset means you don't always get what you might want, so crafting your prompt can improve the results, additionally the URLs don't always have a valid image response, so handling of these dead URLs will need to be something handled by your application.