Laion AI Image API
A simple API to get images from the Laion 5B Dataset without having to download it.
For the unaware, Laion 5B is a dataset of over 5 billion images, they have even exposed a frontend to retrieve images from the dataset - https://knn5.laion.ai.
But sometimes you just want to get the images into your project without downloading the entire dataset - of course the quality of these images won't be as high as a paid API, but it's a great way to get started or to develop a proof of concept.
API
Unfortunatly the API documentation is lacking to the degree that I couldn't actually find any, but thankfully the provided frontend is more then enough to reverse engineer it, in addition to the Python Clip Client that exists.
The endpoint is https://knn5.laion.ai/knn-service
, and all it requires is a POST
request with a JSON body containing a few attributes, most of which are customizable by the frontend itself:
{
"text": "cat",
"image": null,
"image_url": null,
"embedding_input": null,
"modality": "image",
"num_images": 40,
"indice_name": "laion5B",
"num_result_ids": 3000,
"use_mclip": false,
"deduplicate": true,
"use_safety_model": true,
"use_violence_detector": true,
"aesthetic_score": "9",
"aesthetic_weight": "0.5"
}
Most of this is straight-forward or explained by the linked UI, but of that which isn't:
num_images
- Desired number of full objects returned
num_result_ids
- Desired number of objects returned
Now an object is at minimum:
{
"id": 1,
"similarity": 0.5
}
while a full object actually has the image URL and caption:
{
"id": 1,
"similarity": 0.5,
"url": "https://example.com/image.png",
"caption": "Example Image"
}
Pagination
This approach can be used to fetch all that you need, but only the images you need for the current page, and then you can make a POST
request to the https://knn5.laion.ai/metadata
with an array of IDs you want to get the full objects of:
{
"ids": [1, 2],
"indice_name": "laion5B"
}
which as expected returns an array of the full objects, with the additional data under a metadata
property:
[
{
"id": 1,
"metadata": {
"caption": "Example Image",
"url": "https://example.com/image.png"
}
},
{
"id": 2,
"metadata": {
"caption": "Example Image",
"url": "https://example.com/image.png"
}
}
]
Conclusion
Of course the scale of this dataset means you don't always get what you might want, so crafting your prompt can improve the results, additionally the URLs don't always have a valid image response, so handling of these dead URLs will need to be something handled by your application.