An AI-powered tool that scrapes images from a webpage and generates captions for each image using the BLIP model. This project uses Gradio to create an interactive web interface.
The Image Caption Generator is designed to help users automatically generate descriptive captions for images found on a webpage. This tool can be useful in the following contexts:
- Content Creation: Content creators, bloggers, or social media managers can use this tool to quickly generate captions for images on their websites or blogs. It saves time and adds valuable metadata to images.
- Accessibility: The captions generated can be used to provide alt text for images, helping visually impaired users understand the content on a webpage.
- Image Dataset Creation: Researchers and data scientists can leverage this tool to build datasets with image captions, which can be useful for training machine learning models in computer vision tasks.
- Web Scraping & Automation: This tool automates the process of scraping images and generating captions, which can be useful for businesses or organizations that need to collect large amounts of image data from various websites.
- Input: Paste the URL of any webpage containing images.
- Processing: The app scrapes all images from the page, processes them using the BLIP (Bootstrapping Language Image Pretraining) model, and generates captions.
- Output: You can download the generated captions as a
.csv
file containing the image URLs and their respective captions.
- Scrapes all images from the given URL.
- Generates captions for each image using an AI model (BLIP).
- Download captions in
.csv
format for further use. - Interactive web interface powered by Gradio.
- Install dependencies using
pip install -r requirements.txt
-
Clone the repository:
git clone https://github.com/bushraqurban/image-caption-generator.git cd image-caption-generator
-
Create a virtual environment and activate it (optional but recommended):
python3 -m venv venv # On Mac source venv/bin/activate # On Windows
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the application:
python app.py
This will launch the app in your browser.
- Paste any webpage URL that contains images (e.g., Wikipedia or blogs).
- Click the Generate Captions button to generate captions.
- After the captions are generated, download the captions file with
.csv
formate.
Here’s an example of how the .csv
file will look like after running the tool:
- Error Handling: Improve error handling for more robust scraping (e.g., handle broken links or missing images).
- User Interface: Enhance the user interface for better interaction (e.g., adding image previews alongside captions).
- Expand Functionality: Add support for other content formats (e.g., videos) and multiple languages.
This project is licensed under the MIT License - see the LICENSE file for details.
This project was inspired by the IBM AI Developer Professional Certificate course guided project. I have further enhanced it by adding several custom features, including:
- A user-friendly interface that allows users to interact with the app directly without needing to run Python scripts.
- An improved output format that generates captions in a CSV file with a table structure, making it more organized and user-friendly.
- BLIP Model for image captioning.
- Gradio for the easy-to-use interface framework.
- BeautifulSoup for web scraping.