Dataset__captiontest1 - OpenDiffusionAI/wiki GitHub Wiki

Image 1

Model Prompt output
GPT4o (see gtp40 section of page) (see gtp40 section)
InternVL-Chat-V1.5
GLM-4v-9B you are a profesionnal captionner, what do you see ? a painting of ...
InternLM-XComposer2-VL Please objectively describe the subjects in detail, including any blurring. The image portrays a serene scene of two individuals fishing at the edge of a lake during sunset. The older man, positioned on the left side of the image, is holding a fishing rod and appears to be teaching or guiding the younger individual, who is located on the right side of the image. Both figures are dressed in casual attire suitable for an outdoor activity like fishing.
InternLM-XComposer2.5 caption this image in detail The image depicts a serene scene of an elderly man and a young boy fishing at sunset. The man is wearing a blue shirt and a cap, holding a fishing rod, while the boy, also holding a fishing rod. They are standing on the shore of a lake, surrounded by tall grasses and trees. The sky is a beautiful sunset, with the sun shining on the water, creating a reflection on the lake. The colors of the painting are warm, with yellow, orange, blue, and green. The art style is impressionistic, with brush strokes and soft colors, creating a peaceful atmosphere.
InternLM-XComposer2-VL Describe the subject's age, gender, hair type, direction they are facing, style of clothes, color of clothes The image portrays an older man and a young boy. The older man is wearing a white cap, blue shirt, and brown pants. He has a long beard and is holding a fishing rod. The young boy is wearing a green jacket and a baseball cap. They are both facing the water, with the sun setting behind them.
InternVL-Chat-V1.5
PaliGemma Fine-tuned for Long Captioning for Stable Diffusion 3 no input prompt in a detailed close-up shot at eye-level, a grandfather and his grandson are engrossed in fishing together on a serene lake. the grandfather, clad in a blue shirt and brown vest, is positioned on the left side of the frame, while the grandson, dressed in a light blue hoodie and gray pants, is on the right. both are equipped with fishing poles in their hands, with the grandfather's pole positioned on the left and the grandson's on the right. the grandfather is carrying a black backpack on his back, while the grandson is dressed in a light blue hoodie and gray pants. the lake, reflecting the setting sun, is filled with reeds and other aquatic plants, adding depth to the scene. the sky, painted in a warm palette of yellows, oranges, and reds, is reflected in the water, creating a vibrant contrast to the scene.
LLava 1.5 7b describe the image The image features a man and a young boy standing next to each other on the bank of a river. The man is holding a fishing rod, and the boy is watching him as he prepares to fish. They are both wearing backpacks, which are placed on the ground near them. The scene is set against a beautiful backdrop of a sunset, adding a serene atmosphere to the moment. The man and the boy are enjoying their time together, sharing the experience of fishing and the beauty of nature.
LLava 1.6 34b describe the image The image depicts a serene scene of two individuals fishing by a body of water. On the left, there is an older man with a white beard and a baseball cap, wearing a blue shirt and a vest. He is standing with a fishing rod in his hand, and there is a backpack on his back. On the right, there is a younger boy, also holding a fishing rod, wearing a baseball cap and a blue jacket. They are both standing in the water, surrounded by tall grasses and reeds, which suggest they are in a marshy or riverbank area. The background features a beautiful sunset or sunrise, with warm colors of yellow, orange, and red blending into the sky. The reflection of the light on the water's surface adds to the tranquil atmosphere. There are trees in the distance, and the overall mood of the painting is peaceful and idyllic, capturing a moment of shared leisure and connection with nature. The style of the painting is realistic with a focus on capturing the light and atmosphere, which gives it a somewhat impressionistic quality.
LLava 1.6 34b describe the image in detail (See standalone page section below)
LLava 1.6 34b describe the image concisely and objectively The image depicts an impressionistic painting of a tranquil scene by a body of water. An older man and a younger boy are standing on the bank, both engaged in fishing. The man is on the left, wearing a cap and a vest, with a fishing rod in his hand. The boy is on the right, also holding a fishing rod. They are surrounded by lush greenery, and the water reflects the warm hues of the sky, suggesting either sunrise or sunset. The painting style is characterized by loose brushstrokes and vibrant colors, emphasizing the mood of the scene over precise detail
Florence 2 large <MORE_DETAILED_CAPTION> The image is a painting of an elderly man and a young boy fishing on a lake. The man is standing on the bank of the lake, holding a fishing rod and casting his line into the water. He is wearing a blue jacket, a cap, and a backpack. The boy is standing next to him, wearing a green jacket and a cap. The lake is surrounded by tall grass and reeds, and there are trees in the background. The sky is painted in a warm orange and yellow hue, with the sun setting in the distance. The overall mood of the image is peaceful and serene.
Florence 2 large finetune <MORE_DETAILED_CAPTION> A painting of a man and a boy fishing. The man is wearing a blue shirt and a backpack. He is holding a fishing rod in his hand. The boy is standing next to the man. There is water in front of them. There are trees behind them. The sky above them is orange and yellow.
Florence 2 large finetune Describe this image in detail In this image we can see a painting of a man and a boy fishing in the water. In the background there are trees and sky.
cogVLM2 describe the style and content of this picture In a tranquil lake setting, an elderly man with a white beard and a young boy share a moment of fishing under the golden glow of a setting sun. The man, clad in a blue vest and cap, stands with a fishing pole, while the boy, in a green jacket, stands beside him, holding a fishing rod. The scene is a harmonious blend of warm colors, with the sun's reflection on the water creating a serene atmosphere. The man's gentle smile and the boy's attentive gaze convey a sense of joy and companionship, encapsulating the essence of a peaceful fishing experience.

LLava 1.6 32b (4bit)

"describe the image in detail"

The image depicts a serene scene of two individuals fishing by a body of water. The central figures are an older man and a younger boy, both engaged in the act of fishing. The man is positioned to the left of the boy, and both are dressed in casual attire suitable for outdoor activities. The man is wearing a cap and a vest, while the boy is wearing a baseball cap and a jacket.

The setting appears to be a tranquil lakeside or riverside, with the water reflecting the warm hues of the sky. The sky is painted with a gradient of colors, suggesting either sunrise or sunset, with the sun visible near the horizon. The surrounding environment is lush with greenery, including tall grasses and trees that frame the scene.

The painting is executed in an impressionistic style, characterized by loose brushstrokes and a focus on the interplay of light and color. The artist has used a warm palette, with the colors transitioning from warm oranges and yellows near the sky to cooler blues and greens in the water and vegetation.

The overall mood of the painting is peaceful and nostalgic, evoking a sense of connection to nature and the simple joys of outdoor activities. The artist has skillfully captured the interplay of light and shadow, with the sun's rays creating a radiant effect on the water's surface. The reflections and the way the light is diffused through the scene suggest a calm and quiet atmosphere.

There are no visible texts or inscriptions within the image. The painting is signed in the lower right corner, but the signature is not legible in this description. The image is a beautiful representation of a moment of shared leisure and the beauty of the natural world.

GPT-4o section

GTP40 prompt

LOOONG-ol GPT-4o prompt: You are an expert captionner, a GPT AI tasked with classifying images. The purpose is to assist in training Stable Diffusion, which works much like DALL·E. This is a mono-purpose task, with no need to add any formalities or extra text beyond the image classification. Only respond with a non-description response if the user asks you a question or for clarification. Goal: To produce highly detailed and precise image captions, emphasizing accuracy and consistency in English outputs. Global settings : Format: phase in natural language by adding tags (separated by commas) at times, in one long paragraph. Exhaustive: Include every visible element in the image: scene settings, character actions, object details, and color usage. Feature: Focus on physical and visible characteristics. Examples: "red", "mountain", "dog". Descriptive: Cover emotions, themes, or abstract concepts. Examples: "loneliness", "noisy", "futuristic". Compound: Describe relationships or actions. Examples: "person hugging dog", "sunset moment". Hierarchical System: Range from broad to specific. Examples: "transportation", "car", "red sports car". Data Quality: Data cleansing and preprocessing to remove mislabels, typos, and inconsistencies. Ensure diversity and representativeness across themes, styles, backgrounds, and subjects. Output Process: Confirm the Medium: Prefix the output with the medium of the uploaded image. Examples: "a photograph of", "an illustration of", "3D artwork of", "a painting of...". Refine the Category: Based on the medium, refine the category. Example: "a woodcut illustration of...". Identify the Main Subject: Provide a detailed description of the main subject. Example: "a goldendoodle with a golden hue". Add Specific Details: Highlight particular elements like attire or pose. Example: "adorned in a suit". Subject location: locates subjects within the image and between them. Example: "on the top right", " far behind", "vertically", "next to a car in front of a tree". Lighting: "soft", "ambient", "ring light", or "neon". Example: "bathed in natural light". Environmental Background: Specify the background as indoor, outdoor, underwater, or in space. Example: "set against the sky". Color Scheme: Determine the overall tone of the work. Examples: "vibrant", "muted", "pastel". Example: "splashed with vibrant colors". Artistic Texture and Technique: Specify the artistic creation technique, texture and patern. Examples: "print texture", "bold outlines". Select a Specific Art Style: Identify the art style, such as "3D rendering", a particular studio's style, or "movie poster". Example: "in the whimsical style of Hayao Miyazaki". Text: Use OCR to encloses text in quotation marks, describe location, font and style of text. Technical aspects: focal length and field of view. Examples: "Canon EOS R5 35mm", "diagonal view". Perspective: visualize the shape, vomule and spatial relationships. Examples: "trompe-l'œil", "multi-point perspective","overlap perspective". Captioning Structure: Type/Perspective: Broad context descriptions, such as the type of image and the subject's perspective. Action Words: Detailed verbs describing the subject's actions or state. Subject Descriptions: Detailed descriptions of the subject, excluding the main concept being taught. Notable Details: Unique or emphasized elements not classified as background. Background/Location: Detailed background context in layered specificity. Loose Associations: Any additional, relevant associations or feelings evoked by the image. Quality of image: visual characteristic of image like: "hight resolution", "extremely detailed", "blurry", "low quality".

GPT 4o output

A painting of an elderly man and a young boy fishing at sunset, the man is standing to the left with a fishing rod, wearing a hat, a vest, and a backpack, the boy is to the right holding his own fishing rod, wearing a cap, a hoodie, and jeans, both are standing on a grassy bank by a calm lake, the water reflecting the golden hues of the sunset, tall grass and plants around them, a line of trees in the background, birds flying in the sky, creating a serene and nostalgic atmosphere, extremely detailed, vibrant colors, warm lighting.