상위 문서로 이동 : AI Wiki

기존 모델 성능 지표

모델명	이미지 해상도	처리속도	VRAM	처리방식	비고	사용여부
DALL-E3	1024x1024	unknown	unknown	txt→img	API를 이용한 손쉬운 사용, 비용 문제	❌
DeepSeek Janus Pro 7b	768x768	4.2s	약 40GB	txt→img	VRAM 사용대비 아쉬운 성능	❌
Stable Diffusion v1.5	1024x1024	3s	약 11GB	img→img	가볍지만 아쉬운 성능	❌
Stable Diffusion v2.1	1024x1024	3s	약 11GB	img→img	가볍지만 아쉬운 성능	❌
Stable Diffusion XL	1024x1024	8s		img→img	LoRA를 이용해 Fine Tuning 하면 좋은 성능을 보일 것으로 예상	✅v1 이후
Stable Diffusion XL + ControlNet	1024x1024	8s		txt→img	Fine Tuning 예시	❌
HiDream I1 Full	1024x1024	unknown	약 30GB	txt→img	Base Model이 Llama 3.1인 Fine Tuning 예시	❌

식별된 성능 분석 상세 내용

Initial Image

1. DALL-E3

prompt

Given this desk photo, generate a realistic image by compositing a desk setup that aesthetically matches and complements the existing environment

negative prompt

blurry, low resolution, distorted, deformed, people, human, cartoon, anime, painting, clutter, messy background, bad lighting, duplicate, twin, cloned

with ChatGPT-4o & Initial Image

result

ChatGPT Image 2025년 4월 15일 오후 05_23_12

prompt

"A clean, modern desk setup in a bright open office, with a large LG monitor displaying " "a forest wallpaper and a MacBook beneath it. Add a minimalist LED desk lamp on the left, " "a sleek wireless charger pad beside the laptop, and all cables neatly organized and hidden. " "The scene should feel calm, tidy, and professional, with natural lighting and no clutter."

negative prompt

blurry, low resolution, distorted, deformed, people, human, cartoon, anime, painting, clutter, messy background, bad lighting, duplicate, twin, cloned

result

ChatGPT Image 2025년 4월 15일 오후 09_18_28

2. DeepSeek Janus Pro 7b

prompt

explain details

result

This image depicts a workspace setup with a focus on visual content and technology. Here are the details:

Primary Display:
- There is a large monitor mounted on a stand, displaying a forest scene with tall redwood trees. The time on the screen is 2:55.
Secondary Display:
- A laptop is placed on the desk, and its screen is displaying the same forest scene as the larger monitor.
Cables and Accessories:
- Several cables are connected to the laptop, possibly for power, data transfer, or peripherals.
- There is a small device with a blue button on the desk, which might be a USB hub or a small external hard drive.
Work Environment:
- The workspace appears to be in an office or a lab setting, with other desks and chairs visible in the background.
- There are people working in the background, indicating a collaborative or professional environment.
Signage:
- A sign on the desk reads "Please do not touch the screens," suggesting that the screens are sensitive or important.

The setup seems to be designed for monitoring or analyzing visual content, possibly for research, development, or quality assurance purposes.

image generate prompt

result

1	2	3	4	5

ChatGPT Image 2025년 4월 10일 오전 09_06_37

prompt

explain details

result

The image depicts a modern workspace setup with a focus on productivity and comfort. Here are the details:

Monitor and Laptop:
- There is a large monitor mounted on a stand above a laptop. Both display the same image of a serene forest with tall redwood trees.
- The laptop is placed on a black mouse pad, and it is connected to the monitor via a cable.
Lighting:
- A desk lamp with an adjustable arm is positioned to the left of the monitor. The lamp provides focused lighting, which is beneficial for tasks requiring attention to detail.
Charging and Power:
- A power adapter is plugged into the laptop, indicating it is being charged.
- There is a power strip on the desk, which likely accommodates multiple devices.
Plant:
- A small potted plant is placed on the right side of the desk, adding a touch of greenery to the workspace.
Clock:
- The time displayed on the laptop is 8:56, which could be the current time or a reminder.
Workspace Setup:
- The desk is organized, with essential items like the laptop, monitor, lamp, and plant within easy reach.
- The setup suggests a focus on both functionality and aesthetics, creating a pleasant and efficient working environment.

This setup is ideal for someone who needs to work for extended periods, requiring good lighting, a comfortable keyboard, and a touch of nature to stay motivated.

result

1	2	3	4	5

image generate prompt

1	2	3	4	5

3. Stable Diffusion v1.5

prompt

Given this desk photo, generate a realistic image by compositing a desk setup that aesthetically matches and complements the existing environment

negative prompt

blurry, low resolution, distorted, deformed, people, human, cartoon, anime, painting, clutter, messy background, bad lighting, duplicate, twin, cloned

result

1	2

Hyperparameters like strength, guidance_scale, num_inference_steps were varied across images to observe their influence on the results.

4. Stable Diffusion v2.1

prompt

Given this desk photo, generate a realistic image by compositing a desk setup that aesthetically matches and complements the existing environment

negative prompt

blurry, low resolution, distorted, deformed, people, human, cartoon, anime, painting, clutter, messy background, bad lighting, duplicate, twin, cloned

1	2	3	4	5

Hyperparameters like strength, guidance_scale, num_inference_steps were varied across images to observe their influence on the results.

5. Stable Diffusion XL

prompt

negative prompt

"blurry, low resolution, distorted, clutter, people, human, cartoon, overexposed, deformed, text"

result

sdxl_desk_result

6. Stable Diffusion XL + ControlNet

prompt

Given this desk photo, generate a realistic image by compositing a desk setup that aesthetically matches and complements the existing environment

negative prompt

blurry, low resolution, distorted, deformed, people, human, cartoon, anime, painting, clutter, messy background, bad lighting, duplicate, twin, cloned
low quality, bad quality, sketches

result

1	2

7. HiDream-I1-Full

hugging face link

구조: Llama-3.1-8B-Instruct 기반 이미지 생성 모델로 보임

prompt

negative_prompt

"blurry, low resolution, distorted, clutter, people, human, cartoon, overexposed, deformed, text"

output-2

prompt

"A modern office workspace featuring a large monitor on a stand displaying a forest scene " "with tall redwood trees, showing the time as 2:55. A laptop is placed on the desk, mirroring " "the same forest image. Multiple cables are connected to the laptop, possibly for data and power. " "A small device with a blue button sits on the desk, likely a USB hub or external drive. " "The background shows other desks and people working, indicating a collaborative lab or office " "environment. A visible sign on the desk reads 'Please do not touch the screens,' suggesting the " "setup is for research, monitoring, or visual content analysis."

negative_prompt

"blurry, low resolution, distorted, clutter, people, human, cartoon, overexposed, deformed, text"

output-3

Token indices sequence length is longer than the specified maximum sequence length for this model (128 > 77). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (128 > 77). Running this sequence through the model will result in indexing errors
The following part of your input was truncated because max_sequence_length is set to 128 tokens: ['for research, monitoring, or visual content analysis.']
토큰 수 초과로 제대로 된 이미지 생성이 안되었을 수 있음

적용할 최적화 기법의 구체적 계획

LoRA를 이용한 Fine-Tuning

이미 다른 개발자들이 Fine Tuning한 LoRA들을 이용해 성능 테스트를 해본다.

1. Desk topview

Base Model: Stable Diffusion XL 1.0
File Size: 217.88 MB
Example Image

2. 3D Office LoRA

Base Model: Stable Diffusion XL 1.0
File Size: 217.87 MB
Note: 3D 기능은 필요없으므로 Trigger Words에서 3D 0ff1c3를 제거, 그외 필요하지 않은 부분을 prompt에서 제거하여 사용
Example Image

서비스에 적합한 사진들을 많이 뽑아내어 증류후 서비스에 적합한 LoRA로 Fine-Tuning

ChatGPT-4o를 이용해 다양한 Desk에 대한 이미지 확보
앞서 소개한 LoRA를 이용해 서비스에 적합한 다양한 이미지 확보

모델 양자화

모델 양자화를 통해 모델 연산량을 줄인다.

최적화 적용 후의 기대 성능

우리의 서비스는 만족도가 우선이므로 해당 서비스로 만족도를 확보하는 것을 우선으로 최적화한다.
LoRA를 이용해 서비스에 최적화 된 이미지를 생성할 수 있도록 하는 것이 우선 목표
서비스에 최적화 된 이미지가 생성되면 모델 양자화 등을 통해 연산량을 감소시켜 사용자가 대기해야하는 시간을 줄이고 서버 부하를 줄이는 것을 목표로 최족화한다.

AI_모델 성능 지표 - 100-hours-a-week/16-Hot6-wiki GitHub Wiki

목차

기존 모델 성능 지표

식별된 성능 분석 상세 내용

1. DALL-E3

prompt

negative prompt

with ChatGPT-4o & Initial Image

result

prompt

negative prompt

result

2. DeepSeek Janus Pro 7b

prompt

result

image generate prompt

result

prompt

result

result

image generate prompt

3. Stable Diffusion v1.5

prompt

negative prompt

result

4. Stable Diffusion v2.1

prompt

negative prompt

5. Stable Diffusion XL

prompt

negative prompt

result

6. Stable Diffusion XL + ControlNet

prompt

negative prompt

result

7. HiDream-I1-Full

prompt

negative_prompt

prompt

negative_prompt

적용할 최적화 기법의 구체적 계획

LoRA를 이용한 Fine-Tuning

서비스에 적합한 사진들을 많이 뽑아내어 증류후 서비스에 적합한 LoRA로 Fine-Tuning

모델 양자화

최적화 적용 후의 기대 성능

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️