AI_모델 성능 지표 - 100-hours-a-week/16-Hot6-wiki GitHub Wiki

상위 문서로 이동 : AI Wiki

목차

1. 기존 모델 성능 지표
2. 식별된 성능 분석 상세 내용
3. 적용할 최적화 기법의 구체적 계획
4. 최적화 적용 후의 기대 성능

기존 모델 성능 지표

모델명 이미지 해상도 처리속도 VRAM 처리방식 비고 사용여부
DALL-E3 1024x1024 unknown unknown txt→img API를 이용한 손쉬운 사용, 비용 문제 ✅v1 예정
DeepSeek Janus Pro 7b 768x768 4.2s 약 40GB txt→img VRAM 사용대비 아쉬운 성능
Stable Diffusion v1.5 1024x1024 3s 약 11GB img→img 가볍지만 아쉬운 성능
Stable Diffusion v2.1 1024x1024 3s 약 11GB img→img 가볍지만 아쉬운 성능
Stable Diffusion XL 1024x1024 8s img→img LoRA를 이용해 Fine Tuning 하면 좋은 성능을 보일 것으로 예상 ✅v1 이후
Stable Diffusion XL + ControlNet 1024x1024 8s txt→img Fine Tuning 예시
HiDream I1 Full 1024x1024 unknown 약 30GB txt→img Base Model이 Llama 3.1인 Fine Tuning 예시

식별된 성능 분석 상세 내용

Initial Image IMG_1205

1. DALL-E3

prompt

Given this desk photo, generate a realistic image by compositing a desk setup that aesthetically matches and complements the existing environment

negative prompt

blurry, low resolution, distorted, deformed, people, human, cartoon, anime, painting, clutter, messy background, bad lighting, duplicate, twin, cloned

with ChatGPT-4o & Initial Image

result

ChatGPT Image 2025년 4월 15일 오후 05_23_12

prompt

"A clean, modern desk setup in a bright open office, with a large LG monitor displaying " "a forest wallpaper and a MacBook beneath it. Add a minimalist LED desk lamp on the left, " "a sleek wireless charger pad beside the laptop, and all cables neatly organized and hidden. " "The scene should feel calm, tidy, and professional, with natural lighting and no clutter."

negative prompt

blurry, low resolution, distorted, deformed, people, human, cartoon, anime, painting, clutter, messy background, bad lighting, duplicate, twin, cloned

result

ChatGPT Image 2025년 4월 15일 오후 09_18_28

2. DeepSeek Janus Pro 7b

prompt

explain details

result

This image depicts a workspace setup with a focus on visual content and technology. Here are the details:

  1. Primary Display:

    • There is a large monitor mounted on a stand, displaying a forest scene with tall redwood trees. The time on the screen is 2:55.
  2. Secondary Display:

    • A laptop is placed on the desk, and its screen is displaying the same forest scene as the larger monitor.
  3. Cables and Accessories:

    • Several cables are connected to the laptop, possibly for power, data transfer, or peripherals.
    • There is a small device with a blue button on the desk, which might be a USB hub or a small external hard drive.
  4. Work Environment:

    • The workspace appears to be in an office or a lab setting, with other desks and chairs visible in the background.
    • There are people working in the background, indicating a collaborative or professional environment.
  5. Signage:

    • A sign on the desk reads "Please do not touch the screens," suggesting that the screens are sensitive or important.

The setup seems to be designed for monitoring or analyzing visual content, possibly for research, development, or quality assurance purposes.

image generate prompt

result

1 2 3 4 5
image-3 image-4 image-5 image-6 image-7

ChatGPT Image 2025년 4월 10일 오전 09_06_37

prompt

explain details

result

The image depicts a modern workspace setup with a focus on productivity and comfort. Here are the details:

  1. Monitor and Laptop:

    • There is a large monitor mounted on a stand above a laptop. Both display the same image of a serene forest with tall redwood trees.
    • The laptop is placed on a black mouse pad, and it is connected to the monitor via a cable.
  2. Lighting:

    • A desk lamp with an adjustable arm is positioned to the left of the monitor. The lamp provides focused lighting, which is beneficial for tasks requiring attention to detail.
  3. Charging and Power:

    • A power adapter is plugged into the laptop, indicating it is being charged.
    • There is a power strip on the desk, which likely accommodates multiple devices.
  4. Plant:

    • A small potted plant is placed on the right side of the desk, adding a touch of greenery to the workspace.
  5. Clock:

    • The time displayed on the laptop is 8:56, which could be the current time or a reminder.
  6. Workspace Setup:

    • The desk is organized, with essential items like the laptop, monitor, lamp, and plant within easy reach.
    • The setup suggests a focus on both functionality and aesthetics, creating a pleasant and efficient working environment.

This setup is ideal for someone who needs to work for extended periods, requiring good lighting, a comfortable keyboard, and a touch of nature to stay motivated.

result

1 2 3 4 5
image-8 image-9 image-10 image-11 image-12

image generate prompt

"A clean, modern desk setup in a bright open office, with a large LG monitor displaying " "a forest wallpaper and a MacBook beneath it. Add a minimalist LED desk lamp on the left, " "a sleek wireless charger pad beside the laptop, and all cables neatly organized and hidden. " "The scene should feel calm, tidy, and professional, with natural lighting and no clutter."

1 2 3 4 5
image-13 image-14 image-15 image-16 image-17

3. Stable Diffusion v1.5

prompt

Given this desk photo, generate a realistic image by compositing a desk setup that aesthetically matches and complements the existing environment

negative prompt

blurry, low resolution, distorted, deformed, people, human, cartoon, anime, painting, clutter, messy background, bad lighting, duplicate, twin, cloned

result

1 2
output_img2img output_img2img-2
  • Hyperparameters like strength, guidance_scale, num_inference_steps were varied across images to observe their influence on the results.

4. Stable Diffusion v2.1

prompt

Given this desk photo, generate a realistic image by compositing a desk setup that aesthetically matches and complements the existing environment

negative prompt

blurry, low resolution, distorted, deformed, people, human, cartoon, anime, painting, clutter, messy background, bad lighting, duplicate, twin, cloned

1 2 3 4 5
output_stable_diffusion_v2 1_1 output_stable_diffusion_v2 1_1-2 output_stable_diffusion_v2 1_1-3 output_stable_diffusion_v2 1_1-4 output_stable_diffusion_v2 1_1-5
  • Hyperparameters like strength, guidance_scale, num_inference_steps were varied across images to observe their influence on the results.

5. Stable Diffusion XL

prompt

"A clean, modern desk setup in a bright open office, with a large LG monitor displaying " "a forest wallpaper and a MacBook beneath it. Add a minimalist LED desk lamp on the left, " "a sleek wireless charger pad beside the laptop, and all cables neatly organized and hidden. " "The scene should feel calm, tidy, and professional, with natural lighting and no clutter."

negative prompt

"blurry, low resolution, distorted, clutter, people, human, cartoon, overexposed, deformed, text"

result

sdxl_desk_result

6. Stable Diffusion XL + ControlNet

prompt

Given this desk photo, generate a realistic image by compositing a desk setup that aesthetically matches and complements the existing environment

negative prompt

  1. blurry, low resolution, distorted, deformed, people, human, cartoon, anime, painting, clutter, messy background, bad lighting, duplicate, twin, cloned
  2. low quality, bad quality, sketches

result

1 2
hug_lab hug_lab-2

7. HiDream-I1-Full

hugging face link

구조: Llama-3.1-8B-Instruct 기반 이미지 생성 모델로 보임

prompt

"A clean, modern desk setup in a bright open office, with a large LG monitor displaying " "a forest wallpaper and a MacBook beneath it. Add a minimalist LED desk lamp on the left, " "a sleek wireless charger pad beside the laptop, and all cables neatly organized and hidden. " "The scene should feel calm, tidy, and professional, with natural lighting and no clutter."

negative_prompt

"blurry, low resolution, distorted, clutter, people, human, cartoon, overexposed, deformed, text"

output-2

prompt

"A modern office workspace featuring a large monitor on a stand displaying a forest scene " "with tall redwood trees, showing the time as 2:55. A laptop is placed on the desk, mirroring " "the same forest image. Multiple cables are connected to the laptop, possibly for data and power. " "A small device with a blue button sits on the desk, likely a USB hub or external drive. " "The background shows other desks and people working, indicating a collaborative lab or office " "environment. A visible sign on the desk reads 'Please do not touch the screens,' suggesting the " "setup is for research, monitoring, or visual content analysis."

negative_prompt

"blurry, low resolution, distorted, clutter, people, human, cartoon, overexposed, deformed, text"

output-3

  • Token indices sequence length is longer than the specified maximum sequence length for this model (128 > 77). Running this sequence through the model will result in indexing errors
    Token indices sequence length is longer than the specified maximum sequence length for this model (128 > 77). Running this sequence through the model will result in indexing errors
    The following part of your input was truncated because max_sequence_length is set to 128 tokens: ['for research, monitoring, or visual content analysis.']

  • 토큰 수 초과로 제대로 된 이미지 생성이 안되었을 수 있음

적용할 최적화 기법의 구체적 계획

LoRA를 이용한 Fine-Tuning

이미 다른 개발자들이 Fine Tuning한 LoRA들을 이용해 성능 테스트를 해본다.

1. Desk topview

  • Base Model: Stable Diffusion XL 1.0
  • File Size: 217.88 MB
  • Example Image
    image

2. 3D Office LoRA

  • Base Model: Stable Diffusion XL 1.0
  • File Size: 217.87 MB
  • Note: 3D 기능은 필요없으므로 Trigger Words에서 3D 0ff1c3를 제거, 그외 필요하지 않은 부분을 prompt에서 제거하여 사용
  • Example Image
    image

서비스에 적합한 사진들을 많이 뽑아내어 증류후 서비스에 적합한 LoRA로 Fine-Tuning

  1. ChatGPT-4o를 이용해 다양한 Desk에 대한 이미지 확보
  2. 앞서 소개한 LoRA를 이용해 서비스에 적합한 다양한 이미지 확보

모델 양자화

  1. 모델 양자화를 통해 모델 연산량을 줄인다.

최적화 적용 후의 기대 성능

  1. 우리의 서비스는 만족도가 우선이므로 해당 서비스로 만족도를 확보하는 것을 우선으로 최적화한다.
    LoRA를 이용해 서비스에 최적화 된 이미지를 생성할 수 있도록 하는 것이 우선 목표
  2. 서비스에 최적화 된 이미지가 생성되면 모델 양자화 등을 통해 연산량을 감소시켜 사용자가 대기해야하는 시간을 줄이고 서버 부하를 줄이는 것을 목표로 최족화한다.
⚠️ **GitHub.com Fallback** ⚠️