[๊ธฐ์ˆ ์ •๋ฆฌ] ๐Ÿค– Gemini API ์—ฐ๋™ ๋ฐ ํ”„๋กฌํ”„ํŠธ ์—”์ง€๋‹ˆ์–ด๋ง - DDAL-KKAK-DOT/DDALKKAK GitHub Wiki

๐Ÿค– Gemini API ์—ฐ๋™ ๋ฐ ํ”„๋กฌํ”„ํŠธ ์—”์ง€๋‹ˆ์–ด๋ง

DDALKKAK์€ ์‚ฌ์šฉ์ž์˜ ํ”„๋กœํ•„ ์ •๋ณด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ƒ์„ธํ•œ ์ด๋ ฅ์„œ๋ฅผ ์ž๋™์œผ๋กœ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด Google Gemini API๋ฅผ ์—ฐ๋™ํ•˜์—ฌ ํ™œ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” Gemini API ์‚ฌ์šฉ๋ฒ•๊ณผ ํ•จ๊ป˜ ํšจ์œจ์ ์ธ ํ”„๋กฌํ”„ํŠธ ์—”์ง€๋‹ˆ์–ด๋ง ์ „๋žต์— ๋Œ€ํ•ด ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.


1๏ธโƒฃ Gemini API ์—ฐ๋™ ๊ฐœ์š”

DDALKKAK ํ”„๋กœ์ ํŠธ์—์„œ Gemini API๋Š” ์ฃผ์–ด์ง„ ์‚ฌ์šฉ์ž ์ž…๋ ฅ ์ •๋ณด์™€ ์ถ”๊ฐ€ ์›น ํŽ˜์ด์ง€ ์ •๋ณด๋ฅผ ๊ฒฐํ•ฉํ•ด ๊ณ ํ’ˆ์งˆ์˜ ์ƒ์„ธ ์ด๋ ฅ์„œ๋ฅผ JSON ํ˜•์‹์œผ๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

โš™๏ธ API ์—ฐ๋™ ํ๋ฆ„

flowchart LR
InputProfile(InputProfile) --> Prompt[ํ”„๋กฌํ”„ํŠธ ๊ตฌ์„ฑ]
URL["์™ธ๋ถ€ URL ๋งํฌ"] --> Fetch["๋ณธ๋ฌธ ์ถ”์ถœ(fetch_page_text)"] --> Prompt
Prompt --> GeminiAPI[Gemini API ํ˜ธ์ถœ]
GeminiAPI --> JSON[์ด๋ ฅ์„œ JSON]
JSON --> OutputProfile(OutputProfile)

๐Ÿ“Œ Gemini ์—ฐ๋™ ์ฝ”๋“œ (generate_profile_from_input)

cfg = types.GenerateContentConfig(response_mime_type="application/json")
resp = genai_client.models.generate_content(
    model="models/gemini-2.5-flash-preview-04-17",
    contents=prompt,
    config=cfg,
)
raw = json.loads(resp.text)

# ๋ˆ„๋ฝ ํ•„๋“œ ๊ธฐ๋ณธ๊ฐ’ ์ฒ˜๋ฆฌ
raw.setdefault("skills", [])
raw.setdefault("projects", [])
raw.setdefault("careers", [])
raw.setdefault("educations", [])
raw.setdefault("clubs", [])

return OutputProfile(**raw)
  • ํ™˜๊ฒฝ๋ณ€์ˆ˜(GEMINI_API_KEY)๋กœ API ํ‚ค ๋ณด์•ˆ ์œ ์ง€
  • Gemini์—์„œ ์ œ๊ณตํ•œ ์‘๋‹ต์€ JSON์œผ๋กœ ๋ฐ”๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ๊ฒ€์ฆ ๋ฐ ํ™œ์šฉ

2๏ธโƒฃ ํ”„๋กฌํ”„ํŠธ ์—”์ง€๋‹ˆ์–ด๋ง ์ „๋žต

Gemini API๊ฐ€ ์ด๋ ฅ์„œ๋ฅผ ์ •ํ™•ํ•˜๊ณ  ์ƒ์„ธํ•˜๊ฒŒ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋„๋ก ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ์„ค๊ณ„ํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ“ ํ”„๋กฌํ”„ํŠธ ์ฃผ์š” ๊ตฌ์„ฑ

ํ”„๋กฌํ”„ํŠธ๋Š” ํฌ๊ฒŒ ๋‹ค์Œ ์„ธ ๊ฐ€์ง€๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค:

  • ์‚ฌ์šฉ์ž ๊ธฐ๋ณธ ์ •๋ณด: ์ด๋ฆ„, ์ด๋ฉ”์ผ, ์—ฐ๋ฝ์ฒ˜, ํ•™๋ ฅ ๋“ฑ
  • ๊ธฐ์ˆ  ์Šคํƒ: ์‚ฌ์šฉ์ž๊ฐ€ ๋ณด์œ ํ•œ ๊ธฐ์ˆ  ์ •๋ณด
  • ํ™œ๋™ ๋งํฌ ๋ฐ ์›น ์ฝ˜ํ…์ธ : ์™ธ๋ถ€ ๋งํฌ์—์„œ ์ถ”์ถœํ•œ ์ฝ˜ํ…์ธ ๋ฅผ ํฌํ•จํ•˜์—ฌ ์ถ”๊ฐ€ ๋ฌธ๋งฅ ์ œ๊ณต

์˜ˆ์‹œ ์ฝ”๋“œ (build_resume_prompt)

def build_resume_prompt(profile: dict, urls: list[str]) -> str:
    sections = []
    for idx, url in enumerate(urls, start=1):
        content = fetch_page_text(url)  # ์›นํŽ˜์ด์ง€ ๋ณธ๋ฌธ ์ถ”์ถœ
        sections.append(f"[{idx}] URL: {url}\nCONTENT:\n{content}\n")
    links_block = "\n".join(sections)

    return f"""
๋‹น์‹ ์€ ๊ฒฝ๋ ฅ ๊ฐœ๋ฐœ์ž๋ฅผ ๋ฝ‘์œผ๋ ค๊ณ  ํ•˜๋Š” ๋ฉด์ ‘์ž์ž…๋‹ˆ๋‹ค.
์•„๋ž˜ ํ”„๋กœํ•„ยทํ”„๋กœ์ ํŠธยท๊ฒฝ๋ ฅยท๊ต์œกยท๋™์•„๋ฆฌ ์ •๋ณด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ,
JSON ํฌ๋งท์œผ๋กœ **๋งค์šฐ ์ƒ์„ธํ•œ** ์ด๋ ฅ์„œ๋ฅผ ๋งŒ๋“ค์–ด ์ฃผ์„ธ์š”.

ํ”„๋กœํ•„:
- ์ด๋ฆ„: {profile['name']}
- ์ด๋ฉ”์ผ: {profile['email']}
- ์—ฐ๋ฝ์ฒ˜: {profile['phone']}
- ํ•™๋ ฅ: {profile['educations']}
- ๊ธฐ์ˆ  ์Šคํƒ: {', '.join(profile['skills'])}

๋งํฌ ๋ฐœ์ทŒ:
{links_block}

### ์ž‘์„ฑ ๊ทœ์น™
- ๊ฐ ํ•„๋“œ๋Š” ์ƒ์„ธํ•œ ์„ค๋ช… ํฌํ•จ
- ํ•œ๊ตญ์–ด๋กœ ์ž‘์„ฑํ•˜๋ฉฐ JSON ํ‚คยท๊ตฌ์กฐ ๋ณ€๊ฒฝ ๊ธˆ์ง€
- ๋งˆํฌ๋‹ค์šด ์—†์ด ์ˆœ์ˆ˜ JSON ์ถœ๋ ฅ
"""
  • ์›นํŽ˜์ด์ง€ ์ •๋ณด๋ฅผ ์ถ”๊ฐ€ํ•ด Gemini๊ฐ€ ์ด๋ ฅ์„œ๋ฅผ ์ž‘์„ฑํ•  ๋•Œ ์ฐธ๊ณ ํ•  ์ˆ˜ ์žˆ๋Š” ํ’๋ถ€ํ•œ ๋ฌธ๋งฅ ์ œ๊ณต
  • Gemini๊ฐ€ ๋”ฐ๋ผ์•ผ ํ•  ๋ช…ํ™•ํ•œ ํ˜•์‹๊ณผ ๊ทœ์น™ ๋ช…์‹œ

3๏ธโƒฃ ์™ธ๋ถ€ ์ฝ˜ํ…์ธ  ์ˆ˜์ง‘ (Crawling ์ „๋žต)

Gemini ํ”„๋กฌํ”„ํŠธ์˜ ํšจ๊ณผ๋ฅผ ๋†’์ด๊ธฐ ์œ„ํ•ด ์›น ํŽ˜์ด์ง€ ์ฝ˜ํ…์ธ ๋ฅผ ์ž๋™ ์ถ”์ถœํ•˜๋Š” ํฌ๋กค๋ง ๋กœ์ง์„ ๊ตฌํ˜„ํ–ˆ์Šต๋‹ˆ๋‹ค.

ํฌ๋กค๋ง ์ ‘๊ทผ ๋ฐฉ์‹

  • ์ •์  ํฌ๋กค๋ง(Requests + readability + BeautifulSoup) ๋น ๋ฅด๊ฒŒ ๋ณธ๋ฌธ์„ ์ถ”์ถœํ•˜๋˜, 200์ž ๋ฏธ๋งŒ์ด๋ฉด ๋ถˆ์ถฉ๋ถ„ํ•˜๋‹ค๊ณ  ํŒ๋‹จ

  • ๋™์  ํฌ๋กค๋ง(Selenium) ์ •์  ํฌ๋กค๋ง ๊ฒฐ๊ณผ๊ฐ€ ๋ถ€์กฑํ•˜๋ฉด JS ๊ธฐ๋ฐ˜์˜ ์›น ํŽ˜์ด์ง€๊นŒ์ง€ ์™„์ „ํžˆ ๋ Œ๋”๋งํ•ด ์ฝ˜ํ…์ธ  ํ™•๋ณด

๐Ÿ” ์ฃผ์š” ์ฝ”๋“œ (fetch_page_text)

@functools.lru_cache(maxsize=256)
def fetch_page_text(url: str) -> str:
    txt = _static_fetch(url)
    if len(txt) >= STATIC_THRESHOLD:
        return txt[:MAX_CHARS]
    return _dynamic_fetch(url)[:MAX_CHARS]
  • STATIC_THRESHOLD: ์ •์  ํฌ๋กค๋ง ๊ฒฐ๊ณผ๊ฐ€ ์ด ๊ธธ์ด๋ฅผ ๋„˜์ง€ ์•Š์œผ๋ฉด ๋™์  ํฌ๋กค๋ง์œผ๋กœ ์ „ํ™˜
  • MAX_CHARS: Gemini API์˜ ํ† ํฐ ์ œํ•œ์„ ๋„˜์ง€ ์•Š๋„๋ก ๋ณธ๋ฌธ ๊ธธ์ด๋ฅผ ์ œํ•œ

๐Ÿ•ธ๏ธ ์ •์  ํฌ๋กค๋ง (_static_fetch)

def _static_fetch(url: str) -> str:
    res = requests.get(url, timeout=8, headers=UA)
    doc = Document(res.text)
    cleaned_html = doc.summary()
    return BeautifulSoup(cleaned_html, "html.parser").get_text(strip=True)

๐Ÿ•ท๏ธ ๋™์  ํฌ๋กค๋ง (_dynamic_fetch)

def _dynamic_fetch(url: str) -> str:
    options = Options()
    options.add_argument("--headless=new")
    options.add_argument("--no-sandbox")
    options.add_argument("--disable-dev-shm-usage")

    driver = webdriver.Chrome(service=_select_driver(options), options=options)

    try:
        driver.get(url)
        time.sleep(2)
        last_height = driver.execute_script("return document.body.scrollHeight")
        while True:
            driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
            time.sleep(1)
            new_height = driver.execute_script("return document.body.scrollHeight")
            if new_height == last_height:
                break
            last_height = new_height
        text = driver.find_element("tag name", "body").text
    finally:
        driver.quit()

    return text.strip()

โœ… ๊ฒฐ๋ก  ๋ฐ ์„ฑ๊ณผ

  • Gemini API๋ฅผ ํ†ตํ•ด ์ด๋ ฅ์„œ๋ฅผ ์ž๋™์œผ๋กœ ์ƒ์„ธํ•˜๊ฒŒ ์ƒ์„ฑํ•˜์—ฌ ์‚ฌ์šฉ์ž ๊ฒฝํ—˜์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ด
  • ์ฒด๊ณ„์ ์ธ ํ”„๋กฌํ”„ํŠธ ์—”์ง€๋‹ˆ์–ด๋ง๊ณผ ์›น ํฌ๋กค๋ง ์ „๋žต์„ ๊ฒฐํ•ฉํ•ด ์ •ํ™•์„ฑ๊ณผ ํ’ˆ์งˆ์„ ๋ณด์žฅ
  • Gemini API ๋ฐ ์›น ์ฝ˜ํ…์ธ  ํ™œ์šฉ์œผ๋กœ ํ™•์žฅ์„ฑ๊ณผ ์œ ์ง€ ๋ณด์ˆ˜์„ฑ์„ ํ™•๋ณด

๐Ÿš€ ํ–ฅํ›„ ๋ฐœ์ „ ๋ฐฉํ–ฅ

  • ๋” ์ •๊ตํ•œ ์บ์‹ฑ ์ „๋žต ๋„์ž…์œผ๋กœ API ํ˜ธ์ถœ ์ตœ์ ํ™”
  • ๋‹ค์–‘ํ•œ ์œ ํ˜•์˜ ์›น ์ฝ˜ํ…์ธ ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•œ ํฌ๋กค๋ง ๊ฐœ์„ 
  • ํ”„๋กฌํ”„ํŠธ ๋ฐ ๊ฒฐ๊ณผ ํ’ˆ์งˆ์„ ๋†’์ด๊ธฐ ์œ„ํ•œ ์ถ”๊ฐ€ ํŠœ๋‹ ๋ฐ ์‚ฌ์šฉ์ž ํ”ผ๋“œ๋ฐฑ ๋ฐ˜์˜