<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type="text/xsl" href="rss.xsl"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>Braincache - Lab</title>
        <link>https://teddynote-lab.github.io/brain-cache/lab</link>
        <description>직접 실험하고 리서치한 결과물</description>
        <lastBuildDate>Fri, 20 Mar 2026 00:00:00 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>ko</language>
        <item>
            <title><![CDATA[The Effect of Dynamic Date Injection Methods on LLM Temporal Reasoning across Deictic Expression Granularities]]></title>
            <link>https://teddynote-lab.github.io/brain-cache/lab/The Effect of Dynamic Date Injection Methods on LLM Temporal Reasoning across Deictic Expression Granularities</link>
            <guid>https://teddynote-lab.github.io/brain-cache/lab/The Effect of Dynamic Date Injection Methods on LLM Temporal Reasoning across Deictic Expression Granularities</guid>
            <pubDate>Fri, 20 Mar 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[LLM은 "어제", "다음 주"와 같은 상대적 시간 표현을 해석할 때 현재 날짜를 알 수 없어 날짜 주입이 필수입니다. 320회의 실험 결과, **한국어 형식(`2025년 3월 19일`) + User Prompt 조합**이 Simple/Structured 모두에서 95]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="tldr">TL;DR<a href="https://teddynote-lab.github.io/brain-cache/lab/The%20Effect%20of%20Dynamic%20Date%20Injection%20Methods%20on%20LLM%20Temporal%20Reasoning%20across%20Deictic%20Expression%20Granularities#tldr" class="hash-link" aria-label="TL;DR에 대한 직접 링크" title="TL;DR에 대한 직접 링크" translate="no">​</a></h2>
<blockquote>
<p>LLM은 "어제", "다음 주"와 같은 상대적 시간 표현을 해석할 때 현재 날짜를 알 수 없어 날짜 주입이 필수입니다. 320회의 실험 결과, <strong>한국어 형식(<code>2025년 3월 19일</code>) + User Prompt 조합</strong>이 Simple/Structured 모두에서 95% 정확도를 달성했으며, 날짜 미주입 시 15%에 불과했던 성능이 최대 95%까지 향상되었습니다. 특히 Week granularity(다음 주 월요일 등)는 요일 정보 포함 시 40%→80% 개선되며, gpt-4o 사용 시 모든 시간 단위에서 100% 정확도를 보였습니다.</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="key-takeaways">Key Takeaways<a href="https://teddynote-lab.github.io/brain-cache/lab/The%20Effect%20of%20Dynamic%20Date%20Injection%20Methods%20on%20LLM%20Temporal%20Reasoning%20across%20Deictic%20Expression%20Granularities#key-takeaways" class="hash-link" aria-label="Key Takeaways에 대한 직접 링크" title="Key Takeaways에 대한 직접 링크" translate="no">​</a></h2>
<ul>
<li class=""><strong>날짜 주입은 선택이 아닌 필수</strong>: 날짜 정보 없이는 Day/Week/Month granularity에서 0% 정확도를 기록하며, Year 단위 상식 문제만 60% 수준으로 부분 정답 가능</li>
<li class=""><strong>한국어 질의에는 한국어 날짜 형식 사용</strong>: <code>현재 날짜: 2025년 3월 19일</code> 형식이 Structured Output에서 English 대비 +15%p 우위(95% vs 80%)를 보이며 응답 방식에 관계없이 안정적</li>
<li class=""><strong>User Prompt가 System Prompt보다 효과적</strong>: 질의와 가까운 위치에 날짜를 배치하면 Simple Response에서 +3.3%p 성능 향상(95.0% vs 91.7%)</li>
<li class=""><strong>Week granularity가 가장 어렵다</strong>: "다음 주 월요일" 같은 표현은 현재 요일 인식→주 경계 판단→날짜 계산의 3단계 추론이 필요하며, 한국어 요일 정보 추가 시 40%→80% 개선</li>
<li class=""><strong>과도한 정보는 오히려 역효과</strong>: 주말 설명 등 불필요한 부가 정보는 LLM의 추론을 방해하여 5~10%p 성능 하락 유발</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="상세-내용">상세 내용<a href="https://teddynote-lab.github.io/brain-cache/lab/The%20Effect%20of%20Dynamic%20Date%20Injection%20Methods%20on%20LLM%20Temporal%20Reasoning%20across%20Deictic%20Expression%20Granularities#%EC%83%81%EC%84%B8-%EB%82%B4%EC%9A%A9" class="hash-link" aria-label="상세 내용에 대한 직접 링크" title="상세 내용에 대한 직접 링크" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="배경-왜-날짜-프롬프트가-필요한가">배경: 왜 날짜 프롬프트가 필요한가<a href="https://teddynote-lab.github.io/brain-cache/lab/The%20Effect%20of%20Dynamic%20Date%20Injection%20Methods%20on%20LLM%20Temporal%20Reasoning%20across%20Deictic%20Expression%20Granularities#%EB%B0%B0%EA%B2%BD-%EC%99%9C-%EB%82%A0%EC%A7%9C-%ED%94%84%EB%A1%AC%ED%94%84%ED%8A%B8%EA%B0%80-%ED%95%84%EC%9A%94%ED%95%9C%EA%B0%80" class="hash-link" aria-label="배경: 왜 날짜 프롬프트가 필요한가에 대한 직접 링크" title="배경: 왜 날짜 프롬프트가 필요한가에 대한 직접 링크" translate="no">​</a></h3>
<p>LLM은 학습 데이터의 시점에 고정되어 있어 "지금이 언제인지" 스스로 알 수 없습니다. 따라서 "어제", "지난주", "다음 달"과 같은 **직시 표현(Deictic Expression)**을 해석할 때 현재 날짜를 기준점으로 제공해야 정확한 날짜 변환이 가능합니다.</p>
<p>600회의 추론 실험(gpt-4o-mini 기준)에서 날짜 주입 유무에 따른 성능 차이는 다음과 같습니다:</p>





























<table><thead><tr><th>조건</th><th>Accuracy</th><th>Day</th><th>Week</th><th>Month</th><th>Year</th></tr></thead><tbody><tr><td>날짜 주입 없음</td><td><strong>15%</strong></td><td>0%</td><td>0%</td><td>0%</td><td>60%</td></tr><tr><td>날짜 주입 있음</td><td><strong>95%</strong></td><td>100%</td><td>100%</td><td>100%</td><td>80%</td></tr></tbody></table>
<p>날짜 주입 없이는 "작년 크리스마스"와 같은 Year granularity 상식 문제만 부분 정답(60%)이 가능하며, 실시간 계산이 필요한 Day/Week/Month는 전부 0%로 시간 추론 자체가 불가능합니다.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="문제-상황-4가지-설계-변수의-영향">문제 상황: 4가지 설계 변수의 영향<a href="https://teddynote-lab.github.io/brain-cache/lab/The%20Effect%20of%20Dynamic%20Date%20Injection%20Methods%20on%20LLM%20Temporal%20Reasoning%20across%20Deictic%20Expression%20Granularities#%EB%AC%B8%EC%A0%9C-%EC%83%81%ED%99%A9-4%EA%B0%80%EC%A7%80-%EC%84%A4%EA%B3%84-%EB%B3%80%EC%88%98%EC%9D%98-%EC%98%81%ED%96%A5" class="hash-link" aria-label="문제 상황: 4가지 설계 변수의 영향에 대한 직접 링크" title="문제 상황: 4가지 설계 변수의 영향에 대한 직접 링크" translate="no">​</a></h3>
<p>날짜 프롬프트 설계 시 고려해야 할 4가지 변수와 각각의 성능 영향을 실험으로 검증했습니다.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-prompt-position-어디에-넣을-것인가">1. Prompt Position: 어디에 넣을 것인가<a href="https://teddynote-lab.github.io/brain-cache/lab/The%20Effect%20of%20Dynamic%20Date%20Injection%20Methods%20on%20LLM%20Temporal%20Reasoning%20across%20Deictic%20Expression%20Granularities#1-prompt-position-%EC%96%B4%EB%94%94%EC%97%90-%EB%84%A3%EC%9D%84-%EA%B2%83%EC%9D%B8%EA%B0%80" class="hash-link" aria-label="1. Prompt Position: 어디에 넣을 것인가에 대한 직접 링크" title="1. Prompt Position: 어디에 넣을 것인가에 대한 직접 링크" translate="no">​</a></h4>




















<table><thead><tr><th>Position</th><th>Simple ACC</th><th>Structured ACC</th></tr></thead><tbody><tr><td>System Prompt</td><td>91.7%</td><td>85.0%</td></tr><tr><td><strong>User Prompt</strong></td><td><strong>95.0%</strong></td><td><strong>85.0%</strong></td></tr></tbody></table>
<p>User Prompt에 날짜 정보를 배치하면 질의와 가까운 위치에서 참조 효율이 높아져 Simple Response에서 +3.3%p 우위를 보였으며, Structured Output에서는 동일하므로 선택에 따른 손해가 없습니다.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-expression-format-어떤-형식으로-넣을-것인가">2. Expression Format: 어떤 형식으로 넣을 것인가<a href="https://teddynote-lab.github.io/brain-cache/lab/The%20Effect%20of%20Dynamic%20Date%20Injection%20Methods%20on%20LLM%20Temporal%20Reasoning%20across%20Deictic%20Expression%20Granularities#2-expression-format-%EC%96%B4%EB%96%A4-%ED%98%95%EC%8B%9D%EC%9C%BC%EB%A1%9C-%EB%84%A3%EC%9D%84-%EA%B2%83%EC%9D%B8%EA%B0%80" class="hash-link" aria-label="2. Expression Format: 어떤 형식으로 넣을 것인가에 대한 직접 링크" title="2. Expression Format: 어떤 형식으로 넣을 것인가에 대한 직접 링크" translate="no">​</a></h4>
<p>테스트한 3가지 형식의 성능 비교:</p>

































<table><thead><tr><th>Format</th><th>예시</th><th>Simple</th><th>Structured</th><th>평균</th></tr></thead><tbody><tr><td><strong>Korean</strong></td><td><code>현재 날짜: 2025년 3월 19일</code></td><td>92.5%</td><td><strong>95.0%</strong></td><td><strong>93.8%</strong></td></tr><tr><td>English</td><td><code>Current date: March 19th, 2025</code></td><td><strong>95.0%</strong></td><td>80.0%</td><td>87.5%</td></tr><tr><td>DayOfWeek</td><td><code>Current date: 2025-03-19, Wed</code></td><td>92.5%</td><td>80.0%</td><td>86.3%</td></tr></tbody></table>
<p><strong>Korean 형식이 가장 안정적</strong>입니다. Simple Response에서는 3개 format 간 차이가 미미(92.5~95.0%)하지만, <strong>Structured Output에서 Korean(95%)이 English/DayOfWeek(80%)를 크게 압도</strong>합니다. 이는 한국어 질의에 한국어 날짜 표현을 사용할 때 토큰 정렬이 자연스럽게 이루어지기 때문으로 추정됩니다.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-날짜-컨텍스트-상세도-얼마나-많은-정보를-넣을-것인가">3. 날짜 컨텍스트 상세도: 얼마나 많은 정보를 넣을 것인가<a href="https://teddynote-lab.github.io/brain-cache/lab/The%20Effect%20of%20Dynamic%20Date%20Injection%20Methods%20on%20LLM%20Temporal%20Reasoning%20across%20Deictic%20Expression%20Granularities#3-%EB%82%A0%EC%A7%9C-%EC%BB%A8%ED%85%8D%EC%8A%A4%ED%8A%B8-%EC%83%81%EC%84%B8%EB%8F%84-%EC%96%BC%EB%A7%88%EB%82%98-%EB%A7%8E%EC%9D%80-%EC%A0%95%EB%B3%B4%EB%A5%BC-%EB%84%A3%EC%9D%84-%EA%B2%83%EC%9D%B8%EA%B0%80" class="hash-link" aria-label="3. 날짜 컨텍스트 상세도: 얼마나 많은 정보를 넣을 것인가에 대한 직접 링크" title="3. 날짜 컨텍스트 상세도: 얼마나 많은 정보를 넣을 것인가에 대한 직접 링크" translate="no">​</a></h4>

























<table><thead><tr><th>컨텍스트</th><th>내용</th><th>ACC</th></tr></thead><tbody><tr><td>A</td><td>날짜 + 시간 (<code>현재 날짜: 2025-03-19 (수요일) / 현재 시간: 14:00</code>)</td><td>85~90%</td></tr><tr><td>B</td><td>날짜 + 주간 달력 (이번 주/지난 주 전체 날짜 나열)</td><td>85~90%</td></tr><tr><td><strong>C</strong></td><td><strong>날짜만</strong> (<code>현재 날짜: 2025-03-19 (Wed)</code>)</td><td>80%</td></tr></tbody></table>
<p>날짜만 제공하고 영문 요일만 포함한 경우(C) Week granularity에서 <strong>40%까지 하락</strong>했습니다. A/B처럼 한국어 요일을 포함하면 Week에서 80%를 유지할 수 있으며, 주간 달력(B)은 정보량 대비 성능 향상이 미미했습니다. <strong>과도한 정보(주말 설명 등)를 추가하면 오히려 5~10%p 하락</strong>하므로 주의가 필요합니다.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-output-방식-응답을-어떤-형식으로-받을-것인가">4. Output 방식: 응답을 어떤 형식으로 받을 것인가<a href="https://teddynote-lab.github.io/brain-cache/lab/The%20Effect%20of%20Dynamic%20Date%20Injection%20Methods%20on%20LLM%20Temporal%20Reasoning%20across%20Deictic%20Expression%20Granularities#4-output-%EB%B0%A9%EC%8B%9D-%EC%9D%91%EB%8B%B5%EC%9D%84-%EC%96%B4%EB%96%A4-%ED%98%95%EC%8B%9D%EC%9C%BC%EB%A1%9C-%EB%B0%9B%EC%9D%84-%EA%B2%83%EC%9D%B8%EA%B0%80" class="hash-link" aria-label="4. Output 방식: 응답을 어떤 형식으로 받을 것인가에 대한 직접 링크" title="4. Output 방식: 응답을 어떤 형식으로 받을 것인가에 대한 직접 링크" translate="no">​</a></h4>




















<table><thead><tr><th>Output</th><th>ACC</th><th>Week ACC</th></tr></thead><tbody><tr><td><strong>Simple (텍스트)</strong></td><td><strong>95%</strong></td><td><strong>100%</strong></td></tr><tr><td>Structured (instructor)</td><td>85%</td><td>60%</td></tr></tbody></table>
<p>시간 추론에서는 Simple Response가 유리합니다. 전체 정확도에서 +10%p 차이가 있으며, <strong>핵심은 Week granularity</strong>(60% → 100%)입니다. Structured Output의 schema 강제가 추론 chain을 방해하는 것으로 분석되며, Structured가 필요한 경우 <strong>Korean format + User Prompt 조합으로 95%까지 보완 가능</strong>합니다.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="해결-과정-granularity별-난이도-분석">해결 과정: Granularity별 난이도 분석<a href="https://teddynote-lab.github.io/brain-cache/lab/The%20Effect%20of%20Dynamic%20Date%20Injection%20Methods%20on%20LLM%20Temporal%20Reasoning%20across%20Deictic%20Expression%20Granularities#%ED%95%B4%EA%B2%B0-%EA%B3%BC%EC%A0%95-granularity%EB%B3%84-%EB%82%9C%EC%9D%B4%EB%8F%84-%EB%B6%84%EC%84%9D" class="hash-link" aria-label="해결 과정: Granularity별 난이도 분석에 대한 직접 링크" title="해결 과정: Granularity별 난이도 분석에 대한 직접 링크" translate="no">​</a></h3>
<p>LLM의 시간 추론 능력은 시간 단위에 따라 극적으로 달라집니다:</p>








































<table><thead><tr><th>순위</th><th>Granularity</th><th>ACC 범위</th><th>핵심 특성</th><th>오류 패턴</th></tr></thead><tbody><tr><td>1 (쉬움)</td><td><strong>Day</strong></td><td>100%</td><td>단순 ±N일 산술</td><td>오류 없음</td></tr><tr><td>2</td><td><strong>Month</strong></td><td>80~100%</td><td>월말/월초 계산</td><td>요일 역산에서 간헐적 오류</td></tr><tr><td>3</td><td><strong>Year</strong></td><td>60~100%</td><td>상식 + 요일 계산</td><td>먼 미래 요일 추론 실패</td></tr><tr><td>4 (어려움)</td><td><strong>Week</strong></td><td>40~100%</td><td>요일 기반 상대 계산</td><td>"다음 주 월요일" 등에서 ±1주 오류 빈번</td></tr></tbody></table>
<p><strong>Week Granularity가 가장 어려운 이유</strong>는 "다음 주 월요일" 같은 표현 해석 시 <strong>현재 요일 인식 → 주 경계 판단 → 날짜 계산</strong>의 3단계 추론이 필요하기 때문입니다. LLM이 "다음 주"의 경계를 잘못 판단하여 ±1주 오프셋 오류가 빈번하게 발생하며, Structured Output에서는 schema 제약이 이 추론 과정을 더욱 방해합니다.</p>
<p><strong>주간 달력을 프롬프트에 포함하거나 상위 모델을 사용하면 개선 가능</strong>합니다:</p>





























<table><thead><tr><th>모델</th><th>Day</th><th>Week</th><th>Month</th><th>Year</th><th>Overall</th></tr></thead><tbody><tr><td>gpt-4o-mini</td><td>100%</td><td>60%</td><td>100%</td><td>80%</td><td>85%</td></tr><tr><td>gpt-4o</td><td><strong>100%</strong></td><td><strong>100%</strong></td><td><strong>100%</strong></td><td><strong>100%</strong></td><td><strong>100%</strong></td></tr></tbody></table>
<p>gpt-4o는 모든 granularity에서 <strong>100%</strong> 정확도를 달성하여 모델 크기가 시간 추론에 직접적 영향을 미치는 것을 확인했습니다.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="결과-권장-프롬프트-템플릿">결과: 권장 프롬프트 템플릿<a href="https://teddynote-lab.github.io/brain-cache/lab/The%20Effect%20of%20Dynamic%20Date%20Injection%20Methods%20on%20LLM%20Temporal%20Reasoning%20across%20Deictic%20Expression%20Granularities#%EA%B2%B0%EA%B3%BC-%EA%B6%8C%EC%9E%A5-%ED%94%84%EB%A1%AC%ED%94%84%ED%8A%B8-%ED%85%9C%ED%94%8C%EB%A6%BF" class="hash-link" aria-label="결과: 권장 프롬프트 템플릿에 대한 직접 링크" title="결과: 권장 프롬프트 템플릿에 대한 직접 링크" translate="no">​</a></h3>
<p>실험 결과를 바탕으로 <strong>Best Practice: Korean + User Prompt</strong> 조합을 권장합니다:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># 기본 템플릿 (Simple: 95% / Structured: 95%)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">system_prompt </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token triple-quoted-string string" style="color:#e3116c">"""</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">사용자의 질문에 정확하게 답변하세요.</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">"""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">user_prompt_template </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token triple-quoted-string string" style="color:#e3116c">"""</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">현재 날짜: {year}년 {month}월 {day}일</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">사용자가 '오늘', '어제', '그저께', '금주', '지난주', '이번 달', '주말' 등</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">상대적 날짜 표현을 사용하면 위 현재 날짜를 기준으로 구체적인 날짜(YYYY-MM-DD)로 변환하세요.</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="display:inline-block;color:#e3116c"></span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">{user_query}</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">"""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 사용 예시</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> datetime </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> datetime</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">now </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> datetime</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">now</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">user_query </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"지난주 금요일에 작성된 보고서를 찾아줘"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">user_prompt </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> user_prompt_template</span><span class="token punctuation" style="color:#393A34">.</span><span class="token builtin">format</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    year</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">now</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">year</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    month</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">now</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">month</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    day</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">now</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">day</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    user_query</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">user_query</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p><strong>Week 정확도가 중요한 경우 주간 달력 추가</strong>:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># Week granularity 강화 템플릿 (Week: 80% → 100% for gpt-4o)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> datetime </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> datetime</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> timedelta</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">get_week_calendar</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">reference_date</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># 이번 주 월요일 찾기</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    this_monday </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> reference_date </span><span class="token operator" style="color:#393A34">-</span><span class="token plain"> timedelta</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">days</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">reference_date</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">weekday</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># 지난 주 월요일</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    last_monday </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> this_monday </span><span class="token operator" style="color:#393A34">-</span><span class="token plain"> timedelta</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">days</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">7</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    days_kr </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">'월요일'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'화요일'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'수요일'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'목요일'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'금요일'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'토요일'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'일요일'</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    this_week </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    last_week </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> i </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> </span><span class="token builtin">range</span><span class="token punctuation" style="color:#393A34">(</span><span class="token number" style="color:#36acaa">7</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        this_day </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> this_monday </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> timedelta</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">days</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">i</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        last_day </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> last_monday </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> timedelta</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">days</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">i</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        this_week</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">append</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">days_kr</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">[</span><span class="token string-interpolation interpolation">i</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">]</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">(</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">this_day</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">strftime</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">(</span><span class="token string-interpolation interpolation string" style="color:#e3116c">'%Y-%m-%d'</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">)</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">)"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        last_week</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">append</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">days_kr</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">[</span><span class="token string-interpolation interpolation">i</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">]</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">(</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">last_day</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">strftime</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">(</span><span class="token string-interpolation interpolation string" style="color:#e3116c">'%Y-%m-%d'</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">)</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">)"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> this_week</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> last_week</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">now </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> datetime</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">now</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">this_week</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> last_week </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> get_week_calendar</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">now</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">user_prompt_with_calendar </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string-interpolation string" style="color:#e3116c">f"""</span><br></span><span class="token-line" style="color:#393A34"><span class="token string-interpolation string" style="color:#e3116c">현재 날짜: </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">now</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">year</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">년 </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">now</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">month</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">월 </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">now</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">day</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">일 (</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">[</span><span class="token string-interpolation interpolation string" style="color:#e3116c">'월요일'</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">,</span><span class="token string-interpolation interpolation string" style="color:#e3116c">'화요일'</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">,</span><span class="token string-interpolation interpolation string" style="color:#e3116c">'수요일'</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">,</span><span class="token string-interpolation interpolation string" style="color:#e3116c">'목요일'</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">,</span><span class="token string-interpolation interpolation string" style="color:#e3116c">'금요일'</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">,</span><span class="token string-interpolation interpolation string" style="color:#e3116c">'토요일'</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">,</span><span class="token string-interpolation interpolation string" style="color:#e3116c">'일요일'</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">]</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">[</span><span class="token string-interpolation interpolation">now</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">weekday</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">(</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">)</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">]</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">)</span><br></span><span class="token-line" style="color:#393A34"><span class="token string-interpolation string" style="color:#e3116c">- 이번 주: </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation string" style="color:#e3116c">', '</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">join</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">(</span><span class="token string-interpolation interpolation">this_week</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">)</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c"></span><br></span><span class="token-line" style="color:#393A34"><span class="token string-interpolation string" style="color:#e3116c">- 지난 주: </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation string" style="color:#e3116c">', '</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">join</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">(</span><span class="token string-interpolation interpolation">last_week</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">)</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c"></span><br></span><span class="token-line" style="color:#393A34"><span class="token string-interpolation string" style="display:inline-block;color:#e3116c"></span><br></span><span class="token-line" style="color:#393A34"><span class="token string-interpolation string" style="color:#e3116c">사용자가 '오늘', '어제', '그저께', '금주', '지난주', '이번 달', '주말' 등</span><br></span><span class="token-line" style="color:#393A34"><span class="token string-interpolation string" style="color:#e3116c">상대적 날짜 표현을 사용하면 위 현재 날짜를 기준으로 구체적인 날짜(YYYY-MM-DD)로 변환하세요.</span><br></span><span class="token-line" style="color:#393A34"><span class="token string-interpolation string" style="display:inline-block;color:#e3116c"></span><br></span><span class="token-line" style="color:#393A34"><span class="token string-interpolation string" style="color:#e3116c"></span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">user_query</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c"></span><br></span><span class="token-line" style="color:#393A34"><span class="token string-interpolation string" style="color:#e3116c">"""</span><br></span></code></pre></div></div>
<p><strong>피해야 할 안티패턴</strong>:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># ❌ 안티패턴 1: 날짜 정보 미주입</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">user_prompt_bad1 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token triple-quoted-string string" style="color:#e3116c">"""</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">사용자의 상대적 날짜 표현을 절대 날짜로 변환하세요.</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">{user_query}</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">"""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># → Day/Week/Month 0%, 전체 15%</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># ❌ 안티패턴 2: 영문 날짜 + Structured Output</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">user_prompt_bad2 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token triple-quoted-string string" style="color:#e3116c">"""</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">Current date: March 19th, 2025</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">Convert relative date expressions to absolute dates.</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">{user_query}</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">"""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># + Structured Output → 80% (Week 40~60%)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># ❌ 안티패턴 3: 과도한 부가 설명</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">user_prompt_bad3 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token triple-quoted-string string" style="color:#e3116c">"""</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">현재 날짜: 2025년 3월 19일</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">주말은 토요일과 일요일을 의미합니다.</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">월말은 매월 마지막 날을 의미합니다.</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">...</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">{user_query}</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">"""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># → -5~10%p 하락</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># ❌ 안티패턴 4: 날짜만 (요일 없이)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">user_prompt_bad4 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token triple-quoted-string string" style="color:#e3116c">"""</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">Current date: 2025-03-19</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">{user_query}</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">"""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># → Week 40%</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="의사결정-가이드">의사결정 가이드<a href="https://teddynote-lab.github.io/brain-cache/lab/The%20Effect%20of%20Dynamic%20Date%20Injection%20Methods%20on%20LLM%20Temporal%20Reasoning%20across%20Deictic%20Expression%20Granularities#%EC%9D%98%EC%82%AC%EA%B2%B0%EC%A0%95-%EA%B0%80%EC%9D%B4%EB%93%9C" class="hash-link" aria-label="의사결정 가이드에 대한 직접 링크" title="의사결정 가이드에 대한 직접 링크" translate="no">​</a></h3>
<p>프로젝트 상황에 따른 날짜 프롬프트 설계 의사결정 플로우:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">1. 날짜 주입이 있는가?</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   └─ NO  → 반드시 추가 (없으면 15%)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   └─ YES → 다음 단계</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">2. 질의가 한국어인가?</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   └─ YES → Korean format 사용 ("2025년 3월 19일")</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   └─ NO  → English format 사용 ("March 19th, 2025")</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">3. Structured Output이 필요한가?</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   └─ NO  → Simple 사용 (최고 성능)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   └─ YES → 반드시 Korean format + User Prompt 조합 (95% 보장)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">4. Week 추론 정확도가 critical한가?</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   └─ NO  → 기본 템플릿으로 충분</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   └─ YES → 주간 달력 추가 또는 gpt-4o 사용 고려</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">5. 전체 정확도 100%가 필요한가?</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   └─ NO  → gpt-4o-mini + 최적 프롬프트 (95%)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   └─ YES → gpt-4o 사용 (100%)</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="종합-권장-설정">종합 권장 설정<a href="https://teddynote-lab.github.io/brain-cache/lab/The%20Effect%20of%20Dynamic%20Date%20Injection%20Methods%20on%20LLM%20Temporal%20Reasoning%20across%20Deictic%20Expression%20Granularities#%EC%A2%85%ED%95%A9-%EA%B6%8C%EC%9E%A5-%EC%84%A4%EC%A0%95" class="hash-link" aria-label="종합 권장 설정에 대한 직접 링크" title="종합 권장 설정에 대한 직접 링크" translate="no">​</a></h3>





















































<table><thead><tr><th>항목</th><th>권장값</th><th>대안</th><th>근거</th></tr></thead><tbody><tr><td>Time Injection</td><td><strong>필수</strong></td><td>-</td><td>미주입 시 15%</td></tr><tr><td>Expression Format</td><td><strong>Korean</strong></td><td>English (Simple 한정)</td><td>Structured에서 +15%p 차이</td></tr><tr><td>Prompt Position</td><td><strong>User Prompt</strong></td><td>System Prompt (동등)</td><td>Simple에서 +3.3%p</td></tr><tr><td>Output 방식</td><td><strong>Simple</strong></td><td>Structured (Korean+User 시 95%)</td><td>전체 +10%p, Week +40%p</td></tr><tr><td>날짜 컨텍스트</td><td><strong>날짜 + 한국어 요일</strong></td><td>주간 달력 (Week 중요 시)</td><td>Week ACC 40→80%</td></tr><tr><td>부가 설명</td><td><strong>불필요</strong></td><td>-</td><td>추가 시 오히려 하락</td></tr><tr><td>모델</td><td><strong>gpt-4o</strong> (정확도 우선)</td><td>gpt-4o-mini (비용 우선)</td><td>100% vs 85~95%</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="실험-환경-및-데이터">실험 환경 및 데이터<a href="https://teddynote-lab.github.io/brain-cache/lab/The%20Effect%20of%20Dynamic%20Date%20Injection%20Methods%20on%20LLM%20Temporal%20Reasoning%20across%20Deictic%20Expression%20Granularities#%EC%8B%A4%ED%97%98-%ED%99%98%EA%B2%BD-%EB%B0%8F-%EB%8D%B0%EC%9D%B4%ED%84%B0" class="hash-link" aria-label="실험 환경 및 데이터에 대한 직접 링크" title="실험 환경 및 데이터에 대한 직접 링크" translate="no">​</a></h3>
<p><strong>실험 구성</strong>:</p>
<ul>
<li class="">Target LLM: <code>gpt-4o-mini</code> (baseline), <code>gpt-4o</code> (비교)</li>
<li class="">Temperature: <code>0.0</code></li>
<li class="">기준 날짜: 2025-03-19 (Wednesday)</li>
<li class="">테스트 쿼리: 20개 (Day 5 + Week 5 + Month 5 + Year 5)</li>
<li class="">평가 메트릭: Accuracy — Include Match (정답 날짜가 응답에 포함되는지 판단)</li>
<li class="">총 추론 수: 600개 (Baseline 320 + Expression Format 280)</li>
</ul>
<p><strong>상세 실험 데이터</strong>:</p>
<ul>
<li class=""><code>BASELINE_RESULTS.md</code>: 날짜 컨텍스트 3종, seed/temp/output/model 비교 (320개 레코드)</li>
<li class=""><code>EXPERIMENT_RESULTS.md</code>: Expression Format × Injection Position 7-case (280개 레코드)</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="references">References<a href="https://teddynote-lab.github.io/brain-cache/lab/The%20Effect%20of%20Dynamic%20Date%20Injection%20Methods%20on%20LLM%20Temporal%20Reasoning%20across%20Deictic%20Expression%20Granularities#references" class="hash-link" aria-label="References에 대한 직접 링크" title="References에 대한 직접 링크" translate="no">​</a></h2>
<ul>
<li class=""><a href="https://arxiv.org/abs/2505.16088" target="_blank" rel="noopener noreferrer" class="">Date Fragments: A Hidden Bottleneck of Tokenization for Temporal Reasoning</a> - BPE 토크나이저의 날짜 분할 문제와 LLM의 날짜 추상화 메커니즘 분석 (arXiv 2505.16088, 2025)</li>
<li class=""><a href="https://www.tdcommons.org/cgi/viewcontent.cgi?article=9885&amp;context=dpubs_series" target="_blank" rel="noopener noreferrer" class="">Automatic Integration of Temporal Context for LLM</a> - LLM을 위한 시간 컨텍스트 자동 통합 방법론 (Defensive Publication, 2025)</li>
<li class=""><a href="https://arxiv.org/abs/2510.02340" target="_blank" rel="noopener noreferrer" class="">Can Prompts Rewind Time for LLMs?</a> - 프롬프트를 통한 LLM의 시간 인식 개선 연구 (Gao et al., EMNLP 2025)</li>
<li class=""><a href="https://github.com/teddynote-lab/RAG-Research-Space/tree/main/deitic_expression_granularity" target="_blank" rel="noopener noreferrer" class="">Deictic Temporal Expression Granularity 실험 Repository</a> - 본 연구의 전체 실험 코드 및 데이터셋</li>
</ul>]]></content:encoded>
            <category>Agent</category>
        </item>
        <item>
            <title><![CDATA[Docker Log Monitor vs Sentry 비교 분석]]></title>
            <link>https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-vs-sentry-비교-분석</link>
            <guid>https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-vs-sentry-비교-분석</guid>
            <pubDate>Tue, 27 Jan 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Docker Log Monitor는 설치가 간단하고 비용이 들지 않으며 코드 수정 없이 즉시 사용 가능한 반면, Sentry는 풍부한 에러 컨텍스트와 분석 도구를 제공하지만 SDK 통합과 비용이 필요합니다. LG Electronics Agent 프로젝트의 경우, 이미 ]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="tldr">TL;DR<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-vs-sentry-%E1%84%87%E1%85%B5%E1%84%80%E1%85%AD-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%86%A8#tldr" class="hash-link" aria-label="TL;DR에 대한 직접 링크" title="TL;DR에 대한 직접 링크" translate="no">​</a></h2>
<blockquote>
<p>Docker Log Monitor는 설치가 간단하고 비용이 들지 않으며 코드 수정 없이 즉시 사용 가능한 반면, Sentry는 풍부한 에러 컨텍스트와 분석 도구를 제공하지만 SDK 통합과 비용이 필요합니다. LG Electronics Agent 프로젝트의 경우, 이미 구현된 Docker Log Monitor만으로도 현재 요구사항을 충족하며, 프로젝트 규모 확장 시 Sentry 추가를 고려하는 점진적 접근이 가장 효율적입니다.</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="key-takeaways">Key Takeaways<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-vs-sentry-%E1%84%87%E1%85%B5%E1%84%80%E1%85%AD-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%86%A8#key-takeaways" class="hash-link" aria-label="Key Takeaways에 대한 직접 링크" title="Key Takeaways에 대한 직접 링크" translate="no">​</a></h2>
<ul>
<li class="">
<p><strong>비용 vs 기능의 트레이드오프</strong>: 초기 단계나 예산이 제한적인 프로젝트에서는 Docker Log Monitor가 비용 효율적이며, 상세한 디버깅과 팀 협업이 중요한 프로덕션 환경에서는 Sentry의 추가 비용이 정당화됩니다.</p>
</li>
<li class="">
<p><strong>레거시 시스템에는 비침투적 모니터링이 유리</strong>: Docker Log Monitor는 애플리케이션 코드 수정 없이 로그 스트림만으로 작동하므로, 레거시 시스템이나 코드 변경이 어려운 환경에서 즉시 적용 가능합니다.</p>
</li>
<li class="">
<p><strong>프라이버시와 데이터 주권이 중요한 경우 자체 호스팅 우선</strong>: 민감한 데이터를 다루거나 GDPR 등 규제 준수가 필요한 경우, 모든 데이터를 자체 서버에 보관하는 Docker Log Monitor가 더 안전한 선택입니다.</p>
</li>
<li class="">
<p><strong>하이브리드 접근법으로 점진적 확장</strong>: 초기에는 Docker Log Monitor로 시작해 기본 모니터링을 확보하고, 프로젝트가 성장하면서 필요에 따라 Sentry를 추가하는 전략이 위험을 최소화하며 투자 효율을 높입니다.</p>
</li>
<li class="">
<p><strong>요구사항에 맞는 도구 선택이 핵심</strong>: "더 많은 기능 = 더 좋은 솔루션"이 아니며, 프로젝트의 현재 단계, 팀 규모, 디버깅 복잡도를 고려한 적절한 도구 선택이 중요합니다.</p>
</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="상세-내용">상세 내용<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-vs-sentry-%E1%84%87%E1%85%B5%E1%84%80%E1%85%AD-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%86%A8#%EC%83%81%EC%84%B8-%EB%82%B4%EC%9A%A9" class="hash-link" aria-label="상세 내용에 대한 직접 링크" title="상세 내용에 대한 직접 링크" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="배경-모니터링-솔루션-선택의-딜레마">배경: 모니터링 솔루션 선택의 딜레마<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-vs-sentry-%E1%84%87%E1%85%B5%E1%84%80%E1%85%AD-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%86%A8#%EB%B0%B0%EA%B2%BD-%EB%AA%A8%EB%8B%88%ED%84%B0%EB%A7%81-%EC%86%94%EB%A3%A8%EC%85%98-%EC%84%A0%ED%83%9D%EC%9D%98-%EB%94%9C%EB%A0%88%EB%A7%88" class="hash-link" aria-label="배경: 모니터링 솔루션 선택의 딜레마에 대한 직접 링크" title="배경: 모니터링 솔루션 선택의 딜레마에 대한 직접 링크" translate="no">​</a></h3>
<p>LG Electronics Agent 프로젝트에서 FastAPI 기반 웹 애플리케이션의 에러 모니터링 시스템을 구축하는 과정에서, 커스텀 Docker 로그 모니터링 솔루션과 업계 표준인 Sentry 사이의 선택이 필요했습니다. 이미 Docker Log Monitor를 구현하여 작동 중이었지만, Sentry의 강력한 기능들을 고려할 때 어떤 방향이 프로젝트에 최적인지 검증이 필요한 상황이었습니다.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="문제-상황-과한-도구-vs-충분한-도구">문제 상황: 과한 도구 vs 충분한 도구<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-vs-sentry-%E1%84%87%E1%85%B5%E1%84%80%E1%85%AD-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%86%A8#%EB%AC%B8%EC%A0%9C-%EC%83%81%ED%99%A9-%EA%B3%BC%ED%95%9C-%EB%8F%84%EA%B5%AC-vs-%EC%B6%A9%EB%B6%84%ED%95%9C-%EB%8F%84%EA%B5%AC" class="hash-link" aria-label="문제 상황: 과한 도구 vs 충분한 도구에 대한 직접 링크" title="문제 상황: 과한 도구 vs 충분한 도구에 대한 직접 링크" translate="no">​</a></h3>
<p>많은 개발팀이 "업계 표준"이라는 이유로 Sentry 같은 고급 도구를 도입하지만, 실제로는 다음과 같은 문제에 직면합니다:</p>
<ol>
<li class=""><strong>불필요한 복잡도</strong>: SDK 통합, 설정 관리, 팀 온보딩에 상당한 시간 투자</li>
<li class=""><strong>비용 압박</strong>: 무료 플랜(월 5,000 이벤트)을 초과하면 월 $26부터 시작하는 유료 플랜 필요</li>
<li class=""><strong>데이터 프라이버시 우려</strong>: 모든 에러 데이터가 외부 서비스로 전송</li>
<li class=""><strong>과도한 기능</strong>: 초기 단계 프로젝트에는 대시보드, 에러 집계, 트렌드 분석 등이 과할 수 있음</li>
</ol>
<p>반면 Docker Log Monitor는 이미 작동 중이었지만, "너무 간단한 것은 아닐까?"라는 의구심이 있었습니다.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="해결-과정-체계적-비교-분석">해결 과정: 체계적 비교 분석<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-vs-sentry-%E1%84%87%E1%85%B5%E1%84%80%E1%85%AD-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%86%A8#%ED%95%B4%EA%B2%B0-%EA%B3%BC%EC%A0%95-%EC%B2%B4%EA%B3%84%EC%A0%81-%EB%B9%84%EA%B5%90-%EB%B6%84%EC%84%9D" class="hash-link" aria-label="해결 과정: 체계적 비교 분석에 대한 직접 링크" title="해결 과정: 체계적 비교 분석에 대한 직접 링크" translate="no">​</a></h3>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-정량적-비교-프레임워크-구축">1. 정량적 비교 프레임워크 구축<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-vs-sentry-%E1%84%87%E1%85%B5%E1%84%80%E1%85%AD-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%86%A8#1-%EC%A0%95%EB%9F%89%EC%A0%81-%EB%B9%84%EA%B5%90-%ED%94%84%EB%A0%88%EC%9E%84%EC%9B%8C%ED%81%AC-%EA%B5%AC%EC%B6%95" class="hash-link" aria-label="1. 정량적 비교 프레임워크 구축에 대한 직접 링크" title="1. 정량적 비교 프레임워크 구축에 대한 직접 링크" translate="no">​</a></h4>
<p>12개 카테고리, 30개 이상의 평가 항목으로 구성된 비교표를 작성하여 주관적 판단을 최소화했습니다:</p>








































<table><thead><tr><th>평가 영역</th><th>Docker Log Monitor</th><th>Sentry</th></tr></thead><tbody><tr><td>설치/설정 용이성</td><td>5/5</td><td>2/5</td></tr><tr><td>비용 효율성</td><td>5/5</td><td>3/5</td></tr><tr><td>에러 분석 기능</td><td>2/5</td><td>5/5</td></tr><tr><td>커스터마이징</td><td>5/5</td><td>3/5</td></tr><tr><td>프라이버시/보안</td><td>5/5</td><td>3/5</td></tr><tr><td><strong>총점</strong></td><td><strong>27/35</strong></td><td><strong>26/35</strong></td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-사용-시나리오별-적합도-분석">2. 사용 시나리오별 적합도 분석<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-vs-sentry-%E1%84%87%E1%85%B5%E1%84%80%E1%85%AD-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%86%A8#2-%EC%82%AC%EC%9A%A9-%EC%8B%9C%EB%82%98%EB%A6%AC%EC%98%A4%EB%B3%84-%EC%A0%81%ED%95%A9%EB%8F%84-%EB%B6%84%EC%84%9D" class="hash-link" aria-label="2. 사용 시나리오별 적합도 분석에 대한 직접 링크" title="2. 사용 시나리오별 적합도 분석에 대한 직접 링크" translate="no">​</a></h4>
<p>프로젝트의 특성에 따라 적합한 도구가 달라진다는 것을 발견했습니다:</p>
<p><strong>Docker Log Monitor가 유리한 경우:</strong></p>
<ul>
<li class="">빠른 프로토타이핑 단계</li>
<li class="">레거시 시스템 (코드 수정 불가)</li>
<li class="">민감한 데이터 처리 (프라이버시 중요)</li>
<li class="">비용 제약이 있는 경우</li>
<li class="">단순한 에러 감지만 필요한 경우</li>
</ul>
<p><strong>Sentry가 유리한 경우:</strong></p>
<ul>
<li class="">복잡한 버그의 빠른 해결 필요</li>
<li class="">팀 협업 및 대시보드 공유 중요</li>
<li class="">에러 트렌드 분석 필요</li>
<li class="">성능 모니터링 필요</li>
<li class="">이슈 트래킹 시스템(Jira 등) 연동 필요</li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-의사결정-맥락-분석">3. 의사결정 맥락 분석<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-vs-sentry-%E1%84%87%E1%85%B5%E1%84%80%E1%85%AD-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%86%A8#3-%EC%9D%98%EC%82%AC%EA%B2%B0%EC%A0%95-%EB%A7%A5%EB%9D%BD-%EB%B6%84%EC%84%9D" class="hash-link" aria-label="3. 의사결정 맥락 분석에 대한 직접 링크" title="3. 의사결정 맥락 분석에 대한 직접 링크" translate="no">​</a></h4>
<p>LG Electronics Agent 프로젝트의 현재 상황:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">✅ 이미 Docker Log Monitor 구현 완료 및 작동 중</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">✅ FastAPI의 구조화된 로그로 충분한 에러 정보 수집 중</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">✅ Slack 알림을 통한 실시간 대응 체계 구축됨</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">✅ Dev/Prod 환경 구분 기능 포함</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">⚠️ 팀 규모와 프로젝트 복잡도가 아직 초기 단계</span><br></span></code></pre></div></div>
<p>이러한 맥락에서 Sentry 도입은 다음과 같은 이유로 "과한" 선택이었습니다:</p>
<ol>
<li class=""><strong>설정 비용 &gt; 얻는 가치</strong>: SDK 통합에 소요되는 시간 대비 추가 이득이 제한적</li>
<li class=""><strong>기존 시스템으로 충분</strong>: FastAPI 로그에서 스택 트레이스를 포함한 대부분의 디버깅 정보 제공</li>
<li class=""><strong>불필요한 의존성</strong>: 외부 서비스 의존으로 인한 잠재적 리스크</li>
</ol>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="의사결정-과정-왜-docker-log-monitor를-선택했는가">의사결정 과정: 왜 Docker Log Monitor를 선택했는가<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-vs-sentry-%E1%84%87%E1%85%B5%E1%84%80%E1%85%AD-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%86%A8#%EC%9D%98%EC%82%AC%EA%B2%B0%EC%A0%95-%EA%B3%BC%EC%A0%95-%EC%99%9C-docker-log-monitor%EB%A5%BC-%EC%84%A0%ED%83%9D%ED%96%88%EB%8A%94%EA%B0%80" class="hash-link" aria-label="의사결정 과정: 왜 Docker Log Monitor를 선택했는가에 대한 직접 링크" title="의사결정 과정: 왜 Docker Log Monitor를 선택했는가에 대한 직접 링크" translate="no">​</a></h3>
<p>다음과 같은 근거로 현 단계에서는 Docker Log Monitor 유지를 결정했습니다:</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-zero-setup-cost-제로-셋업-비용">1. Zero Setup Cost (제로 셋업 비용)<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-vs-sentry-%E1%84%87%E1%85%B5%E1%84%80%E1%85%AD-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%86%A8#1-zero-setup-cost-%EC%A0%9C%EB%A1%9C-%EC%85%8B%EC%97%85-%EB%B9%84%EC%9A%A9" class="hash-link" aria-label="1. Zero Setup Cost (제로 셋업 비용)에 대한 직접 링크" title="1. Zero Setup Cost (제로 셋업 비용)에 대한 직접 링크" translate="no">​</a></h4>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># 이미 작동 중인 시스템 - 추가 작업 불필요</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token function" style="color:#d73a49">docker-compose</span><span class="token plain"> up </span><span class="token parameter variable" style="color:#36acaa">-d</span><span class="token plain"> log-monitor  </span><span class="token comment" style="color:#999988;font-style:italic"># 끝</span><br></span></code></pre></div></div>
<p>반면 Sentry는 다음과 같은 작업이 필요:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># pip install sentry-sdk 필요</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> sentry_sdk</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">sentry_sdk</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">init</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    dsn</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"https://...@sentry.io/..."</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    traces_sample_rate</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">1.0</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    profiles_sample_rate</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">1.0</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 모든 FastAPI 엔드포인트에 추가 설정 필요</span><br></span></code></pre></div></div>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-fastapi-로그의-충분한-정보">2. FastAPI 로그의 충분한 정보<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-vs-sentry-%E1%84%87%E1%85%B5%E1%84%80%E1%85%AD-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%86%A8#2-fastapi-%EB%A1%9C%EA%B7%B8%EC%9D%98-%EC%B6%A9%EB%B6%84%ED%95%9C-%EC%A0%95%EB%B3%B4" class="hash-link" aria-label="2. FastAPI 로그의 충분한 정보에 대한 직접 링크" title="2. FastAPI 로그의 충분한 정보에 대한 직접 링크" translate="no">​</a></h4>
<p>FastAPI는 기본적으로 매우 상세한 로그를 생성합니다:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># FastAPI 로그 예시</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">ERROR</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">    Exception </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> ASGI application</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Traceback </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">most recent call last</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  File </span><span class="token string" style="color:#e3116c">"/app/main.py"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> line </span><span class="token number" style="color:#36acaa">45</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> process_request</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    result </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">await</span><span class="token plain"> service</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">execute</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">data</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">ValueError</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Invalid </span><span class="token builtin">input</span><span class="token plain"> </span><span class="token builtin">format</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 이미 포함된 정보:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># ✅ 스택 트레이스</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># ✅ 에러 타입 및 메시지</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># ✅ 발생 위치 (파일, 라인)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># ✅ 타임스탬프</span><br></span></code></pre></div></div>
<p>Docker Log Monitor는 이러한 로그를 정규표현식으로 파싱하여 효과적으로 감지합니다:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># docker-log-monitor의 패턴 매칭</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">error_patterns </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">r"ERROR"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">r"Exception"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">r"Traceback"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">r"500 Internal Server Error"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">]</span><br></span></code></pre></div></div>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-비용-효율성">3. 비용 효율성<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-vs-sentry-%E1%84%87%E1%85%B5%E1%84%80%E1%85%AD-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%86%A8#3-%EB%B9%84%EC%9A%A9-%ED%9A%A8%EC%9C%A8%EC%84%B1" class="hash-link" aria-label="3. 비용 효율성에 대한 직접 링크" title="3. 비용 효율성에 대한 직접 링크" translate="no">​</a></h4>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">Docker Log Monitor:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">- 설치 비용: $0</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">- 운영 비용: $0 (기존 서버 리소스 활용)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">- 유지보수 비용: 최소 (안정적으로 작동 중)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">- 총 비용: $0/월</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Sentry:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">- 무료 플랜: 5,000 이벤트/월 (제한적)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">- Team 플랜: $26/월 (기본)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">- Business 플랜: $80/월 (고급 기능)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">- 대규모 사용 시: 추가 비용 발생</span><br></span></code></pre></div></div>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-프라이버시와-데이터-통제">4. 프라이버시와 데이터 통제<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-vs-sentry-%E1%84%87%E1%85%B5%E1%84%80%E1%85%AD-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%86%A8#4-%ED%94%84%EB%9D%BC%EC%9D%B4%EB%B2%84%EC%8B%9C%EC%99%80-%EB%8D%B0%EC%9D%B4%ED%84%B0-%ED%86%B5%EC%A0%9C" class="hash-link" aria-label="4. 프라이버시와 데이터 통제에 대한 직접 링크" title="4. 프라이버시와 데이터 통제에 대한 직접 링크" translate="no">​</a></h4>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">Docker Log Monitor: </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">┌─────────────┐</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">│ Application │</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">└──────┬──────┘</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">       │ logs</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">       ▼</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">┌─────────────┐      ┌─────────┐</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">│ Log Monitor │─────▶│  Slack  │</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">└─────────────┘      └─────────┘</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">(자체 서버)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Sentry:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">┌─────────────┐</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">│ Application │</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">└──────┬──────┘</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">       │ SDK + 네트워크 요청</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">       ▼</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">┌─────────────┐      ┌─────────┐</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">│ Sentry.io   │─────▶│  Slack  │</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">└─────────────┘      └─────────┘</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">(외부 서비스)</span><br></span></code></pre></div></div>
<p>모든 에러 데이터가 자체 서버에 남아 데이터 주권과 프라이버시를 완벽히 통제할 수 있습니다.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="결과-점진적-확장-전략">결과: 점진적 확장 전략<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-vs-sentry-%E1%84%87%E1%85%B5%E1%84%80%E1%85%AD-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%86%A8#%EA%B2%B0%EA%B3%BC-%EC%A0%90%EC%A7%84%EC%A0%81-%ED%99%95%EC%9E%A5-%EC%A0%84%EB%9E%B5" class="hash-link" aria-label="결과: 점진적 확장 전략에 대한 직접 링크" title="결과: 점진적 확장 전략에 대한 직접 링크" translate="no">​</a></h3>
<p>최종적으로 다음과 같은 하이브리드 전략을 수립했습니다:</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="phase-1-현재-docker-log-monitor">Phase 1: 현재 (Docker Log Monitor)<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-vs-sentry-%E1%84%87%E1%85%B5%E1%84%80%E1%85%AD-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%86%A8#phase-1-%ED%98%84%EC%9E%AC-docker-log-monitor" class="hash-link" aria-label="Phase 1: 현재 (Docker Log Monitor)에 대한 직접 링크" title="Phase 1: 현재 (Docker Log Monitor)에 대한 직접 링크" translate="no">​</a></h4>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># docker-compose.yml</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">services</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">log-monitor</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">image</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> teddy/docker</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">log</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">monitor</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">environment</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> SLACK_WEBHOOK_URL=$</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain">SLACK_WEBHOOK_URL</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> COOLDOWN_MINUTES=30</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">volumes</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> /var/run/docker.sock</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">/var/run/docker.sock</span><br></span></code></pre></div></div>
<p><strong>커버리지:</strong></p>
<ul>
<li class="">✅ HTTP 5xx 에러 감지</li>
<li class="">✅ Python Exception 추적</li>
<li class="">✅ 실시간 Slack 알림</li>
<li class="">✅ Dev/Prod 환경 구분</li>
<li class="">✅ 쿨다운으로 중복 알림 방지</li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="phase-2-필요-시-sentry-추가">Phase 2: 필요 시 (Sentry 추가)<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-vs-sentry-%E1%84%87%E1%85%B5%E1%84%80%E1%85%AD-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%86%A8#phase-2-%ED%95%84%EC%9A%94-%EC%8B%9C-sentry-%EC%B6%94%EA%B0%80" class="hash-link" aria-label="Phase 2: 필요 시 (Sentry 추가)에 대한 직접 링크" title="Phase 2: 필요 시 (Sentry 추가)에 대한 직접 링크" translate="no">​</a></h4>
<p>프로젝트가 성장하면서 다음 상황이 발생할 때 Sentry 추가 고려:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># 트리거 조건 예시</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    팀_규모 </span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">5</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">or</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    에러_발생_빈도 </span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> 100_per_day </span><span class="token keyword" style="color:#00009f">or</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    복잡한_버그_디버깅_소요시간 </span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> 4_hours </span><span class="token keyword" style="color:#00009f">or</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    고객_영향_추적_필요 </span><span class="token operator" style="color:#393A34">==</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">True</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    add_sentry</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p><strong>Sentry 추가 시 얻는 이점:</strong></p>
<ul>
<li class="">변수 값, 요청 파라미터 등 상세 컨텍스트</li>
<li class="">웹 대시보드로 팀 전체 가시성 확보</li>
<li class="">에러 트렌드 분석으로 품질 개선 인사이트</li>
<li class="">릴리즈별 에러 추적으로 배포 영향 분석</li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="phase-3-하이브리드-최적의-조합">Phase 3: 하이브리드 (최적의 조합)<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-vs-sentry-%E1%84%87%E1%85%B5%E1%84%80%E1%85%AD-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%86%A8#phase-3-%ED%95%98%EC%9D%B4%EB%B8%8C%EB%A6%AC%EB%93%9C-%EC%B5%9C%EC%A0%81%EC%9D%98-%EC%A1%B0%ED%95%A9" class="hash-link" aria-label="Phase 3: 하이브리드 (최적의 조합)에 대한 직접 링크" title="Phase 3: 하이브리드 (최적의 조합)에 대한 직접 링크" translate="no">​</a></h4>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">Infrastructure Level (Docker Log Monitor):</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">├─ 시스템 레벨 에러 감지</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">├─ 즉각적인 알림 (네트워크 독립)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">└─ 백업 모니터링 (Sentry 장애 대응)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Application Level (Sentry):</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">├─ 상세한 에러 컨텍스트</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">├─ 트렌드 분석 및 대시보드</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">└─ 팀 협업 및 이슈 관리</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="실전-적용-가이드">실전 적용 가이드<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-vs-sentry-%E1%84%87%E1%85%B5%E1%84%80%E1%85%AD-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%86%A8#%EC%8B%A4%EC%A0%84-%EC%A0%81%EC%9A%A9-%EA%B0%80%EC%9D%B4%EB%93%9C" class="hash-link" aria-label="실전 적용 가이드에 대한 직접 링크" title="실전 적용 가이드에 대한 직접 링크" translate="no">​</a></h3>
<p>다른 프로젝트에서도 적용 가능한 의사결정 플로우차트:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">프로젝트 시작</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    ↓</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">[Q1] 코드 수정 가능한가?</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    NO → Docker Log Monitor (유일한 선택)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    YES → 다음 질문</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    ↓</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">[Q2] 팀 규모가 5명 이상인가?</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    NO → Docker Log Monitor 추천</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    YES → 다음 질문</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    ↓</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">[Q3] 월 예산 $50 이상 가능한가?</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    NO → Docker Log Monitor</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    YES → 다음 질문</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    ↓</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">[Q4] 복잡한 디버깅이 자주 발생하는가?</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    NO → Docker Log Monitor</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    YES → Sentry 권장</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    ↓</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">[Q5] 데이터 프라이버시가 중요한가?</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    YES → Docker Log Monitor</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    NO → Sentry 권장</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="구현-예시-docker-log-monitor-설정">구현 예시: Docker Log Monitor 설정<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-vs-sentry-%E1%84%87%E1%85%B5%E1%84%80%E1%85%AD-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%86%A8#%EA%B5%AC%ED%98%84-%EC%98%88%EC%8B%9C-docker-log-monitor-%EC%84%A4%EC%A0%95" class="hash-link" aria-label="구현 예시: Docker Log Monitor 설정에 대한 직접 링크" title="구현 예시: Docker Log Monitor 설정에 대한 직접 링크" translate="no">​</a></h3>
<p>실제 프로젝트에서 사용한 설정:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># docker-compose.yml</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">version</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'3.8'</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">services</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">app</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">image</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> myapp</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">latest</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># ... 애플리케이션 설정</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">log-monitor</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">build</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> ./docker</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">log</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">monitor</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">container_name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> log</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">monitor</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">environment</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token comment" style="color:#999988;font-style:italic"># Slack Webhook URL (필수)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> SLACK_WEBHOOK_URL=$</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain">SLACK_WEBHOOK_URL</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token comment" style="color:#999988;font-style:italic"># 모니터링할 컨테이너 (선택, 기본값: 모두)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> MONITOR_CONTAINERS=app</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain">worker</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token comment" style="color:#999988;font-style:italic"># 에러 패턴 커스터마이징</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> ERROR_PATTERNS=ERROR</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain">Exception</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain">CRITICAL</span><span class="token punctuation" style="color:#393A34">,</span><span class="token number" style="color:#36acaa">500</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token comment" style="color:#999988;font-style:italic"># 쿨다운 설정 (분 단위)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> COOLDOWN_MINUTES=30</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token comment" style="color:#999988;font-style:italic"># 환경 구분 (Dev/Prod)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> ENVIRONMENT=production</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token comment" style="color:#999988;font-style:italic"># 타임존 설정</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> TZ=Asia/Seoul</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">volumes</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token comment" style="color:#999988;font-style:italic"># Docker 소켓 마운트 (로그 접근 필수)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> /var/run/docker.sock</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">/var/run/docker.sock</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">ro</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">restart</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> unless</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">stopped</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># 리소스 제한 (선택)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">deploy</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">resources</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">limits</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">memory</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 100M</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">cpus</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'0.1'</span><br></span></code></pre></div></div>
<p>환경변수 설정 (<code>.env</code> 파일):</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># Slack Webhook URL</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token assign-left variable" style="color:#36acaa">SLACK_WEBHOOK_URL</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">https://hooks.slack.com/services/YOUR/WEBHOOK/URL</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 환경 구분</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token assign-left variable" style="color:#36acaa">ENVIRONMENT</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">production</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 쿨다운 설정 (30분)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token assign-left variable" style="color:#36acaa">COOLDOWN_MINUTES</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">30</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="측정-가능한-성과">측정 가능한 성과<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-vs-sentry-%E1%84%87%E1%85%B5%E1%84%80%E1%85%AD-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%86%A8#%EC%B8%A1%EC%A0%95-%EA%B0%80%EB%8A%A5%ED%95%9C-%EC%84%B1%EA%B3%BC" class="hash-link" aria-label="측정 가능한 성과에 대한 직접 링크" title="측정 가능한 성과에 대한 직접 링크" translate="no">​</a></h3>
<p>Docker Log Monitor 도입 후 다음과 같은 성과를 측정했습니다:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">설정 시간: 10분 (vs Sentry 예상 4시간)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">비용 절감: $26-80/월 (Sentry 유료 플랜 대비)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">에러 감지 지연시간: &lt;1초 (실시간 로그 스트리밍)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">알림 응답시간: 평균 5분 이내 (Slack 알림 즉시 확인)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">시스템 리소스: ~20MB 메모리, CPU &lt;1%</span><br></span></code></pre></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="references">References<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-vs-sentry-%E1%84%87%E1%85%B5%E1%84%80%E1%85%AD-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%86%A8#references" class="hash-link" aria-label="References에 대한 직접 링크" title="References에 대한 직접 링크" translate="no">​</a></h2>
<ul>
<li class=""><a href="https://github.com/teddynote-lab/docker-log-monitor" target="_blank" rel="noopener noreferrer" class="">Docker Log Monitor GitHub</a></li>
<li class=""><a href="https://docs.sentry.io/" target="_blank" rel="noopener noreferrer" class="">Sentry 공식 문서</a></li>
<li class=""><a href="https://sentry.io/pricing/" target="_blank" rel="noopener noreferrer" class="">Sentry 가격 정책</a></li>
<li class=""><a href="https://fastapi.tiangolo.com/advanced/logging/" target="_blank" rel="noopener noreferrer" class="">FastAPI 로깅 가이드</a></li>
<li class=""><a href="https://12factor.net/logs" target="_blank" rel="noopener noreferrer" class="">Twelve-Factor App - Logs</a></li>
<li class=""><a href="https://sre.google/sre-book/monitoring-distributed-systems/" target="_blank" rel="noopener noreferrer" class="">Google SRE Book - Monitoring Distributed Systems</a></li>
</ul>]]></content:encoded>
            <category>Agent</category>
        </item>
        <item>
            <title><![CDATA[Docker Log Monitor 적용 가이드라인]]></title>
            <link>https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-적용-가이드라인</link>
            <guid>https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-적용-가이드라인</guid>
            <pubDate>Tue, 27 Jan 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[EC2 환경에서 Docker 컨테이너 로그를 실시간 모니터링하고 에러 발생 시 Slack으로 알림을 보내는 경량 모니터링 시스템 구축 경험을 공유합니다. Sentry 같은 무거운 솔루션 대신, Python 기반의 간단한 스크립트로 실시간 로그 감지, 중복 알림 방지, ]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="tldr">TL;DR<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-%E1%84%8C%E1%85%A5%E1%86%A8%E1%84%8B%E1%85%AD%E1%86%BC-%E1%84%80%E1%85%A1%E1%84%8B%E1%85%B5%E1%84%83%E1%85%B3%E1%84%85%E1%85%A1%E1%84%8B%E1%85%B5%E1%86%AB#tldr" class="hash-link" aria-label="TL;DR에 대한 직접 링크" title="TL;DR에 대한 직접 링크" translate="no">​</a></h2>
<blockquote>
<p>EC2 환경에서 Docker 컨테이너 로그를 실시간 모니터링하고 에러 발생 시 Slack으로 알림을 보내는 경량 모니터링 시스템 구축 경험을 공유합니다. Sentry 같은 무거운 솔루션 대신, Python 기반의 간단한 스크립트로 실시간 로그 감지, 중복 알림 방지, Traceback 수집 등의 핵심 기능을 구현했습니다. systemd 서비스로 등록하여 서버 재시작 시에도 자동 실행되도록 설정하고, 배포 시 불필요한 알림을 방지하는 Grace Period를 적용했습니다.</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="key-takeaways">Key Takeaways<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-%E1%84%8C%E1%85%A5%E1%86%A8%E1%84%8B%E1%85%AD%E1%86%BC-%E1%84%80%E1%85%A1%E1%84%8B%E1%85%B5%E1%84%83%E1%85%B3%E1%84%85%E1%85%A1%E1%84%8B%E1%85%B5%E1%86%AB#key-takeaways" class="hash-link" aria-label="Key Takeaways에 대한 직접 링크" title="Key Takeaways에 대한 직접 링크" translate="no">​</a></h2>
<ul>
<li class=""><strong>경량화된 모니터링의 필요성</strong>: 모든 프로젝트에 Sentry 같은 무거운 솔루션이 필요한 것은 아니며, 초기 단계나 소규모 프로젝트에서는 간단한 로그 모니터링 시스템이 더 효과적일 수 있습니다.</li>
<li class=""><strong>쿨다운 메커니즘의 중요성</strong>: 동일한 에러가 연속 발생 시 알림 피로도를 방지하기 위해 시간 기반 중복 알림 제어가 필수입니다.</li>
<li class=""><strong>배포 시나리오 고려</strong>: 컨테이너 재시작이나 배포 시 발생하는 일시적 에러를 필터링하기 위한 Grace Period 설정으로 노이즈를 줄일 수 있습니다.</li>
<li class=""><strong>Traceback 전체 수집</strong>: 단일 라인 에러 로그만으로는 디버깅이 어려우므로, Python Traceback 전체를 수집하여 컨텍스트를 제공해야 합니다.</li>
<li class=""><strong>systemd 통합의 장점</strong>: 서비스로 등록하면 서버 재시작, 자동 재시작, 로그 관리 등을 운영체제 레벨에서 관리할 수 있어 안정성이 높아집니다.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="상세-내용">상세 내용<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-%E1%84%8C%E1%85%A5%E1%86%A8%E1%84%8B%E1%85%AD%E1%86%BC-%E1%84%80%E1%85%A1%E1%84%8B%E1%85%B5%E1%84%83%E1%85%B3%E1%84%85%E1%85%A1%E1%84%8B%E1%85%B5%E1%86%AB#%EC%83%81%EC%84%B8-%EB%82%B4%EC%9A%A9" class="hash-link" aria-label="상세 내용에 대한 직접 링크" title="상세 내용에 대한 직접 링크" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="배경-왜-커스텀-모니터링-시스템을-만들었나">배경: 왜 커스텀 모니터링 시스템을 만들었나?<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-%E1%84%8C%E1%85%A5%E1%86%A8%E1%84%8B%E1%85%AD%E1%86%BC-%E1%84%80%E1%85%A1%E1%84%8B%E1%85%B5%E1%84%83%E1%85%B3%E1%84%85%E1%85%A1%E1%84%8B%E1%85%B5%E1%86%AB#%EB%B0%B0%EA%B2%BD-%EC%99%9C-%EC%BB%A4%EC%8A%A4%ED%85%80-%EB%AA%A8%EB%8B%88%ED%84%B0%EB%A7%81-%EC%8B%9C%EC%8A%A4%ED%85%9C%EC%9D%84-%EB%A7%8C%EB%93%A4%EC%97%88%EB%82%98" class="hash-link" aria-label="배경: 왜 커스텀 모니터링 시스템을 만들었나?에 대한 직접 링크" title="배경: 왜 커스텀 모니터링 시스템을 만들었나?에 대한 직접 링크" translate="no">​</a></h3>
<p>프로젝트 초기 단계에서 에러 모니터링의 필요성은 명확했지만, Sentry 같은 상용 솔루션을 도입하기에는 몇 가지 장벽이 있었습니다:</p>
<ol>
<li class=""><strong>비용 및 리소스</strong>: Sentry는 강력하지만 설정이 복잡하고 서버 리소스를 많이 소모합니다</li>
<li class=""><strong>과도한 기능</strong>: 초기 단계에서는 단순한 에러 알림만 필요했습니다</li>
<li class=""><strong>Docker 환경 특화</strong>: Docker 컨테이너의 stdout/stderr 로그를 직접 모니터링하면 애플리케이션 코드 수정 없이 모니터링이 가능합니다</li>
</ol>
<p>이러한 이유로 Python Docker SDK를 활용한 경량 모니터링 시스템을 직접 구축하기로 결정했습니다.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="문제-상황-docker-로그-모니터링의-도전-과제">문제 상황: Docker 로그 모니터링의 도전 과제<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-%E1%84%8C%E1%85%A5%E1%86%A8%E1%84%8B%E1%85%AD%E1%86%BC-%E1%84%80%E1%85%A1%E1%84%8B%E1%85%B5%E1%84%83%E1%85%B3%E1%84%85%E1%85%A1%E1%84%8B%E1%85%B5%E1%86%AB#%EB%AC%B8%EC%A0%9C-%EC%83%81%ED%99%A9-docker-%EB%A1%9C%EA%B7%B8-%EB%AA%A8%EB%8B%88%ED%84%B0%EB%A7%81%EC%9D%98-%EB%8F%84%EC%A0%84-%EA%B3%BC%EC%A0%9C" class="hash-link" aria-label="문제 상황: Docker 로그 모니터링의 도전 과제에 대한 직접 링크" title="문제 상황: Docker 로그 모니터링의 도전 과제에 대한 직접 링크" translate="no">​</a></h3>
<p>Docker 환경에서 로그 모니터링을 구현하면서 마주친 주요 문제들:</p>
<p><strong>1. 연속된 동일 에러의 알림 폭탄</strong>
초기 버전에서는 에러가 발생할 때마다 Slack 알림을 보냈는데, 특정 에러가 반복되면 수십 개의 알림이 순식간에 쌓였습니다.</p>
<p><strong>2. 배포 시 불필요한 알림</strong>
컨테이너를 재시작하거나 배포할 때 일시적으로 연결이 끊기면서 발생하는 에러들이 알림으로 전송되었습니다.</p>
<p><strong>3. 불완전한 에러 정보</strong>
단일 라인 에러 메시지만 캡처하면 전체 Traceback을 파악할 수 없어 디버깅이 어려웠습니다.</p>
<p><strong>4. 모니터링 프로세스의 안정성</strong>
모니터링 스크립트 자체가 중단되면 에러를 놓치게 되는 문제가 있었습니다.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="해결-과정">해결 과정<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-%E1%84%8C%E1%85%A5%E1%86%A8%E1%84%8B%E1%85%AD%E1%86%BC-%E1%84%80%E1%85%A1%E1%84%8B%E1%85%B5%E1%84%83%E1%85%B3%E1%84%85%E1%85%A1%E1%84%8B%E1%85%B5%E1%86%AB#%ED%95%B4%EA%B2%B0-%EA%B3%BC%EC%A0%95" class="hash-link" aria-label="해결 과정에 대한 직접 링크" title="해결 과정에 대한 직접 링크" translate="no">​</a></h3>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-쿨다운-메커니즘-구현">1. 쿨다운 메커니즘 구현<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-%E1%84%8C%E1%85%A5%E1%86%A8%E1%84%8B%E1%85%AD%E1%86%BC-%E1%84%80%E1%85%A1%E1%84%8B%E1%85%B5%E1%84%83%E1%85%B3%E1%84%85%E1%85%A1%E1%84%8B%E1%85%B5%E1%86%AB#1-%EC%BF%A8%EB%8B%A4%EC%9A%B4-%EB%A9%94%EC%BB%A4%EB%8B%88%EC%A6%98-%EA%B5%AC%ED%98%84" class="hash-link" aria-label="1. 쿨다운 메커니즘 구현에 대한 직접 링크" title="1. 쿨다운 메커니즘 구현에 대한 직접 링크" translate="no">​</a></h4>
<p>동일한 에러에 대해 일정 시간 동안 중복 알림을 방지하는 메커니즘을 구현했습니다:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">class</span><span class="token plain"> </span><span class="token class-name">ErrorTracker</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">__init__</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> cooldown_seconds</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">300</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># 5분 쿨다운</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">error_history </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">cooldown_seconds </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> cooldown_seconds</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">should_notify</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> error_signature</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token triple-quoted-string string" style="color:#e3116c">"""에러 시그니처 기반으로 알림 전송 여부 결정"""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        current_time </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> time</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">time</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> error_signature </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">error_history</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            last_notified </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">error_history</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">error_signature</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> current_time </span><span class="token operator" style="color:#393A34">-</span><span class="token plain"> last_notified </span><span class="token operator" style="color:#393A34">&lt;</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">cooldown_seconds</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">False</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># 쿨다운 기간 내에는 알림 차단</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">error_history</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">error_signature</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> current_time</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">True</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">cleanup_old_entries</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token triple-quoted-string string" style="color:#e3116c">"""오래된 에러 기록 정리"""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        current_time </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> time</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">time</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">error_history </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            k</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> v </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> k</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> v </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">error_history</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">items</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> current_time </span><span class="token operator" style="color:#393A34">-</span><span class="token plain"> v </span><span class="token operator" style="color:#393A34">&lt;</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">cooldown_seconds </span><span class="token operator" style="color:#393A34">*</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">2</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">}</span><br></span></code></pre></div></div>
<p><strong>의사결정 포인트</strong>: 쿨다운 시간을 5분으로 설정한 이유는, 대부분의 에러가 5분 내에 해결되거나 반복 패턴이 명확해지기 때문입니다. 프로젝트 특성에 따라 조정 가능합니다.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-grace-period-구현">2. Grace Period 구현<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-%E1%84%8C%E1%85%A5%E1%86%A8%E1%84%8B%E1%85%AD%E1%86%BC-%E1%84%80%E1%85%A1%E1%84%8B%E1%85%B5%E1%84%83%E1%85%B3%E1%84%85%E1%85%A1%E1%84%8B%E1%85%B5%E1%86%AB#2-grace-period-%EA%B5%AC%ED%98%84" class="hash-link" aria-label="2. Grace Period 구현에 대한 직접 링크" title="2. Grace Period 구현에 대한 직접 링크" translate="no">​</a></h4>
<p>배포 시 컨테이너가 시작된 직후 일정 시간 동안은 알림을 보내지 않도록 설정:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">class</span><span class="token plain"> </span><span class="token class-name">DockerLogMonitor</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">__init__</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> grace_period_seconds</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">60</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">grace_period </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> grace_period_seconds</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">container_start_times </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">is_in_grace_period</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> container_id</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token triple-quoted-string string" style="color:#e3116c">"""컨테이너가 Grace Period 내에 있는지 확인"""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> container_id </span><span class="token keyword" style="color:#00009f">not</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">container_start_times</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token comment" style="color:#999988;font-style:italic"># 컨테이너 시작 시간 기록</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            container </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">docker_client</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">containers</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">get</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">container_id</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            start_time </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> container</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">attrs</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">'State'</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">'StartedAt'</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">container_start_times</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">container_id</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> parse_datetime</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">start_time</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        start_time </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">container_start_times</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">container_id</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        elapsed </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">datetime</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">now</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token plain"> start_time</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">total_seconds</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> elapsed </span><span class="token operator" style="color:#393A34">&lt;</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">grace_period</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">process_log_line</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> container_id</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> log_line</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">is_in_grace_period</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">container_id</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            logger</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">debug</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"Grace period active for </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">container_id</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">, skipping alert"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token comment" style="color:#999988;font-style:italic"># 에러 패턴 감지 및 처리</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">detect_and_notify</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">log_line</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-traceback-전체-수집">3. Traceback 전체 수집<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-%E1%84%8C%E1%85%A5%E1%86%A8%E1%84%8B%E1%85%AD%E1%86%BC-%E1%84%80%E1%85%A1%E1%84%8B%E1%85%B5%E1%84%83%E1%85%B3%E1%84%85%E1%85%A1%E1%84%8B%E1%85%B5%E1%86%AB#3-traceback-%EC%A0%84%EC%B2%B4-%EC%88%98%EC%A7%91" class="hash-link" aria-label="3. Traceback 전체 수집에 대한 직접 링크" title="3. Traceback 전체 수집에 대한 직접 링크" translate="no">​</a></h4>
<p>Python 에러의 경우 여러 줄에 걸쳐 있는 Traceback을 모두 수집:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">class</span><span class="token plain"> </span><span class="token class-name">TracebackCollector</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">__init__</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">traceback_buffer </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">in_traceback </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">False</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">process_line</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> line</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token triple-quoted-string string" style="color:#e3116c">"""로그 라인을 처리하고 Traceback 수집"""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token comment" style="color:#999988;font-style:italic"># Traceback 시작 감지</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Traceback (most recent call last):"</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> line</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">in_traceback </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">True</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">traceback_buffer </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">line</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token comment" style="color:#999988;font-style:italic"># Traceback 진행 중</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">in_traceback</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">traceback_buffer</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">append</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">line</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token comment" style="color:#999988;font-style:italic"># Traceback 종료 조건: 실제 에러 메시지 라인</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">is_error_line</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">line</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">and</span><span class="token plain"> </span><span class="token builtin">len</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">traceback_buffer</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                full_traceback </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"\n"</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">join</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">traceback_buffer</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">in_traceback </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">False</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">traceback_buffer </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> full_traceback</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">None</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">is_error_line</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> line</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token triple-quoted-string string" style="color:#e3116c">"""에러 메시지 라인 판별"""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        error_patterns </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">r'^[A-Z][a-zA-Z]+Error:'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">r'^[A-Z][a-zA-Z]+Exception:'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">r'^AssertionError:'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token builtin">any</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">re</span><span class="token punctuation" style="color:#393A34">.</span><span class="token keyword" style="color:#00009f">match</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">pattern</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> line</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">strip</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> pattern </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> error_patterns</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-docker-api를-통한-실시간-스트리밍">4. Docker API를 통한 실시간 스트리밍<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-%E1%84%8C%E1%85%A5%E1%86%A8%E1%84%8B%E1%85%AD%E1%86%BC-%E1%84%80%E1%85%A1%E1%84%8B%E1%85%B5%E1%84%83%E1%85%B3%E1%84%85%E1%85%A1%E1%84%8B%E1%85%B5%E1%86%AB#4-docker-api%EB%A5%BC-%ED%86%B5%ED%95%9C-%EC%8B%A4%EC%8B%9C%EA%B0%84-%EC%8A%A4%ED%8A%B8%EB%A6%AC%EB%B0%8D" class="hash-link" aria-label="4. Docker API를 통한 실시간 스트리밍에 대한 직접 링크" title="4. Docker API를 통한 실시간 스트리밍에 대한 직접 링크" translate="no">​</a></h4>
<p>Docker SDK를 사용하여 컨테이너 로그를 실시간으로 스트리밍:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> docker</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">class</span><span class="token plain"> </span><span class="token class-name">DockerLogMonitor</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">__init__</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> container_name</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> slack_webhook_url</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">docker_client </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> docker</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">from_env</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">container_name </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> container_name</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">slack_webhook </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> slack_webhook_url</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">error_tracker </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ErrorTracker</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">cooldown_seconds</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">300</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">traceback_collector </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> TracebackCollector</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">start_monitoring</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token triple-quoted-string string" style="color:#e3116c">"""컨테이너 로그 모니터링 시작"""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">try</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            container </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">docker_client</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">containers</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">get</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">container_name</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            logger</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">info</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"Monitoring started for container: </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">self</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">container_name</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token comment" style="color:#999988;font-style:italic"># 실시간 로그 스트리밍 (follow=True, tail='all')</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> log_line </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> container</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">logs</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">stream</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> follow</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                line </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> log_line</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">decode</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">'utf-8'</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">strip</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">process_log_line</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">container</span><span class="token punctuation" style="color:#393A34">.</span><span class="token builtin">id</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> line</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">except</span><span class="token plain"> docker</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">errors</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">NotFound</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            logger</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">error</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"Container </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">self</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">container_name</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c"> not found"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">except</span><span class="token plain"> Exception </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> e</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            logger</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">error</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"Monitoring error: </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">e</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token comment" style="color:#999988;font-style:italic"># 자동 재연결 로직</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            time</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">sleep</span><span class="token punctuation" style="color:#393A34">(</span><span class="token number" style="color:#36acaa">10</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">start_monitoring</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p><strong>의사결정 포인트</strong>: <code>stream=True</code>와 <code>follow=True</code> 옵션으로 실시간 스트리밍을 구현했습니다. <code>tail='all'</code>을 사용하면 컨테이너 시작 후 모든 로그를 캡처할 수 있지만, 필요에 따라 <code>tail=100</code>처럼 최근 로그만 가져올 수도 있습니다.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="5-systemd-서비스-등록">5. systemd 서비스 등록<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-%E1%84%8C%E1%85%A5%E1%86%A8%E1%84%8B%E1%85%AD%E1%86%BC-%E1%84%80%E1%85%A1%E1%84%8B%E1%85%B5%E1%84%83%E1%85%B3%E1%84%85%E1%85%A1%E1%84%8B%E1%85%B5%E1%86%AB#5-systemd-%EC%84%9C%EB%B9%84%EC%8A%A4-%EB%93%B1%EB%A1%9D" class="hash-link" aria-label="5. systemd 서비스 등록에 대한 직접 링크" title="5. systemd 서비스 등록에 대한 직접 링크" translate="no">​</a></h4>
<p>안정적인 운영을 위해 systemd 서비스로 등록:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># /etc/systemd/system/docker-monitor.service</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">Unit</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token assign-left variable" style="color:#36acaa">Description</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">Docker Log Monitor Service</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token assign-left variable" style="color:#36acaa">After</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">docker.service</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token assign-left variable" style="color:#36acaa">Requires</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">docker.service</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">Service</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token assign-left variable" style="color:#36acaa">Type</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">simple</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token assign-left variable" style="color:#36acaa">User</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">ubuntu</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token assign-left variable" style="color:#36acaa">WorkingDirectory</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">/home/ubuntu/docker-monitor</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token assign-left variable" style="color:#36acaa">Environment</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"PYTHONUNBUFFERED=1"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token assign-left variable" style="color:#36acaa">ExecStart</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">/usr/bin/python3 /home/ubuntu/docker-monitor/monitor.py</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token assign-left variable" style="color:#36acaa">Restart</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">always</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token assign-left variable" style="color:#36acaa">RestartSec</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">10</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token assign-left variable" style="color:#36acaa">StandardOutput</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">journal</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token assign-left variable" style="color:#36acaa">StandardError</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">journal</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">Install</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token assign-left variable" style="color:#36acaa">WantedBy</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">multi-user.target</span><br></span></code></pre></div></div>
<p>서비스 등록 및 실행 명령어:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># 서비스 파일 복사 및 권한 설정</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token function" style="color:#d73a49">sudo</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">cp</span><span class="token plain"> docker-monitor.service /etc/systemd/system/</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token function" style="color:#d73a49">sudo</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">chmod</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">644</span><span class="token plain"> /etc/systemd/system/docker-monitor.service</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># systemd 리로드 및 서비스 활성화</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token function" style="color:#d73a49">sudo</span><span class="token plain"> systemctl daemon-reload</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token function" style="color:#d73a49">sudo</span><span class="token plain"> systemctl </span><span class="token builtin class-name">enable</span><span class="token plain"> docker-monitor.service</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token function" style="color:#d73a49">sudo</span><span class="token plain"> systemctl start docker-monitor.service</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 서비스 상태 확인</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token function" style="color:#d73a49">sudo</span><span class="token plain"> systemctl status docker-monitor.service</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 로그 확인</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token function" style="color:#d73a49">sudo</span><span class="token plain"> journalctl </span><span class="token parameter variable" style="color:#36acaa">-u</span><span class="token plain"> docker-monitor.service </span><span class="token parameter variable" style="color:#36acaa">-f</span><br></span></code></pre></div></div>
<p><strong>의사결정 포인트</strong>:</p>
<ul>
<li class=""><code>Restart=always</code>로 설정하여 프로세스 종료 시 자동 재시작</li>
<li class=""><code>After=docker.service</code>로 Docker 서비스가 시작된 후에 실행되도록 의존성 설정</li>
<li class=""><code>StandardOutput=journal</code>로 systemd 저널에 로그 저장하여 중앙화된 로그 관리</li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="6-slack-알림-포맷-개선">6. Slack 알림 포맷 개선<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-%E1%84%8C%E1%85%A5%E1%86%A8%E1%84%8B%E1%85%AD%E1%86%BC-%E1%84%80%E1%85%A1%E1%84%8B%E1%85%B5%E1%84%83%E1%85%B3%E1%84%85%E1%85%A1%E1%84%8B%E1%85%B5%E1%86%AB#6-slack-%EC%95%8C%EB%A6%BC-%ED%8F%AC%EB%A7%B7-%EA%B0%9C%EC%84%A0" class="hash-link" aria-label="6. Slack 알림 포맷 개선에 대한 직접 링크" title="6. Slack 알림 포맷 개선에 대한 직접 링크" translate="no">​</a></h4>
<p>가독성 높은 알림 메시지 구성:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">send_slack_notification</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> error_info</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token triple-quoted-string string" style="color:#e3116c">"""Slack으로 에러 알림 전송"""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    message </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"blocks"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token string" style="color:#e3116c">"type"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"header"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token string" style="color:#e3116c">"text"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    </span><span class="token string" style="color:#e3116c">"type"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"plain_text"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    </span><span class="token string" style="color:#e3116c">"text"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"🚨 Docker Container Error Detected"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token string" style="color:#e3116c">"type"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"section"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token string" style="color:#e3116c">"fields"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                        </span><span class="token string" style="color:#e3116c">"type"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"mrkdwn"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                        </span><span class="token string" style="color:#e3116c">"text"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string-interpolation string" style="color:#e3116c">f"*Container:*\n</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">error_info</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">[</span><span class="token string-interpolation interpolation string" style="color:#e3116c">'container_name'</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">]</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                        </span><span class="token string" style="color:#e3116c">"type"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"mrkdwn"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                        </span><span class="token string" style="color:#e3116c">"text"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string-interpolation string" style="color:#e3116c">f"*Time:*\n</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">error_info</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">[</span><span class="token string-interpolation interpolation string" style="color:#e3116c">'timestamp'</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">]</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token string" style="color:#e3116c">"type"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"section"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token string" style="color:#e3116c">"text"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    </span><span class="token string" style="color:#e3116c">"type"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"mrkdwn"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    </span><span class="token string" style="color:#e3116c">"text"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string-interpolation string" style="color:#e3116c">f"*Error Message:*\n```</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">error_info</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">[</span><span class="token string-interpolation interpolation string" style="color:#e3116c">'error_message'</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">]</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">```"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token string" style="color:#e3116c">"type"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"section"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token string" style="color:#e3116c">"text"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    </span><span class="token string" style="color:#e3116c">"type"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"mrkdwn"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    </span><span class="token string" style="color:#e3116c">"text"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string-interpolation string" style="color:#e3116c">f"*Full Traceback:*\n```</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">error_info</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">[</span><span class="token string-interpolation interpolation string" style="color:#e3116c">'traceback'</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">]</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">[</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">:</span><span class="token string-interpolation interpolation format-spec">2000]</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">```"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> requests</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">post</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">slack_webhook</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        json</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">message</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        headers</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">'Content-Type'</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'application/json'</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> response</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">status_code </span><span class="token operator" style="color:#393A34">!=</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">200</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        logger</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">error</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"Slack notification failed: </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">response</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">text</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="결과">결과<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-%E1%84%8C%E1%85%A5%E1%86%A8%E1%84%8B%E1%85%AD%E1%86%BC-%E1%84%80%E1%85%A1%E1%84%8B%E1%85%B5%E1%84%83%E1%85%B3%E1%84%85%E1%85%A1%E1%84%8B%E1%85%B5%E1%86%AB#%EA%B2%B0%EA%B3%BC" class="hash-link" aria-label="결과에 대한 직접 링크" title="결과에 대한 직접 링크" translate="no">​</a></h3>
<p>이 시스템을 도입한 후 다음과 같은 개선 효과를 얻었습니다:</p>
<ol>
<li class=""><strong>즉각적인 에러 인지</strong>: 프로덕션 환경에서 발생하는 에러를 실시간으로 파악할 수 있게 되었습니다</li>
<li class=""><strong>알림 피로도 감소</strong>: 쿨다운 메커니즘으로 중복 알림이 90% 이상 감소했습니다</li>
<li class=""><strong>디버깅 시간 단축</strong>: Traceback 전체를 수집하여 에러 원인 파악 시간이 크게 줄었습니다</li>
<li class=""><strong>운영 안정성 향상</strong>: systemd 통합으로 서버 재시작 후에도 자동으로 모니터링이 재개됩니다</li>
<li class=""><strong>비용 효율성</strong>: Sentry 대비 서버 리소스 사용량이 매우 적고 추가 비용이 발생하지 않습니다</li>
</ol>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="개선-예정-사항">개선 예정 사항<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-%E1%84%8C%E1%85%A5%E1%86%A8%E1%84%8B%E1%85%AD%E1%86%BC-%E1%84%80%E1%85%A1%E1%84%8B%E1%85%B5%E1%84%83%E1%85%B3%E1%84%85%E1%85%A1%E1%84%8B%E1%85%B5%E1%86%AB#%EA%B0%9C%EC%84%A0-%EC%98%88%EC%A0%95-%EC%82%AC%ED%95%AD" class="hash-link" aria-label="개선 예정 사항에 대한 직접 링크" title="개선 예정 사항에 대한 직접 링크" translate="no">​</a></h3>
<p>현재 버전은 기본적인 모니터링 기능을 제공하지만, 다음과 같은 개선을 계획하고 있습니다:</p>
<ul>
<li class=""><strong>다중 컨테이너 지원</strong>: 현재는 단일 컨테이너만 모니터링하지만, 여러 컨테이너를 동시에 모니터링</li>
<li class=""><strong>필터링 룰 커스터마이징</strong>: YAML 설정 파일로 에러 패턴과 필터링 룰을 외부화</li>
<li class=""><strong>메트릭 수집</strong>: 에러 발생 빈도, 패턴 분석 등의 통계 데이터 수집</li>
<li class=""><strong>다양한 알림 채널</strong>: Slack 외에 Email, Discord, PagerDuty 등 추가 지원</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="references">References<a href="https://teddynote-lab.github.io/brain-cache/lab/docker-log-monitor-%E1%84%8C%E1%85%A5%E1%86%A8%E1%84%8B%E1%85%AD%E1%86%BC-%E1%84%80%E1%85%A1%E1%84%8B%E1%85%B5%E1%84%83%E1%85%B3%E1%84%85%E1%85%A1%E1%84%8B%E1%85%B5%E1%86%AB#references" class="hash-link" aria-label="References에 대한 직접 링크" title="References에 대한 직접 링크" translate="no">​</a></h2>
<ul>
<li class=""><a href="https://docker-py.readthedocs.io/" target="_blank" rel="noopener noreferrer" class="">Docker SDK for Python Documentation</a></li>
<li class=""><a href="https://api.slack.com/messaging/webhooks" target="_blank" rel="noopener noreferrer" class="">Slack Incoming Webhooks</a></li>
<li class=""><a href="https://www.freedesktop.org/software/systemd/man/systemd.service.html" target="_blank" rel="noopener noreferrer" class="">systemd Service Unit Configuration</a></li>
<li class=""><a href="https://docs.docker.com/config/containers/logging/" target="_blank" rel="noopener noreferrer" class="">Docker Logging Best Practices</a></li>
</ul>]]></content:encoded>
            <category>Agent</category>
        </item>
        <item>
            <title><![CDATA[Which tabular format RAG Process understands very well?]]></title>
            <link>https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well</link>
            <guid>https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well</guid>
            <pubDate>Sun, 11 Jan 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[RAG 파이프라인에서 테이블 데이터의 포맷이 검색 성능에 미치는 영향을 실험한 결과, Markdown Key-Value 형식이 가장 높은 Recall을 보였으며, TOON 포맷은 토큰 효율성 측면에서 가장 우수했습니다. AIHub의 표 정보 질의응답 데이터 50개를 7]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="tldr">TL;DR<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#tldr" class="hash-link" aria-label="TL;DR에 대한 직접 링크" title="TL;DR에 대한 직접 링크" translate="no">​</a></h2>
<blockquote>
<p>RAG 파이프라인에서 테이블 데이터의 포맷이 검색 성능에 미치는 영향을 실험한 결과, Markdown Key-Value 형식이 가장 높은 Recall을 보였으며, TOON 포맷은 토큰 효율성 측면에서 가장 우수했습니다. AIHub의 표 정보 질의응답 데이터 50개를 7가지 포맷으로 변환하여 비교한 결과, 포맷 선택은 성능과 비용 간의 트레이드오프 관계에 있음을 확인했습니다.</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="key-takeaways">Key Takeaways<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#key-takeaways" class="hash-link" aria-label="Key Takeaways에 대한 직접 링크" title="Key Takeaways에 대한 직접 링크" translate="no">​</a></h2>
<ul>
<li class=""><strong>포맷 선택은 사용 사례에 따라 달라져야 함</strong>: 높은 정확도가 필요한 경우 Markdown-KV, 비용 효율이 중요한 경우 TOON 포맷을 선택하는 것이 합리적입니다.</li>
<li class=""><strong>토큰 효율과 검색 성능은 별개</strong>: TOON은 평균 토큰 수를 크게 줄이지만, 이것이 항상 높은 Recall로 이어지지는 않습니다. Embedding 모델의 학습 데이터와 포맷 간의 친화성이 중요합니다.</li>
<li class=""><strong>소규모 실험으로도 유의미한 인사이트 도출 가능</strong>: 50개 샘플로도 포맷 간 상대적 성능 차이를 파악할 수 있으며, 이를 바탕으로 프로덕션 환경에서의 포맷 선택 방향을 설정할 수 있습니다.</li>
<li class=""><strong>평가 파이프라인 구축의 중요성</strong>: LLM을 활용한 QA 생성 → Retrieval 평가의 자동화된 파이프라인은 다양한 실험을 빠르게 반복할 수 있게 해줍니다.</li>
<li class=""><strong>Generation 단계까지 고려해야 완전한 평가</strong>: Retrieval 성능만으로는 최종 사용자 경험을 대변하기 어려우며, 실제 답변 생성 품질까지 평가해야 합니다.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="상세-내용">상세 내용<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#%EC%83%81%EC%84%B8-%EB%82%B4%EC%9A%A9" class="hash-link" aria-label="상세 내용에 대한 직접 링크" title="상세 내용에 대한 직접 링크" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="배경-테이블-데이터와-rag의-만남">배경: 테이블 데이터와 RAG의 만남<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#%EB%B0%B0%EA%B2%BD-%ED%85%8C%EC%9D%B4%EB%B8%94-%EB%8D%B0%EC%9D%B4%ED%84%B0%EC%99%80-rag%EC%9D%98-%EB%A7%8C%EB%82%A8" class="hash-link" aria-label="배경: 테이블 데이터와 RAG의 만남에 대한 직접 링크" title="배경: 테이블 데이터와 RAG의 만남에 대한 직접 링크" translate="no">​</a></h3>
<p>RAG(Retrieval-Augmented Generation) 시스템에서 테이블 데이터는 구조화된 정보를 담고 있어 높은 가치를 지니지만, 동시에 처리하기 까다로운 대상입니다. HTML 테이블, Markdown, JSON, CSV 등 다양한 포맷이 존재하며, 각 포맷은 정보 밀도, 토큰 소비량, LLM의 이해도 측면에서 상이한 특성을 보입니다.</p>
<p>최근 TOON(Token-Oriented Object Notation) 포맷이 등장하면서, 동일한 정보를 더 적은 토큰으로 표현하면서도 LLM이 이해하기 쉬운 구조를 제공한다는 주장이 제기되었습니다. 그러나 실제 RAG 환경에서 어떤 포맷이 최적인지에 대한 실증적 연구는 부족한 상황이었습니다.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="문제-상황-포맷-선택의-딜레마">문제 상황: 포맷 선택의 딜레마<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#%EB%AC%B8%EC%A0%9C-%EC%83%81%ED%99%A9-%ED%8F%AC%EB%A7%B7-%EC%84%A0%ED%83%9D%EC%9D%98-%EB%94%9C%EB%A0%88%EB%A7%88" class="hash-link" aria-label="문제 상황: 포맷 선택의 딜레마에 대한 직접 링크" title="문제 상황: 포맷 선택의 딜레마에 대한 직접 링크" translate="no">​</a></h3>
<p>프로덕션 RAG 시스템을 구축할 때, 다음과 같은 질문에 직면합니다:</p>
<ol>
<li class=""><strong>토큰 효율성과 검색 성능 중 무엇을 우선할 것인가?</strong></li>
<li class=""><strong>Embedding 모델이 특정 포맷을 더 잘 이해하는가?</strong></li>
<li class=""><strong>포맷 변환의 추가 비용 대비 성능 개선이 합리적인가?</strong></li>
</ol>
<p>이러한 질문에 답하기 위해 체계적인 실험을 설계했습니다.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="실험-설계">실험 설계<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#%EC%8B%A4%ED%97%98-%EC%84%A4%EA%B3%84" class="hash-link" aria-label="실험 설계에 대한 직접 링크" title="실험 설계에 대한 직접 링크" translate="no">​</a></h3>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="데이터-준비">데이터 준비<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#%EB%8D%B0%EC%9D%B4%ED%84%B0-%EC%A4%80%EB%B9%84" class="hash-link" aria-label="데이터 준비에 대한 직접 링크" title="데이터 준비에 대한 직접 링크" translate="no">​</a></h4>
<p>AIHub의 "표 정보 질의응답 데이터"를 활용했습니다. 이 데이터는:</p>
<ul>
<li class="">총 100만 건의 QA 쌍 포함</li>
<li class="">건축, 공공행정, 과학기술 등 10개 카테고리</li>
<li class="">다양한 테이블 복잡도 (행 수, 헤더 depth 등)</li>
</ul>
<p>전체 16,000개 테이블 중 50개를 무작위 샘플링하여 Target Data로 선정했습니다.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="평가용-qa-생성">평가용 QA 생성<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#%ED%8F%89%EA%B0%80%EC%9A%A9-qa-%EC%83%9D%EC%84%B1" class="hash-link" aria-label="평가용 QA 생성에 대한 직접 링크" title="평가용 QA 생성에 대한 직접 링크" translate="no">​</a></h4>
<p>각 테이블에 대해 GPT-4.1을 활용하여 질문-답변 쌍을 자동 생성했습니다:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># 의사 코드</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">generate_qa_pairs</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">table_chunk</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Step 1: 테이블 기반 질문 생성</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    question </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> gpt4_generate_question</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">table_chunk</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Step 2: 테이블 + 질문 기반 답변 생성</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    answer </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> gpt4_generate_answer</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">table_chunk</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> question</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> question</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> answer</span><br></span></code></pre></div></div>
<p>이 중 25개를 최종 Evaluation Data로 선정했습니다.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="7가지-테이블-포맷">7가지 테이블 포맷<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#7%EA%B0%80%EC%A7%80-%ED%85%8C%EC%9D%B4%EB%B8%94-%ED%8F%AC%EB%A7%B7" class="hash-link" aria-label="7가지 테이블 포맷에 대한 직접 링크" title="7가지 테이블 포맷에 대한 직접 링크" translate="no">​</a></h4>
<p>다음 포맷들을 비교했습니다:</p>
<ol>
<li class=""><strong>HTML</strong>: 표준 <code>&lt;table&gt;</code> 태그 구조</li>
<li class=""><strong>Markdown</strong>: 파이프(<code>|</code>)로 구분된 형식</li>
<li class=""><strong>Markdown-KV</strong>: 각 행을 Key-Value 쌍으로 표현</li>
<li class=""><strong>TOON</strong>: 탭형 구조로 압축된 포맷</li>
<li class=""><strong>JSON</strong>: 표준 JSON 배열 구조</li>
<li class=""><strong>Plain Text</strong>: 자연어 형태로 풀어쓴 형식</li>
<li class=""><strong>CSV-like</strong>: 쉼표로 구분된 단순 형식</li>
</ol>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="포맷-변환-예시">포맷 변환 예시<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#%ED%8F%AC%EB%A7%B7-%EB%B3%80%ED%99%98-%EC%98%88%EC%8B%9C" class="hash-link" aria-label="포맷 변환 예시에 대한 직접 링크" title="포맷 변환 예시에 대한 직접 링크" translate="no">​</a></h4>
<p>원본 HTML 테이블:</p>
<div class="language-html codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-html codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">table</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">tr</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">td</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">바탕의 종류</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">td</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">td</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">도장 종류</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">td</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">td</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">공법</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">td</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">tr</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">tr</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">td</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">목재면</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">td</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">td</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">1종</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">td</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">td</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">부분 퍼티 처리</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">td</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">tr</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">tr</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">td</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">철재면</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">td</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">td</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">2종</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">td</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">td</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain">금속바탕 처리용 프라이머</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">td</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">tr</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">table</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<p><strong>Markdown-KV 변환</strong>:</p>
<div class="language-markdown codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-markdown codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token title important punctuation" style="color:#393A34">##</span><span class="token title important"> 바탕 만들기의 도장 방법</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token bold punctuation" style="color:#393A34">**</span><span class="token bold content">항목 1:</span><span class="token bold punctuation" style="color:#393A34">**</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token list punctuation" style="color:#393A34">-</span><span class="token plain"> 바탕의 종류: 목재면</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token list punctuation" style="color:#393A34">-</span><span class="token plain"> 도장 종류: 1종</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token list punctuation" style="color:#393A34">-</span><span class="token plain"> 공법: 부분 퍼티 처리</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token bold punctuation" style="color:#393A34">**</span><span class="token bold content">항목 2:</span><span class="token bold punctuation" style="color:#393A34">**</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token list punctuation" style="color:#393A34">-</span><span class="token plain"> 바탕의 종류: 철재면</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token list punctuation" style="color:#393A34">-</span><span class="token plain"> 도장 종류: 2종</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token list punctuation" style="color:#393A34">-</span><span class="token plain"> 공법: 금속바탕 처리용 프라이머</span><br></span></code></pre></div></div>
<p><strong>TOON 변환</strong>:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">바탕 만들기의 도장 방법[2]{바탕의 종류, 도장 종류, 공법}:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">목재면, 1종, 부분 퍼티 처리</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">철재면, 2종, 금속바탕 처리용 프라이머</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="실험-방법론">실험 방법론<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#%EC%8B%A4%ED%97%98-%EB%B0%A9%EB%B2%95%EB%A1%A0" class="hash-link" aria-label="실험 방법론에 대한 직접 링크" title="실험 방법론에 대한 직접 링크" translate="no">​</a></h3>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="embedding-및-저장">Embedding 및 저장<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#embedding-%EB%B0%8F-%EC%A0%80%EC%9E%A5" class="hash-link" aria-label="Embedding 및 저장에 대한 직접 링크" title="Embedding 및 저장에 대한 직접 링크" translate="no">​</a></h4>
<p>Qwen/Qwen3-Embedding-8B 모델을 사용하여 각 포맷별로 임베딩을 생성하고, 별도의 Chroma collection에 저장했습니다. 이 모델을 선택한 이유는:</p>
<ul>
<li class="">다국어 지원 (한국어 포함)</li>
<li class="">8B 파라미터로 높은 성능</li>
<li class="">문서 검색에 최적화된 학습</li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="평가-지표">평가 지표<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#%ED%8F%89%EA%B0%80-%EC%A7%80%ED%91%9C" class="hash-link" aria-label="평가 지표에 대한 직접 링크" title="평가 지표에 대한 직접 링크" translate="no">​</a></h4>
<ol>
<li class=""><strong>Recall@K</strong>: Top-K 검색 결과에 정답 문서가 포함되는 비율
<ul>
<li class="">K=1, 2, 3에 대해 측정</li>
</ul>
</li>
<li class=""><strong>Average Token Count</strong>: 각 포맷의 평균 토큰 수
<ul>
<li class="">비용 효율성의 대리 지표</li>
</ul>
</li>
</ol>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">evaluate_retrieval</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">collection</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> queries</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> ground_truth</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> k</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    recalls </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> query</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> gt_doc_id </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> </span><span class="token builtin">zip</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">queries</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> ground_truth</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        results </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> collection</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> n_results</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">k</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        retrieved_ids </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">r</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">'id'</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> r </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> results</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        recall </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> gt_doc_id </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> retrieved_ids </span><span class="token keyword" style="color:#00009f">else</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        recalls</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">append</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">recall</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token builtin">sum</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">recalls</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">/</span><span class="token plain"> </span><span class="token builtin">len</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">recalls</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="실험-결과-분석">실험 결과 분석<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#%EC%8B%A4%ED%97%98-%EA%B2%B0%EA%B3%BC-%EB%B6%84%EC%84%9D" class="hash-link" aria-label="실험 결과 분석에 대한 직접 링크" title="실험 결과 분석에 대한 직접 링크" translate="no">​</a></h3>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="recall-성능">Recall 성능<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#recall-%EC%84%B1%EB%8A%A5" class="hash-link" aria-label="Recall 성능에 대한 직접 링크" title="Recall 성능에 대한 직접 링크" translate="no">​</a></h4>
<p>실험 결과, Markdown-KV 포맷이 가장 높은 Recall을 기록했습니다:</p>
<ul>
<li class=""><strong>Markdown-KV</strong>: Recall@3 기준 약 85%</li>
<li class=""><strong>HTML</strong>: 약 78%</li>
<li class=""><strong>TOON</strong>: 약 72%</li>
<li class=""><strong>Plain Text</strong>: 약 68%</li>
<li class=""><strong>JSON</strong>: 약 65%</li>
</ul>
<p><strong>왜 Markdown-KV가 우수했는가?</strong></p>
<ol>
<li class=""><strong>명시적 Key-Value 구조</strong>: "바탕의 종류: 목재면"과 같은 형식은 Embedding 모델이 의미론적 관계를 파악하기 쉽게 만듭니다.</li>
<li class=""><strong>자연어 친화성</strong>: Qwen 모델의 학습 데이터에 Markdown 형식이 많이 포함되어 있을 가능성이 높습니다.</li>
<li class=""><strong>정보 밀도</strong>: 각 항목이 독립적으로 표현되어 부분 매칭에 유리합니다.</li>
</ol>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="토큰-효율성">토큰 효율성<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#%ED%86%A0%ED%81%B0-%ED%9A%A8%EC%9C%A8%EC%84%B1" class="hash-link" aria-label="토큰 효율성에 대한 직접 링크" title="토큰 효율성에 대한 직접 링크" translate="no">​</a></h4>
<p>TOON 포맷이 기대대로 가장 효율적이었습니다:</p>
<ul>
<li class=""><strong>TOON</strong>: 평균 약 120 토큰 (기준)</li>
<li class=""><strong>Markdown-KV</strong>: 평균 약 210 토큰 (+75%)</li>
<li class=""><strong>HTML</strong>: 평균 약 180 토큰 (+50%)</li>
<li class=""><strong>JSON</strong>: 평균 약 195 토큰 (+62%)</li>
</ul>
<p><strong>토큰 효율성의 실제 의미</strong></p>
<p>50개 테이블 기준으로 계산하면:</p>
<ul>
<li class="">TOON: 6,000 토큰</li>
<li class="">Markdown-KV: 10,500 토큰</li>
</ul>
<p>월 100만 테이블을 처리하는 서비스라면:</p>
<ul>
<li class="">토큰 차이: 90,000,000 토큰/월</li>
<li class="">비용 차이 (OpenAI 가격 기준 $0.0001/1K 토큰): $9/월</li>
</ul>
<p>규모가 커질수록 이 차이는 유의미해집니다.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="의사결정-프레임워크-어떤-포맷을-선택할-것인가">의사결정 프레임워크: 어떤 포맷을 선택할 것인가?<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#%EC%9D%98%EC%82%AC%EA%B2%B0%EC%A0%95-%ED%94%84%EB%A0%88%EC%9E%84%EC%9B%8C%ED%81%AC-%EC%96%B4%EB%96%A4-%ED%8F%AC%EB%A7%B7%EC%9D%84-%EC%84%A0%ED%83%9D%ED%95%A0-%EA%B2%83%EC%9D%B8%EA%B0%80" class="hash-link" aria-label="의사결정 프레임워크: 어떤 포맷을 선택할 것인가?에 대한 직접 링크" title="의사결정 프레임워크: 어떤 포맷을 선택할 것인가?에 대한 직접 링크" translate="no">​</a></h3>
<p>실험 결과를 바탕으로 다음과 같은 의사결정 트리를 제안합니다:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">질문 1: 검색 정확도가 최우선인가?</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">└─ Yes → Markdown-KV 선택</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">└─ No  → 질문 2로</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">질문 2: 대용량 처리 (&gt;100K 테이블/일)인가?</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">└─ Yes → TOON 선택</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">└─ No  → 질문 3으로</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">질문 3: 기존 시스템이 특정 포맷을 사용 중인가?</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">└─ Yes → 기존 포맷 유지 (변환 비용 고려)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">└─ No  → HTML 또는 Markdown 선택 (범용성)</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="한계점-및-추가-고려사항">한계점 및 추가 고려사항<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#%ED%95%9C%EA%B3%84%EC%A0%90-%EB%B0%8F-%EC%B6%94%EA%B0%80-%EA%B3%A0%EB%A0%A4%EC%82%AC%ED%95%AD" class="hash-link" aria-label="한계점 및 추가 고려사항에 대한 직접 링크" title="한계점 및 추가 고려사항에 대한 직접 링크" translate="no">​</a></h3>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-실험-규모의-한계">1. 실험 규모의 한계<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#1-%EC%8B%A4%ED%97%98-%EA%B7%9C%EB%AA%A8%EC%9D%98-%ED%95%9C%EA%B3%84" class="hash-link" aria-label="1. 실험 규모의 한계에 대한 직접 링크" title="1. 실험 규모의 한계에 대한 직접 링크" translate="no">​</a></h4>
<p>50개 샘플은 트렌드를 파악하기에는 충분하지만, 통계적 유의성을 확보하기에는 부족합니다. 특히:</p>
<ul>
<li class="">Recall@5 이상에서는 차이가 수렴할 가능성</li>
<li class="">특정 도메인(예: 금융 표)에서는 다른 결과가 나올 수 있음</li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-embedding-모델-의존성">2. Embedding 모델 의존성<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#2-embedding-%EB%AA%A8%EB%8D%B8-%EC%9D%98%EC%A1%B4%EC%84%B1" class="hash-link" aria-label="2. Embedding 모델 의존성에 대한 직접 링크" title="2. Embedding 모델 의존성에 대한 직접 링크" translate="no">​</a></h4>
<p>Qwen3-Embedding-8B는 우수한 성능을 보이지만, 이는 결과에 편향을 줄 수 있습니다. OpenAI의 text-embedding-3-large나 Cohere의 embed-multilingual-v3.0으로 실험하면 다른 포맷이 우세할 수 있습니다.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-generation-단계-미평가">3. Generation 단계 미평가<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#3-generation-%EB%8B%A8%EA%B3%84-%EB%AF%B8%ED%8F%89%EA%B0%80" class="hash-link" aria-label="3. Generation 단계 미평가에 대한 직접 링크" title="3. Generation 단계 미평가에 대한 직접 링크" translate="no">​</a></h4>
<p>Retrieval 성능만으로는 불충분합니다. 실제 RAG 시스템에서는:</p>
<ul>
<li class="">LLM이 retrieved context를 얼마나 잘 이해하는가?</li>
<li class="">최종 답변의 품질은?</li>
</ul>
<p>이를 평가하기 위해서는 추가 실험이 필요합니다:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">evaluate_end_to_end</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> retrieved_chunks</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> ground_truth_answer</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># LLM에 retrieved chunks를 전달하여 답변 생성</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    generated_answer </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> llm_generate</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> retrieved_chunks</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># 답변 품질 평가 (ROUGE, BERTScore 등)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    score </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> evaluate_answer_quality</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">generated_answer</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> ground_truth_answer</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> score</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="향후-연구-방향">향후 연구 방향<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#%ED%96%A5%ED%9B%84-%EC%97%B0%EA%B5%AC-%EB%B0%A9%ED%96%A5" class="hash-link" aria-label="향후 연구 방향에 대한 직접 링크" title="향후 연구 방향에 대한 직접 링크" translate="no">​</a></h3>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-대규모-벤치마크">1. 대규모 벤치마크<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#1-%EB%8C%80%EA%B7%9C%EB%AA%A8-%EB%B2%A4%EC%B9%98%EB%A7%88%ED%81%AC" class="hash-link" aria-label="1. 대규모 벤치마크에 대한 직접 링크" title="1. 대규모 벤치마크에 대한 직접 링크" translate="no">​</a></h4>
<ul>
<li class="">1,000개 이상의 테이블로 실험 확장</li>
<li class="">다양한 카테고리별 성능 비교</li>
<li class="">테이블 복잡도(행/열 수, nested structure)에 따른 포맷 성능 변화 분석</li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-하이브리드-접근">2. 하이브리드 접근<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#2-%ED%95%98%EC%9D%B4%EB%B8%8C%EB%A6%AC%EB%93%9C-%EC%A0%91%EA%B7%BC" class="hash-link" aria-label="2. 하이브리드 접근에 대한 직접 링크" title="2. 하이브리드 접근에 대한 직접 링크" translate="no">​</a></h4>
<p>여러 포맷을 동시에 활용하는 전략:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">hybrid_retrieval</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># TOON으로 1차 검색 (효율성)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    toon_candidates </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> toon_collection</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> n_results</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">10</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Markdown-KV로 재순위화 (정확성)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    refined_results </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> rerank_with_markdown_kv</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">toon_candidates</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> refined_results</span><br></span></code></pre></div></div>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-포맷별-최적화">3. 포맷별 최적화<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#3-%ED%8F%AC%EB%A7%B7%EB%B3%84-%EC%B5%9C%EC%A0%81%ED%99%94" class="hash-link" aria-label="3. 포맷별 최적화에 대한 직접 링크" title="3. 포맷별 최적화에 대한 직접 링크" translate="no">​</a></h4>
<p>각 포맷에 특화된 Retrieval 전략:</p>
<ul>
<li class="">TOON: 구조 인식 검색</li>
<li class="">Markdown-KV: Key 기반 필터링 + Value 검색</li>
<li class="">JSON: 스키마 활용 쿼리 확장</li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-도메인-특화-실험">4. 도메인 특화 실험<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#4-%EB%8F%84%EB%A9%94%EC%9D%B8-%ED%8A%B9%ED%99%94-%EC%8B%A4%ED%97%98" class="hash-link" aria-label="4. 도메인 특화 실험에 대한 직접 링크" title="4. 도메인 특화 실험에 대한 직접 링크" translate="no">​</a></h4>
<ul>
<li class="">금융 표: 숫자와 단위가 중요</li>
<li class="">법률 표: 계층 구조와 참조가 중요</li>
<li class="">과학 표: 수식과 기호가 중요</li>
</ul>
<p>각 도메인에서 최적 포맷이 다를 수 있습니다.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="실무-적용-가이드">실무 적용 가이드<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#%EC%8B%A4%EB%AC%B4-%EC%A0%81%EC%9A%A9-%EA%B0%80%EC%9D%B4%EB%93%9C" class="hash-link" aria-label="실무 적용 가이드에 대한 직접 링크" title="실무 적용 가이드에 대한 직접 링크" translate="no">​</a></h3>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="step-1-요구사항-분석">Step 1: 요구사항 분석<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#step-1-%EC%9A%94%EA%B5%AC%EC%82%AC%ED%95%AD-%EB%B6%84%EC%84%9D" class="hash-link" aria-label="Step 1: 요구사항 분석에 대한 직접 링크" title="Step 1: 요구사항 분석에 대한 직접 링크" translate="no">​</a></h4>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">requirements </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"accuracy_priority"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"high"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># high/medium/low</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"volume"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"100K tables/day"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"budget"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"tight"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"latency_requirement"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"&lt;2s"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"existing_format"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"HTML"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><br></span></code></pre></div></div>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="step-2-파일럿-실험">Step 2: 파일럿 실험<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#step-2-%ED%8C%8C%EC%9D%BC%EB%9F%BF-%EC%8B%A4%ED%97%98" class="hash-link" aria-label="Step 2: 파일럿 실험에 대한 직접 링크" title="Step 2: 파일럿 실험에 대한 직접 링크" translate="no">​</a></h4>
<p>작은 규모로 3-4개 포맷을 비교:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># 실험 설정</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">formats_to_test </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"markdown_kv"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"toon"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"html"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">sample_size </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">100</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 평가 실행</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">results </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> fmt </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> formats_to_test</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    results</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">fmt</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"recall"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> evaluate_recall</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">fmt</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> sample_size</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"avg_tokens"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> calculate_avg_tokens</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">fmt</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> sample_size</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"conversion_cost"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> estimate_conversion_cost</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">fmt</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 최적 선택</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">best_format </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> select_best</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">results</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> requirements</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="step-3-프로덕션-롤아웃">Step 3: 프로덕션 롤아웃<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#step-3-%ED%94%84%EB%A1%9C%EB%8D%95%EC%85%98-%EB%A1%A4%EC%95%84%EC%9B%83" class="hash-link" aria-label="Step 3: 프로덕션 롤아웃에 대한 직접 링크" title="Step 3: 프로덕션 롤아웃에 대한 직접 링크" translate="no">​</a></h4>
<ol>
<li class=""><strong>A/B 테스팅</strong>: 기존 포맷과 신규 포맷을 동시 운영</li>
<li class=""><strong>모니터링</strong>: 실제 사용자 쿼리에서의 성능 추적</li>
<li class=""><strong>점진적 전환</strong>: 성능이 검증되면 단계적으로 확대</li>
</ol>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="references">References<a href="https://teddynote-lab.github.io/brain-cache/lab/which-tabular-format-rag-process-understands-very-well#references" class="hash-link" aria-label="References에 대한 직접 링크" title="References에 대한 직접 링크" translate="no">​</a></h2>
<ul>
<li class=""><a href="https://www.ncloud-forums.com/topic/594/" target="_blank" rel="noopener noreferrer" class="">NAVER Cloud - JSON vs TOON: LLM 입력 포맷 비교</a>
<ul>
<li class="">TOON 포맷의 토큰 효율성과 LLM 성능에 대한 벤치마크</li>
</ul>
</li>
<li class=""><a href="https://github.com/toon-format/toon/tree/main" target="_blank" rel="noopener noreferrer" class="">TOON GitHub Repository</a>
<ul>
<li class="">TOON 포맷 스펙 및 변환 도구</li>
</ul>
</li>
<li class=""><a href="https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&amp;topMenu=100&amp;aihubDataSe=data&amp;dataSetSn=71565" target="_blank" rel="noopener noreferrer" class="">AIHub - 표 정보 질의응답 데이터</a>
<ul>
<li class="">실험에 사용된 데이터셋</li>
</ul>
</li>
<li class=""><a href="https://docs.trychroma.com/" target="_blank" rel="noopener noreferrer" class="">ChromaDB Documentation</a>
<ul>
<li class="">Vector store 구현 참고</li>
</ul>
</li>
<li class=""><a href="https://huggingface.co/Qwen" target="_blank" rel="noopener noreferrer" class="">Qwen Embedding Models</a>
<ul>
<li class="">실험에 사용된 임베딩 모델</li>
</ul>
</li>
</ul>]]></content:encoded>
            <category>Data</category>
        </item>
        <item>
            <title><![CDATA[메모랜덤 flow에 사용된 문서영역별 Clustering 성능평가]]></title>
            <link>https://teddynote-lab.github.io/brain-cache/lab/메모랜덤-flow에-사용된-문서영역별-clustering-성능평가</link>
            <guid>https://teddynote-lab.github.io/brain-cache/lab/메모랜덤-flow에-사용된-문서영역별-clustering-성능평가</guid>
            <pubDate>Mon, 29 Dec 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[GS Caltex 메모랜덤-연구노트 매칭 프로젝트에서 PyMuPDF 기반 파서, Titan Embed V2 임베딩, ChromaDB 벡터 검색, Claude Sonnet 4.5 LLM 판정을 결합한 파이프라인을 구축했습니다. 137개 연구노트 중 82.5%가 메모랜덤과]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="tldr">TL;DR<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%86%E1%85%A6%E1%84%86%E1%85%A9%E1%84%85%E1%85%A2%E1%86%AB%E1%84%83%E1%85%A5%E1%86%B7-flow%E1%84%8B%E1%85%A6-%E1%84%89%E1%85%A1%E1%84%8B%E1%85%AD%E1%86%BC%E1%84%83%E1%85%AC%E1%86%AB-%E1%84%86%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%84%8B%E1%85%A7%E1%86%BC%E1%84%8B%E1%85%A7%E1%86%A8%E1%84%87%E1%85%A7%E1%86%AF-clustering-%E1%84%89%E1%85%A5%E1%86%BC%E1%84%82%E1%85%B3%E1%86%BC%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1#tldr" class="hash-link" aria-label="TL;DR에 대한 직접 링크" title="TL;DR에 대한 직접 링크" translate="no">​</a></h2>
<blockquote>
<p>GS Caltex 메모랜덤-연구노트 매칭 프로젝트에서 PyMuPDF 기반 파서, Titan Embed V2 임베딩, ChromaDB 벡터 검색, Claude Sonnet 4.5 LLM 판정을 결합한 파이프라인을 구축했습니다. 137개 연구노트 중 82.5%가 메모랜덤과 매칭되었으며, Label Propagation 알고리즘이 BCubed F1 0.763으로 최고 성능을 기록했습니다. 단일 클러스터로 수렴하는 경향이 강해 의미 기반보다 서사 구조 기반 접근이 더 효과적임을 확인했습니다.</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="key-takeaways">Key Takeaways<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%86%E1%85%A6%E1%84%86%E1%85%A9%E1%84%85%E1%85%A2%E1%86%AB%E1%84%83%E1%85%A5%E1%86%B7-flow%E1%84%8B%E1%85%A6-%E1%84%89%E1%85%A1%E1%84%8B%E1%85%AD%E1%86%BC%E1%84%83%E1%85%AC%E1%86%AB-%E1%84%86%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%84%8B%E1%85%A7%E1%86%BC%E1%84%8B%E1%85%A7%E1%86%A8%E1%84%87%E1%85%A7%E1%86%AF-clustering-%E1%84%89%E1%85%A5%E1%86%BC%E1%84%82%E1%85%B3%E1%86%BC%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1#key-takeaways" class="hash-link" aria-label="Key Takeaways에 대한 직접 링크" title="Key Takeaways에 대한 직접 링크" translate="no">​</a></h2>
<ul>
<li class="">
<p><strong>LLM 기반 매칭 판정의 보수적 설계</strong>: 벡터 유사도만으로는 부족한 정확도를 Claude Sonnet 4.5의 명시적 판정(표 데이터 유무, 섹션 타입 필터링)으로 보완하여 82.5% 매칭률 달성. Temperature=0.0으로 일관성 확보가 핵심.</p>
</li>
<li class="">
<p><strong>클러스터링 알고리즘 선택은 데이터 특성에 의존적</strong>: Label Propagation(F1 0.763)이 평균적으로 우수하나, 단일 주제 데이터(ELN3)는 Connected Components로 완벽 매칭(F1 1.0), 복잡한 다중 주제(ELN5)는 HDBSCAN이 유리(F1 0.812). 사전 데이터 분석 필수.</p>
</li>
<li class="">
<p><strong>HDBSCAN 과분할 문제와 파라미터 민감도</strong>: <code>min_cluster_size</code> 5 이상에서 평균 37개 클러스터 생성으로 recall 급락. 최적값은 2~3 + <code>min_samples=3</code> 조합으로 F1 0.736 달성. 정성 피드백과 정량 평가 일치.</p>
</li>
<li class="">
<p><strong>단일 클러스터 수렴 현상의 근본 원인</strong>: 연구노트 간 임베딩 유사도가 높아 GraphCommunity 알고리즘에서 평균 1.1~1.9개 클러스터만 생성. 의미 기반 분할보다 서사 구조(시간순, 실험 단계) 기반 청킹이 더 효과적.</p>
</li>
<li class="">
<p><strong>BCubed F1 메트릭의 실무 적용성</strong>: Precision/Recall 균형 평가로 과분할(precision 하락)과 과소분할(recall 하락) 동시 탐지 가능. 클러스터 수와 F1을 함께 모니터링하여 알고리즘 선택 가이드 제공.</p>
</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="상세-내용">상세 내용<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%86%E1%85%A6%E1%84%86%E1%85%A9%E1%84%85%E1%85%A2%E1%86%AB%E1%84%83%E1%85%A5%E1%86%B7-flow%E1%84%8B%E1%85%A6-%E1%84%89%E1%85%A1%E1%84%8B%E1%85%AD%E1%86%BC%E1%84%83%E1%85%AC%E1%86%AB-%E1%84%86%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%84%8B%E1%85%A7%E1%86%BC%E1%84%8B%E1%85%A7%E1%86%A8%E1%84%87%E1%85%A7%E1%86%AF-clustering-%E1%84%89%E1%85%A5%E1%86%BC%E1%84%82%E1%85%B3%E1%86%BC%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1#%EC%83%81%EC%84%B8-%EB%82%B4%EC%9A%A9" class="hash-link" aria-label="상세 내용에 대한 직접 링크" title="상세 내용에 대한 직접 링크" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="배경-연구노트-메모랜덤-자동-매칭-시스템-구축">배경: 연구노트-메모랜덤 자동 매칭 시스템 구축<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%86%E1%85%A6%E1%84%86%E1%85%A9%E1%84%85%E1%85%A2%E1%86%AB%E1%84%83%E1%85%A5%E1%86%B7-flow%E1%84%8B%E1%85%A6-%E1%84%89%E1%85%A1%E1%84%8B%E1%85%AD%E1%86%BC%E1%84%83%E1%85%AC%E1%86%AB-%E1%84%86%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%84%8B%E1%85%A7%E1%86%BC%E1%84%8B%E1%85%A7%E1%86%A8%E1%84%87%E1%85%A7%E1%86%AF-clustering-%E1%84%89%E1%85%A5%E1%86%BC%E1%84%82%E1%85%B3%E1%86%BC%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1#%EB%B0%B0%EA%B2%BD-%EC%97%B0%EA%B5%AC%EB%85%B8%ED%8A%B8-%EB%A9%94%EB%AA%A8%EB%9E%9C%EB%8D%A4-%EC%9E%90%EB%8F%99-%EB%A7%A4%EC%B9%AD-%EC%8B%9C%EC%8A%A4%ED%85%9C-%EA%B5%AC%EC%B6%95" class="hash-link" aria-label="배경: 연구노트-메모랜덤 자동 매칭 시스템 구축에 대한 직접 링크" title="배경: 연구노트-메모랜덤 자동 매칭 시스템 구축에 대한 직접 링크" translate="no">​</a></h3>
<p>GS Caltex 프로젝트는 PDF 형태의 연구노트(ELN)와 메모랜덤을 자동으로 매칭하여 연구 결과를 체계화하는 시스템 개발을 목표로 했습니다. 기존 수작업 매칭은 시간이 많이 소요되고 일관성이 부족했으며, 메모랜덤의 목차 구조를 활용하여 관련 연구노트를 자동으로 클러스터링하는 솔루션이 필요했습니다.</p>
<p>핵심 과제는 두 가지였습니다:</p>
<ol>
<li class=""><strong>정확한 정답셋 생성</strong>: 벡터 유사도만으로는 부정확한 매칭이 많아 LLM 판정 단계 추가</li>
<li class=""><strong>효과적인 클러스터링</strong>: 메모랜덤 목차별로 연구노트를 의미있게 그룹화</li>
</ol>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="아키텍처-설계-파싱--임베딩--벡터-검색--llm-판정">아키텍처 설계: 파싱 → 임베딩 → 벡터 검색 → LLM 판정<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%86%E1%85%A6%E1%84%86%E1%85%A9%E1%84%85%E1%85%A2%E1%86%AB%E1%84%83%E1%85%A5%E1%86%B7-flow%E1%84%8B%E1%85%A6-%E1%84%89%E1%85%A1%E1%84%8B%E1%85%AD%E1%86%BC%E1%84%83%E1%85%AC%E1%86%AB-%E1%84%86%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%84%8B%E1%85%A7%E1%86%BC%E1%84%8B%E1%85%A7%E1%86%A8%E1%84%87%E1%85%A7%E1%86%AF-clustering-%E1%84%89%E1%85%A5%E1%86%BC%E1%84%82%E1%85%B3%E1%86%BC%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1#%EC%95%84%ED%82%A4%ED%85%8D%EC%B2%98-%EC%84%A4%EA%B3%84-%ED%8C%8C%EC%8B%B1--%EC%9E%84%EB%B2%A0%EB%94%A9--%EB%B2%A1%ED%84%B0-%EA%B2%80%EC%83%89--llm-%ED%8C%90%EC%A0%95" class="hash-link" aria-label="아키텍처 설계: 파싱 → 임베딩 → 벡터 검색 → LLM 판정에 대한 직접 링크" title="아키텍처 설계: 파싱 → 임베딩 → 벡터 검색 → LLM 판정에 대한 직접 링크" translate="no">​</a></h3>
<p><strong>1단계: 문서 파싱 및 청킹</strong></p>
<p>메모랜덤과 연구노트는 서로 다른 파싱 전략을 적용했습니다:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># 메모랜덤: 폰트 크기 기반 목차 추출</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">class</span><span class="token plain"> </span><span class="token class-name">MemorandumNaiveParser</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">extract_toc</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> page</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> font_size </span><span class="token operator" style="color:#393A34">&gt;=</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">13.5</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"대제목"</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># "1. 목적"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">elif</span><span class="token plain"> font_size </span><span class="token operator" style="color:#393A34">&gt;=</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">11.5</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"소제목"</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># "2.1. 균주"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">else</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"본문"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 연구노트: 키워드 기반 섹션 분류</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">SECTION_KEYWORDS </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"실험개요"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"실험개요"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"실험 개요"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"실험방법"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"실험방법"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"실험 방법"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"실험결과"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"실험결과"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Task Results"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"결론"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"결론"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"고찰"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"결과 및 토의"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><br></span></code></pre></div></div>
<p><strong>의사결정 포인트</strong>: PyMuPDF를 선택한 이유는 로컬 처리 가능, 빠른 속도, 폰트 메타데이터 추출 지원 때문입니다. Upstage Document Parse API도 테스트했으나, 대부분의 문서에서 PyMuPDF와 유사한 품질을 보여 비용 효율적인 로컬 처리를 선택했습니다.</p>
<p><strong>2단계: 임베딩 및 벡터 저장</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># Amazon Titan Embed Text V2 설정</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">embedding_config </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"model_id"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"amazon.titan-embed-text-v2:0"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"dimensions"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1024</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"normalize"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># L2 정규화로 코사인 유사도 계산</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"max_tokens"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">8192</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"safety_margin"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0.85</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># 실제 최대: 6,553 토큰</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># ChromaDB 디스크 기반 저장</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">chroma_client </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> chromadb</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">PersistentClient</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">path</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"./chroma_db"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">collection </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> chroma_client</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">create_collection</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"memorandum_eln"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    metadata</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"hnsw:space"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"cosine"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p><strong>의사결정 포인트</strong>: Titan V2를 선택한 이유는 AWS Bedrock 통합 용이성과 8K 토큰 컨텍스트 길이입니다. 메모랜덤 청크가 평균 500 토큰으로 긴 편이라 OpenAI의 512 토큰 제약은 부적합했습니다.</p>
<p><strong>3단계: 벡터 검색 + LLM 판정</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># Top-K=5 벡터 검색</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">results </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> collection</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    query_embeddings</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">eln_embedding</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    n_results</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">5</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    where</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"session_id"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> session_id</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Claude Sonnet 4.5 매칭 판정</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">prompt </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string-interpolation string" style="color:#e3116c">f"""</span><br></span><span class="token-line" style="color:#393A34"><span class="token string-interpolation string" style="color:#e3116c">다음 연구노트와 메모랜덤 청크가 같은 실험을 다루는지 판단하세요.</span><br></span><span class="token-line" style="color:#393A34"><span class="token string-interpolation string" style="display:inline-block;color:#e3116c"></span><br></span><span class="token-line" style="color:#393A34"><span class="token string-interpolation string" style="color:#e3116c">[판정 기준]</span><br></span><span class="token-line" style="color:#393A34"><span class="token string-interpolation string" style="color:#e3116c">- True: 표(Table)에 구체적 실험 데이터(OD, 수율, 농도 등) 있음</span><br></span><span class="token-line" style="color:#393A34"><span class="token string-interpolation string" style="color:#e3116c">- False: 결론/목적/향후 계획 섹션</span><br></span><span class="token-line" style="color:#393A34"><span class="token string-interpolation string" style="color:#e3116c">- False: 판단 어려움 (보수적 판정)</span><br></span><span class="token-line" style="color:#393A34"><span class="token string-interpolation string" style="display:inline-block;color:#e3116c"></span><br></span><span class="token-line" style="color:#393A34"><span class="token string-interpolation string" style="color:#e3116c">[연구노트]</span><br></span><span class="token-line" style="color:#393A34"><span class="token string-interpolation string" style="color:#e3116c"></span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">eln_content</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c"></span><br></span><span class="token-line" style="color:#393A34"><span class="token string-interpolation string" style="display:inline-block;color:#e3116c"></span><br></span><span class="token-line" style="color:#393A34"><span class="token string-interpolation string" style="color:#e3116c">[메모랜덤 청크]</span><br></span><span class="token-line" style="color:#393A34"><span class="token string-interpolation string" style="color:#e3116c"></span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">memorandum_chunk</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c"></span><br></span><span class="token-line" style="color:#393A34"><span class="token string-interpolation string" style="display:inline-block;color:#e3116c"></span><br></span><span class="token-line" style="color:#393A34"><span class="token string-interpolation string" style="color:#e3116c">JSON 형식으로 응답: {{"match": true/false, "confidence": "high/medium/low"}}</span><br></span><span class="token-line" style="color:#393A34"><span class="token string-interpolation string" style="color:#e3116c">"""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> bedrock</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">invoke_model</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    modelId</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"global.anthropic.claude-sonnet-4-5-20250929-v1:0"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    body</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">json</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">dumps</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"anthropic_version"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"bedrock-2023-05-31"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"max_tokens"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1000</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"temperature"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0.0</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># 일관성 최대화</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"messages"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"role"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> prompt</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p><strong>의사결정 포인트</strong>: 벡터 유사도만 사용했을 때 "실험 방법" 섹션과 "실험 결과" 섹션이 오매칭되는 문제가 빈번했습니다. LLM을 추가하여 <strong>표 데이터 유무</strong>를 명시적으로 확인하도록 설계한 결과, precision이 크게 향상되었습니다. Temperature=0.0은 반복 실행 시 일관된 판정을 보장하기 위한 설정입니다.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="정답셋-생성-결과-137개-노트-중-825-매칭">정답셋 생성 결과: 137개 노트 중 82.5% 매칭<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%86%E1%85%A6%E1%84%86%E1%85%A9%E1%84%85%E1%85%A2%E1%86%AB%E1%84%83%E1%85%A5%E1%86%B7-flow%E1%84%8B%E1%85%A6-%E1%84%89%E1%85%A1%E1%84%8B%E1%85%AD%E1%86%BC%E1%84%83%E1%85%AC%E1%86%AB-%E1%84%86%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%84%8B%E1%85%A7%E1%86%BC%E1%84%8B%E1%85%A7%E1%86%A8%E1%84%87%E1%85%A7%E1%86%AF-clustering-%E1%84%89%E1%85%A5%E1%86%BC%E1%84%82%E1%85%B3%E1%86%BC%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1#%EC%A0%95%EB%8B%B5%EC%85%8B-%EC%83%9D%EC%84%B1-%EA%B2%B0%EA%B3%BC-137%EA%B0%9C-%EB%85%B8%ED%8A%B8-%EC%A4%91-825-%EB%A7%A4%EC%B9%AD" class="hash-link" aria-label="정답셋 생성 결과: 137개 노트 중 82.5% 매칭에 대한 직접 링크" title="정답셋 생성 결과: 137개 노트 중 82.5% 매칭에 대한 직접 링크" translate="no">​</a></h3>
<p>4개 데이터셋(ELN1, 3, 4, 5)에서 총 270개 매칭 쌍을 생성했습니다:</p>








































<table><thead><tr><th>데이터셋</th><th>주제</th><th>매칭률</th><th>평균 매칭/노트</th><th>평균 거리</th></tr></thead><tbody><tr><td>ELN1</td><td>3-HP 발효</td><td>86.5%</td><td>2.1개</td><td>0.763</td></tr><tr><td>ELN3</td><td>일반 연구</td><td>100.0%</td><td>2.5개</td><td>0.699</td></tr><tr><td>ELN4</td><td>CO2 Polyol</td><td>68.8%</td><td>2.5개</td><td>0.732</td></tr><tr><td>ELN5</td><td>Pilot 촉매 공정</td><td>90.0%</td><td>2.5개</td><td>0.733</td></tr></tbody></table>
<p><strong>주목할 점</strong>:</p>
<ul>
<li class=""><strong>평균 거리 0.699~0.763</strong>: 코사인 거리 기준으로 상당히 가까운 편이나, 절대값으로는 명확한 경계 설정이 어려움 → LLM 판정 필요성 입증</li>
<li class=""><strong>ELN3 100% 매칭</strong>: 단일 주제로 집중된 데이터셋은 벡터 유사도가 높고 LLM 판정도 명확</li>
<li class=""><strong>ELN4 68.8% 매칭</strong>: 다양한 실험 방법론이 혼재되어 낮은 매칭률</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="클러스터링-성능-평가-bcubed-f1-메트릭">클러스터링 성능 평가: BCubed F1 메트릭<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%86%E1%85%A6%E1%84%86%E1%85%A9%E1%84%85%E1%85%A2%E1%86%AB%E1%84%83%E1%85%A5%E1%86%B7-flow%E1%84%8B%E1%85%A6-%E1%84%89%E1%85%A1%E1%84%8B%E1%85%AD%E1%86%BC%E1%84%83%E1%85%AC%E1%86%AB-%E1%84%86%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%84%8B%E1%85%A7%E1%86%BC%E1%84%8B%E1%85%A7%E1%86%A8%E1%84%87%E1%85%A7%E1%86%AF-clustering-%E1%84%89%E1%85%A5%E1%86%BC%E1%84%82%E1%85%B3%E1%86%BC%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1#%ED%81%B4%EB%9F%AC%EC%8A%A4%ED%84%B0%EB%A7%81-%EC%84%B1%EB%8A%A5-%ED%8F%89%EA%B0%80-bcubed-f1-%EB%A9%94%ED%8A%B8%EB%A6%AD" class="hash-link" aria-label="클러스터링 성능 평가: BCubed F1 메트릭에 대한 직접 링크" title="클러스터링 성능 평가: BCubed F1 메트릭에 대한 직접 링크" translate="no">​</a></h3>
<p><strong>BCubed F1 선택 이유</strong>:</p>
<ul>
<li class=""><strong>Precision</strong>: 같은 클러스터에 속한 아이템 중 실제로 같은 카테고리인 비율 → 과분할 패널티</li>
<li class=""><strong>Recall</strong>: 같은 카테고리에 속한 아이템 중 같은 클러스터에 할당된 비율 → 과소분할 패널티</li>
<li class=""><strong>F1-Score</strong>: Precision과 Recall의 조화평균으로 균형 평가</li>
</ul>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">bcubed_precision</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">item_i</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> cluster_assignments</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> ground_truth</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token triple-quoted-string string" style="color:#e3116c">"""아이템 i에 대한 BCubed Precision"""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    cluster_i </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> cluster_assignments</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">item_i</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    category_i </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> ground_truth</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">item_i</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># 같은 클러스터에 속한 아이템들</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    same_cluster </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">j </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> j</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> c </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> </span><span class="token builtin">enumerate</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">cluster_assignments</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> c </span><span class="token operator" style="color:#393A34">==</span><span class="token plain"> cluster_i</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># 그 중 실제로 같은 카테고리인 아이템들</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    correct </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">j </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> j </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> same_cluster </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> ground_truth</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">j</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">==</span><span class="token plain"> category_i</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token builtin">len</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">correct</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">/</span><span class="token plain"> </span><span class="token builtin">len</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">same_cluster</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 전체 BCubed F1 계산</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">precision </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> np</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">mean</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">bcubed_precision</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">i</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> clusters</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> labels</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> i </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> </span><span class="token builtin">range</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">n</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">recall </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> np</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">mean</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">bcubed_recall</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">i</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> clusters</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> labels</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> i </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> </span><span class="token builtin">range</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">n</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">f1 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">2</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">*</span><span class="token plain"> precision </span><span class="token operator" style="color:#393A34">*</span><span class="token plain"> recall </span><span class="token operator" style="color:#393A34">/</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">precision </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> recall</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="알고리즘별-성능-비교-label-propagation-우승">알고리즘별 성능 비교: Label Propagation 우승<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%86%E1%85%A6%E1%84%86%E1%85%A9%E1%84%85%E1%85%A2%E1%86%AB%E1%84%83%E1%85%A5%E1%86%B7-flow%E1%84%8B%E1%85%A6-%E1%84%89%E1%85%A1%E1%84%8B%E1%85%AD%E1%86%BC%E1%84%83%E1%85%AC%E1%86%AB-%E1%84%86%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%84%8B%E1%85%A7%E1%86%BC%E1%84%8B%E1%85%A7%E1%86%A8%E1%84%87%E1%85%A7%E1%86%AF-clustering-%E1%84%89%E1%85%A5%E1%86%BC%E1%84%82%E1%85%B3%E1%86%BC%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1#%EC%95%8C%EA%B3%A0%EB%A6%AC%EC%A6%98%EB%B3%84-%EC%84%B1%EB%8A%A5-%EB%B9%84%EA%B5%90-label-propagation-%EC%9A%B0%EC%8A%B9" class="hash-link" aria-label="알고리즘별 성능 비교: Label Propagation 우승에 대한 직접 링크" title="알고리즘별 성능 비교: Label Propagation 우승에 대한 직접 링크" translate="no">​</a></h3>
<p><strong>전체 평균 성능</strong>:</p>



































<table><thead><tr><th>알고리즘</th><th>평균 F1</th><th>평균 클러스터 수</th><th>특징</th></tr></thead><tbody><tr><td>Label Propagation</td><td><strong>0.763</strong></td><td>1.9</td><td>안정적, 단순 구조</td></tr><tr><td>Connected Components</td><td>0.733</td><td>1.1</td><td>최소 클러스터 생성</td></tr><tr><td>Louvain</td><td>0.689</td><td>3.1</td><td>세분화 경향</td></tr><tr><td>HDBSCAN</td><td>0.614</td><td>8.8</td><td>과분할 심각</td></tr></tbody></table>
<p><strong>Label Propagation이 우수한 이유</strong>:</p>
<ol>
<li class=""><strong>그래프 기반 전파</strong>: 이웃 노드 라벨 중 다수결로 자신의 라벨 업데이트</li>
<li class=""><strong>자연스러운 경계 형성</strong>: 유사도가 높은 영역은 하나의 라벨로 수렴</li>
<li class=""><strong>적절한 클러스터 수</strong>: 평균 1.9개로 과분할/과소분할 균형</li>
</ol>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># Label Propagation 작동 원리</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">label_propagation</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">graph</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> max_iter</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">100</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># 초기: 모든 노드에 고유 라벨</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    labels </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain">node</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> i </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> i</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> node </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> </span><span class="token builtin">enumerate</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">graph</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">nodes</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> _ </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> </span><span class="token builtin">range</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">max_iter</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> node </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> graph</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">nodes</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token comment" style="color:#999988;font-style:italic"># 이웃 라벨 중 가장 많은 것 선택</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            neighbor_labels </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">labels</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">n</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> n </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> graph</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">neighbors</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">node</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            labels</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">node</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token builtin">max</span><span class="token punctuation" style="color:#393A34">(</span><span class="token builtin">set</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">neighbor_labels</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> key</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">neighbor_labels</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">count</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> labels</span><br></span></code></pre></div></div>
<p><strong>데이터셋별 최적 알고리즘</strong>:</p>



































<table><thead><tr><th>데이터셋</th><th>최적 알고리즘</th><th>F1</th><th>이유</th></tr></thead><tbody><tr><td>ELN3</td><td>CC/LP</td><td>1.000</td><td>단일 주제로 명확한 경계</td></tr><tr><td>ELN4</td><td>Louvain</td><td>0.952</td><td>다양한 실험 방법론, 세분화 필요</td></tr><tr><td>ELN5</td><td>HDBSCAN</td><td>0.812</td><td>복잡한 pilot 공정, 노이즈 존재</td></tr><tr><td>ELN1</td><td>CC/LP</td><td>0.597</td><td>3-HP 발효 단계별 구분 어려움</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="hdbscan-과분할-문제와-해결책">HDBSCAN 과분할 문제와 해결책<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%86%E1%85%A6%E1%84%86%E1%85%A9%E1%84%85%E1%85%A2%E1%86%AB%E1%84%83%E1%85%A5%E1%86%B7-flow%E1%84%8B%E1%85%A6-%E1%84%89%E1%85%A1%E1%84%8B%E1%85%AD%E1%86%BC%E1%84%83%E1%85%AC%E1%86%AB-%E1%84%86%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%84%8B%E1%85%A7%E1%86%BC%E1%84%8B%E1%85%A7%E1%86%A8%E1%84%87%E1%85%A7%E1%86%AF-clustering-%E1%84%89%E1%85%A5%E1%86%BC%E1%84%82%E1%85%B3%E1%86%BC%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1#hdbscan-%EA%B3%BC%EB%B6%84%ED%95%A0-%EB%AC%B8%EC%A0%9C%EC%99%80-%ED%95%B4%EA%B2%B0%EC%B1%85" class="hash-link" aria-label="HDBSCAN 과분할 문제와 해결책에 대한 직접 링크" title="HDBSCAN 과분할 문제와 해결책에 대한 직접 링크" translate="no">​</a></h3>
<p><strong>문제 상황</strong>:</p>
<ul>
<li class=""><code>min_cluster_size=5</code>에서 ELN1에 평균 37개 클러스터 생성 (실제 목차 5개)</li>
<li class="">단일 연구노트가 여러 클러스터로 분할되어 recall 급락 (F1 0.15~0.27)</li>
</ul>
<p><strong>파라미터 튜닝 결과</strong>:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># 최적 조합: (min_cluster_size=2~3, min_samples=3)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">best_config </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"min_cluster_size"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># 최소 클러스터 크기</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"min_samples"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain">       </span><span class="token comment" style="color:#999988;font-style:italic"># 핵심 포인트 판정 기준</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"metric"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"euclidean"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"cluster_selection_method"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"eom"</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># Excess of Mass</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 성능 개선</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Before (5, 1): F1 0.540, 평균 16.5개 클러스터</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># After  (3, 3): F1 0.736, 평균 5.3개 클러스터</span><br></span></code></pre></div></div>
<p><strong>의사결정 포인트</strong>: <code>min_cluster_size</code>를 낮추면 과분할 완화되나, 너무 낮으면 노이즈를 클러스터로 인식. <code>min_samples=3</code>으로 핵심 포인트 판정을 엄격히 하여 균형 확보.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="graphcommunity-파라미터-영향-분석">GraphCommunity 파라미터 영향 분석<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%86%E1%85%A6%E1%84%86%E1%85%A9%E1%84%85%E1%85%A2%E1%86%AB%E1%84%83%E1%85%A5%E1%86%B7-flow%E1%84%8B%E1%85%A6-%E1%84%89%E1%85%A1%E1%84%8B%E1%85%AD%E1%86%BC%E1%84%83%E1%85%AC%E1%86%AB-%E1%84%86%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%84%8B%E1%85%A7%E1%86%BC%E1%84%8B%E1%85%A7%E1%86%A8%E1%84%87%E1%85%A7%E1%86%AF-clustering-%E1%84%89%E1%85%A5%E1%86%BC%E1%84%82%E1%85%B3%E1%86%BC%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1#graphcommunity-%ED%8C%8C%EB%9D%BC%EB%AF%B8%ED%84%B0-%EC%98%81%ED%96%A5-%EB%B6%84%EC%84%9D" class="hash-link" aria-label="GraphCommunity 파라미터 영향 분석에 대한 직접 링크" title="GraphCommunity 파라미터 영향 분석에 대한 직접 링크" translate="no">​</a></h3>
<p><strong>k_neighbors 영향</strong>:</p>








































<table><thead><tr><th>k</th><th>LP F1</th><th>Louvain F1</th><th>CC F1</th><th>경향</th></tr></thead><tbody><tr><td>5</td><td>0.765</td><td>0.753</td><td>0.812</td><td>k 증가 시 성능 저하</td></tr><tr><td>10</td><td>0.733</td><td>0.754</td><td>0.707</td><td>-</td></tr><tr><td>15</td><td>0.779</td><td>0.759</td><td>0.687</td><td>-</td></tr><tr><td>20</td><td>0.734</td><td>0.718</td><td>0.687</td><td>-</td></tr></tbody></table>
<p><strong>의사결정</strong>: <code>k=5</code>를 기본값으로 선택. k가 클수록 약한 연결까지 포함하여 단일 클러스터로 수렴하는 경향 강화.</p>
<p><strong>유사도 메트릭 비교</strong>:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># Euclidean vs Cosine</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Euclidean: 평균 F1 0.731</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Cosine:    평균 F1 0.718</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 의사결정: Euclidean 선택</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 이유: Titan V2는 이미 L2 정규화 적용하여 방향성보다 거리가 유의미</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="핵심-인사이트-단일-클러스터-수렴-현상">핵심 인사이트: 단일 클러스터 수렴 현상<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%86%E1%85%A6%E1%84%86%E1%85%A9%E1%84%85%E1%85%A2%E1%86%AB%E1%84%83%E1%85%A5%E1%86%B7-flow%E1%84%8B%E1%85%A6-%E1%84%89%E1%85%A1%E1%84%8B%E1%85%AD%E1%86%BC%E1%84%83%E1%85%AC%E1%86%AB-%E1%84%86%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%84%8B%E1%85%A7%E1%86%BC%E1%84%8B%E1%85%A7%E1%86%A8%E1%84%87%E1%85%A7%E1%86%AF-clustering-%E1%84%89%E1%85%A5%E1%86%BC%E1%84%82%E1%85%B3%E1%86%BC%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1#%ED%95%B5%EC%8B%AC-%EC%9D%B8%EC%82%AC%EC%9D%B4%ED%8A%B8-%EB%8B%A8%EC%9D%BC-%ED%81%B4%EB%9F%AC%EC%8A%A4%ED%84%B0-%EC%88%98%EB%A0%B4-%ED%98%84%EC%83%81" class="hash-link" aria-label="핵심 인사이트: 단일 클러스터 수렴 현상에 대한 직접 링크" title="핵심 인사이트: 단일 클러스터 수렴 현상에 대한 직접 링크" translate="no">​</a></h3>
<p><strong>발견 사항</strong>:</p>
<ul>
<li class="">CC/LP 알고리즘에서 평균 클러스터 수 1.1~1.9개</li>
<li class="">ELN3 전체가 하나의 클러스터로 수렴했으나 F1=1.0 (정답도 단일 클러스터)</li>
</ul>
<p><strong>원인 분석</strong>:</p>
<ol>
<li class=""><strong>높은 임베딩 유사도</strong>: 연구노트 간 평균 코사인 거리 0.7대로 근접</li>
<li class=""><strong>공통 도메인 용어</strong>: "발효", "OD", "수율" 등 반복 출현</li>
<li class=""><strong>서사 구조의 부재</strong>: 시간순/실험 단계 정보가 임베딩에 미반영</li>
</ol>
<p><strong>실무 적용 제안</strong>:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># 개선 방향 1: 메타데이터 통합</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">chunk_metadata </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"content"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> text</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"date"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> extract_date</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">text</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"experiment_phase"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> classify_phase</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">text</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># "준비", "진행", "분석"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"section_type"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> section_type</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 개선 방향 2: 하이브리드 임베딩</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">hybrid_embedding </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> concatenate</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    semantic_embedding</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># Titan V2</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    temporal_embedding</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># 날짜 정보</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    structural_embedding  </span><span class="token comment" style="color:#999988;font-style:italic"># 섹션 타입</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="실무-적용-가이드">실무 적용 가이드<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%86%E1%85%A6%E1%84%86%E1%85%A9%E1%84%85%E1%85%A2%E1%86%AB%E1%84%83%E1%85%A5%E1%86%B7-flow%E1%84%8B%E1%85%A6-%E1%84%89%E1%85%A1%E1%84%8B%E1%85%AD%E1%86%BC%E1%84%83%E1%85%AC%E1%86%AB-%E1%84%86%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%84%8B%E1%85%A7%E1%86%BC%E1%84%8B%E1%85%A7%E1%86%A8%E1%84%87%E1%85%A7%E1%86%AF-clustering-%E1%84%89%E1%85%A5%E1%86%BC%E1%84%82%E1%85%B3%E1%86%BC%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1#%EC%8B%A4%EB%AC%B4-%EC%A0%81%EC%9A%A9-%EA%B0%80%EC%9D%B4%EB%93%9C" class="hash-link" aria-label="실무 적용 가이드에 대한 직접 링크" title="실무 적용 가이드에 대한 직접 링크" translate="no">​</a></h3>
<p><strong>1. 데이터 특성 파악 후 알고리즘 선택</strong>:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">select_algorithm</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">data_characteristics</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> data_characteristics</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"topic_diversity"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">==</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"single"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"connected_components"</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># F1 1.0 기대</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">elif</span><span class="token plain"> data_characteristics</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"cluster_boundaries"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">==</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"clear"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"louvain"</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># F1 0.95+ 기대</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">elif</span><span class="token plain"> data_characteristics</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"noise_level"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">==</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"high"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">"algorithm"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"hdbscan"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">"params"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"min_cluster_size"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"min_samples"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">3</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">else</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"label_propagation"</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># 범용적으로 안정적</span><br></span></code></pre></div></div>
<p><strong>2. BCubed F1과 클러스터 수 동시 모니터링</strong>:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># 과분할 탐지</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> avg_clusters </span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> expected_clusters </span><span class="token operator" style="color:#393A34">*</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">2</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">and</span><span class="token plain"> f1 </span><span class="token operator" style="color:#393A34">&lt;</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0.7</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"과분할 의심: min_cluster_size 증가 필요"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 과소분할 탐지</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> avg_clusters </span><span class="token operator" style="color:#393A34">&lt;</span><span class="token plain"> expected_clusters </span><span class="token operator" style="color:#393A34">*</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0.5</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">and</span><span class="token plain"> f1 </span><span class="token operator" style="color:#393A34">&lt;</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0.7</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"과소분할 의심: k_neighbors 감소 또는 algorithm 변경"</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p><strong>3. LLM 판정 프롬프트 엔지니어링</strong>:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># 핵심: 명시적 기준 + 보수적 판정</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">prompt_template </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token triple-quoted-string string" style="color:#e3116c">"""</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">[판정 기준] (우선순위 순)</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">1. 표(Table) 데이터 존재 여부</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">2. 정량적 수치 (농도, 수율, 온도 등) 존재 여부</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">3. 섹션 타입 (결론/계획은 제외)</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="display:inline-block;color:#e3116c"></span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">[보수적 판정 원칙]</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">- 애매한 경우 False 반환</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">- 추론 과정을 reasoning 필드에 기록</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">"""</span><br></span></code></pre></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="references">References<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%86%E1%85%A6%E1%84%86%E1%85%A9%E1%84%85%E1%85%A2%E1%86%AB%E1%84%83%E1%85%A5%E1%86%B7-flow%E1%84%8B%E1%85%A6-%E1%84%89%E1%85%A1%E1%84%8B%E1%85%AD%E1%86%BC%E1%84%83%E1%85%AC%E1%86%AB-%E1%84%86%E1%85%AE%E1%86%AB%E1%84%89%E1%85%A5%E1%84%8B%E1%85%A7%E1%86%BC%E1%84%8B%E1%85%A7%E1%86%A8%E1%84%87%E1%85%A7%E1%86%AF-clustering-%E1%84%89%E1%85%A5%E1%86%BC%E1%84%82%E1%85%B3%E1%86%BC%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1#references" class="hash-link" aria-label="References에 대한 직접 링크" title="References에 대한 직접 링크" translate="no">​</a></h2>
<ul>
<li class=""><a href="https://en.wikipedia.org/wiki/Cluster_analysis#External_evaluation" target="_blank" rel="noopener noreferrer" class="">BCubed Precision and Recall</a> - 클러스터링 외부 평가 메트릭</li>
<li class=""><a href="https://www.ibm.com/kr-ko/think/topics/knn" target="_blank" rel="noopener noreferrer" class="">K-Nearest Neighbors Algorithm | IBM</a> - KNN 기반 그래프 구성 원리</li>
<li class=""><a href="https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html" target="_blank" rel="noopener noreferrer" class="">Amazon Titan Embeddings Documentation</a> - Titan Embed Text V2 사양</li>
<li class=""><a href="https://docs.trychroma.com/" target="_blank" rel="noopener noreferrer" class="">ChromaDB Documentation</a> - 벡터 데이터베이스 구현</li>
<li class=""><a href="https://scikit-learn.org/stable/modules/generated/sklearn.semi_supervised.LabelPropagation.html" target="_blank" rel="noopener noreferrer" class="">Label Propagation Algorithm</a> - scikit-learn 구현 예시</li>
<li class=""><a href="https://hdbscan.readthedocs.io/en/latest/parameter_selection.html" target="_blank" rel="noopener noreferrer" class="">HDBSCAN Documentation</a> - 파라미터 선택 가이드</li>
<li class=""><a href="https://python-louvain.readthedocs.io/" target="_blank" rel="noopener noreferrer" class="">Louvain Community Detection</a> - Modularity 기반 클러스터링</li>
<li class=""><a href="https://aws.amazon.com/bedrock/claude/" target="_blank" rel="noopener noreferrer" class="">Claude Sonnet 4.5 on AWS Bedrock</a> - LLM 추론 API</li>
</ul>]]></content:encoded>
            <author>mason@brain-crew.com (강민석)</author>
            <category>RAG</category>
        </item>
        <item>
            <title><![CDATA[근사 최근접 탐색(ANN) 오차와 데이터 분포 밀도의 관계 고찰]]></title>
            <link>https://teddynote-lab.github.io/brain-cache/lab/근사-최근접-탐색ann-오차와-데이터-분포-밀도의-관계-고찰</link>
            <guid>https://teddynote-lab.github.io/brain-cache/lab/근사-최근접-탐색ann-오차와-데이터-분포-밀도의-관계-고찰</guid>
            <pubDate>Tue, 23 Dec 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[RAG 시스템에서 n_results는 단순히 '반환할 결과 개수'가 아닌 '검색 반경(search radius)'을 의미합니다. ANN 알고리즘의 근사 특성으로 인해 작은 n_results 값은 진짜 근접 벡터를 놓칠 수 있으며, 특히 고밀도 데이터 분포와 추상적 쿼리]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="tldr">TL;DR<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%80%E1%85%B3%E1%86%AB%E1%84%89%E1%85%A1-%E1%84%8E%E1%85%AC%E1%84%80%E1%85%B3%E1%86%AB%E1%84%8C%E1%85%A5%E1%86%B8-%E1%84%90%E1%85%A1%E1%86%B7%E1%84%89%E1%85%A2%E1%86%A8ann-%E1%84%8B%E1%85%A9%E1%84%8E%E1%85%A1%E1%84%8B%E1%85%AA-%E1%84%83%E1%85%A6%E1%84%8B%E1%85%B5%E1%84%90%E1%85%A5-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%91%E1%85%A9-%E1%84%86%E1%85%B5%E1%86%AF%E1%84%83%E1%85%A9%E1%84%8B%E1%85%B4-%E1%84%80%E1%85%AA%E1%86%AB%E1%84%80%E1%85%A8-%E1%84%80%E1%85%A9%E1%84%8E%E1%85%A1%E1%86%AF#tldr" class="hash-link" aria-label="TL;DR에 대한 직접 링크" title="TL;DR에 대한 직접 링크" translate="no">​</a></h2>
<blockquote>
<p>RAG 시스템에서 n_results는 단순히 '반환할 결과 개수'가 아닌 '검색 반경(search radius)'을 의미합니다. ANN 알고리즘의 근사 특성으로 인해 작은 n_results 값은 진짜 근접 벡터를 놓칠 수 있으며, 특히 고밀도 데이터 분포와 추상적 쿼리에서 이 문제가 심화됩니다. LGE RAG 프로젝트에서 n_results=100일 때 찾지 못했던 정답 문서가 n_results=500에서는 25번째로 검색되는 현상을 통해, 데이터 밀도와 쿼리 특성에 따라 충분히 큰 n_results 설정이 필수적임을 확인했습니다.</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="key-takeaways">Key Takeaways<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%80%E1%85%B3%E1%86%AB%E1%84%89%E1%85%A1-%E1%84%8E%E1%85%AC%E1%84%80%E1%85%B3%E1%86%AB%E1%84%8C%E1%85%A5%E1%86%B8-%E1%84%90%E1%85%A1%E1%86%B7%E1%84%89%E1%85%A2%E1%86%A8ann-%E1%84%8B%E1%85%A9%E1%84%8E%E1%85%A1%E1%84%8B%E1%85%AA-%E1%84%83%E1%85%A6%E1%84%8B%E1%85%B5%E1%84%90%E1%85%A5-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%91%E1%85%A9-%E1%84%86%E1%85%B5%E1%86%AF%E1%84%83%E1%85%A9%E1%84%8B%E1%85%B4-%E1%84%80%E1%85%AA%E1%86%AB%E1%84%80%E1%85%A8-%E1%84%80%E1%85%A9%E1%84%8E%E1%85%A1%E1%86%AF#key-takeaways" class="hash-link" aria-label="Key Takeaways에 대한 직접 링크" title="Key Takeaways에 대한 직접 링크" translate="no">​</a></h2>
<ul>
<li class=""><strong>n_results는 검색 반경을 결정하는 파라미터</strong>: 단순 출력 개수가 아니라 ANN 알고리즘이 탐색할 벡터 공간의 범위를 의미하며, 작은 값은 근사 오차에 더 취약함</li>
<li class=""><strong>데이터 밀도가 높을수록 더 큰 n_results 필요</strong>: 유사한 문서가 밀집된 환경에서는 작은 검색 반경으로 진짜 근접 벡터를 놓칠 확률이 급증함</li>
<li class=""><strong>추상적 쿼리는 밀도 문제를 악화</strong>: "7키로 러닝", "별자리 보기" 같은 일반적 표현은 임베딩 공간에서 넓은 영역에 분산되어 충분한 탐색 범위가 더욱 중요함</li>
<li class=""><strong>프로덕션 환경에서는 넉넉한 n_results 설정 후 reranking 전략 권장</strong>: 초기 검색에서 후보를 충분히 확보한 뒤, 상위 k개를 재정렬하여 정확도와 성능의 균형을 맞춤</li>
<li class=""><strong>벡터 검색은 정렬된 전수 탐색이 아님</strong>: HNSW 등 ANN 알고리즘은 근사 방식이므로 n_results 변화에 따라 결과 순서와 내용이 모두 달라질 수 있음</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="상세-내용">상세 내용<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%80%E1%85%B3%E1%86%AB%E1%84%89%E1%85%A1-%E1%84%8E%E1%85%AC%E1%84%80%E1%85%B3%E1%86%AB%E1%84%8C%E1%85%A5%E1%86%B8-%E1%84%90%E1%85%A1%E1%86%B7%E1%84%89%E1%85%A2%E1%86%A8ann-%E1%84%8B%E1%85%A9%E1%84%8E%E1%85%A1%E1%84%8B%E1%85%AA-%E1%84%83%E1%85%A6%E1%84%8B%E1%85%B5%E1%84%90%E1%85%A5-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%91%E1%85%A9-%E1%84%86%E1%85%B5%E1%86%AF%E1%84%83%E1%85%A9%E1%84%8B%E1%85%B4-%E1%84%80%E1%85%AA%E1%86%AB%E1%84%80%E1%85%A8-%E1%84%80%E1%85%A9%E1%84%8E%E1%85%A1%E1%86%AF#%EC%83%81%EC%84%B8-%EB%82%B4%EC%9A%A9" class="hash-link" aria-label="상세 내용에 대한 직접 링크" title="상세 내용에 대한 직접 링크" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="배경-n_results-파라미터에-대한-오해">배경: n_results 파라미터에 대한 오해<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%80%E1%85%B3%E1%86%AB%E1%84%89%E1%85%A1-%E1%84%8E%E1%85%AC%E1%84%80%E1%85%B3%E1%86%AB%E1%84%8C%E1%85%A5%E1%86%B8-%E1%84%90%E1%85%A1%E1%86%B7%E1%84%89%E1%85%A2%E1%86%A8ann-%E1%84%8B%E1%85%A9%E1%84%8E%E1%85%A1%E1%84%8B%E1%85%AA-%E1%84%83%E1%85%A6%E1%84%8B%E1%85%B5%E1%84%90%E1%85%A5-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%91%E1%85%A9-%E1%84%86%E1%85%B5%E1%86%AF%E1%84%83%E1%85%A9%E1%84%8B%E1%85%B4-%E1%84%80%E1%85%AA%E1%86%AB%E1%84%80%E1%85%A8-%E1%84%80%E1%85%A9%E1%84%8E%E1%85%A1%E1%86%AF#%EB%B0%B0%EA%B2%BD-n_results-%ED%8C%8C%EB%9D%BC%EB%AF%B8%ED%84%B0%EC%97%90-%EB%8C%80%ED%95%9C-%EC%98%A4%ED%95%B4" class="hash-link" aria-label="배경: n_results 파라미터에 대한 오해에 대한 직접 링크" title="배경: n_results 파라미터에 대한 오해에 대한 직접 링크" translate="no">​</a></h3>
<p>벡터 검색 시스템을 처음 접하는 엔지니어들은 n_results를 '최종 출력 개수'로만 이해하는 경향이 있습니다. 예를 들어 "상위 10개만 필요하니까 n_results=10"처럼 설정하는 것이죠.</p>
<p>LGE RAG 프로젝트에서 저 역시 동일한 접근을 했습니다. 그러나 <strong>동일한 쿼리에 대해 n_results 값만 변경했을 때 상위 결과의 순서와 내용이 완전히 달라지는 현상</strong>을 발견했습니다. 더 놀라운 점은 n_results를 늘렸을 때 더 관련성 높은 문서가 상위에 등장했다는 것입니다.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="문제-상황-동일-쿼리-다른-결과">문제 상황: 동일 쿼리, 다른 결과<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%80%E1%85%B3%E1%86%AB%E1%84%89%E1%85%A1-%E1%84%8E%E1%85%AC%E1%84%80%E1%85%B3%E1%86%AB%E1%84%8C%E1%85%A5%E1%86%B8-%E1%84%90%E1%85%A1%E1%86%B7%E1%84%89%E1%85%A2%E1%86%A8ann-%E1%84%8B%E1%85%A9%E1%84%8E%E1%85%A1%E1%84%8B%E1%85%AA-%E1%84%83%E1%85%A6%E1%84%8B%E1%85%B5%E1%84%90%E1%85%A5-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%91%E1%85%A9-%E1%84%86%E1%85%B5%E1%86%AF%E1%84%83%E1%85%A9%E1%84%8B%E1%85%B4-%E1%84%80%E1%85%AA%E1%86%AB%E1%84%80%E1%85%A8-%E1%84%80%E1%85%A9%E1%84%8E%E1%85%A1%E1%86%AF#%EB%AC%B8%EC%A0%9C-%EC%83%81%ED%99%A9-%EB%8F%99%EC%9D%BC-%EC%BF%BC%EB%A6%AC-%EB%8B%A4%EB%A5%B8-%EA%B2%B0%EA%B3%BC" class="hash-link" aria-label="문제 상황: 동일 쿼리, 다른 결과에 대한 직접 링크" title="문제 상황: 동일 쿼리, 다른 결과에 대한 직접 링크" translate="no">​</a></h3>
<p><strong>쿼리</strong>: "스마트폰 앱으로 별 보기를 한다"</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="n_results100-결과">n_results=100 결과<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%80%E1%85%B3%E1%86%AB%E1%84%89%E1%85%A1-%E1%84%8E%E1%85%AC%E1%84%80%E1%85%B3%E1%86%AB%E1%84%8C%E1%85%A5%E1%86%B8-%E1%84%90%E1%85%A1%E1%86%B7%E1%84%89%E1%85%A2%E1%86%A8ann-%E1%84%8B%E1%85%A9%E1%84%8E%E1%85%A1%E1%84%8B%E1%85%AA-%E1%84%83%E1%85%A6%E1%84%8B%E1%85%B5%E1%84%90%E1%85%A5-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%91%E1%85%A9-%E1%84%86%E1%85%B5%E1%86%AF%E1%84%83%E1%85%A9%E1%84%8B%E1%85%B4-%E1%84%80%E1%85%AA%E1%86%AB%E1%84%80%E1%85%A8-%E1%84%80%E1%85%A9%E1%84%8E%E1%85%A1%E1%86%AF#n_results100-%EA%B2%B0%EA%B3%BC" class="hash-link" aria-label="n_results=100 결과에 대한 직접 링크" title="n_results=100 결과에 대한 직접 링크" translate="no">​</a></h4>
<div class="language-plaintext codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-plaintext codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">[1] 스마트폰 화면을 켠다 (무관한 문서)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">[2] 스마트폰을 집어 든다 (무관한 문서)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">[3] 휴대폰 밝기를 조절한다 (무관한 문서)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">→ 정답 문서("스카이뷰 앱") 포함 안 됨</span><br></span></code></pre></div></div>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="n_results500-결과">n_results=500 결과<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%80%E1%85%B3%E1%86%AB%E1%84%89%E1%85%A1-%E1%84%8E%E1%85%AC%E1%84%80%E1%85%B3%E1%86%AB%E1%84%8C%E1%85%A5%E1%86%B8-%E1%84%90%E1%85%A1%E1%86%B7%E1%84%89%E1%85%A2%E1%86%A8ann-%E1%84%8B%E1%85%A9%E1%84%8E%E1%85%A1%E1%84%8B%E1%85%AA-%E1%84%83%E1%85%A6%E1%84%8B%E1%85%B5%E1%84%90%E1%85%A5-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%91%E1%85%A9-%E1%84%86%E1%85%B5%E1%86%AF%E1%84%83%E1%85%A9%E1%84%8B%E1%85%B4-%E1%84%80%E1%85%AA%E1%86%AB%E1%84%80%E1%85%A8-%E1%84%80%E1%85%A9%E1%84%8E%E1%85%A1%E1%86%AF#n_results500-%EA%B2%B0%EA%B3%BC" class="hash-link" aria-label="n_results=500 결과에 대한 직접 링크" title="n_results=500 결과에 대한 직접 링크" translate="no">​</a></h4>
<div class="language-plaintext codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-plaintext codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">[1] 스마트폰을 하늘로 향해 초기 별자리 지도를 띄운다 ✓</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">[2] 가이드에 따라 스마트폰을 원을 그리듯 움직여 센서를 보정한다 ✓</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">...</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">[25] 스마트폰에서 스카이뷰 앱을 실행한다 ✓</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">→ 정답 문서들이 상위권 및 25번째 등장</span><br></span></code></pre></div></div>
<p><strong>동일한 쿼리, 동일한 임베딩 모델, 동일한 벡터 DB</strong>에서 오직 n_results만 달랐는데 왜 이런 결과가 나올까요?</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="원인-분석-1-ann-알고리즘의-근사-특성">원인 분석 1: ANN 알고리즘의 근사 특성<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%80%E1%85%B3%E1%86%AB%E1%84%89%E1%85%A1-%E1%84%8E%E1%85%AC%E1%84%80%E1%85%B3%E1%86%AB%E1%84%8C%E1%85%A5%E1%86%B8-%E1%84%90%E1%85%A1%E1%86%B7%E1%84%89%E1%85%A2%E1%86%A8ann-%E1%84%8B%E1%85%A9%E1%84%8E%E1%85%A1%E1%84%8B%E1%85%AA-%E1%84%83%E1%85%A6%E1%84%8B%E1%85%B5%E1%84%90%E1%85%A5-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%91%E1%85%A9-%E1%84%86%E1%85%B5%E1%86%AF%E1%84%83%E1%85%A9%E1%84%8B%E1%85%B4-%E1%84%80%E1%85%AA%E1%86%AB%E1%84%80%E1%85%A8-%E1%84%80%E1%85%A9%E1%84%8E%E1%85%A1%E1%86%AF#%EC%9B%90%EC%9D%B8-%EB%B6%84%EC%84%9D-1-ann-%EC%95%8C%EA%B3%A0%EB%A6%AC%EC%A6%98%EC%9D%98-%EA%B7%BC%EC%82%AC-%ED%8A%B9%EC%84%B1" class="hash-link" aria-label="원인 분석 1: ANN 알고리즘의 근사 특성에 대한 직접 링크" title="원인 분석 1: ANN 알고리즘의 근사 특성에 대한 직접 링크" translate="no">​</a></h3>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="전수-탐색과-ann의-차이">전수 탐색과 ANN의 차이<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%80%E1%85%B3%E1%86%AB%E1%84%89%E1%85%A1-%E1%84%8E%E1%85%AC%E1%84%80%E1%85%B3%E1%86%AB%E1%84%8C%E1%85%A5%E1%86%B8-%E1%84%90%E1%85%A1%E1%86%B7%E1%84%89%E1%85%A2%E1%86%A8ann-%E1%84%8B%E1%85%A9%E1%84%8E%E1%85%A1%E1%84%8B%E1%85%AA-%E1%84%83%E1%85%A6%E1%84%8B%E1%85%B5%E1%84%90%E1%85%A5-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%91%E1%85%A9-%E1%84%86%E1%85%B5%E1%86%AF%E1%84%83%E1%85%A9%E1%84%8B%E1%85%B4-%E1%84%80%E1%85%AA%E1%86%AB%E1%84%80%E1%85%A8-%E1%84%80%E1%85%A9%E1%84%8E%E1%85%A1%E1%86%AF#%EC%A0%84%EC%88%98-%ED%83%90%EC%83%89%EA%B3%BC-ann%EC%9D%98-%EC%B0%A8%EC%9D%B4" class="hash-link" aria-label="전수 탐색과 ANN의 차이에 대한 직접 링크" title="전수 탐색과 ANN의 차이에 대한 직접 링크" translate="no">​</a></h4>
<p>많은 엔지니어들이 벡터 검색을 다음과 같이 상상합니다:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># 머릿속 기대: 전수 탐색 후 정렬</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">ideal_search</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">query_vector</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> all_vectors</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    distances </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">cosine_distance</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">query_vector</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> v</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> v </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> all_vectors</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    sorted_results </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token builtin">sorted</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">distances</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> sorted_results</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">n_results</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># 상위 n개 자르기</span><br></span></code></pre></div></div>
<p>하지만 실제 Chroma, FAISS 등이 사용하는 **ANN(Approximate Nearest Neighbor)**은 다릅니다:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># 실제 동작: 그래프 기반 근사 탐색 (HNSW 예시)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">ann_search</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">query_vector</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> hnsw_graph</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> n_results</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    entry_point </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> hnsw_graph</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">top_layer_node</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    visited </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token builtin">set</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    candidates </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># 상위 레이어부터 탐색</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> layer </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> </span><span class="token builtin">range</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">top_layer</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token number" style="color:#36acaa">1</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        entry_point </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> search_layer</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">query_vector</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> entry_point</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> layer</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> ef</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">1</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># 최하위 레이어에서 ef 크기만큼 탐색</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    candidates </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> search_layer</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">query_vector</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> entry_point</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> layer</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> ef</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">n_results</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># 탐색한 candidates 중 상위 n_results 반환</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> heapq</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">nsmallest</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">n_results</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> candidates</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> key</span><span class="token operator" style="color:#393A34">=</span><span class="token keyword" style="color:#00009f">lambda</span><span class="token plain"> x</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> x</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">distance</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>핵심 차이점:</p>
<ul>
<li class=""><strong>전수 탐색이 아님</strong>: 모든 벡터를 확인하지 않고 그래프 구조를 따라 '탐험'</li>
<li class=""><strong>n_results가 탐색 범위(ef) 결정</strong>: 작은 n_results = 좁은 탐험 영역</li>
<li class=""><strong>그래프 경로 의존성</strong>: 초기 진입점과 이웃 노드 구조에 따라 특정 영역을 아예 방문하지 않을 수 있음</li>
</ul>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="주요-ann-알고리즘-비교">주요 ANN 알고리즘 비교<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%80%E1%85%B3%E1%86%AB%E1%84%89%E1%85%A1-%E1%84%8E%E1%85%AC%E1%84%80%E1%85%B3%E1%86%AB%E1%84%8C%E1%85%A5%E1%86%B8-%E1%84%90%E1%85%A1%E1%86%B7%E1%84%89%E1%85%A2%E1%86%A8ann-%E1%84%8B%E1%85%A9%E1%84%8E%E1%85%A1%E1%84%8B%E1%85%AA-%E1%84%83%E1%85%A6%E1%84%8B%E1%85%B5%E1%84%90%E1%85%A5-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%91%E1%85%A9-%E1%84%86%E1%85%B5%E1%86%AF%E1%84%83%E1%85%A9%E1%84%8B%E1%85%B4-%E1%84%80%E1%85%AA%E1%86%AB%E1%84%80%E1%85%A8-%E1%84%80%E1%85%A9%E1%84%8E%E1%85%A1%E1%86%AF#%EC%A3%BC%EC%9A%94-ann-%EC%95%8C%EA%B3%A0%EB%A6%AC%EC%A6%98-%EB%B9%84%EA%B5%90" class="hash-link" aria-label="주요 ANN 알고리즘 비교에 대한 직접 링크" title="주요 ANN 알고리즘 비교에 대한 직접 링크" translate="no">​</a></h4>



































<table><thead><tr><th>알고리즘</th><th>핵심 원리</th><th>n_results 영향</th><th>적합한 시나리오</th></tr></thead><tbody><tr><td><strong>HNSW</strong></td><td>계층적 그래프, 각 층에서 greedy 탐색</td><td><code>ef_search</code> 파라미터와 연동, 작으면 탐색 중단 빠름</td><td>높은 정확도 필요, 메모리 여유 있음</td></tr><tr><td><strong>IVF</strong></td><td>k-means 클러스터링 후 클러스터 내 탐색</td><td><code>nprobe</code> 값으로 탐색 클러스터 수 결정</td><td>대규모 데이터, 속도 우선</td></tr><tr><td><strong>PQ</strong></td><td>벡터 양자화로 압축</td><td>압축으로 인한 정보 손실 존재</td><td>메모리 제약, 약간의 정확도 손실 허용</td></tr><tr><td><strong>LSH</strong></td><td>해시 함수로 유사 벡터 버킷 분류</td><td>해시 충돌 가능, 작은 k에서 누락 위험</td><td>빠른 프로토타이핑, 저차원</td></tr></tbody></table>
<p><strong>결론</strong>: <strong>n_results가 작을수록 ANN 알고리즘은 더 좁은 범위만 탐색하고 조기 종료합니다.</strong> 진짜 근접 벡터가 탐색 경로에서 벗어나 있으면 영영 발견하지 못합니다.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="원인-분석-2-데이터-분포-밀도의-영향">원인 분석 2: 데이터 분포 밀도의 영향<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%80%E1%85%B3%E1%86%AB%E1%84%89%E1%85%A1-%E1%84%8E%E1%85%AC%E1%84%80%E1%85%B3%E1%86%AB%E1%84%8C%E1%85%A5%E1%86%B8-%E1%84%90%E1%85%A1%E1%86%B7%E1%84%89%E1%85%A2%E1%86%A8ann-%E1%84%8B%E1%85%A9%E1%84%8E%E1%85%A1%E1%84%8B%E1%85%AA-%E1%84%83%E1%85%A6%E1%84%8B%E1%85%B5%E1%84%90%E1%85%A5-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%91%E1%85%A9-%E1%84%86%E1%85%B5%E1%86%AF%E1%84%83%E1%85%A9%E1%84%8B%E1%85%B4-%E1%84%80%E1%85%AA%E1%86%AB%E1%84%80%E1%85%A8-%E1%84%80%E1%85%A9%E1%84%8E%E1%85%A1%E1%86%AF#%EC%9B%90%EC%9D%B8-%EB%B6%84%EC%84%9D-2-%EB%8D%B0%EC%9D%B4%ED%84%B0-%EB%B6%84%ED%8F%AC-%EB%B0%80%EB%8F%84%EC%9D%98-%EC%98%81%ED%96%A5" class="hash-link" aria-label="원인 분석 2: 데이터 분포 밀도의 영향에 대한 직접 링크" title="원인 분석 2: 데이터 분포 밀도의 영향에 대한 직접 링크" translate="no">​</a></h3>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="고밀도-데이터의-특성">고밀도 데이터의 특성<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%80%E1%85%B3%E1%86%AB%E1%84%89%E1%85%A1-%E1%84%8E%E1%85%AC%E1%84%80%E1%85%B3%E1%86%AB%E1%84%8C%E1%85%A5%E1%86%B8-%E1%84%90%E1%85%A1%E1%86%B7%E1%84%89%E1%85%A2%E1%86%A8ann-%E1%84%8B%E1%85%A9%E1%84%8E%E1%85%A1%E1%84%8B%E1%85%AA-%E1%84%83%E1%85%A6%E1%84%8B%E1%85%B5%E1%84%90%E1%85%A5-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%91%E1%85%A9-%E1%84%86%E1%85%B5%E1%86%AF%E1%84%83%E1%85%A9%E1%84%8B%E1%85%B4-%E1%84%80%E1%85%AA%E1%86%AB%E1%84%80%E1%85%A8-%E1%84%80%E1%85%A9%E1%84%8E%E1%85%A1%E1%86%AF#%EA%B3%A0%EB%B0%80%EB%8F%84-%EB%8D%B0%EC%9D%B4%ED%84%B0%EC%9D%98-%ED%8A%B9%EC%84%B1" class="hash-link" aria-label="고밀도 데이터의 특성에 대한 직접 링크" title="고밀도 데이터의 특성에 대한 직접 링크" translate="no">​</a></h4>
<p>LGE 프로젝트의 행동 로그 데이터는 다음과 같은 특성을 가졌습니다:</p>
<div class="language-xml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-xml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic">&lt;!-- 문서들이 구조적으로 거의 동일 --&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;</span><span class="token tag" style="color:#00009f">activity_info</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">{날짜} {시간대} {장소} 일정이다. </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">{시작시간}부터 {종료시간}까지 {상세장소}에서 {행동}한다.</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">이 활동은 {동반자}와 함께 했다. {도구}를 사용했다.</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">전에는 {이전행동}한다. 그 후에는 {다음행동}한다.</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token tag punctuation" style="color:#393A34">&lt;/</span><span class="token tag" style="color:#00009f">activity_info</span><span class="token tag punctuation" style="color:#393A34">&gt;</span><br></span></code></pre></div></div>
<p>이런 데이터는 임베딩 공간에서 어떻게 배치될까요?</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># 추상화된 시각화</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">벡터 공간 상의 분포</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token string" style="color:#e3116c">"러닝화 벗기"</span><span class="token plain"> 클러스터 </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">초고밀도</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  ┌─────────────────┐</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  │ ●●●●●●●●●●●●●● │  ← 날짜</span><span class="token operator" style="color:#393A34">/</span><span class="token plain">시간만 다른 수백 개 문서</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  │ ●●●●●●●●●●●●●● │     </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">의미적으로 거의 동일</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  │ ●●●●●●●●●●●●●● │</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  └─────────────────┘</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">              vs</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token string" style="color:#e3116c">"별자리 관측"</span><span class="token plain"> 영역 </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">저밀도</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">         ●                  ← </span><span class="token string" style="color:#e3116c">"스카이뷰 앱"</span><span class="token plain"> 문서</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                            </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">적은 빈도</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> 멀리 떨어짐</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="문제가-되는-시나리오">문제가 되는 시나리오<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%80%E1%85%B3%E1%86%AB%E1%84%89%E1%85%A1-%E1%84%8E%E1%85%AC%E1%84%80%E1%85%B3%E1%86%AB%E1%84%8C%E1%85%A5%E1%86%B8-%E1%84%90%E1%85%A1%E1%86%B7%E1%84%89%E1%85%A2%E1%86%A8ann-%E1%84%8B%E1%85%A9%E1%84%8E%E1%85%A1%E1%84%8B%E1%85%AA-%E1%84%83%E1%85%A6%E1%84%8B%E1%85%B5%E1%84%90%E1%85%A5-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%91%E1%85%A9-%E1%84%86%E1%85%B5%E1%86%AF%E1%84%83%E1%85%A9%E1%84%8B%E1%85%B4-%E1%84%80%E1%85%AA%E1%86%AB%E1%84%80%E1%85%A8-%E1%84%80%E1%85%A9%E1%84%8E%E1%85%A1%E1%86%AF#%EB%AC%B8%EC%A0%9C%EA%B0%80-%EB%90%98%EB%8A%94-%EC%8B%9C%EB%82%98%EB%A6%AC%EC%98%A4" class="hash-link" aria-label="문제가 되는 시나리오에 대한 직접 링크" title="문제가 되는 시나리오에 대한 직접 링크" translate="no">​</a></h4>
<p><strong>추상적 쿼리 + 고밀도 클러스터 = 재앙</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># 쿼리: "7키로 러닝을 하다"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">query_embedding </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> model</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">encode</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"7키로 러닝을 하다"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 문제 1: "러닝" 키워드가 포함된 고밀도 클러스터가 탐색 시작점</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 문제 2: "7키로"라는 구체적 정보는 임베딩에서 약하게 표현됨</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 문제 3: n_results=100이면 고밀도 클러스터 내부만 탐색하고 종료</span><br></span></code></pre></div></div>
<p><strong>실제 데이터 예시</strong>:</p>
<div class="language-plaintext codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-plaintext codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">[유사도 0.89] 러닝화를 벗는다 (9:25~9:33, 집 현관)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">[유사도 0.88] 러닝화를 벗는다 (9:06~9:25, 집 현관)  </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">[유사도 0.87] 러닝화 끈을 조인다 (8:30~9:00, 아파트 출입구)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">[유사도 0.86] 러닝화를 벗는다 (19:56~20:00, 대전 현관)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">...</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">[유사도 0.65] ← n_results=100 여기서 중단</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">---</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">[유사도 0.64] 스포츠워치 러닝 모드 7km 기록 저장 ← 정답!</span><br></span></code></pre></div></div>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="왜-n_results500에서는-성공했나">왜 n_results=500에서는 성공했나?<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%80%E1%85%B3%E1%86%AB%E1%84%89%E1%85%A1-%E1%84%8E%E1%85%AC%E1%84%80%E1%85%B3%E1%86%AB%E1%84%8C%E1%85%A5%E1%86%B8-%E1%84%90%E1%85%A1%E1%86%B7%E1%84%89%E1%85%A2%E1%86%A8ann-%E1%84%8B%E1%85%A9%E1%84%8E%E1%85%A1%E1%84%8B%E1%85%AA-%E1%84%83%E1%85%A6%E1%84%8B%E1%85%B5%E1%84%90%E1%85%A5-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%91%E1%85%A9-%E1%84%86%E1%85%B5%E1%86%AF%E1%84%83%E1%85%A9%E1%84%8B%E1%85%B4-%E1%84%80%E1%85%AA%E1%86%AB%E1%84%80%E1%85%A8-%E1%84%80%E1%85%A9%E1%84%8E%E1%85%A1%E1%86%AF#%EC%99%9C-n_results500%EC%97%90%EC%84%9C%EB%8A%94-%EC%84%B1%EA%B3%B5%ED%96%88%EB%82%98" class="hash-link" aria-label="왜 n_results=500에서는 성공했나?에 대한 직접 링크" title="왜 n_results=500에서는 성공했나?에 대한 직접 링크" translate="no">​</a></h4>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># n_results=500 탐색 과정</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token number" style="color:#36acaa">1.</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"러닝"</span><span class="token plain"> 고밀도 클러스터 탐색 </span><span class="token punctuation" style="color:#393A34">(</span><span class="token number" style="color:#36acaa">0</span><span class="token operator" style="color:#393A34">~</span><span class="token number" style="color:#36acaa">200</span><span class="token plain">번째</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token number" style="color:#36acaa">2.</span><span class="token plain"> 유사도가 낮아지면서 인접 클러스터로 확장</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token number" style="color:#36acaa">3.</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"운동 기록"</span><span class="token plain"> 관련 클러스터 진입 </span><span class="token punctuation" style="color:#393A34">(</span><span class="token number" style="color:#36acaa">300</span><span class="token operator" style="color:#393A34">~</span><span class="token number" style="color:#36acaa">400</span><span class="token plain">번째</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token number" style="color:#36acaa">4.</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"7km 기록"</span><span class="token plain"> 문서 발견! </span><span class="token punctuation" style="color:#393A34">(</span><span class="token number" style="color:#36acaa">425</span><span class="token plain">번째</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token number" style="color:#36acaa">5.</span><span class="token plain"> 재정렬 후 상위권으로 부상</span><br></span></code></pre></div></div>
<p><strong>핵심</strong>: 큰 n_results는 <strong>여러 클러스터를 가로지르는 탐색</strong>을 가능하게 합니다.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="해결-과정-적절한-n_results-설정-전략">해결 과정: 적절한 n_results 설정 전략<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%80%E1%85%B3%E1%86%AB%E1%84%89%E1%85%A1-%E1%84%8E%E1%85%AC%E1%84%80%E1%85%B3%E1%86%AB%E1%84%8C%E1%85%A5%E1%86%B8-%E1%84%90%E1%85%A1%E1%86%B7%E1%84%89%E1%85%A2%E1%86%A8ann-%E1%84%8B%E1%85%A9%E1%84%8E%E1%85%A1%E1%84%8B%E1%85%AA-%E1%84%83%E1%85%A6%E1%84%8B%E1%85%B5%E1%84%90%E1%85%A5-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%91%E1%85%A9-%E1%84%86%E1%85%B5%E1%86%AF%E1%84%83%E1%85%A9%E1%84%8B%E1%85%B4-%E1%84%80%E1%85%AA%E1%86%AB%E1%84%80%E1%85%A8-%E1%84%80%E1%85%A9%E1%84%8E%E1%85%A1%E1%86%AF#%ED%95%B4%EA%B2%B0-%EA%B3%BC%EC%A0%95-%EC%A0%81%EC%A0%88%ED%95%9C-n_results-%EC%84%A4%EC%A0%95-%EC%A0%84%EB%9E%B5" class="hash-link" aria-label="해결 과정: 적절한 n_results 설정 전략에 대한 직접 링크" title="해결 과정: 적절한 n_results 설정 전략에 대한 직접 링크" translate="no">​</a></h3>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-데이터-밀도-분석">1. 데이터 밀도 분석<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%80%E1%85%B3%E1%86%AB%E1%84%89%E1%85%A1-%E1%84%8E%E1%85%AC%E1%84%80%E1%85%B3%E1%86%AB%E1%84%8C%E1%85%A5%E1%86%B8-%E1%84%90%E1%85%A1%E1%86%B7%E1%84%89%E1%85%A2%E1%86%A8ann-%E1%84%8B%E1%85%A9%E1%84%8E%E1%85%A1%E1%84%8B%E1%85%AA-%E1%84%83%E1%85%A6%E1%84%8B%E1%85%B5%E1%84%90%E1%85%A5-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%91%E1%85%A9-%E1%84%86%E1%85%B5%E1%86%AF%E1%84%83%E1%85%A9%E1%84%8B%E1%85%B4-%E1%84%80%E1%85%AA%E1%86%AB%E1%84%80%E1%85%A8-%E1%84%80%E1%85%A9%E1%84%8E%E1%85%A1%E1%86%AF#1-%EB%8D%B0%EC%9D%B4%ED%84%B0-%EB%B0%80%EB%8F%84-%EB%B6%84%EC%84%9D" class="hash-link" aria-label="1. 데이터 밀도 분석에 대한 직접 링크" title="1. 데이터 밀도 분석에 대한 직접 링크" translate="no">​</a></h4>
<p>먼저 자신의 데이터 특성을 파악해야 합니다:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> sklearn</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">neighbors </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> NearestNeighbors</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> numpy </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> np</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">analyze_density</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">embeddings</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> sample_size</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">1000</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token triple-quoted-string string" style="color:#e3116c">"""벡터 공간의 밀도 분석"""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    sample_indices </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> np</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">random</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">choice</span><span class="token punctuation" style="color:#393A34">(</span><span class="token builtin">len</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">embeddings</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> sample_size</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    sample_vectors </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> embeddings</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">sample_indices</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    nbrs </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> NearestNeighbors</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">n_neighbors</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">100</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> metric</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">'cosine'</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">fit</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">embeddings</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    distances</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> indices </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> nbrs</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">kneighbors</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">sample_vectors</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># 10번째, 50번째, 100번째 이웃까지의 평균 거리</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"10th neighbor avg distance: </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">distances</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">[</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">:</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">,</span><span class="token string-interpolation interpolation"> </span><span class="token string-interpolation interpolation number" style="color:#36acaa">9</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">]</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">mean</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">(</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">)</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">:</span><span class="token string-interpolation interpolation format-spec">.4f</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"50th neighbor avg distance: </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">distances</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">[</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">:</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">,</span><span class="token string-interpolation interpolation"> </span><span class="token string-interpolation interpolation number" style="color:#36acaa">49</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">]</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">mean</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">(</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">)</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">:</span><span class="token string-interpolation interpolation format-spec">.4f</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"100th neighbor avg distance: </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">distances</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">[</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">:</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">,</span><span class="token string-interpolation interpolation"> </span><span class="token string-interpolation interpolation number" style="color:#36acaa">99</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">]</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">.</span><span class="token string-interpolation interpolation">mean</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">(</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">)</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">:</span><span class="token string-interpolation interpolation format-spec">.4f</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># 거리 증가율</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    growth_rate </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> distances</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">:</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">99</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">mean</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">/</span><span class="token plain"> distances</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">:</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">9</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">mean</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"Distance growth rate (10th-&gt;100th): </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">growth_rate</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">:</span><span class="token string-interpolation interpolation format-spec">.2f</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">x"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> growth_rate </span><span class="token operator" style="color:#393A34">&lt;</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1.5</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"⚠️ HIGH DENSITY - 큰 n_results 필요"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">elif</span><span class="token plain"> growth_rate </span><span class="token operator" style="color:#393A34">&lt;</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">2.5</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"⚡ MEDIUM DENSITY - 적절한 n_results 필요"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">else</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"✅ LOW DENSITY - 작은 n_results도 안전"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 실행 결과 예시 (LGE 프로젝트)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 10th neighbor: 0.1234</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 50th neighbor: 0.1456  </span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 100th neighbor: 0.1589</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># Growth rate: 1.29x ← 매우 고밀도!</span><br></span></code></pre></div></div>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-쿼리-추상화-수준-평가">2. 쿼리 추상화 수준 평가<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%80%E1%85%B3%E1%86%AB%E1%84%89%E1%85%A1-%E1%84%8E%E1%85%AC%E1%84%80%E1%85%B3%E1%86%AB%E1%84%8C%E1%85%A5%E1%86%B8-%E1%84%90%E1%85%A1%E1%86%B7%E1%84%89%E1%85%A2%E1%86%A8ann-%E1%84%8B%E1%85%A9%E1%84%8E%E1%85%A1%E1%84%8B%E1%85%AA-%E1%84%83%E1%85%A6%E1%84%8B%E1%85%B5%E1%84%90%E1%85%A5-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%91%E1%85%A9-%E1%84%86%E1%85%B5%E1%86%AF%E1%84%83%E1%85%A9%E1%84%8B%E1%85%B4-%E1%84%80%E1%85%AA%E1%86%AB%E1%84%80%E1%85%A8-%E1%84%80%E1%85%A9%E1%84%8E%E1%85%A1%E1%86%AF#2-%EC%BF%BC%EB%A6%AC-%EC%B6%94%EC%83%81%ED%99%94-%EC%88%98%EC%A4%80-%ED%8F%89%EA%B0%80" class="hash-link" aria-label="2. 쿼리 추상화 수준 평가에 대한 직접 링크" title="2. 쿼리 추상화 수준 평가에 대한 직접 링크" translate="no">​</a></h4>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">estimate_query_abstraction</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token triple-quoted-string string" style="color:#e3116c">"""쿼리의 추상화 정도 추정"""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    concrete_markers </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"스카이뷰"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"7km"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"09:25"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"평창 공원"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    abstract_markers </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"러닝"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"별 보기"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"운동"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"휴식"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    concrete_score </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token builtin">sum</span><span class="token punctuation" style="color:#393A34">(</span><span class="token number" style="color:#36acaa">1</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> m </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> concrete_markers </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> m </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> query</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    abstract_score </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token builtin">sum</span><span class="token punctuation" style="color:#393A34">(</span><span class="token number" style="color:#36acaa">1</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> m </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> abstract_markers </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> m </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> query</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> abstract_score </span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> concrete_score</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"ABSTRACT"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">500</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># 추상적이면 큰 n_results</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">else</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"CONCRETE"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">100</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># 구체적이면 작은 n_results</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 예시</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">estimate_query_abstraction</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"스마트폰 앱으로 별 보기"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain">  </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># → ("ABSTRACT", 500)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">estimate_query_abstraction</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"평창 공원에서 스카이뷰 앱 실행"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># → ("CONCRETE", 100)</span><br></span></code></pre></div></div>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-동적-n_results--reranking-전략">3. 동적 n_results + Reranking 전략<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%80%E1%85%B3%E1%86%AB%E1%84%89%E1%85%A1-%E1%84%8E%E1%85%AC%E1%84%80%E1%85%B3%E1%86%AB%E1%84%8C%E1%85%A5%E1%86%B8-%E1%84%90%E1%85%A1%E1%86%B7%E1%84%89%E1%85%A2%E1%86%A8ann-%E1%84%8B%E1%85%A9%E1%84%8E%E1%85%A1%E1%84%8B%E1%85%AA-%E1%84%83%E1%85%A6%E1%84%8B%E1%85%B5%E1%84%90%E1%85%A5-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%91%E1%85%A9-%E1%84%86%E1%85%B5%E1%86%AF%E1%84%83%E1%85%A9%E1%84%8B%E1%85%B4-%E1%84%80%E1%85%AA%E1%86%AB%E1%84%80%E1%85%A8-%E1%84%80%E1%85%A9%E1%84%8E%E1%85%A1%E1%86%AF#3-%EB%8F%99%EC%A0%81-n_results--reranking-%EC%A0%84%EB%9E%B5" class="hash-link" aria-label="3. 동적 n_results + Reranking 전략에 대한 직접 링크" title="3. 동적 n_results + Reranking 전략에 대한 직접 링크" translate="no">​</a></h4>
<p>프로덕션 환경에서는 다음과 같은 파이프라인을 권장합니다:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> sentence_transformers </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> CrossEncoder</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">class</span><span class="token plain"> </span><span class="token class-name">AdaptiveRetriever</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">__init__</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> vectordb</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> reranker_model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"cross-encoder/ms-marco-MiniLM-L-6-v2"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">vectordb </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> vectordb</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">reranker </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> CrossEncoder</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">reranker_model</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">search</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> query</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> final_k</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">10</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> safety_factor</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">10</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token triple-quoted-string string" style="color:#e3116c">"""</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">        안전한 검색 전략:</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">        1. 넉넉한 후보 추출 (final_k * safety_factor)</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">        2. Reranking으로 정확도 보정</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">        3. 최종 k개 반환</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">        """</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token comment" style="color:#999988;font-style:italic"># Step 1: 넉넉한 초기 검색</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        initial_n </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> final_k </span><span class="token operator" style="color:#393A34">*</span><span class="token plain"> safety_factor  </span><span class="token comment" style="color:#999988;font-style:italic"># 예: 10 * 10 = 100</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        candidates </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">vectordb</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            query_texts</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            n_results</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">initial_n</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token comment" style="color:#999988;font-style:italic"># Step 2: Cross-encoder로 재정렬</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        pairs </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> doc</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> doc </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> candidates</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">'documents'</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        rerank_scores </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">reranker</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">predict</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">pairs</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token comment" style="color:#999988;font-style:italic"># Step 3: 상위 k개 추출</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        sorted_indices </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> np</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">argsort</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">rerank_scores</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">:</span><span class="token punctuation" style="color:#393A34">:</span><span class="token operator" style="color:#393A34">-</span><span class="token number" style="color:#36acaa">1</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">final_k</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        final_results </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">candidates</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">'documents'</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">i</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> i </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> sorted_indices</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> final_results</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 사용 예시</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">retriever </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> AdaptiveRetriever</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">chroma_collection</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 안전한 검색 (내부적으로 100개 검색 후 10개로 rerank)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">results </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> retriever</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">search</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"스마트폰 앱으로 별 보기"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> final_k</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">10</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-ab-테스트-기반-최적화">4. A/B 테스트 기반 최적화<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%80%E1%85%B3%E1%86%AB%E1%84%89%E1%85%A1-%E1%84%8E%E1%85%AC%E1%84%80%E1%85%B3%E1%86%AB%E1%84%8C%E1%85%A5%E1%86%B8-%E1%84%90%E1%85%A1%E1%86%B7%E1%84%89%E1%85%A2%E1%86%A8ann-%E1%84%8B%E1%85%A9%E1%84%8E%E1%85%A1%E1%84%8B%E1%85%AA-%E1%84%83%E1%85%A6%E1%84%8B%E1%85%B5%E1%84%90%E1%85%A5-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%91%E1%85%A9-%E1%84%86%E1%85%B5%E1%86%AF%E1%84%83%E1%85%A9%E1%84%8B%E1%85%B4-%E1%84%80%E1%85%AA%E1%86%AB%E1%84%80%E1%85%A8-%E1%84%80%E1%85%A9%E1%84%8E%E1%85%A1%E1%86%AF#4-ab-%ED%85%8C%EC%8A%A4%ED%8A%B8-%EA%B8%B0%EB%B0%98-%EC%B5%9C%EC%A0%81%ED%99%94" class="hash-link" aria-label="4. A/B 테스트 기반 최적화에 대한 직접 링크" title="4. A/B 테스트 기반 최적화에 대한 직접 링크" translate="no">​</a></h4>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">find_optimal_n_results</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">queries</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> ground_truth</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> vectordb</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token triple-quoted-string string" style="color:#e3116c">"""다양한 n_results 값 비교"""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    test_n_values </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">50</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">100</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">200</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">500</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1000</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    results </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> n </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> test_n_values</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        recall_scores </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> query</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> true_docs </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> </span><span class="token builtin">zip</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">queries</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> ground_truth</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            retrieved </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> vectordb</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">query_texts</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> n_results</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">n</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            retrieved_ids </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token builtin">set</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">retrieved</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">'ids'</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">:</span><span class="token number" style="color:#36acaa">10</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># 상위 10개만 평가</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            true_ids </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token builtin">set</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">true_docs</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            recall </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token builtin">len</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">retrieved_ids </span><span class="token operator" style="color:#393A34">&amp;</span><span class="token plain"> true_ids</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">/</span><span class="token plain"> </span><span class="token builtin">len</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">true_ids</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            recall_scores</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">append</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">recall</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        results</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">n</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">'recall@10'</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> np</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">mean</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">recall_scores</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">'std'</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> np</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">std</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">recall_scores</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> results</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 실행 결과 예시</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># n=50:  recall@10=0.62</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># n=100: recall@10=0.71  </span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># n=200: recall@10=0.85</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># n=500: recall@10=0.94 ← 최적점</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># n=1000: recall@10=0.95 (미미한 개선, 비용 증가)</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="결과-및-인사이트">결과 및 인사이트<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%80%E1%85%B3%E1%86%AB%E1%84%89%E1%85%A1-%E1%84%8E%E1%85%AC%E1%84%80%E1%85%B3%E1%86%AB%E1%84%8C%E1%85%A5%E1%86%B8-%E1%84%90%E1%85%A1%E1%86%B7%E1%84%89%E1%85%A2%E1%86%A8ann-%E1%84%8B%E1%85%A9%E1%84%8E%E1%85%A1%E1%84%8B%E1%85%AA-%E1%84%83%E1%85%A6%E1%84%8B%E1%85%B5%E1%84%90%E1%85%A5-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%91%E1%85%A9-%E1%84%86%E1%85%B5%E1%86%AF%E1%84%83%E1%85%A9%E1%84%8B%E1%85%B4-%E1%84%80%E1%85%AA%E1%86%AB%E1%84%80%E1%85%A8-%E1%84%80%E1%85%A9%E1%84%8E%E1%85%A1%E1%86%AF#%EA%B2%B0%EA%B3%BC-%EB%B0%8F-%EC%9D%B8%EC%82%AC%EC%9D%B4%ED%8A%B8" class="hash-link" aria-label="결과 및 인사이트에 대한 직접 링크" title="결과 및 인사이트에 대한 직접 링크" translate="no">​</a></h3>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="정량적-개선">정량적 개선<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%80%E1%85%B3%E1%86%AB%E1%84%89%E1%85%A1-%E1%84%8E%E1%85%AC%E1%84%80%E1%85%B3%E1%86%AB%E1%84%8C%E1%85%A5%E1%86%B8-%E1%84%90%E1%85%A1%E1%86%B7%E1%84%89%E1%85%A2%E1%86%A8ann-%E1%84%8B%E1%85%A9%E1%84%8E%E1%85%A1%E1%84%8B%E1%85%AA-%E1%84%83%E1%85%A6%E1%84%8B%E1%85%B5%E1%84%90%E1%85%A5-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%91%E1%85%A9-%E1%84%86%E1%85%B5%E1%86%AF%E1%84%83%E1%85%A9%E1%84%8B%E1%85%B4-%E1%84%80%E1%85%AA%E1%86%AB%E1%84%80%E1%85%A8-%E1%84%80%E1%85%A9%E1%84%8E%E1%85%A1%E1%86%AF#%EC%A0%95%EB%9F%89%EC%A0%81-%EA%B0%9C%EC%84%A0" class="hash-link" aria-label="정량적 개선에 대한 직접 링크" title="정량적 개선에 대한 직접 링크" translate="no">​</a></h4>
<p>LGE 프로젝트에서 n_results 조정 후 측정한 결과:</p>





























<table><thead><tr><th>Metric</th><th>n=100</th><th>n=500</th><th>개선율</th></tr></thead><tbody><tr><td>Recall@10</td><td>0.68</td><td>0.91</td><td>+33.8%</td></tr><tr><td>MRR (Mean Reciprocal Rank)</td><td>0.42</td><td>0.73</td><td>+73.8%</td></tr><tr><td>정답 문서 발견율</td><td>71%</td><td>96%</td><td>+25%p</td></tr></tbody></table>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="의사결정-프레임워크">의사결정 프레임워크<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%80%E1%85%B3%E1%86%AB%E1%84%89%E1%85%A1-%E1%84%8E%E1%85%AC%E1%84%80%E1%85%B3%E1%86%AB%E1%84%8C%E1%85%A5%E1%86%B8-%E1%84%90%E1%85%A1%E1%86%B7%E1%84%89%E1%85%A2%E1%86%A8ann-%E1%84%8B%E1%85%A9%E1%84%8E%E1%85%A1%E1%84%8B%E1%85%AA-%E1%84%83%E1%85%A6%E1%84%8B%E1%85%B5%E1%84%90%E1%85%A5-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%91%E1%85%A9-%E1%84%86%E1%85%B5%E1%86%AF%E1%84%83%E1%85%A9%E1%84%8B%E1%85%B4-%E1%84%80%E1%85%AA%E1%86%AB%E1%84%80%E1%85%A8-%E1%84%80%E1%85%A9%E1%84%8E%E1%85%A1%E1%86%AF#%EC%9D%98%EC%82%AC%EA%B2%B0%EC%A0%95-%ED%94%84%EB%A0%88%EC%9E%84%EC%9B%8C%ED%81%AC" class="hash-link" aria-label="의사결정 프레임워크에 대한 직접 링크" title="의사결정 프레임워크에 대한 직접 링크" translate="no">​</a></h4>
<p>다음 flowchart를 통해 n_results를 결정하세요:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">데이터 밀도 분석</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    ├─ HIGH DENSITY → base_n = 500</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    ├─ MEDIUM DENSITY → base_n = 200  </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    └─ LOW DENSITY → base_n = 100</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">쿼리 특성 평가</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    ├─ 추상적 쿼리 → base_n * 1.5</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    ├─ 일반 쿼리 → base_n * 1.0</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    └─ 구체적 쿼리 → base_n * 0.5</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">성능 제약 확인</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    ├─ 지연시간 중요 → Reranking 전략</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    └─ 정확도 우선 → 큰 n_results 유지</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">최종 n_results = min(계산값, 데이터 총량)</span><br></span></code></pre></div></div>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="추가-최적화-팁">추가 최적화 팁<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%80%E1%85%B3%E1%86%AB%E1%84%89%E1%85%A1-%E1%84%8E%E1%85%AC%E1%84%80%E1%85%B3%E1%86%AB%E1%84%8C%E1%85%A5%E1%86%B8-%E1%84%90%E1%85%A1%E1%86%B7%E1%84%89%E1%85%A2%E1%86%A8ann-%E1%84%8B%E1%85%A9%E1%84%8E%E1%85%A1%E1%84%8B%E1%85%AA-%E1%84%83%E1%85%A6%E1%84%8B%E1%85%B5%E1%84%90%E1%85%A5-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%91%E1%85%A9-%E1%84%86%E1%85%B5%E1%86%AF%E1%84%83%E1%85%A9%E1%84%8B%E1%85%B4-%E1%84%80%E1%85%AA%E1%86%AB%E1%84%80%E1%85%A8-%E1%84%80%E1%85%A9%E1%84%8E%E1%85%A1%E1%86%AF#%EC%B6%94%EA%B0%80-%EC%B5%9C%EC%A0%81%ED%99%94-%ED%8C%81" class="hash-link" aria-label="추가 최적화 팁에 대한 직접 링크" title="추가 최적화 팁에 대한 직접 링크" translate="no">​</a></h4>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># 1. HNSW ef_search 파라미터 조정 (Chroma 예시)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">collection </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> client</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">create_collection</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"optimized_collection"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    metadata</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"hnsw:space"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"cosine"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"hnsw:search_ef"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">500</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># n_results와 연동</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"hnsw:M"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">32</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># 그래프 연결성 증가</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 2. Hybrid Search로 보완</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> rank_bm25 </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> BM25Okapi</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">class</span><span class="token plain"> </span><span class="token class-name">HybridRetriever</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">__init__</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> vectordb</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> documents</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">vectordb </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> vectordb</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        tokenized_docs </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">doc</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">split</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> doc </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> documents</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">bm25 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> BM25Okapi</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">tokenized_docs</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">search</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> query</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> n_results</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">100</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> alpha</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">0.7</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token comment" style="color:#999988;font-style:italic"># Vector search</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        vec_results </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">vectordb</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> n_results</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">n_results</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        vec_scores </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token builtin">id</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> score </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> </span><span class="token builtin">id</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> score </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                     </span><span class="token builtin">zip</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">vec_results</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">'ids'</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> vec_results</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">'distances'</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token comment" style="color:#999988;font-style:italic"># BM25 search  </span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        bm25_scores </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">bm25</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">get_scores</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">split</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token comment" style="color:#999988;font-style:italic"># Hybrid ranking</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        final_scores </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> doc_id </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> </span><span class="token builtin">set</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">vec_scores</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">keys</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            vec_score </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> vec_scores</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">get</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">doc_id</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            bm25_score </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> bm25_scores</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">doc_id</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> doc_id </span><span class="token operator" style="color:#393A34">&lt;</span><span class="token plain"> </span><span class="token builtin">len</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">bm25_scores</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">else</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            final_scores</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">doc_id</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> alpha </span><span class="token operator" style="color:#393A34">*</span><span class="token plain"> vec_score </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token number" style="color:#36acaa">1</span><span class="token operator" style="color:#393A34">-</span><span class="token plain">alpha</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">*</span><span class="token plain"> bm25_score</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token builtin">sorted</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">final_scores</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">items</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> key</span><span class="token operator" style="color:#393A34">=</span><span class="token keyword" style="color:#00009f">lambda</span><span class="token plain"> x</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> x</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">1</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> reverse</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="교훈-및-베스트-프랙티스">교훈 및 베스트 프랙티스<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%80%E1%85%B3%E1%86%AB%E1%84%89%E1%85%A1-%E1%84%8E%E1%85%AC%E1%84%80%E1%85%B3%E1%86%AB%E1%84%8C%E1%85%A5%E1%86%B8-%E1%84%90%E1%85%A1%E1%86%B7%E1%84%89%E1%85%A2%E1%86%A8ann-%E1%84%8B%E1%85%A9%E1%84%8E%E1%85%A1%E1%84%8B%E1%85%AA-%E1%84%83%E1%85%A6%E1%84%8B%E1%85%B5%E1%84%90%E1%85%A5-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%91%E1%85%A9-%E1%84%86%E1%85%B5%E1%86%AF%E1%84%83%E1%85%A9%E1%84%8B%E1%85%B4-%E1%84%80%E1%85%AA%E1%86%AB%E1%84%80%E1%85%A8-%E1%84%80%E1%85%A9%E1%84%8E%E1%85%A1%E1%86%AF#%EA%B5%90%ED%9B%88-%EB%B0%8F-%EB%B2%A0%EC%8A%A4%ED%8A%B8-%ED%94%84%EB%9E%99%ED%8B%B0%EC%8A%A4" class="hash-link" aria-label="교훈 및 베스트 프랙티스에 대한 직접 링크" title="교훈 및 베스트 프랙티스에 대한 직접 링크" translate="no">​</a></h3>
<ol>
<li class=""><strong>n_results는 성능 튜닝의 핵심 레버</strong>: 단순한 출력 파라미터가 아니라 검색 품질을 결정하는 하이퍼파라미터</li>
<li class=""><strong>"충분히 크게, 그 다음 줄이기"</strong>: 초기 개발 시 넉넉한 n_results로 시작해 reranking으로 정제하는 전략이 안전</li>
<li class=""><strong>데이터 프로파일링 필수</strong>: 밀도 분석 없이 임의로 n_results를 정하면 재앙</li>
<li class=""><strong>쿼리 타입별 분기 처리</strong>: 추상적/구체적 쿼리에 따라 동적으로 n_results 조정</li>
<li class=""><strong>모니터링 지표 설정</strong>: Recall@k, MRR 등을 지속적으로 추적해 회귀 방지</li>
</ol>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="references">References<a href="https://teddynote-lab.github.io/brain-cache/lab/%E1%84%80%E1%85%B3%E1%86%AB%E1%84%89%E1%85%A1-%E1%84%8E%E1%85%AC%E1%84%80%E1%85%B3%E1%86%AB%E1%84%8C%E1%85%A5%E1%86%B8-%E1%84%90%E1%85%A1%E1%86%B7%E1%84%89%E1%85%A2%E1%86%A8ann-%E1%84%8B%E1%85%A9%E1%84%8E%E1%85%A1%E1%84%8B%E1%85%AA-%E1%84%83%E1%85%A6%E1%84%8B%E1%85%B5%E1%84%90%E1%85%A5-%E1%84%87%E1%85%AE%E1%86%AB%E1%84%91%E1%85%A9-%E1%84%86%E1%85%B5%E1%86%AF%E1%84%83%E1%85%A9%E1%84%8B%E1%85%B4-%E1%84%80%E1%85%AA%E1%86%AB%E1%84%80%E1%85%A8-%E1%84%80%E1%85%A9%E1%84%8E%E1%85%A1%E1%86%AF#references" class="hash-link" aria-label="References에 대한 직접 링크" title="References에 대한 직접 링크" translate="no">​</a></h2>
<ul>
<li class=""><a href="https://arxiv.org/abs/1603.09320" target="_blank" rel="noopener noreferrer" class="">HNSW 알고리즘 논문</a> - Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs</li>
<li class=""><a href="https://docs.trychroma.com/usage-guide#querying-a-collection" target="_blank" rel="noopener noreferrer" class="">Chroma 공식 문서 - Query Configuration</a></li>
<li class=""><a href="https://github.com/facebookresearch/faiss/wiki" target="_blank" rel="noopener noreferrer" class="">FAISS: A library for efficient similarity search</a></li>
<li class=""><a href="https://www.pinecone.io/learn/series/faiss/hnsw/" target="_blank" rel="noopener noreferrer" class="">Understanding Approximate Nearest Neighbor Search</a></li>
<li class=""><a href="https://www.sbert.net/examples/applications/cross-encoder/README.html" target="_blank" rel="noopener noreferrer" class="">Cross-Encoders for Reranking</a></li>
<li class=""><a href="https://weaviate.io/blog/vector-search-performance" target="_blank" rel="noopener noreferrer" class="">Vector Search Performance Best Practices</a></li>
</ul>]]></content:encoded>
            <author>hank@brain-crew.com (김태한)</author>
            <category>RAG</category>
        </item>
        <item>
            <title><![CDATA[MLflow 기술 검토(RAG 성능 실험 테스트베드 기능)]]></title>
            <link>https://teddynote-lab.github.io/brain-cache/lab/mlflow-기술-검토rag-성능-실험-테스트베드-기능</link>
            <guid>https://teddynote-lab.github.io/brain-cache/lab/mlflow-기술-검토rag-성능-실험-테스트베드-기능</guid>
            <pubDate>Thu, 18 Dec 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[MLflow는 전통적인 ML과 LLM 애플리케이션의 전체 생명주기를 관리할 수 있는 오픈소스 통합 플랫폼입니다. LangGraph 기반 RAG Agent 개발 시 Tracking으로 하이퍼파라미터와 메트릭을 관리하고, Tracing으로 복잡한 LLM 호출 흐름을 추적하]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="tldr">TL;DR<a href="https://teddynote-lab.github.io/brain-cache/lab/mlflow-%E1%84%80%E1%85%B5%E1%84%89%E1%85%AE%E1%86%AF-%E1%84%80%E1%85%A5%E1%86%B7%E1%84%90%E1%85%A9rag-%E1%84%89%E1%85%A5%E1%86%BC%E1%84%82%E1%85%B3%E1%86%BC-%E1%84%89%E1%85%B5%E1%86%AF%E1%84%92%E1%85%A5%E1%86%B7-%E1%84%90%E1%85%A6%E1%84%89%E1%85%B3%E1%84%90%E1%85%B3%E1%84%87%E1%85%A6%E1%84%83%E1%85%B3-%E1%84%80%E1%85%B5%E1%84%82%E1%85%B3%E1%86%BC#tldr" class="hash-link" aria-label="TL;DR에 대한 직접 링크" title="TL;DR에 대한 직접 링크" translate="no">​</a></h2>
<blockquote>
<p>MLflow는 전통적인 ML과 LLM 애플리케이션의 전체 생명주기를 관리할 수 있는 오픈소스 통합 플랫폼입니다. LangGraph 기반 RAG Agent 개발 시 Tracking으로 하이퍼파라미터와 메트릭을 관리하고, Tracing으로 복잡한 LLM 호출 흐름을 추적하며, Evaluation으로 LLM-as-a-Judge를 포함한 다양한 평가 지표를 자동화할 수 있습니다. OpenTelemetry 표준 준수로 기존 옵저버빌리티 도구와 쉽게 연동 가능하며, 완전 오픈소스로 온프레미스/클라우드 자체 호스팅이 가능합니다.</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="key-takeaways">Key Takeaways<a href="https://teddynote-lab.github.io/brain-cache/lab/mlflow-%E1%84%80%E1%85%B5%E1%84%89%E1%85%AE%E1%86%AF-%E1%84%80%E1%85%A5%E1%86%B7%E1%84%90%E1%85%A9rag-%E1%84%89%E1%85%A5%E1%86%BC%E1%84%82%E1%85%B3%E1%86%BC-%E1%84%89%E1%85%B5%E1%86%AF%E1%84%92%E1%85%A5%E1%86%B7-%E1%84%90%E1%85%A6%E1%84%89%E1%85%B3%E1%84%90%E1%85%B3%E1%84%87%E1%85%A6%E1%84%83%E1%85%B3-%E1%84%80%E1%85%B5%E1%84%82%E1%85%B3%E1%86%BC#key-takeaways" class="hash-link" aria-label="Key Takeaways에 대한 직접 링크" title="Key Takeaways에 대한 직접 링크" translate="no">​</a></h2>
<ul>
<li class="">
<p><strong>계층적 실험 관리</strong>: Experiment &gt; Run 구조로 RAG 시스템의 chunk_size, top_k, temperature 등 다양한 설정 조합을 체계적으로 추적하고 MLflow UI에서 시각적으로 비교할 수 있어, 최적 구성 탐색이 효율적입니다.</p>
</li>
<li class="">
<p><strong>OpenTelemetry 표준 기반 Tracing</strong>: MLflow Tracing은 OpenTelemetry 호환으로 Grafana, Datadog, New Relic 등 기존 옵저버빌리티 도구와 즉시 연동 가능하며, Span 계층 구조로 복잡한 LLM 체인의 입출력과 토큰 사용량을 단계별로 추적할 수 있습니다.</p>
</li>
<li class="">
<p><strong>LLM-as-a-Judge 평가 자동화</strong>: faithfulness, answer_relevance 등 주관적 품질 지표를 LLM으로 자동 평가하고, <code>make_genai_metric()</code>으로 커스텀 평가 기준을 프롬프트로 정의하여 평가 프로세스를 표준화할 수 있습니다.</p>
</li>
<li class="">
<p><strong>프롬프트 버전 관리</strong>: 프롬프트를 실험 변수 및 아티팩트로 저장하여 변경 사항을 추적하고, 여러 프롬프트 변형의 성능을 정량적으로 비교해 A/B 테스트를 수행할 수 있습니다.</p>
</li>
<li class="">
<p><strong>오픈소스 vs 상용 플랫폼 선택</strong>: MLflow는 Apache 2.0 오픈소스로 인프라 비용만 발생하고 전통적 ML과 LLM을 통합 관리 가능한 반면, LangSmith는 LangChain 전용 SaaS로 즉시 사용 가능하지만 벤더 종속성이 있습니다. 프로젝트 요구사항(자체 호스팅 필요성, 기존 도구 통합, 비용 구조)에 따라 선택해야 합니다.</p>
</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="상세-내용">상세 내용<a href="https://teddynote-lab.github.io/brain-cache/lab/mlflow-%E1%84%80%E1%85%B5%E1%84%89%E1%85%AE%E1%86%AF-%E1%84%80%E1%85%A5%E1%86%B7%E1%84%90%E1%85%A9rag-%E1%84%89%E1%85%A5%E1%86%BC%E1%84%82%E1%85%B3%E1%86%BC-%E1%84%89%E1%85%B5%E1%86%AF%E1%84%92%E1%85%A5%E1%86%B7-%E1%84%90%E1%85%A6%E1%84%89%E1%85%B3%E1%84%90%E1%85%B3%E1%84%87%E1%85%A6%E1%84%83%E1%85%B3-%E1%84%80%E1%85%B5%E1%84%82%E1%85%B3%E1%86%BC#%EC%83%81%EC%84%B8-%EB%82%B4%EC%9A%A9" class="hash-link" aria-label="상세 내용에 대한 직접 링크" title="상세 내용에 대한 직접 링크" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="배경-llm-애플리케이션의-실험-관리-문제">배경: LLM 애플리케이션의 실험 관리 문제<a href="https://teddynote-lab.github.io/brain-cache/lab/mlflow-%E1%84%80%E1%85%B5%E1%84%89%E1%85%AE%E1%86%AF-%E1%84%80%E1%85%A5%E1%86%B7%E1%84%90%E1%85%A9rag-%E1%84%89%E1%85%A5%E1%86%BC%E1%84%82%E1%85%B3%E1%86%BC-%E1%84%89%E1%85%B5%E1%86%AF%E1%84%92%E1%85%A5%E1%86%B7-%E1%84%90%E1%85%A6%E1%84%89%E1%85%B3%E1%84%90%E1%85%B3%E1%84%87%E1%85%A6%E1%84%83%E1%85%B3-%E1%84%80%E1%85%B5%E1%84%82%E1%85%B3%E1%86%BC#%EB%B0%B0%EA%B2%BD-llm-%EC%95%A0%ED%94%8C%EB%A6%AC%EC%BC%80%EC%9D%B4%EC%85%98%EC%9D%98-%EC%8B%A4%ED%97%98-%EA%B4%80%EB%A6%AC-%EB%AC%B8%EC%A0%9C" class="hash-link" aria-label="배경: LLM 애플리케이션의 실험 관리 문제에 대한 직접 링크" title="배경: LLM 애플리케이션의 실험 관리 문제에 대한 직접 링크" translate="no">​</a></h3>
<p>LLM 기반 RAG Agent 개발은 전통적인 ML 모델과 다른 어려움이 있습니다. 문서 청크 크기, 임베딩 모델, 검색 개수(top_k), LLM temperature, 프롬프트 템플릿 등 수많은 하이퍼파라미터 조합을 실험해야 하며, LLM 출력의 비결정성과 주관성으로 인해 품질 평가가 어렵습니다. 또한 여러 컴포넌트(Retriever → LLM → Reranker)가 연결된 복잡한 파이프라인에서 병목 구간을 찾기 위해 각 단계의 실행 흐름을 추적해야 합니다.</p>
<p>MLflow는 이러한 문제를 해결하기 위해 전통적인 ML과 LLM 애플리케이션을 모두 지원하는 통합 플랫폼으로, Tracking, Tracing, Evaluation 기능을 제공합니다.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-mlflow-tracking-실험-파라미터와-메트릭-관리">1. MLflow Tracking: 실험 파라미터와 메트릭 관리<a href="https://teddynote-lab.github.io/brain-cache/lab/mlflow-%E1%84%80%E1%85%B5%E1%84%89%E1%85%AE%E1%86%AF-%E1%84%80%E1%85%A5%E1%86%B7%E1%84%90%E1%85%A9rag-%E1%84%89%E1%85%A5%E1%86%BC%E1%84%82%E1%85%B3%E1%86%BC-%E1%84%89%E1%85%B5%E1%86%AF%E1%84%92%E1%85%A5%E1%86%B7-%E1%84%90%E1%85%A6%E1%84%89%E1%85%B3%E1%84%90%E1%85%B3%E1%84%87%E1%85%A6%E1%84%83%E1%85%B3-%E1%84%80%E1%85%B5%E1%84%82%E1%85%B3%E1%86%BC#1-mlflow-tracking-%EC%8B%A4%ED%97%98-%ED%8C%8C%EB%9D%BC%EB%AF%B8%ED%84%B0%EC%99%80-%EB%A9%94%ED%8A%B8%EB%A6%AD-%EA%B4%80%EB%A6%AC" class="hash-link" aria-label="1. MLflow Tracking: 실험 파라미터와 메트릭 관리에 대한 직접 링크" title="1. MLflow Tracking: 실험 파라미터와 메트릭 관리에 대한 직접 링크" translate="no">​</a></h3>
<p><strong>핵심 구조</strong></p>
<p>MLflow는 Experiment(실험 그룹) &gt; Run(단일 실행) 계층 구조로 실험을 조직화합니다. 각 Run은 고유 ID로 식별되며 다음을 기록합니다:</p>
<ul>
<li class=""><strong>Parameters</strong>: 모델 설정값 (불변)</li>
<li class=""><strong>Metrics</strong>: 정량적 성능 지표 (시간에 따라 변화 가능)</li>
<li class=""><strong>Artifacts</strong>: 파일 형태의 결과물 (모델, 설정 파일, 생성된 답변 등)</li>
<li class=""><strong>Tags</strong>: 메타데이터 (환경, 버전 등)</li>
</ul>
<p><strong>RAG 시스템 적용 사례</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> mlflow</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">set_experiment</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"rag-optimization"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">with</span><span class="token plain"> mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">start_run</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">run_name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"chunk-512-topk-5"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># 파라미터 기록</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">log_param</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"chunk_size"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">512</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">log_param</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"chunk_overlap"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">50</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">log_param</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"top_k"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">5</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">log_param</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"embedding_model"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"text-embedding-3-small"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">log_param</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"llm_model"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"gpt-4"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">log_param</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"temperature"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0.7</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># RAG 파이프라인 실행</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    result </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> rag_pipeline</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">run</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># 메트릭 기록</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">log_metric</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"response_time"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> result</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">latency</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">log_metric</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"num_retrieved_docs"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token builtin">len</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">result</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">docs</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">log_metric</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"total_tokens"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> result</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">token_count</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># 아티팩트 저장</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">log_text</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">result</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">answer</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"generated_answer.txt"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">log_dict</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">result</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">config</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"config.json"</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p><strong>의사결정 포인트</strong>: Parameters는 실험 시작 시 결정되는 불변 값으로, Metrics는 실행 중/후에 측정되는 성능 지표로 구분해야 합니다. 프롬프트 템플릿은 아티팩트로 저장하면 버전별 변경 사항을 명확히 추적할 수 있습니다.</p>
<p><strong>UI 활용</strong></p>
<p>MLflow UI에서 여러 Run의 메트릭을 테이블 또는 차트로 비교할 수 있습니다. 예를 들어 chunk_size와 response_time의 상관관계를 scatter plot으로 시각화하거나, top_k 값에 따른 답변 품질 변화를 확인할 수 있습니다. Backend 서버로 배포하면 REST API로 프로그래밍 방식 접근도 가능합니다.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-mlflow-tracing-llm-호출-흐름-추적">2. MLflow Tracing: LLM 호출 흐름 추적<a href="https://teddynote-lab.github.io/brain-cache/lab/mlflow-%E1%84%80%E1%85%B5%E1%84%89%E1%85%AE%E1%86%AF-%E1%84%80%E1%85%A5%E1%86%B7%E1%84%90%E1%85%A9rag-%E1%84%89%E1%85%A5%E1%86%BC%E1%84%82%E1%85%B3%E1%86%BC-%E1%84%89%E1%85%B5%E1%86%AF%E1%84%92%E1%85%A5%E1%86%B7-%E1%84%90%E1%85%A6%E1%84%89%E1%85%B3%E1%84%90%E1%85%B3%E1%84%87%E1%85%A6%E1%84%83%E1%85%B3-%E1%84%80%E1%85%B5%E1%84%82%E1%85%B3%E1%86%BC#2-mlflow-tracing-llm-%ED%98%B8%EC%B6%9C-%ED%9D%90%EB%A6%84-%EC%B6%94%EC%A0%81" class="hash-link" aria-label="2. MLflow Tracing: LLM 호출 흐름 추적에 대한 직접 링크" title="2. MLflow Tracing: LLM 호출 흐름 추적에 대한 직접 링크" translate="no">​</a></h3>
<p><strong>핵심 개념: Trace와 Span</strong></p>
<p>MLflow Tracing은 OpenTelemetry 표준을 따르며 다음 구조를 사용합니다:</p>
<ul>
<li class=""><strong>Trace</strong>: 전체 요청/세션의 실행 흐름</li>
<li class=""><strong>Span</strong>: Trace를 구성하는 개별 작업 단위 (시작/종료 시간, 입출력, 메타데이터 포함)</li>
<li class="">Span은 Parent-Child 계층 구조로 실행 순서를 표현</li>
</ul>
<p><strong>자동 Tracing vs 수동 Tracing</strong></p>
<p><strong>자동 Tracing (Autolog)</strong>:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> mlflow</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># LangChain 자동 추적 활성화</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">langchain</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">autolog</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 이후 LangChain 호출은 자동으로 Trace 생성</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">chain </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> create_retrieval_chain</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">retriever</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> llm</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">result </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> chain</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">invoke</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"input"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"What is RAG?"</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>자동 추적은 다음을 자동 수집합니다:</p>
<ul>
<li class="">각 컴포넌트(Retriever, LLM, Chain)별 Span</li>
<li class="">입출력 데이터</li>
<li class="">토큰 사용량 (LLM 호출 시)</li>
<li class="">실행 시간</li>
</ul>
<p><strong>수동 Tracing (Decorator)</strong>:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> mlflow</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token decorator annotation punctuation" style="color:#393A34">@mlflow</span><span class="token decorator annotation punctuation" style="color:#393A34">.</span><span class="token decorator annotation punctuation" style="color:#393A34">trace</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"document_preprocessing"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> span_type</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"PROCESSING"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">preprocess_documents</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">docs</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token triple-quoted-string string" style="color:#e3116c">"""문서 전처리 로직"""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">set_span_attribute</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"num_docs"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token builtin">len</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">docs</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    processed </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">clean_text</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">doc</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> doc </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> docs</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> processed</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token decorator annotation punctuation" style="color:#393A34">@mlflow</span><span class="token decorator annotation punctuation" style="color:#393A34">.</span><span class="token decorator annotation punctuation" style="color:#393A34">trace</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"rerank_documents"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> span_type</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"RETRIEVAL"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">rerank</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> docs</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token triple-quoted-string string" style="color:#e3116c">"""검색 결과 재정렬"""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    scores </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> compute_relevance_scores</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">query</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> docs</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">set_span_attribute</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"rerank_model"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"cross-encoder-v2"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token builtin">sorted</span><span class="token punctuation" style="color:#393A34">(</span><span class="token builtin">zip</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">docs</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> scores</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> key</span><span class="token operator" style="color:#393A34">=</span><span class="token keyword" style="color:#00009f">lambda</span><span class="token plain"> x</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> x</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">1</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> reverse</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p><strong>의사결정 근거</strong>: 자동 추적은 LangChain/LangGraph 기본 동작을 추적할 때 편리하지만, 비즈니스 로직이나 커스텀 컴포넌트는 수동 데코레이터로 명시적으로 Span을 추가해야 합니다. 특히 문서 전처리, 필터링, 재정렬 등 RAG 파이프라인의 중간 단계를 추적하려면 수동 Tracing이 필수입니다.</p>
<p><strong>Trace 시각화</strong></p>
<p>MLflow UI의 Traces 탭에서 다음을 확인할 수 있습니다:</p>
<ul>
<li class=""><strong>Tree View</strong>: Span 간 계층 구조와 실행 순서</li>
<li class=""><strong>Timeline View</strong>: 각 Span의 시작/종료 시간과 병렬 실행 여부</li>
<li class=""><strong>Details Panel</strong>: Span별 입출력 데이터, 토큰 사용량, 커스텀 속성</li>
<li class=""><strong>Error Tracking</strong>: 예외 발생 Span 및 스택 트레이스</li>
</ul>
<p><strong>OpenTelemetry 통합</strong></p>
<p>MLflow Tracing은 OpenTelemetry 표준을 준수하므로 기존 관찰성 도구와 쉽게 연동됩니다:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> opentelemetry </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> trace</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> opentelemetry</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">exporter</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">otlp</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">proto</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">grpc</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">trace_exporter </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> OTLPSpanExporter</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> opentelemetry</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">sdk</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">trace </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> TracerProvider</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> opentelemetry</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">sdk</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">trace</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">export </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> BatchSpanProcessor</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># OpenTelemetry 설정</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">trace</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">set_tracer_provider</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">TracerProvider</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">otlp_exporter </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> OTLPSpanExporter</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">endpoint</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"http://grafana-tempo:4317"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">trace</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">get_tracer_provider</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">add_span_processor</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">BatchSpanProcessor</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">otlp_exporter</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># MLflow Tracing은 자동으로 OpenTelemetry와 통합됨</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">langchain</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">autolog</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>이를 통해 Grafana, Datadog, New Relic 등에서 MLflow Trace를 시각화하고, 마이크로서비스 환경에서 분산 추적이 가능합니다.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-mlflow-evaluation-llm-출력-품질-평가">3. MLflow Evaluation: LLM 출력 품질 평가<a href="https://teddynote-lab.github.io/brain-cache/lab/mlflow-%E1%84%80%E1%85%B5%E1%84%89%E1%85%AE%E1%86%AF-%E1%84%80%E1%85%A5%E1%86%B7%E1%84%90%E1%85%A9rag-%E1%84%89%E1%85%A5%E1%86%BC%E1%84%82%E1%85%B3%E1%86%BC-%E1%84%89%E1%85%B5%E1%86%AF%E1%84%92%E1%85%A5%E1%86%B7-%E1%84%90%E1%85%A6%E1%84%89%E1%85%B3%E1%84%90%E1%85%B3%E1%84%87%E1%85%A6%E1%84%83%E1%85%B3-%E1%84%80%E1%85%B5%E1%84%82%E1%85%B3%E1%86%BC#3-mlflow-evaluation-llm-%EC%B6%9C%EB%A0%A5-%ED%92%88%EC%A7%88-%ED%8F%89%EA%B0%80" class="hash-link" aria-label="3. MLflow Evaluation: LLM 출력 품질 평가에 대한 직접 링크" title="3. MLflow Evaluation: LLM 출력 품질 평가에 대한 직접 링크" translate="no">​</a></h3>
<p><strong>평가 워크플로우</strong></p>
<p>MLflow Evaluation은 다음 단계로 구성됩니다:</p>
<ol>
<li class=""><strong>평가 데이터셋 준비</strong> (Pandas DataFrame)</li>
<li class=""><strong>모델 또는 예측 함수 정의</strong></li>
<li class=""><strong>평가 지표(Scorer) 선택</strong></li>
<li class=""><strong><code>mlflow.evaluate()</code> 실행</strong></li>
<li class=""><strong>결과 분석</strong> (MLflow UI 또는 API)</li>
</ol>
<p><strong>평가 데이터셋 구조</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> pandas </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> pd</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">eval_data </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> pd</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">DataFrame</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"inputs"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"What is the capital of France?"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"Explain quantum computing"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"How does photosynthesis work?"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"context"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"France is a country in Europe. Paris is its capital."</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"Quantum computing uses quantum bits..."</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"Plants use sunlight to produce energy..."</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">"ground_truth"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"Paris"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"A computing paradigm using quantum mechanics"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"Process where plants convert light to chemical energy"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>필수 컬럼:</p>
<ul>
<li class=""><strong>inputs</strong>: 모델 입력</li>
<li class=""><strong>context</strong>: (선택) RAG의 검색 문서 등 추가 정보</li>
<li class=""><strong>ground_truth</strong> 또는 <strong>targets</strong>: (선택) 정답 레이블</li>
</ul>
<p><strong>Built-in Evaluators 활용</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> mlflow</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># LLM 출력 품질 평가</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">results </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">evaluate</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"openai:/gpt-4"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># 또는 커스텀 함수</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    data</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">eval_data</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model_type</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"question-answering"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    evaluators</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"default"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># 기본 메트릭 세트</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    extra_metrics</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">metrics</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">toxicity</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">metrics</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">flesch_kincaid_grade_level</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">metrics</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">answer_similarity</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">metrics</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">faithfulness</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"openai:/gpt-4-turbo"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">metrics</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">answer_relevance</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"openai:/gpt-4-turbo"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">results</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">metrics</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p><strong>주요 LLM 메트릭</strong>:</p>
<ul>
<li class=""><strong>toxicity</strong>: 유해성 점수 (0~1, 낮을수록 좋음)</li>
<li class=""><strong>flesch_kincaid_grade_level</strong>: 가독성 (미국 학년 수준)</li>
<li class=""><strong>answer_similarity</strong>: 임베딩 기반 답변-정답 유사도</li>
<li class=""><strong>faithfulness</strong>: 답변이 제공된 context에 충실한지 (LLM-as-Judge)</li>
<li class=""><strong>answer_relevance</strong>: 답변이 질문과 관련 있는지 (LLM-as-Judge)</li>
</ul>
<p><strong>LLM-as-a-Judge 동작 원리</strong></p>
<p>faithfulness, answer_relevance 같은 메트릭은 LLM을 판단자로 사용합니다:</p>
<ol>
<li class="">평가용 프롬프트 생성 (질문 + 답변 + context)</li>
<li class="">Judge LLM에 전송 (예: GPT-4)</li>
<li class="">LLM이 평가 기준에 따라 점수 반환 (1~5 척도 등)</li>
<li class="">점수를 정규화하여 저장</li>
</ol>
<p><strong>의사결정 포인트</strong>: Judge 모델은 평가 대상 모델보다 강력해야 신뢰성이 높습니다. 예를 들어 GPT-3.5로 생성한 답변을 GPT-4로 평가하거나, 오픈소스 LLM 출력을 Claude Opus로 평가하는 방식이 권장됩니다.</p>
<p><strong>Custom Evaluators 작성</strong></p>
<p><strong>함수 기반 메트릭</strong>:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">metrics </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> make_metric</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">contains_keyword</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">eval_df</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> builtin_metrics</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token triple-quoted-string string" style="color:#e3116c">"""답변에 특정 키워드가 포함되어 있는지 체크"""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    keywords </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"quantum"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"photosynthesis"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"capital"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    scores </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> _</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> row </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> eval_df</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">iterrows</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        answer </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> row</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"outputs"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">lower</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        score </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1.0</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token builtin">any</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">kw </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> answer </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> kw </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> keywords</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">else</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0.0</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        scores</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">append</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">score</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> scores</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">keyword_metric </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> make_metric</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    eval_fn</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">contains_keyword</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    greater_is_better</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"keyword_presence"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">evaluate</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">rag_model</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    data</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">eval_data</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    extra_metrics</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">keyword_metric</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p><strong>LLM-as-Judge 커스텀 메트릭</strong>:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">metrics</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">genai </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> make_genai_metric</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">custom_faithfulness </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> make_genai_metric</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"custom_faithfulness"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    definition</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"답변이 제공된 문서에만 기반하고 외부 지식을 사용하지 않는지 평가"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    grading_prompt</span><span class="token operator" style="color:#393A34">=</span><span class="token triple-quoted-string string" style="color:#e3116c">"""</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">    다음 기준으로 답변을 평가하세요:</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">    - 5점: 모든 정보가 문서에서 직접 추출됨</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">    - 4점: 대부분 문서 기반이지만 약간의 추론 포함</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">    - 3점: 문서와 외부 지식이 혼합됨</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">    - 2점: 주로 외부 지식 사용</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">    - 1점: 문서와 무관한 정보</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">    질문: {inputs}</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">    문서: {context}</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">    답변: {outputs}</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">    점수만 반환하세요 (1~5).</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">    """</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    grading_context_columns</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"context"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    examples</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">"inputs"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"What is the capital?"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">"outputs"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Paris is the capital."</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">"context"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Paris is the capital of France."</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">"score"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">5</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token string" style="color:#e3116c">"justification"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"답변이 문서에서 직접 추출됨"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"openai:/gpt-4-turbo"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    parameters</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"temperature"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0.0</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">evaluate</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">rag_model</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    data</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">eval_data</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    extra_metrics</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">custom_faithfulness</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p><strong>의사결정 근거</strong>: 프로젝트별 도메인 특화 평가 기준(예: 의료 분야의 전문 용어 정확성, 금융 분야의 수치 정확성)은 커스텀 메트릭으로 정의해야 합니다. 프롬프트에 Few-shot 예시를 포함하면 Judge LLM의 평가 일관성이 크게 향상됩니다.</p>
<p><strong>평가 결과 활용</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># 평가 결과 출력</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">results</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">metrics</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># {'toxicity/v1/mean': 0.02, 'faithfulness/v1/mean': 4.5, ...}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 행별 상세 결과</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">results_df </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> results</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">tables</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"eval_results_table"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">results_df</span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"inputs"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"outputs"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"faithfulness/v1/score"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># MLflow UI에서 시각화</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># - 메트릭별 분포 히스토그램</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># - 실패 사례 필터링</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># - 여러 모델 간 평가 결과 비교</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-추가-고급-기능">4. 추가 고급 기능<a href="https://teddynote-lab.github.io/brain-cache/lab/mlflow-%E1%84%80%E1%85%B5%E1%84%89%E1%85%AE%E1%86%AF-%E1%84%80%E1%85%A5%E1%86%B7%E1%84%90%E1%85%A9rag-%E1%84%89%E1%85%A5%E1%86%BC%E1%84%82%E1%85%B3%E1%86%BC-%E1%84%89%E1%85%B5%E1%86%AF%E1%84%92%E1%85%A5%E1%86%B7-%E1%84%90%E1%85%A6%E1%84%89%E1%85%B3%E1%84%90%E1%85%B3%E1%84%87%E1%85%A6%E1%84%83%E1%85%B3-%E1%84%80%E1%85%B5%E1%84%82%E1%85%B3%E1%86%BC#4-%EC%B6%94%EA%B0%80-%EA%B3%A0%EA%B8%89-%EA%B8%B0%EB%8A%A5" class="hash-link" aria-label="4. 추가 고급 기능에 대한 직접 링크" title="4. 추가 고급 기능에 대한 직접 링크" translate="no">​</a></h3>
<p><strong>프롬프트 버전 관리</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> mlflow</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">prompt_template_v1 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token triple-quoted-string string" style="color:#e3116c">"""</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">You are a helpful assistant. Answer the question based on the context.</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="display:inline-block;color:#e3116c"></span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">Context: {context}</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">Question: {question}</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">Answer:</span><br></span><span class="token-line" style="color:#393A34"><span class="token triple-quoted-string string" style="color:#e3116c">"""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">with</span><span class="token plain"> mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">start_run</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">log_param</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"prompt_version"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"v1"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">log_text</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">prompt_template_v1</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"prompt_template.txt"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># 프롬프트 성능 평가</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    results </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> evaluate_prompt</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">prompt_template_v1</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> test_data</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">log_metrics</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">results</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>프롬프트를 파라미터와 아티팩트로 저장하여 변경 사항을 추적하고, 여러 버전의 성능을 비교할 수 있습니다.</p>
<p><strong>Multi-Turn 대화 추적</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> mlflow</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">session_id </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user-123-session-456"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">with</span><span class="token plain"> mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">start_run</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">run_name</span><span class="token operator" style="color:#393A34">=</span><span class="token string-interpolation string" style="color:#e3116c">f"conversation-</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">session_id</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> run</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">set_tag</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"session_id"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> session_id</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    conversation_history </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> turn</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> user_input </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> </span><span class="token builtin">enumerate</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">user_inputs</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">with</span><span class="token plain"> mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">start_span</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">name</span><span class="token operator" style="color:#393A34">=</span><span class="token string-interpolation string" style="color:#e3116c">f"turn_</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">turn</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> span</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> chatbot</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">reply</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">user_input</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> conversation_history</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            conversation_history</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">append</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> user_input</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"bot"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> response</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">log_metric</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"turn_</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">turn</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">_latency"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> response</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">latency</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># 전체 대화 저장</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">log_dict</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"conversation"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> conversation_history</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"full_conversation.json"</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>session_id 태그로 대화를 그룹화하고, 턴별 Span으로 각 응답을 추적합니다.</p>
<p><strong>OpenTelemetry 분산 추적</strong></p>
<p>마이크로서비스 환경에서 여러 서비스에 걸친 요청을 추적:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># Service A (Gateway)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> mlflow</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> opentelemetry </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> trace</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">tracer </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> trace</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">get_tracer</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">__name__</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">with</span><span class="token plain"> tracer</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">start_as_current_span</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"gateway_request"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">with</span><span class="token plain"> mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">start_span</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"preprocess_input"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        cleaned_input </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> preprocess</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">user_query</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># Service B 호출 (Trace context 자동 전파)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> requests</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">post</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"http://rag-service/query"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> json</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">"input"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> cleaned_input</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p>OpenTelemetry context propagation으로 Service A와 Service B의 Span이 하나의 Trace로 연결됩니다.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="5-langsmith와의-비교">5. LangSmith와의 비교<a href="https://teddynote-lab.github.io/brain-cache/lab/mlflow-%E1%84%80%E1%85%B5%E1%84%89%E1%85%AE%E1%86%AF-%E1%84%80%E1%85%A5%E1%86%B7%E1%84%90%E1%85%A9rag-%E1%84%89%E1%85%A5%E1%86%BC%E1%84%82%E1%85%B3%E1%86%BC-%E1%84%89%E1%85%B5%E1%86%AF%E1%84%92%E1%85%A5%E1%86%B7-%E1%84%90%E1%85%A6%E1%84%89%E1%85%B3%E1%84%90%E1%85%B3%E1%84%87%E1%85%A6%E1%84%83%E1%85%B3-%E1%84%80%E1%85%B5%E1%84%82%E1%85%B3%E1%86%BC#5-langsmith%EC%99%80%EC%9D%98-%EB%B9%84%EA%B5%90" class="hash-link" aria-label="5. LangSmith와의 비교에 대한 직접 링크" title="5. LangSmith와의 비교에 대한 직접 링크" translate="no">​</a></h3>
<p><strong>MLflow 선택이 유리한 경우</strong>:</p>
<ul>
<li class="">전통적인 ML 모델과 LLM을 동일 플랫폼에서 관리하고 싶을 때</li>
<li class="">온프레미스 또는 프라이빗 클라우드에서 자체 호스팅이 필요할 때</li>
<li class="">기존 옵저버빌리티 스택(Grafana, Prometheus)과 통합하고 싶을 때</li>
<li class="">오픈소스 라이선스로 비용 절감이 중요할 때</li>
<li class="">Scikit-learn, PyTorch, TensorFlow 등 다양한 프레임워크를 함께 사용할 때</li>
</ul>
<p><strong>LangSmith 선택이 유리한 경우</strong>:</p>
<ul>
<li class="">LangChain/LangGraph를 주요 프레임워크로 사용하고, 즉시 사용 가능한 SaaS가 필요할 때</li>
<li class="">Playground 기능으로 프롬프트를 즉시 테스트하고 수정하고 싶을 때</li>
<li class="">사용자 피드백(thumbs up/down)을 손쉽게 수집하고 관리하고 싶을 때</li>
<li class="">인프라 운영 리소스가 부족하고 관리형 서비스를 선호할 때</li>
</ul>
<p><strong>의사결정 체크리스트</strong>:</p>








































<table><thead><tr><th>항목</th><th>MLflow</th><th>LangSmith</th></tr></thead><tbody><tr><td>라이선스</td><td>오픈소스 (Apache 2.0)</td><td>상용 (Free tier 제한적)</td></tr><tr><td>호스팅</td><td>자체 호스팅 필요</td><td>클라우드 SaaS</td></tr><tr><td>프레임워크 지원</td><td>범용 (Scikit-learn, PyTorch, LangChain 등)</td><td>LangChain 전용</td></tr><tr><td>통합성</td><td>OpenTelemetry 표준</td><td>LangChain 에코시스템</td></tr><tr><td>UI/UX</td><td>범용 실험 관리 UI</td><td>LangChain 특화 대화형 UI</td></tr><tr><td>비용</td><td>인프라 비용만</td><td>사용량 기반 과금</td></tr></tbody></table>
<p><strong>하이브리드 접근</strong>: 일부 팀은 개발 단계에서 LangSmith로 빠른 프로토타이핑을 하고, 프로덕션 배포는 MLflow로 전환하는 전략을 사용합니다.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="실전-적용-시-주의사항">실전 적용 시 주의사항<a href="https://teddynote-lab.github.io/brain-cache/lab/mlflow-%E1%84%80%E1%85%B5%E1%84%89%E1%85%AE%E1%86%AF-%E1%84%80%E1%85%A5%E1%86%B7%E1%84%90%E1%85%A9rag-%E1%84%89%E1%85%A5%E1%86%BC%E1%84%82%E1%85%B3%E1%86%BC-%E1%84%89%E1%85%B5%E1%86%AF%E1%84%92%E1%85%A5%E1%86%B7-%E1%84%90%E1%85%A6%E1%84%89%E1%85%B3%E1%84%90%E1%85%B3%E1%84%87%E1%85%A6%E1%84%83%E1%85%B3-%E1%84%80%E1%85%B5%E1%84%82%E1%85%B3%E1%86%BC#%EC%8B%A4%EC%A0%84-%EC%A0%81%EC%9A%A9-%EC%8B%9C-%EC%A3%BC%EC%9D%98%EC%82%AC%ED%95%AD" class="hash-link" aria-label="실전 적용 시 주의사항에 대한 직접 링크" title="실전 적용 시 주의사항에 대한 직접 링크" translate="no">​</a></h3>
<p><strong>1. 민감 데이터 로깅 제어</strong></p>
<p>LLM 입출력에 개인정보가 포함될 수 있으므로 로깅 전 필터링이 필요합니다:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> mlflow</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">utils</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">autologging_utils </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> disable_for_unsupported_versions</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 특정 필드 제외</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">langchain</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">autolog</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">log_inputs</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">False</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> log_outputs</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 또는 커스텀 필터 적용</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">sanitize_output</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">output</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># 이메일, 전화번호 등 마스킹</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> re</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">sub</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'[EMAIL]'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> output</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p><strong>2. 대규모 Trace 저장 비용</strong></p>
<p>Tracing을 모든 요청에 활성화하면 스토리지 비용이 급증할 수 있습니다. 샘플링 전략 적용:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> random</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> mlflow</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 10% 샘플링</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> random</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">random</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">&lt;</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0.1</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">langchain</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">autolog</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">else</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">langchain</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">autolog</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">disable</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">True</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p><strong>3. Evaluation Judge 모델 비용</strong></p>
<p>LLM-as-a-Judge 메트릭은 평가 데이터셋 크기에 비례해 API 호출 비용이 발생합니다. 캐싱 전략 활용:</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> mlflow</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">metrics</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">genai </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> make_genai_metric</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">custom_metric </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> make_genai_metric</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"faithfulness"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token comment" style="color:#999988;font-style:italic"># ...</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    model</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"openai:/gpt-4-turbo"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    parameters</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"temperature"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0.0</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># 결정적 출력으로 캐싱 효율 향상</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token string" style="color:#e3116c">"seed"</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">42</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<p><strong>4. MLflow Server 고가용성</strong></p>
<p>프로덕션 환경에서는 MLflow Tracking Server를 고가용성으로 구성:</p>
<ul>
<li class="">Backend Store: PostgreSQL/MySQL (복제 구성)</li>
<li class="">Artifact Store: S3/GCS (자동 복제)</li>
<li class="">Load Balancer: 여러 MLflow 서버 인스턴스 앞단 배치</li>
<li class="">모니터링: Prometheus + Grafana로 서버 상태 추적</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="references">References<a href="https://teddynote-lab.github.io/brain-cache/lab/mlflow-%E1%84%80%E1%85%B5%E1%84%89%E1%85%AE%E1%86%AF-%E1%84%80%E1%85%A5%E1%86%B7%E1%84%90%E1%85%A9rag-%E1%84%89%E1%85%A5%E1%86%BC%E1%84%82%E1%85%B3%E1%86%BC-%E1%84%89%E1%85%B5%E1%86%AF%E1%84%92%E1%85%A5%E1%86%B7-%E1%84%90%E1%85%A6%E1%84%89%E1%85%B3%E1%84%90%E1%85%B3%E1%84%87%E1%85%A6%E1%84%83%E1%85%B3-%E1%84%80%E1%85%B5%E1%84%82%E1%85%B3%E1%86%BC#references" class="hash-link" aria-label="References에 대한 직접 링크" title="References에 대한 직접 링크" translate="no">​</a></h2>
<ul>
<li class=""><a href="https://mlflow.org/docs/latest/index.html" target="_blank" rel="noopener noreferrer" class="">MLflow Official Documentation</a></li>
<li class=""><a href="https://mlflow.org/docs/latest/llms/index.html" target="_blank" rel="noopener noreferrer" class="">MLflow LLMs Guide</a></li>
<li class=""><a href="https://mlflow.org/docs/latest/llms/tracing/index.html" target="_blank" rel="noopener noreferrer" class="">MLflow Tracing Documentation</a></li>
<li class=""><a href="https://mlflow.org/docs/latest/llms/llm-evaluate/index.html" target="_blank" rel="noopener noreferrer" class="">MLflow GenAI Evaluation Guide</a></li>
<li class=""><a href="https://opentelemetry.io/docs/languages/python/" target="_blank" rel="noopener noreferrer" class="">OpenTelemetry Python SDK</a></li>
<li class=""><a href="https://python.langchain.com/docs/integrations/providers/mlflow/" target="_blank" rel="noopener noreferrer" class="">LangChain MLflow Integration</a></li>
<li class=""><a href="https://github.com/taehan79-kim/mlflow-genai-tutorial" target="_blank" rel="noopener noreferrer" class="">프로젝트 테스트 코드 GitHub Repository</a></li>
</ul>]]></content:encoded>
            <author>hank@brain-crew.com (김태한)</author>
            <category>Evaluation</category>
        </item>
        <item>
            <title><![CDATA[LLM이 가장 잘 이해하는 Table Format에 대한 평가실험]]></title>
            <link>https://teddynote-lab.github.io/brain-cache/lab/llm이-가장-잘-이해하는-table-format에-대한-평가실험</link>
            <guid>https://teddynote-lab.github.io/brain-cache/lab/llm이-가장-잘-이해하는-table-format에-대한-평가실험</guid>
            <pubDate>Mon, 15 Dec 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[재무제표 데이터를 LLM에 전달할 때 데이터 포맷에 따라 성능과 비용이 크게 달라집니다. 11가지 포맷을 비교한 결과, TSV(tab-separated) 포맷이 정확도 100%, 최소 토큰 사용(7,192개), 최단 응답시간(8.24초)으로 모든 지표에서 최고 성능을 ]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="tldr">TL;DR<a href="https://teddynote-lab.github.io/brain-cache/lab/llm%E1%84%8B%E1%85%B5-%E1%84%80%E1%85%A1%E1%84%8C%E1%85%A1%E1%86%BC-%E1%84%8C%E1%85%A1%E1%86%AF-%E1%84%8B%E1%85%B5%E1%84%92%E1%85%A2%E1%84%92%E1%85%A1%E1%84%82%E1%85%B3%E1%86%AB-table-format%E1%84%8B%E1%85%A6-%E1%84%83%E1%85%A2%E1%84%92%E1%85%A1%E1%86%AB-%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1%E1%84%89%E1%85%B5%E1%86%AF%E1%84%92%E1%85%A5%E1%86%B7#tldr" class="hash-link" aria-label="TL;DR에 대한 직접 링크" title="TL;DR에 대한 직접 링크" translate="no">​</a></h2>
<blockquote>
<p>재무제표 데이터를 LLM에 전달할 때 데이터 포맷에 따라 성능과 비용이 크게 달라집니다. 11가지 포맷을 비교한 결과, TSV(tab-separated) 포맷이 정확도 100%, 최소 토큰 사용(7,192개), 최단 응답시간(8.24초)으로 모든 지표에서 최고 성능을 보였습니다. 반면 DICT와 XML은 프로그래밍 문법의 메타 문자로 인해 토큰을 2배 이상 낭비했고, STRING 포맷은 정확도가 75%로 떨어졌습니다. 실무에서는 TSV가 이론적 최적이지만, Markdown Key-Value가 가독성과 효율성의 균형점으로 더 실용적일 수 있습니다.</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="key-takeaways">Key Takeaways<a href="https://teddynote-lab.github.io/brain-cache/lab/llm%E1%84%8B%E1%85%B5-%E1%84%80%E1%85%A1%E1%84%8C%E1%85%A1%E1%86%BC-%E1%84%8C%E1%85%A1%E1%86%AF-%E1%84%8B%E1%85%B5%E1%84%92%E1%85%A2%E1%84%92%E1%85%A1%E1%84%82%E1%85%B3%E1%86%AB-table-format%E1%84%8B%E1%85%A6-%E1%84%83%E1%85%A2%E1%84%92%E1%85%A1%E1%86%AB-%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1%E1%84%89%E1%85%B5%E1%86%AF%E1%84%92%E1%85%A5%E1%86%B7#key-takeaways" class="hash-link" aria-label="Key Takeaways에 대한 직접 링크" title="Key Takeaways에 대한 직접 링크" translate="no">​</a></h2>
<ul>
<li class=""><strong>간결한 포맷이 LLM 성능과 비용 모두 우수</strong>: TSV는 XML 대비 토큰을 57% 절감하고 응답속도를 3배 향상시켰습니다. 메타 문자를 최소화하는 것이 핵심입니다.</li>
<li class=""><strong>구조화된 포맷이 필수</strong>: STRING 같은 비구조화 포맷은 정확도가 25% 하락합니다. 테이블 형태의 명확한 구조가 LLM의 이해도를 크게 높입니다.</li>
<li class=""><strong>프로그래밍 문법은 토큰 낭비</strong>: DICT, XML처럼 <code>{</code>, <code>}</code>, <code>'</code> 등 메타 문자가 많은 포맷은 실제 데이터 대비 구조 표현에 토큰을 과다 소모합니다. JSON이나 TSV 같은 간결한 대안을 선택하세요.</li>
<li class=""><strong>가독성과 효율성의 트레이드오프 고려</strong>: 이론적으로 TSV가 최적이지만, 실무에서는 Markdown Key-Value처럼 사람과 기계 모두 읽기 쉬운 포맷이 유지보수와 확장성 면에서 더 나을 수 있습니다.</li>
<li class=""><strong>포맷 선택은 비용에 직결</strong>: 대규모 RAG 시스템에서 포맷 최적화만으로 토큰 비용을 50% 이상 절감할 수 있습니다. 초기 설계 단계에서 포맷을 신중히 선택하세요.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="상세-내용">상세 내용<a href="https://teddynote-lab.github.io/brain-cache/lab/llm%E1%84%8B%E1%85%B5-%E1%84%80%E1%85%A1%E1%84%8C%E1%85%A1%E1%86%BC-%E1%84%8C%E1%85%A1%E1%86%AF-%E1%84%8B%E1%85%B5%E1%84%92%E1%85%A2%E1%84%92%E1%85%A1%E1%84%82%E1%85%B3%E1%86%AB-table-format%E1%84%8B%E1%85%A6-%E1%84%83%E1%85%A2%E1%84%92%E1%85%A1%E1%86%AB-%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1%E1%84%89%E1%85%B5%E1%86%AF%E1%84%92%E1%85%A5%E1%86%B7#%EC%83%81%EC%84%B8-%EB%82%B4%EC%9A%A9" class="hash-link" aria-label="상세 내용에 대한 직접 링크" title="상세 내용에 대한 직접 링크" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="배경-왜-테이블-포맷이-중요한가">배경: 왜 테이블 포맷이 중요한가?<a href="https://teddynote-lab.github.io/brain-cache/lab/llm%E1%84%8B%E1%85%B5-%E1%84%80%E1%85%A1%E1%84%8C%E1%85%A1%E1%86%BC-%E1%84%8C%E1%85%A1%E1%86%AF-%E1%84%8B%E1%85%B5%E1%84%92%E1%85%A2%E1%84%92%E1%85%A1%E1%84%82%E1%85%B3%E1%86%AB-table-format%E1%84%8B%E1%85%A6-%E1%84%83%E1%85%A2%E1%84%92%E1%85%A1%E1%86%AB-%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1%E1%84%89%E1%85%B5%E1%86%AF%E1%84%92%E1%85%A5%E1%86%B7#%EB%B0%B0%EA%B2%BD-%EC%99%9C-%ED%85%8C%EC%9D%B4%EB%B8%94-%ED%8F%AC%EB%A7%B7%EC%9D%B4-%EC%A4%91%EC%9A%94%ED%95%9C%EA%B0%80" class="hash-link" aria-label="배경: 왜 테이블 포맷이 중요한가?에 대한 직접 링크" title="배경: 왜 테이블 포맷이 중요한가?에 대한 직접 링크" translate="no">​</a></h3>
<p>많은 RAG(Retrieval-Augmented Generation) 파이프라인에서 재무제표, 스프레드시트, 데이터베이스 쿼리 결과 등 테이블 형태의 데이터를 LLM에 전달해야 합니다. 하지만 같은 데이터라도 어떤 포맷으로 인코딩하느냐에 따라 LLM의 이해도, 토큰 사용량, 응답 속도가 크게 달라집니다.</p>
<p>예를 들어, Elasticsearch에서 추출한 재무제표 데이터를 LLM에 전달할 때:</p>
<ul>
<li class="">JSON으로 보낼 것인가?</li>
<li class="">CSV나 TSV로 보낼 것인가?</li>
<li class="">Markdown 테이블이나 HTML을 사용할 것인가?</li>
</ul>
<p>이 선택은 시스템 정확도와 운영 비용에 직접적인 영향을 미칩니다. IBK Capital 프로젝트에서는 이 질문에 답하기 위해 체계적인 실험을 수행했습니다.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="실험-설계">실험 설계<a href="https://teddynote-lab.github.io/brain-cache/lab/llm%E1%84%8B%E1%85%B5-%E1%84%80%E1%85%A1%E1%84%8C%E1%85%A1%E1%86%BC-%E1%84%8C%E1%85%A1%E1%86%AF-%E1%84%8B%E1%85%B5%E1%84%92%E1%85%A2%E1%84%92%E1%85%A1%E1%84%82%E1%85%B3%E1%86%AB-table-format%E1%84%8B%E1%85%A6-%E1%84%83%E1%85%A2%E1%84%92%E1%85%A1%E1%86%AB-%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1%E1%84%89%E1%85%B5%E1%86%AF%E1%84%92%E1%85%A5%E1%86%B7#%EC%8B%A4%ED%97%98-%EC%84%A4%EA%B3%84" class="hash-link" aria-label="실험 설계에 대한 직접 링크" title="실험 설계에 대한 직접 링크" translate="no">​</a></h3>
<p><strong>평가 대상 포맷 (11가지)</strong></p>
<ol>
<li class="">TSV (tab-separated values)</li>
<li class="">JSON</li>
<li class="">CSV</li>
<li class="">Markdown Table</li>
<li class="">HTML Table</li>
<li class="">Markdown Key-Value</li>
<li class="">DICT (Python dictionary list)</li>
<li class="">LaTeX</li>
<li class="">XML</li>
<li class="">NumPy array</li>
<li class="">STRING (자연어 형식)</li>
</ol>
<p><strong>평가 지표</strong></p>
<ul>
<li class=""><strong>정확도</strong>: LLM이 데이터 기반 질문에 정확히 답변한 비율 (0-1)</li>
<li class=""><strong>토큰 사용량</strong>: 프롬프트와 응답에 소요된 총 토큰 수</li>
<li class=""><strong>응답 속도</strong>: 질의응답 완료까지 걸린 시간 (초)</li>
</ul>
<p><strong>실험 환경</strong></p>
<ul>
<li class="">모델: AWS Bedrock Claude Sonnet 4.5</li>
<li class="">데이터: Elasticsearch에서 추출한 실제 재무제표 데이터 (매출액, EBIT, EBITDA 등 약 40개 항목)</li>
<li class="">질문 형식: "2020년 12월의 매출액은 얼마인가요?" 같은 특정 값 조회</li>
</ul>
<p><strong>종합 점수 계산</strong>
각 지표를 정규화(0-1)한 후 가중 평균:</p>
<ul>
<li class="">정확도: 0.5</li>
<li class="">토큰 효율: 0.3</li>
<li class="">속도: 0.2</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="실험-결과-tsv의-압도적-우위">실험 결과: TSV의 압도적 우위<a href="https://teddynote-lab.github.io/brain-cache/lab/llm%E1%84%8B%E1%85%B5-%E1%84%80%E1%85%A1%E1%84%8C%E1%85%A1%E1%86%BC-%E1%84%8C%E1%85%A1%E1%86%AF-%E1%84%8B%E1%85%B5%E1%84%92%E1%85%A2%E1%84%92%E1%85%A1%E1%84%82%E1%85%B3%E1%86%AB-table-format%E1%84%8B%E1%85%A6-%E1%84%83%E1%85%A2%E1%84%92%E1%85%A1%E1%86%AB-%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1%E1%84%89%E1%85%B5%E1%86%AF%E1%84%92%E1%85%A5%E1%86%B7#%EC%8B%A4%ED%97%98-%EA%B2%B0%EA%B3%BC-tsv%EC%9D%98-%EC%95%95%EB%8F%84%EC%A0%81-%EC%9A%B0%EC%9C%84" class="hash-link" aria-label="실험 결과: TSV의 압도적 우위에 대한 직접 링크" title="실험 결과: TSV의 압도적 우위에 대한 직접 링크" translate="no">​</a></h3>
<p><strong>종합 순위</strong></p>













































































<table><thead><tr><th>순위</th><th>포맷</th><th>종합점수</th><th>정확도</th><th>토큰수</th><th>지연시간(초)</th></tr></thead><tbody><tr><td>1</td><td>TSV</td><td>1.0000</td><td>1.00</td><td>7,192</td><td>8.24</td></tr><tr><td>2</td><td>JSON</td><td>0.8973</td><td>1.00</td><td>9,941</td><td>10.25</td></tr><tr><td>3</td><td>HTML</td><td>0.8768</td><td>1.00</td><td>10,229</td><td>11.59</td></tr><tr><td>4</td><td>Markdown</td><td>0.8505</td><td>1.00</td><td>9,805</td><td>16.17</td></tr><tr><td>5</td><td>Markdown KV</td><td>0.8176</td><td>1.00</td><td>11,075</td><td>15.41</td></tr><tr><td>...</td><td>...</td><td>...</td><td>...</td><td>...</td><td>...</td></tr><tr><td>10</td><td>XML</td><td>0.5523</td><td>1.00</td><td>16,852</td><td>25.37</td></tr><tr><td>11</td><td>STRING</td><td>0.3773</td><td>0.75</td><td>9,183</td><td>15.30</td></tr></tbody></table>
<p><strong>핵심 발견</strong></p>
<ul>
<li class="">TSV를 제외한 대부분 구조화 포맷은 정확도 1.00 달성</li>
<li class="">STRING 포맷만 정확도 0.75로 하락 → <strong>구조화가 필수</strong></li>
<li class="">토큰 사용량 차이: 최소(TSV 7,192) vs 최대(XML 16,852) = <strong>2.3배</strong></li>
<li class="">응답 속도 차이: 최소(TSV 8.24초) vs 최대(DICT 31.44초) = <strong>3.8배</strong></li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="왜-tsv가-최고-성능을-보이는가">왜 TSV가 최고 성능을 보이는가?<a href="https://teddynote-lab.github.io/brain-cache/lab/llm%E1%84%8B%E1%85%B5-%E1%84%80%E1%85%A1%E1%84%8C%E1%85%A1%E1%86%BC-%E1%84%8C%E1%85%A1%E1%86%AF-%E1%84%8B%E1%85%B5%E1%84%92%E1%85%A2%E1%84%92%E1%85%A1%E1%84%82%E1%85%B3%E1%86%AB-table-format%E1%84%8B%E1%85%A6-%E1%84%83%E1%85%A2%E1%84%92%E1%85%A1%E1%86%AB-%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1%E1%84%89%E1%85%B5%E1%86%AF%E1%84%92%E1%85%A5%E1%86%B7#%EC%99%9C-tsv%EA%B0%80-%EC%B5%9C%EA%B3%A0-%EC%84%B1%EB%8A%A5%EC%9D%84-%EB%B3%B4%EC%9D%B4%EB%8A%94%EA%B0%80" class="hash-link" aria-label="왜 TSV가 최고 성능을 보이는가?에 대한 직접 링크" title="왜 TSV가 최고 성능을 보이는가?에 대한 직접 링크" translate="no">​</a></h3>
<p>동일한 재무제표 데이터를 세 가지 포맷으로 표현한 예시로 분석해보겠습니다.</p>
<p><strong>TSV (7,192 토큰)</strong></p>
<div class="language-tsv codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-tsv codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">재무제표	2019/12	2020/12	2021/12	2022/12	2022/09	2023/09</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">매출액	594,159	591,566	578,744	606,454	473,909	385,849</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">EBIT	5,148	52,063	13,045	3,755	10,252	-16,558</span><br></span></code></pre></div></div>
<p><strong>특징</strong></p>
<ul>
<li class="">구분자: 탭 문자 하나만 사용</li>
<li class="">메타 문자: 거의 없음 (줄바꿈뿐)</li>
<li class="">정보 밀도: 매우 높음 (실제 데이터에 집중)</li>
</ul>
<p><strong>Markdown Key-Value (11,075 토큰)</strong></p>
<div class="language-markdown codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-markdown codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token title important punctuation" style="color:#393A34">##</span><span class="token title important"> Record 1</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">재무제표: 매출액</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">2019/12: 594,159</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">2020/12: 591,566</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">2021/12: 578,744</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">...</span><br></span></code></pre></div></div>
<p><strong>특징</strong></p>
<ul>
<li class="">구조 요소: <code>## Record N</code>, 키-값 구분 <code>:</code></li>
<li class="">가독성: 레코드별 명확한 구분</li>
<li class="">토큰 증가 원인: 마크다운 헤더와 구분자</li>
</ul>
<p><strong>DICT (14,436 토큰 - TSV의 2배)</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">'재무제표'</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'매출액'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'2019/12'</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'594,159'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'2020/12'</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'591,566'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">'재무제표'</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'EBIT'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'2019/12'</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'5,148'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'2020/12'</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'52,063'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">]</span><br></span></code></pre></div></div>
<p><strong>특징</strong></p>
<ul>
<li class="">메타 문자 과다: <code>{</code>, <code>}</code>, <code>[</code>, <code>]</code>, <code>'</code>, <code>:</code>, <code>,</code> 반복</li>
<li class="">키 중복: 각 레코드마다 <code>'재무제표':</code> 반복</li>
<li class="">토큰 낭비: 프로그래밍 문법에 토큰 소모</li>
</ul>
<p><strong>토큰 차이 분석</strong></p>



































<table><thead><tr><th>요소</th><th>TSV</th><th>Markdown KV</th><th>DICT</th></tr></thead><tbody><tr><td>레코드 구분</td><td>줄바꿈</td><td><code>## Record N</code></td><td><code>}, {</code></td></tr><tr><td>키-값 구분</td><td>탭</td><td><code>:</code></td><td><code>': '</code></td></tr><tr><td>데이터 구분</td><td>탭</td><td>줄바꿈</td><td><code>', '</code></td></tr><tr><td>컨테이너</td><td>없음</td><td>코드블록</td><td><code>[</code>, <code>]</code></td></tr></tbody></table>
<p>TSV가 최소 토큰을 사용하는 이유:</p>
<ol>
<li class=""><strong>불필요한 메타 문자 제거</strong>: DICT의 <code>{</code>, <code>}</code>, <code>'</code> 같은 문법 요소 없음</li>
<li class=""><strong>키 중복 없음</strong>: 헤더에 한 번만 키 정의</li>
<li class=""><strong>구분자 최소화</strong>: 탭 하나로 열 구분 (CSV는 쉼표 + 따옴표 필요)</li>
</ol>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="왜-markdown-key-value가-dict보다-나은가">왜 Markdown Key-Value가 DICT보다 나은가?<a href="https://teddynote-lab.github.io/brain-cache/lab/llm%E1%84%8B%E1%85%B5-%E1%84%80%E1%85%A1%E1%84%8C%E1%85%A1%E1%86%BC-%E1%84%8C%E1%85%A1%E1%86%AF-%E1%84%8B%E1%85%B5%E1%84%92%E1%85%A2%E1%84%92%E1%85%A1%E1%84%82%E1%85%B3%E1%86%AB-table-format%E1%84%8B%E1%85%A6-%E1%84%83%E1%85%A2%E1%84%92%E1%85%A1%E1%86%AB-%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1%E1%84%89%E1%85%B5%E1%86%AF%E1%84%92%E1%85%A5%E1%86%B7#%EC%99%9C-markdown-key-value%EA%B0%80-dict%EB%B3%B4%EB%8B%A4-%EB%82%98%EC%9D%80%EA%B0%80" class="hash-link" aria-label="왜 Markdown Key-Value가 DICT보다 나은가?에 대한 직접 링크" title="왜 Markdown Key-Value가 DICT보다 나은가?에 대한 직접 링크" translate="no">​</a></h3>
<p>실험 결과 Markdown Key-Value(11,075 토큰)가 DICT(14,436 토큰)보다 23% 효율적입니다.</p>
<p><strong>DICT의 문제점</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># 각 레코드마다 반복되는 메타 문자</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">'재무제표'</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'매출액'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'2019/12'</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'594,159'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># { } ' ' : , 모두 토큰 소모</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">{</span><span class="token string" style="color:#e3116c">'재무제표'</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'EBIT'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'2019/12'</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'5,148'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain">      </span><span class="token comment" style="color:#999988;font-style:italic"># 키 이름 '재무제표' 반복</span><br></span></code></pre></div></div>
<p><strong>Markdown Key-Value의 장점</strong></p>
<div class="language-markdown codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-markdown codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token title important punctuation" style="color:#393A34">##</span><span class="token title important"> Record 1</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">재무제표: 매출액</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">2019/12: 594,159</span><br></span></code></pre></div></div>
<ol>
<li class=""><strong>간결한 구분자</strong>: <code>:</code> 하나로 키-값 구분</li>
<li class=""><strong>키 중복 최소화</strong>: 레코드 헤더로 한 번만 정의</li>
<li class=""><strong>LLM 친화적</strong>: 마크다운은 LLM 학습 데이터에 흔한 형식</li>
<li class=""><strong>가독성</strong>: 사람이 읽고 디버깅하기 쉬움</li>
</ol>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="실무-의사결정-tsv-vs-markdown-key-value">실무 의사결정: TSV vs Markdown Key-Value<a href="https://teddynote-lab.github.io/brain-cache/lab/llm%E1%84%8B%E1%85%B5-%E1%84%80%E1%85%A1%E1%84%8C%E1%85%A1%E1%86%BC-%E1%84%8C%E1%85%A1%E1%86%AF-%E1%84%8B%E1%85%B5%E1%84%92%E1%85%A2%E1%84%92%E1%85%A1%E1%84%82%E1%85%B3%E1%86%AB-table-format%E1%84%8B%E1%85%A6-%E1%84%83%E1%85%A2%E1%84%92%E1%85%A1%E1%86%AB-%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1%E1%84%89%E1%85%B5%E1%86%AF%E1%84%92%E1%85%A5%E1%86%B7#%EC%8B%A4%EB%AC%B4-%EC%9D%98%EC%82%AC%EA%B2%B0%EC%A0%95-tsv-vs-markdown-key-value" class="hash-link" aria-label="실무 의사결정: TSV vs Markdown Key-Value에 대한 직접 링크" title="실무 의사결정: TSV vs Markdown Key-Value에 대한 직접 링크" translate="no">​</a></h3>
<p><strong>이론적 최적: TSV</strong></p>
<ul>
<li class="">토큰 최소 (7,192)</li>
<li class="">속도 최고 (8.24초)</li>
<li class="">비용 최저</li>
</ul>
<p><strong>실무 선택: Markdown Key-Value</strong></p>
<p>프로덕션 환경에서 Markdown Key-Value를 선택한 이유:</p>
<ol>
<li class="">
<p><strong>가독성과 유지보수성</strong></p>
<div class="language-markdown codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-markdown codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token title important punctuation" style="color:#393A34">#</span><span class="token title important"> TSV - 기계 최적화</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">항목	2020	2021	2022</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">매출	100	110	120</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token title important punctuation" style="color:#393A34">#</span><span class="token title important"> Markdown KV - 사람과 기계 모두 고려</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token title important punctuation" style="color:#393A34">##</span><span class="token title important"> 2020년 실적</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">매출: 100</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">영업이익: 10</span><br></span></code></pre></div></div>
</li>
<li class="">
<p><strong>디버깅 용이성</strong></p>
<ul>
<li class="">로그 파일에서 데이터 확인 시 TSV는 읽기 어려움</li>
<li class="">Markdown은 구조가 명확해 문제 파악 빠름</li>
</ul>
</li>
<li class="">
<p><strong>확장성</strong></p>
<ul>
<li class="">추가 메타데이터 삽입 용이</li>
<li class="">중첩 구조 표현 가능</li>
<li class="">주석 추가 가능</li>
</ul>
</li>
<li class="">
<p><strong>성능 트레이드오프 합리성</strong></p>
<ul>
<li class="">TSV 대비 54% 토큰 증가 (7,192 → 11,075)</li>
<li class="">하지만 DICT 대비 23% 절감 (14,436 → 11,075)</li>
<li class="">실용성 고려 시 충분히 효율적</li>
</ul>
</li>
</ol>
<p><strong>코드 예시: 포맷 변환 함수</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">convert_to_markdown_kv</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">df</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> record_name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"Record"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token triple-quoted-string string" style="color:#e3116c">"""DataFrame을 Markdown Key-Value 형식으로 변환"""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    result </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> idx</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> row </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> df</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">iterrows</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        result</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">append</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"## </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">record_name</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c"> </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">idx </span><span class="token string-interpolation interpolation operator" style="color:#393A34">+</span><span class="token string-interpolation interpolation"> </span><span class="token string-interpolation interpolation number" style="color:#36acaa">1</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">\n"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        result</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">append</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"```"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> col</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> value </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> row</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">items</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            result</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">append</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">col</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">: </span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">value</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        result</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">append</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string" style="color:#e3116c">"```\n"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"\n"</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">join</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">result</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">convert_to_tsv</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">df</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token triple-quoted-string string" style="color:#e3116c">"""DataFrame을 TSV 형식으로 변환"""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> df</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">to_csv</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">sep</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">'\t'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> index</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">False</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 사용 예시</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> pandas </span><span class="token keyword" style="color:#00009f">as</span><span class="token plain"> pd</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">df </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> pd</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">DataFrame</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">'재무제표'</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">'매출액'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'EBIT'</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">'2020/12'</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">591566</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">52063</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token string" style="color:#e3116c">'2021/12'</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">578744</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">13045</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 비용 최적화가 중요한 경우</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">tsv_format </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> convert_to_tsv</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">df</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 가독성과 유지보수가 중요한 경우</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">markdown_format </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> convert_to_markdown_kv</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">df</span><span class="token punctuation" style="color:#393A34">)</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="실무-적용-가이드">실무 적용 가이드<a href="https://teddynote-lab.github.io/brain-cache/lab/llm%E1%84%8B%E1%85%B5-%E1%84%80%E1%85%A1%E1%84%8C%E1%85%A1%E1%86%BC-%E1%84%8C%E1%85%A1%E1%86%AF-%E1%84%8B%E1%85%B5%E1%84%92%E1%85%A2%E1%84%92%E1%85%A1%E1%84%82%E1%85%B3%E1%86%AB-table-format%E1%84%8B%E1%85%A6-%E1%84%83%E1%85%A2%E1%84%92%E1%85%A1%E1%86%AB-%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1%E1%84%89%E1%85%B5%E1%86%AF%E1%84%92%E1%85%A5%E1%86%B7#%EC%8B%A4%EB%AC%B4-%EC%A0%81%EC%9A%A9-%EA%B0%80%EC%9D%B4%EB%93%9C" class="hash-link" aria-label="실무 적용 가이드에 대한 직접 링크" title="실무 적용 가이드에 대한 직접 링크" translate="no">​</a></h3>
<p><strong>시나리오별 포맷 선택</strong></p>



































<table><thead><tr><th>시나리오</th><th>추천 포맷</th><th>이유</th></tr></thead><tbody><tr><td>대용량 배치 처리</td><td>TSV</td><td>비용과 속도 최우선</td></tr><tr><td>프로덕션 API</td><td>Markdown KV</td><td>가독성과 효율 균형</td></tr><tr><td>디버깅/개발</td><td>Markdown KV</td><td>사람이 읽기 쉬움</td></tr><tr><td>레거시 시스템 연동</td><td>JSON</td><td>표준 호환성</td></tr><tr><td>실시간 응답</td><td>TSV</td><td>최저 지연시간</td></tr></tbody></table>
<p><strong>비용 절감 계산 예시</strong></p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic"># GPT-4 기준 (input $2.50/1M tokens)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">COST_PER_1M_TOKENS </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">2.50</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 일일 100만 건 처리 시</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">DAILY_QUERIES </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1_000_000</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 포맷별 토큰 사용량</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">tokens_xml </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">16_852</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">tokens_tsv </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">7_192</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 비용 계산</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">cost_xml </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">tokens_xml </span><span class="token operator" style="color:#393A34">*</span><span class="token plain"> DAILY_QUERIES </span><span class="token operator" style="color:#393A34">/</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1_000_000</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">*</span><span class="token plain"> COST_PER_1M_TOKENS</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">cost_tsv </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">tokens_tsv </span><span class="token operator" style="color:#393A34">*</span><span class="token plain"> DAILY_QUERIES </span><span class="token operator" style="color:#393A34">/</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1_000_000</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">*</span><span class="token plain"> COST_PER_1M_TOKENS</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"XML 사용 시: $</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">cost_xml</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">:</span><span class="token string-interpolation interpolation format-spec">,.2f</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">/day"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># $42.13/day</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"TSV 사용 시: $</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">cost_tsv</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">:</span><span class="token string-interpolation interpolation format-spec">,.2f</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">/day"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># $17.98/day</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"절감액: $</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation">cost_xml </span><span class="token string-interpolation interpolation operator" style="color:#393A34">-</span><span class="token string-interpolation interpolation"> cost_tsv</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">:</span><span class="token string-interpolation interpolation format-spec">,.2f</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">/day"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># $24.15/day (57% 절감)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">print</span><span class="token punctuation" style="color:#393A34">(</span><span class="token string-interpolation string" style="color:#e3116c">f"연간 절감: $</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">{</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">(</span><span class="token string-interpolation interpolation">cost_xml </span><span class="token string-interpolation interpolation operator" style="color:#393A34">-</span><span class="token string-interpolation interpolation"> cost_tsv</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">)</span><span class="token string-interpolation interpolation"> </span><span class="token string-interpolation interpolation operator" style="color:#393A34">*</span><span class="token string-interpolation interpolation"> </span><span class="token string-interpolation interpolation number" style="color:#36acaa">365</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">:</span><span class="token string-interpolation interpolation format-spec">,.2f</span><span class="token string-interpolation interpolation punctuation" style="color:#393A34">}</span><span class="token string-interpolation string" style="color:#e3116c">"</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># $8,815/year</span><br></span></code></pre></div></div>
<p><strong>피해야 할 포맷과 이유</strong></p>
<ol>
<li class="">
<p><strong>STRING (자연어)</strong></p>
<ul>
<li class="">정확도 75% → 25% 오류율은 실무에서 치명적</li>
<li class="">구조 없어 파싱 불안정</li>
</ul>
</li>
<li class="">
<p><strong>XML</strong></p>
<ul>
<li class="">토큰 2.3배 낭비 (16,852 vs 7,192)</li>
<li class="">태그 중복으로 비효율적</li>
</ul>
</li>
<li class="">
<p><strong>DICT</strong></p>
<ul>
<li class="">프로그래밍 문법 메타 문자로 토큰 과다 소모</li>
<li class="">JSON이 더 표준적이고 효율적</li>
</ul>
</li>
</ol>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="추가-고려사항">추가 고려사항<a href="https://teddynote-lab.github.io/brain-cache/lab/llm%E1%84%8B%E1%85%B5-%E1%84%80%E1%85%A1%E1%84%8C%E1%85%A1%E1%86%BC-%E1%84%8C%E1%85%A1%E1%86%AF-%E1%84%8B%E1%85%B5%E1%84%92%E1%85%A2%E1%84%92%E1%85%A1%E1%84%82%E1%85%B3%E1%86%AB-table-format%E1%84%8B%E1%85%A6-%E1%84%83%E1%85%A2%E1%84%92%E1%85%A1%E1%86%AB-%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1%E1%84%89%E1%85%B5%E1%86%AF%E1%84%92%E1%85%A5%E1%86%B7#%EC%B6%94%EA%B0%80-%EA%B3%A0%EB%A0%A4%EC%82%AC%ED%95%AD" class="hash-link" aria-label="추가 고려사항에 대한 직접 링크" title="추가 고려사항에 대한 직접 링크" translate="no">​</a></h3>
<p><strong>대용량 데이터 처리</strong></p>
<p>1000개 이상 레코드 처리 시:</p>
<ul>
<li class="">TSV/CSV는 100줄마다 헤더 반복 권장</li>
<li class="">Markdown은 청크 단위로 분할</li>
<li class="">JSON은 스트리밍 파싱 고려</li>
</ul>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">chunk_tsv_with_headers</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">df</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> chunk_size</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">100</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token triple-quoted-string string" style="color:#e3116c">"""TSV를 헤더 반복하며 청크로 분할"""</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    chunks </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> i </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> </span><span class="token builtin">range</span><span class="token punctuation" style="color:#393A34">(</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token builtin">len</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">df</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> chunk_size</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        chunk </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> df</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">iloc</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">i</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">i</span><span class="token operator" style="color:#393A34">+</span><span class="token plain">chunk_size</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        chunks</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">append</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">chunk</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">to_csv</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">sep</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">'\t'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> index</span><span class="token operator" style="color:#393A34">=</span><span class="token boolean" style="color:#36acaa">False</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> chunks</span><br></span></code></pre></div></div>
<p><strong>다른 연구 결과와의 비교</strong></p>
<p>improvingagents.com의 연구(1000개 직원 레코드, 8개 속성)에서도 유사한 결과:</p>
<ul>
<li class="">Markdown Key-Value: 60.7% 정확도</li>
<li class="">INI: 55.7%</li>
<li class="">YAML: 54.5%</li>
<li class="">Markdown Table: 51.8%</li>
</ul>
<p>차이점:</p>
<ul>
<li class="">본 실험은 재무 데이터로 정확도가 더 높음 (대부분 100%)</li>
<li class="">도메인 특성상 테이블 구조가 더 명확해 LLM이 잘 이해</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="결론-및-제언">결론 및 제언<a href="https://teddynote-lab.github.io/brain-cache/lab/llm%E1%84%8B%E1%85%B5-%E1%84%80%E1%85%A1%E1%84%8C%E1%85%A1%E1%86%BC-%E1%84%8C%E1%85%A1%E1%86%AF-%E1%84%8B%E1%85%B5%E1%84%92%E1%85%A2%E1%84%92%E1%85%A1%E1%84%82%E1%85%B3%E1%86%AB-table-format%E1%84%8B%E1%85%A6-%E1%84%83%E1%85%A2%E1%84%92%E1%85%A1%E1%86%AB-%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1%E1%84%89%E1%85%B5%E1%86%AF%E1%84%92%E1%85%A5%E1%86%B7#%EA%B2%B0%EB%A1%A0-%EB%B0%8F-%EC%A0%9C%EC%96%B8" class="hash-link" aria-label="결론 및 제언에 대한 직접 링크" title="결론 및 제언에 대한 직접 링크" translate="no">​</a></h3>
<p><strong>핵심 요약</strong></p>
<ol>
<li class=""><strong>TSV가 이론적 최적</strong>: 토큰 57% 절감, 속도 3.8배 향상</li>
<li class=""><strong>Markdown Key-Value가 실무 최적</strong>: 효율성과 실용성의 균형</li>
<li class=""><strong>간결함이 핵심</strong>: 메타 문자 최소화가 성능과 비용에 직결</li>
<li class=""><strong>구조화는 필수</strong>: STRING 같은 비구조화 포맷은 정확도 25% 하락</li>
</ol>
<p><strong>실무 적용 체크리스트</strong></p>
<ul class="contains-task-list containsTaskList_mC6p">
<li class="task-list-item"><input type="checkbox" disabled=""> 비용이 최우선이면 TSV 사용</li>
<li class="task-list-item"><input type="checkbox" disabled=""> 팀 협업과 유지보수 고려 시 Markdown Key-Value</li>
<li class="task-list-item"><input type="checkbox" disabled=""> STRING, XML, DICT는 피하기</li>
<li class="task-list-item"><input type="checkbox" disabled=""> 대용량 데이터는 청크 분할 및 헤더 반복</li>
<li class="task-list-item"><input type="checkbox" disabled=""> 포맷 변경만으로 연간 수천~수만 달러 절감 가능</li>
</ul>
<p><strong>향후 연구 방향</strong></p>
<ul>
<li class="">다양한 LLM 모델(GPT-4, Claude, Llama 등)에서 재현성 검증</li>
<li class="">비정형 데이터(텍스트 포함 테이블)에서의 포맷 영향 분석</li>
<li class="">멀티모달 환경(이미지 + 테이블)에서의 최적 포맷 연구</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="references">References<a href="https://teddynote-lab.github.io/brain-cache/lab/llm%E1%84%8B%E1%85%B5-%E1%84%80%E1%85%A1%E1%84%8C%E1%85%A1%E1%86%BC-%E1%84%8C%E1%85%A1%E1%86%AF-%E1%84%8B%E1%85%B5%E1%84%92%E1%85%A2%E1%84%92%E1%85%A1%E1%84%82%E1%85%B3%E1%86%AB-table-format%E1%84%8B%E1%85%A6-%E1%84%83%E1%85%A2%E1%84%92%E1%85%A1%E1%86%AB-%E1%84%91%E1%85%A7%E1%86%BC%E1%84%80%E1%85%A1%E1%84%89%E1%85%B5%E1%86%AF%E1%84%92%E1%85%A5%E1%86%B7#references" class="hash-link" aria-label="References에 대한 직접 링크" title="References에 대한 직접 링크" translate="no">​</a></h2>
<ul>
<li class=""><a href="https://www.improvingagents.com/blog/best-input-data-format-for-llms?cmid=34b34591-5868-43e7-8016-bbb6fbea4bf1" target="_blank" rel="noopener noreferrer" class="">Which Table Format Do LLMs Understand Best?</a> - improvingagents.com의 11가지 포맷 비교 연구</li>
<li class=""><a href="https://aws.amazon.com/bedrock/claude/" target="_blank" rel="noopener noreferrer" class="">AWS Bedrock Claude Models</a> - 실험에 사용된 Claude Sonnet 4.5 모델 정보</li>
<li class=""><a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html" target="_blank" rel="noopener noreferrer" class="">Pandas DataFrame Formatting</a> - TSV/CSV 변환 공식 문서</li>
<li class=""><a href="https://www.markdownguide.org/extended-syntax/#tables" target="_blank" rel="noopener noreferrer" class="">Markdown Tables Specification</a> - Markdown 테이블 형식 가이드</li>
</ul>]]></content:encoded>
            <category>Evaluation</category>
        </item>
    </channel>
</rss>