Token Counting - flight505/ContextCraft GitHub Wiki

Token Counting

Token counting is a fundamental feature in ContextCraft that helps you optimize your AI interactions by monitoring and managing token usage. This page explains what tokens are, how they're counted, and how to use token metrics effectively.

What Are Tokens?

Tokens are the basic units that AI models like GPT-4 process. A token can be:

A word
Part of a word
A character
A punctuation mark
A whitespace

For English text, a token is roughly equivalent to 4 characters or 3/4 of a word on average. Code tends to tokenize differently than natural language, with symbols and operators often counting as individual tokens.

How ContextCraft Counts Tokens

ContextCraft provides real-time token counting for:

Individual Files: See token counts for each file in your project
Selected Files: Track token usage for your current selection
Processed Output: Monitor token counts after compression/comment removal
Total Context Usage: View the total tokens that will be sent to the AI

Token Counting Algorithm

ContextCraft uses advanced tokenization algorithms similar to those used by AI models to provide accurate estimates of token usage:

The tokenizer analyzes each character sequence in your code
It applies language-specific rules to identify token boundaries
It maintains a running count of identified tokens
It displays this information in real-time as you select files

Token Counter Interface

The token counter appears in multiple locations within ContextCraft:

Status Bar: Shows the total token count of your current selection
File Tree: Displays token counts next to each file (when enabled)
Control Container: Shows detailed token metrics for selected files
Output Preview: Provides token counts for processed output

Understanding Token Metrics

ContextCraft provides several token metrics to help you manage context usage:

Raw Token Count

This is the unprocessed token count of your selected files before any optimizations.

Processed Token Count

This is the token count after applying optimizations like:

Code compression
Comment removal
Whitespace reduction

Token Savings

This metric shows how many tokens you're saving through optimization:

Displayed as a number and percentage
Updates in real-time as you adjust settings
Helps quantify the effectiveness of your optimization strategies

Model Context Limits

ContextCraft displays the context limit for your selected AI model:

Shows how much of the available context you're using
Provides visual indicators when approaching limits
Helps you stay within the model's processing capabilities

Managing Token Usage

Token Budget Planning

Set a Target Budget:
- Determine how many tokens you want to allocate to code context
- Leave room for the AI's response and your prompts
- ContextCraft helps visualize your budget allocation
Prioritize Files:
- Select the most important files first
- Use token counts to guide your selection process
- Balance coverage with token efficiency
Apply Optimizations Strategically:
- Use compression for large files with repetitive patterns
- Remove comments from heavily-documented files
- Preserve critical files in their original form

Visual Token Indicators

ContextCraft provides visual cues to help manage tokens:

Color-Coded Status:
- Green: Well within token limits
- Yellow: Approaching token limits
- Red: Exceeding token limits
Progress Bars:
- Shows context usage relative to model limits
- Updates dynamically as you adjust your selection
Warning Notifications:
- Alerts when you exceed recommended token limits
- Provides suggestions for optimization

Token Efficiency Best Practices

Focus on Relevance:
- Include only files directly relevant to your query
- Exclude test files, generated code, and dependencies when possible
Leverage Optimizations:
- Use code compression for large files
- Remove comments when they're not essential
- Apply whitespace reduction for minor additional savings
Balance Context and Detail:
- For architectural questions: More files with compression
- For implementation details: Fewer files without compression
Monitor Token Usage Patterns:
- Track which files consistently use the most tokens
- Look for opportunities to refactor token-heavy files
- Consider breaking large files into smaller modules

Language-Specific Token Considerations

Different programming languages have different tokenization characteristics:

Language	Token Efficiency	Notes
JavaScript/TypeScript	Medium	Symbols and operators count as separate tokens
Python	High	Whitespace-significant, relatively token-efficient
Java	Low	Verbose syntax uses more tokens
HTML/CSS	Low	Tags and attributes consume many tokens
JSON	Medium	Structure overhead but predictable

Advanced Token Management

Custom Tokenization Rules

In some ContextCraft versions, you can define custom tokenization rules:

Token Weight Adjustments:
- Prioritize certain file types over others
- Weight tokens by file importance
Token Budgeting:
- Allocate specific token budgets to different parts of your codebase
- Get warnings when individual sections exceed their budget

Token Analytics

ContextCraft may provide token usage analytics:

Historical Usage:
- Track token usage over time
- Identify optimization opportunities
Project-Wide Analysis:
- Get insights into token distribution across your project
- Find token-heavy files and patterns

Related Topics: