Token Counting - flight505/ContextCraft GitHub Wiki
Token Counting
Token counting is a fundamental feature in ContextCraft that helps you optimize your AI interactions by monitoring and managing token usage. This page explains what tokens are, how they're counted, and how to use token metrics effectively.
What Are Tokens?
Tokens are the basic units that AI models like GPT-4 process. A token can be:
- A word
- Part of a word
- A character
- A punctuation mark
- A whitespace
For English text, a token is roughly equivalent to 4 characters or 3/4 of a word on average. Code tends to tokenize differently than natural language, with symbols and operators often counting as individual tokens.
How ContextCraft Counts Tokens
ContextCraft provides real-time token counting for:
- Individual Files: See token counts for each file in your project
- Selected Files: Track token usage for your current selection
- Processed Output: Monitor token counts after compression/comment removal
- Total Context Usage: View the total tokens that will be sent to the AI
Token Counting Algorithm
ContextCraft uses advanced tokenization algorithms similar to those used by AI models to provide accurate estimates of token usage:
- The tokenizer analyzes each character sequence in your code
- It applies language-specific rules to identify token boundaries
- It maintains a running count of identified tokens
- It displays this information in real-time as you select files
Token Counter Interface
The token counter appears in multiple locations within ContextCraft:
- Status Bar: Shows the total token count of your current selection
- File Tree: Displays token counts next to each file (when enabled)
- Control Container: Shows detailed token metrics for selected files
- Output Preview: Provides token counts for processed output
Understanding Token Metrics
ContextCraft provides several token metrics to help you manage context usage:
Raw Token Count
This is the unprocessed token count of your selected files before any optimizations.
Processed Token Count
This is the token count after applying optimizations like:
- Code compression
- Comment removal
- Whitespace reduction
Token Savings
This metric shows how many tokens you're saving through optimization:
- Displayed as a number and percentage
- Updates in real-time as you adjust settings
- Helps quantify the effectiveness of your optimization strategies
Model Context Limits
ContextCraft displays the context limit for your selected AI model:
- Shows how much of the available context you're using
- Provides visual indicators when approaching limits
- Helps you stay within the model's processing capabilities
Managing Token Usage
Token Budget Planning
-
Set a Target Budget:
- Determine how many tokens you want to allocate to code context
- Leave room for the AI's response and your prompts
- ContextCraft helps visualize your budget allocation
-
Prioritize Files:
- Select the most important files first
- Use token counts to guide your selection process
- Balance coverage with token efficiency
-
Apply Optimizations Strategically:
- Use compression for large files with repetitive patterns
- Remove comments from heavily-documented files
- Preserve critical files in their original form
Visual Token Indicators
ContextCraft provides visual cues to help manage tokens:
-
Color-Coded Status:
- Green: Well within token limits
- Yellow: Approaching token limits
- Red: Exceeding token limits
-
Progress Bars:
- Shows context usage relative to model limits
- Updates dynamically as you adjust your selection
-
Warning Notifications:
- Alerts when you exceed recommended token limits
- Provides suggestions for optimization
Token Efficiency Best Practices
-
Focus on Relevance:
- Include only files directly relevant to your query
- Exclude test files, generated code, and dependencies when possible
-
Leverage Optimizations:
- Use code compression for large files
- Remove comments when they're not essential
- Apply whitespace reduction for minor additional savings
-
Balance Context and Detail:
- For architectural questions: More files with compression
- For implementation details: Fewer files without compression
-
Monitor Token Usage Patterns:
- Track which files consistently use the most tokens
- Look for opportunities to refactor token-heavy files
- Consider breaking large files into smaller modules
Language-Specific Token Considerations
Different programming languages have different tokenization characteristics:
Language | Token Efficiency | Notes |
---|---|---|
JavaScript/TypeScript | Medium | Symbols and operators count as separate tokens |
Python | High | Whitespace-significant, relatively token-efficient |
Java | Low | Verbose syntax uses more tokens |
HTML/CSS | Low | Tags and attributes consume many tokens |
JSON | Medium | Structure overhead but predictable |
Advanced Token Management
Custom Tokenization Rules
In some ContextCraft versions, you can define custom tokenization rules:
-
Token Weight Adjustments:
- Prioritize certain file types over others
- Weight tokens by file importance
-
Token Budgeting:
- Allocate specific token budgets to different parts of your codebase
- Get warnings when individual sections exceed their budget
Token Analytics
ContextCraft may provide token usage analytics:
-
Historical Usage:
- Track token usage over time
- Identify optimization opportunities
-
Project-Wide Analysis:
- Get insights into token distribution across your project
- Find token-heavy files and patterns
Related Topics: