Performance Benchmarks#
SousChef includes comprehensive performance benchmarks to ensure fast and efficient cookbook conversions. This guide explains the benchmarking framework, current performance characteristics, and optimisation strategies.
Benchmark Suite#
SousChef uses pytest-benchmark to measure performance across key operations. Benchmarks run automatically in CI/CD and can be executed locally for performance validation.
Running Benchmarks#
# Run all benchmarks with detailed output
poetry run pytest tests/ -v --benchmark-only
# Run benchmarks with comparison
poetry run pytest tests/ --benchmark-only --benchmark-compare
# Save benchmark results
poetry run pytest tests/ --benchmark-only --benchmark-save=baseline
# Compare against saved baseline
poetry run pytest tests/ --benchmark-only --benchmark-compare=baseline
# Generate histogram visualizations
poetry run pytest tests/ --benchmark-only --benchmark-histogram
Current Performance Metrics#
All benchmarks measured on: Linux, Python 3.13, single-threaded execution.
Parsing Operations#
| Operation | Mean Time | Min Time | Max Time | Throughput |
|---|---|---|---|---|
| Recipe parsing | 174.5 µs | 126.9 µs | 302.8 µs | 5,732 ops/sec |
| Attribute parsing | 150.8 µs | 107.7 µs | 384.2 µs | 6,633 ops/sec |
| Template parsing | 149.9 µs | 107.2 µs | 719.8 µs | 6,672 ops/sec |
| Metadata parsing | 182.3 µs | 122.2 µs | 2,025 ms | 5,486 ops/sec |
| Custom resource | 144.3 µs | 96.0 µs | 565.7 µs | 7,442 ops/sec |
Key Insights: - Consistent sub-200µs performance for standard Chef artifacts - Over 5,000 operations per second for all parsing tasks - Metadata parsing has occasional outliers (max 2ms) due to file I/O
Conversion Operations#
| Operation | Mean Time | Min Time | Max Time | Throughput |
|---|---|---|---|---|
| Basic conversion | 906 ns | 791 ns | 42.5 µs | 1,103,644 ops/sec |
| Resource conversion | 3.2 µs | 1.9 µs | 1,118 ms | 312,166 ops/sec |
| InSpec conversion | 170.5 µs | 119.0 µs | 4,663 ms | 5,864 ops/sec |
| Playbook generation | 1.4 ms | 721.1 µs | 15.8 ms | 717 ops/sec |
Key Insights: - Nanosecond-level performance for basic conversions - Resource conversion highly parallelizable - Playbook generation is the most expensive operation (milliseconds range)
Structure Analysis#
| Operation | Mean Time | Min Time | Max Time | Throughput |
|---|---|---|---|---|
| Cookbook structure | 1.5 ms | 926.0 µs | 20.7 ms | 672 ops/sec |
| Large cookbook | 1.4 ms | 1.2 ms | 2.9 ms | 708 ops/sec |
| InSpec profiles | 632.6 µs | 443.0 µs | 3,256 ms | 1,581 ops/sec |
Key Insights: - Consistent low-millisecond performance for structure analysis - Large cookbooks (100+ resources) maintain similar performance - File I/O dominates structure analysis time
Performance Characteristics#
Scalability#
SousChef scales linearly with cookbook size:
# Small cookbook (10 resources): ~1.4ms
# Medium cookbook (50 resources): ~1.5ms
# Large cookbook (100 resources): ~1.4ms
# Very large cookbook (500 resources): ~7-10ms (estimated)
Why it scales well: - Parsing is single-pass with minimal backtracking - No recursive tree transformations - Lazy evaluation of optional fields - Efficient path normalization caching
Memory Usage#
Typical memory footprint:
- Small cookbook (10 recipes): ~5-10 MB
- Medium cookbook (50 recipes): ~15-30 MB
- Large cookbook (100+ recipes): ~50-100 MB
Memory is linear with: - Number of resources parsed - Template file sizes - Attribute complexity
Memory optimization tips: - Process cookbooks in batches for large migrations - Use streaming for template conversion - Clear cached results between cookbook conversions
CPU Utilization#
SousChef is primarily CPU-bound during: 1. Ruby parsing (30-40% of time) 2. YAML generation (20-30% of time) 3. Validation (15-25% of time) 4. AI API calls (varies, network-bound)
I/O-bound operations: - Reading Chef files from disk - Writing Ansible playbooks - Network requests to AI providers
Optimization Strategies#
For Large Cookbooks#
# Process cookbooks in parallel
from concurrent.futures import ProcessPoolExecutor
from souschef.assessment import assess_cookbook
cookbooks = ["cookbook1", "cookbook2", "cookbook3"]
with ProcessPoolExecutor(max_workers=4) as executor:
results = list(executor.map(assess_cookbook, cookbooks))
Benefits: - 3-4x speedup on multi-core systems - Reduced wall-clock time for batch conversions - Each cookbook isolated in separate process
For Incremental Conversions#
# Only convert changed recipes
from souschef.converters.playbook import convert_recipe_to_playbook
changed_recipes = get_changed_files() # Your VCS diff logic
for recipe in changed_recipes:
playbook = convert_recipe_to_playbook(recipe)
save_playbook(playbook)
Benefits: - Only process what changed - Ideal for CI/CD pipelines - Maintains conversion consistency
For Template-Heavy Cookbooks#
# Pre-compile templates in batch
from souschef.parsers.template import parse_template
templates = find_all_templates(cookbook_path)
parsed_templates = {
t: parse_template(t) for t in templates
}
# Reuse parsed templates during conversion
for recipe in recipes:
playbook = convert_with_templates(recipe, parsed_templates)
Benefits: - Avoid re-parsing templates - Better cache locality - 30-50% faster for template-heavy cookbooks
For AI-Powered Conversions#
# Batch AI requests to reduce latency
from souschef.converters.playbook import generate_playbook_with_ai
# Collect all conversion requests
conversion_requests = prepare_batch(recipes)
# Send in batches of 10
for batch in chunks(conversion_requests, size=10):
results = asyncio.run(batch_convert_with_ai(batch))
Benefits: - Amortize network overhead - Respect API rate limits - Better error handling
Performance Tuning#
Environment Variables#
# Disable AI-assisted conversion for speed
export SOUSCHEF_DISABLE_AI=1
# Increase parser timeout for complex files
export SOUSCHEF_PARSER_TIMEOUT=60
# Enable performance profiling
export SOUSCHEF_PROFILE=1
Configuration Options#
# In your conversion script
config = {
"max_workers": 4, # Parallel workers
"cache_templates": True, # Cache parsed templates
"validate_output": False, # Skip validation for speed
"ai_batch_size": 10, # AI request batch size
}
convert_cookbook(path, **config)
Profiling and Diagnostics#
Built-in Profiler#
# Profile a conversion
poetry run souschef convert --profile cookbook/ output/
# View profiling results
# Results saved to souschef_profile.prof
poetry run python -m pstats souschef_profile.prof
Custom Profiling#
import cProfile
import pstats
from souschef.assessment import assess_cookbook
# Profile cookbook assessment
profiler = cProfile.Profile()
profiler.enable()
result = assess_cookbook("path/to/cookbook")
profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(20) # Top 20 functions
Memory Profiling#
# Install memory profiler
poetry add --dev memory-profiler
# Profile memory usage
poetry run python -m memory_profiler souschef/cli.py convert cookbook/
Performance Regression Testing#
Benchmarks run automatically in CI/CD to detect performance regressions:
# .github/workflows/ci.yml
- name: Run benchmarks
run: |
poetry run pytest --benchmark-only \
--benchmark-compare=main \
--benchmark-compare-fail=mean:10%
Regression thresholds: - Fail if mean time increases by >10% - Warn if max time increases by >25% - Alert if throughput drops by >15%
Troubleshooting Slow Conversions#
Symptom: Parsing taking >1 second per file#
Possible causes: - Very complex Chef Ruby code - Large template files (>100KB) - Deep attribute nesting (>10 levels)
Solutions: - Simplify Chef code before conversion - Split large templates - Flatten attribute structures
Symptom: High memory usage (>500MB)#
Possible causes: - Processing too many cookbooks simultaneously - Not clearing caches between conversions - Very large attribute files
Solutions:
import gc
for cookbook in large_cookbook_list:
result = convert_cookbook(cookbook)
process_result(result)
# Manual garbage collection
gc.collect()
Symptom: Slow AI-powered conversions#
Possible causes: - Network latency to AI provider - API rate limits - Large prompts (>4000 tokens)
Solutions: - Use regional AI endpoints - Implement request caching - Batch API calls - Reduce prompt size
Best Practices#
- Measure first: Profile before optimizing
- Batch operations: Process cookbooks in logical groups
- Cache aggressively: Reuse parsed templates and attributes
- Parallel processing: Use multiple workers for independent cookbooks
- Monitor memory: Clear caches for long-running processes
- Set timeouts: Prevent hanging on complex files
- Use benchmarks: Compare before/after optimization
Future Performance Improvements#
Planned optimizations:
- Incremental parsing: Only re-parse changed sections
- Native parsing: Cython extensions for hot paths
- Streaming output: Write playbooks as they're generated
- Distributed processing: Celery task queue for large migrations
- Smart caching: Persistent cache across runs
- GPU acceleration: For AI-powered conversions
Contributing Benchmarks#
When adding new features, include benchmarks:
def test_benchmark_my_new_feature(benchmark):
"""Benchmark the new feature."""
result = benchmark(my_new_feature, input_data)
# Assert performance requirements
assert benchmark.stats['mean'] < 0.001 # <1ms mean
assert benchmark.stats['max'] < 0.010 # <10ms max
See the integration test examples in the repository: https://github.com/kpeacocke/souschef/blob/main/tests/integration/test_integration.py
Resources#
- pytest-benchmark docs: https://pytest-benchmark.readthedocs.io/
- Python profiling: https://docs.python.org/3/library/profile.html
- Performance best practices: https://wiki.python.org/moin/PythonSpeed
For questions about performance, open an issue: https://github.com/kpeacocke/souschef/issues