Performance Benchmarks#

SousChef includes comprehensive performance benchmarks to ensure fast and efficient cookbook conversions. This guide explains the benchmarking framework, current performance characteristics, and optimisation strategies.

Benchmark Suite#

SousChef uses pytest-benchmark to measure performance across key operations. Benchmarks run automatically in CI/CD and can be executed locally for performance validation.

Running Benchmarks#

# Run all benchmarks with detailed output
poetry run pytest tests/ -v --benchmark-only

# Run benchmarks with comparison
poetry run pytest tests/ --benchmark-only --benchmark-compare

# Save benchmark results
poetry run pytest tests/ --benchmark-only --benchmark-save=baseline

# Compare against saved baseline
poetry run pytest tests/ --benchmark-only --benchmark-compare=baseline

# Generate histogram visualizations
poetry run pytest tests/ --benchmark-only --benchmark-histogram

Current Performance Metrics#

All benchmarks measured on: Linux, Python 3.13, single-threaded execution.

Parsing Operations#

Operation	Mean Time	Min Time	Max Time	Throughput
Recipe parsing	174.5 µs	126.9 µs	302.8 µs	5,732 ops/sec
Attribute parsing	150.8 µs	107.7 µs	384.2 µs	6,633 ops/sec
Template parsing	149.9 µs	107.2 µs	719.8 µs	6,672 ops/sec
Metadata parsing	182.3 µs	122.2 µs	2,025 ms	5,486 ops/sec
Custom resource	144.3 µs	96.0 µs	565.7 µs	7,442 ops/sec

Key Insights: - Consistent sub-200µs performance for standard Chef artifacts - Over 5,000 operations per second for all parsing tasks - Metadata parsing has occasional outliers (max 2ms) due to file I/O

Conversion Operations#

Operation	Mean Time	Min Time	Max Time	Throughput
Basic conversion	906 ns	791 ns	42.5 µs	1,103,644 ops/sec
Resource conversion	3.2 µs	1.9 µs	1,118 ms	312,166 ops/sec
InSpec conversion	170.5 µs	119.0 µs	4,663 ms	5,864 ops/sec
Playbook generation	1.4 ms	721.1 µs	15.8 ms	717 ops/sec

Key Insights: - Nanosecond-level performance for basic conversions - Resource conversion highly parallelizable - Playbook generation is the most expensive operation (milliseconds range)

Structure Analysis#

Operation	Mean Time	Min Time	Max Time	Throughput
Cookbook structure	1.5 ms	926.0 µs	20.7 ms	672 ops/sec
Large cookbook	1.4 ms	1.2 ms	2.9 ms	708 ops/sec
InSpec profiles	632.6 µs	443.0 µs	3,256 ms	1,581 ops/sec

Key Insights: - Consistent low-millisecond performance for structure analysis - Large cookbooks (100+ resources) maintain similar performance - File I/O dominates structure analysis time

Performance Characteristics#

Scalability#

SousChef scales linearly with cookbook size:

# Small cookbook (10 resources): ~1.4ms
# Medium cookbook (50 resources): ~1.5ms
# Large cookbook (100 resources): ~1.4ms
# Very large cookbook (500 resources): ~7-10ms (estimated)

Why it scales well: - Parsing is single-pass with minimal backtracking - No recursive tree transformations - Lazy evaluation of optional fields - Efficient path normalization caching

Memory Usage#

Typical memory footprint:

Small cookbook (10 recipes): ~5-10 MB
Medium cookbook (50 recipes): ~15-30 MB
Large cookbook (100+ recipes): ~50-100 MB

Memory is linear with: - Number of resources parsed - Template file sizes - Attribute complexity

Memory optimization tips: - Process cookbooks in batches for large migrations - Use streaming for template conversion - Clear cached results between cookbook conversions

CPU Utilization#

SousChef is primarily CPU-bound during: 1. Ruby parsing (30-40% of time) 2. YAML generation (20-30% of time) 3. Validation (15-25% of time) 4. AI API calls (varies, network-bound)

I/O-bound operations: - Reading Chef files from disk - Writing Ansible playbooks - Network requests to AI providers

Optimization Strategies#

For Large Cookbooks#

# Process cookbooks in parallel
from concurrent.futures import ProcessPoolExecutor
from souschef.assessment import assess_cookbook

cookbooks = ["cookbook1", "cookbook2", "cookbook3"]

with ProcessPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(assess_cookbook, cookbooks))

Benefits: - 3-4x speedup on multi-core systems - Reduced wall-clock time for batch conversions - Each cookbook isolated in separate process

For Incremental Conversions#

# Only convert changed recipes
from souschef.converters.playbook import convert_recipe_to_playbook

changed_recipes = get_changed_files()  # Your VCS diff logic
for recipe in changed_recipes:
    playbook = convert_recipe_to_playbook(recipe)
    save_playbook(playbook)

Benefits: - Only process what changed - Ideal for CI/CD pipelines - Maintains conversion consistency

For Template-Heavy Cookbooks#

# Pre-compile templates in batch
from souschef.parsers.template import parse_template

templates = find_all_templates(cookbook_path)
parsed_templates = {
    t: parse_template(t) for t in templates
}

# Reuse parsed templates during conversion
for recipe in recipes:
    playbook = convert_with_templates(recipe, parsed_templates)

Benefits: - Avoid re-parsing templates - Better cache locality - 30-50% faster for template-heavy cookbooks

For AI-Powered Conversions#

# Batch AI requests to reduce latency
from souschef.converters.playbook import generate_playbook_with_ai

# Collect all conversion requests
conversion_requests = prepare_batch(recipes)

# Send in batches of 10
for batch in chunks(conversion_requests, size=10):
    results = asyncio.run(batch_convert_with_ai(batch))

Benefits: - Amortize network overhead - Respect API rate limits - Better error handling

Performance Tuning#

Environment Variables#

# Disable AI-assisted conversion for speed
export SOUSCHEF_DISABLE_AI=1

# Increase parser timeout for complex files
export SOUSCHEF_PARSER_TIMEOUT=60

# Enable performance profiling
export SOUSCHEF_PROFILE=1

Configuration Options#

# In your conversion script
config = {
    "max_workers": 4,           # Parallel workers
    "cache_templates": True,    # Cache parsed templates
    "validate_output": False,   # Skip validation for speed
    "ai_batch_size": 10,       # AI request batch size
}

convert_cookbook(path, **config)

Profiling and Diagnostics#

Built-in Profiler#

# Profile a conversion
poetry run souschef convert --profile cookbook/ output/

# View profiling results
# Results saved to souschef_profile.prof
poetry run python -m pstats souschef_profile.prof

Custom Profiling#

import cProfile
import pstats
from souschef.assessment import assess_cookbook

# Profile cookbook assessment
profiler = cProfile.Profile()
profiler.enable()

result = assess_cookbook("path/to/cookbook")

profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(20)  # Top 20 functions

Memory Profiling#

# Install memory profiler
poetry add --dev memory-profiler

# Profile memory usage
poetry run python -m memory_profiler souschef/cli.py convert cookbook/

Performance Regression Testing#

Benchmarks run automatically in CI/CD to detect performance regressions:

# .github/workflows/ci.yml
- name: Run benchmarks
  run: |
    poetry run pytest --benchmark-only \
      --benchmark-compare=main \
      --benchmark-compare-fail=mean:10%

Regression thresholds: - Fail if mean time increases by >10% - Warn if max time increases by >25% - Alert if throughput drops by >15%

Troubleshooting Slow Conversions#

Symptom: Parsing taking >1 second per file#

Possible causes: - Very complex Chef Ruby code - Large template files (>100KB) - Deep attribute nesting (>10 levels)

Solutions: - Simplify Chef code before conversion - Split large templates - Flatten attribute structures

Symptom: High memory usage (>500MB)#

Possible causes: - Processing too many cookbooks simultaneously - Not clearing caches between conversions - Very large attribute files

Solutions:

import gc

for cookbook in large_cookbook_list:
    result = convert_cookbook(cookbook)
    process_result(result)

    # Manual garbage collection
    gc.collect()

Symptom: Slow AI-powered conversions#

Possible causes: - Network latency to AI provider - API rate limits - Large prompts (>4000 tokens)

Solutions: - Use regional AI endpoints - Implement request caching - Batch API calls - Reduce prompt size

Best Practices#

Measure first: Profile before optimizing
Batch operations: Process cookbooks in logical groups
Cache aggressively: Reuse parsed templates and attributes
Parallel processing: Use multiple workers for independent cookbooks
Monitor memory: Clear caches for long-running processes
Set timeouts: Prevent hanging on complex files
Use benchmarks: Compare before/after optimization

Future Performance Improvements#

Planned optimizations:

Incremental parsing: Only re-parse changed sections
Native parsing: Cython extensions for hot paths
Streaming output: Write playbooks as they're generated
Distributed processing: Celery task queue for large migrations
Smart caching: Persistent cache across runs
GPU acceleration: For AI-powered conversions

Contributing Benchmarks#

When adding new features, include benchmarks:

def test_benchmark_my_new_feature(benchmark):
    """Benchmark the new feature."""
    result = benchmark(my_new_feature, input_data)

    # Assert performance requirements
    assert benchmark.stats['mean'] < 0.001  # <1ms mean
    assert benchmark.stats['max'] < 0.010   # <10ms max

See the integration test examples in the repository: https://github.com/kpeacocke/souschef/blob/main/tests/integration/test_integration.py

Resources#

pytest-benchmark docs: https://pytest-benchmark.readthedocs.io/
Python profiling: https://docs.python.org/3/library/profile.html
Performance best practices: https://wiki.python.org/moin/PythonSpeed

For questions about performance, open an issue: https://github.com/kpeacocke/souschef/issues