Skip to content

Performance Benchmarks#

SousChef includes comprehensive performance benchmarks to ensure fast and efficient cookbook conversions. This guide explains the benchmarking framework, current performance characteristics, and optimisation strategies.

Benchmark Suite#

SousChef uses pytest-benchmark to measure performance across key operations. Benchmarks run automatically in CI/CD and can be executed locally for performance validation.

Running Benchmarks#

# Run all benchmarks with detailed output
poetry run pytest tests/ -v --benchmark-only

# Run benchmarks with comparison
poetry run pytest tests/ --benchmark-only --benchmark-compare

# Save benchmark results
poetry run pytest tests/ --benchmark-only --benchmark-save=baseline

# Compare against saved baseline
poetry run pytest tests/ --benchmark-only --benchmark-compare=baseline

# Generate histogram visualizations
poetry run pytest tests/ --benchmark-only --benchmark-histogram

Current Performance Metrics#

All benchmarks measured on: Linux, Python 3.13, single-threaded execution.

Parsing Operations#

Operation Mean Time Min Time Max Time Throughput
Recipe parsing 174.5 µs 126.9 µs 302.8 µs 5,732 ops/sec
Attribute parsing 150.8 µs 107.7 µs 384.2 µs 6,633 ops/sec
Template parsing 149.9 µs 107.2 µs 719.8 µs 6,672 ops/sec
Metadata parsing 182.3 µs 122.2 µs 2,025 ms 5,486 ops/sec
Custom resource 144.3 µs 96.0 µs 565.7 µs 7,442 ops/sec

Key Insights: - Consistent sub-200µs performance for standard Chef artifacts - Over 5,000 operations per second for all parsing tasks - Metadata parsing has occasional outliers (max 2ms) due to file I/O

Conversion Operations#

Operation Mean Time Min Time Max Time Throughput
Basic conversion 906 ns 791 ns 42.5 µs 1,103,644 ops/sec
Resource conversion 3.2 µs 1.9 µs 1,118 ms 312,166 ops/sec
InSpec conversion 170.5 µs 119.0 µs 4,663 ms 5,864 ops/sec
Playbook generation 1.4 ms 721.1 µs 15.8 ms 717 ops/sec

Key Insights: - Nanosecond-level performance for basic conversions - Resource conversion highly parallelizable - Playbook generation is the most expensive operation (milliseconds range)

Structure Analysis#

Operation Mean Time Min Time Max Time Throughput
Cookbook structure 1.5 ms 926.0 µs 20.7 ms 672 ops/sec
Large cookbook 1.4 ms 1.2 ms 2.9 ms 708 ops/sec
InSpec profiles 632.6 µs 443.0 µs 3,256 ms 1,581 ops/sec

Key Insights: - Consistent low-millisecond performance for structure analysis - Large cookbooks (100+ resources) maintain similar performance - File I/O dominates structure analysis time

Performance Characteristics#

Scalability#

SousChef scales linearly with cookbook size:

# Small cookbook (10 resources): ~1.4ms
# Medium cookbook (50 resources): ~1.5ms
# Large cookbook (100 resources): ~1.4ms
# Very large cookbook (500 resources): ~7-10ms (estimated)

Why it scales well: - Parsing is single-pass with minimal backtracking - No recursive tree transformations - Lazy evaluation of optional fields - Efficient path normalization caching

Memory Usage#

Typical memory footprint:

  • Small cookbook (10 recipes): ~5-10 MB
  • Medium cookbook (50 recipes): ~15-30 MB
  • Large cookbook (100+ recipes): ~50-100 MB

Memory is linear with: - Number of resources parsed - Template file sizes - Attribute complexity

Memory optimization tips: - Process cookbooks in batches for large migrations - Use streaming for template conversion - Clear cached results between cookbook conversions

CPU Utilization#

SousChef is primarily CPU-bound during: 1. Ruby parsing (30-40% of time) 2. YAML generation (20-30% of time) 3. Validation (15-25% of time) 4. AI API calls (varies, network-bound)

I/O-bound operations: - Reading Chef files from disk - Writing Ansible playbooks - Network requests to AI providers

Optimization Strategies#

For Large Cookbooks#

# Process cookbooks in parallel
from concurrent.futures import ProcessPoolExecutor
from souschef.assessment import assess_cookbook

cookbooks = ["cookbook1", "cookbook2", "cookbook3"]

with ProcessPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(assess_cookbook, cookbooks))

Benefits: - 3-4x speedup on multi-core systems - Reduced wall-clock time for batch conversions - Each cookbook isolated in separate process

For Incremental Conversions#

# Only convert changed recipes
from souschef.converters.playbook import convert_recipe_to_playbook

changed_recipes = get_changed_files()  # Your VCS diff logic
for recipe in changed_recipes:
    playbook = convert_recipe_to_playbook(recipe)
    save_playbook(playbook)

Benefits: - Only process what changed - Ideal for CI/CD pipelines - Maintains conversion consistency

For Template-Heavy Cookbooks#

# Pre-compile templates in batch
from souschef.parsers.template import parse_template

templates = find_all_templates(cookbook_path)
parsed_templates = {
    t: parse_template(t) for t in templates
}

# Reuse parsed templates during conversion
for recipe in recipes:
    playbook = convert_with_templates(recipe, parsed_templates)

Benefits: - Avoid re-parsing templates - Better cache locality - 30-50% faster for template-heavy cookbooks

For AI-Powered Conversions#

# Batch AI requests to reduce latency
from souschef.converters.playbook import generate_playbook_with_ai

# Collect all conversion requests
conversion_requests = prepare_batch(recipes)

# Send in batches of 10
for batch in chunks(conversion_requests, size=10):
    results = asyncio.run(batch_convert_with_ai(batch))

Benefits: - Amortize network overhead - Respect API rate limits - Better error handling

Performance Tuning#

Environment Variables#

# Disable AI-assisted conversion for speed
export SOUSCHEF_DISABLE_AI=1

# Increase parser timeout for complex files
export SOUSCHEF_PARSER_TIMEOUT=60

# Enable performance profiling
export SOUSCHEF_PROFILE=1

Configuration Options#

# In your conversion script
config = {
    "max_workers": 4,           # Parallel workers
    "cache_templates": True,    # Cache parsed templates
    "validate_output": False,   # Skip validation for speed
    "ai_batch_size": 10,       # AI request batch size
}

convert_cookbook(path, **config)

Profiling and Diagnostics#

Built-in Profiler#

# Profile a conversion
poetry run souschef convert --profile cookbook/ output/

# View profiling results
# Results saved to souschef_profile.prof
poetry run python -m pstats souschef_profile.prof

Custom Profiling#

import cProfile
import pstats
from souschef.assessment import assess_cookbook

# Profile cookbook assessment
profiler = cProfile.Profile()
profiler.enable()

result = assess_cookbook("path/to/cookbook")

profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(20)  # Top 20 functions

Memory Profiling#

# Install memory profiler
poetry add --dev memory-profiler

# Profile memory usage
poetry run python -m memory_profiler souschef/cli.py convert cookbook/

Performance Regression Testing#

Benchmarks run automatically in CI/CD to detect performance regressions:

# .github/workflows/ci.yml
- name: Run benchmarks
  run: |
    poetry run pytest --benchmark-only \
      --benchmark-compare=main \
      --benchmark-compare-fail=mean:10%

Regression thresholds: - Fail if mean time increases by >10% - Warn if max time increases by >25% - Alert if throughput drops by >15%

Troubleshooting Slow Conversions#

Symptom: Parsing taking >1 second per file#

Possible causes: - Very complex Chef Ruby code - Large template files (>100KB) - Deep attribute nesting (>10 levels)

Solutions: - Simplify Chef code before conversion - Split large templates - Flatten attribute structures

Symptom: High memory usage (>500MB)#

Possible causes: - Processing too many cookbooks simultaneously - Not clearing caches between conversions - Very large attribute files

Solutions:

import gc

for cookbook in large_cookbook_list:
    result = convert_cookbook(cookbook)
    process_result(result)

    # Manual garbage collection
    gc.collect()

Symptom: Slow AI-powered conversions#

Possible causes: - Network latency to AI provider - API rate limits - Large prompts (>4000 tokens)

Solutions: - Use regional AI endpoints - Implement request caching - Batch API calls - Reduce prompt size

Best Practices#

  1. Measure first: Profile before optimizing
  2. Batch operations: Process cookbooks in logical groups
  3. Cache aggressively: Reuse parsed templates and attributes
  4. Parallel processing: Use multiple workers for independent cookbooks
  5. Monitor memory: Clear caches for long-running processes
  6. Set timeouts: Prevent hanging on complex files
  7. Use benchmarks: Compare before/after optimization

Future Performance Improvements#

Planned optimizations:

  • Incremental parsing: Only re-parse changed sections
  • Native parsing: Cython extensions for hot paths
  • Streaming output: Write playbooks as they're generated
  • Distributed processing: Celery task queue for large migrations
  • Smart caching: Persistent cache across runs
  • GPU acceleration: For AI-powered conversions

Contributing Benchmarks#

When adding new features, include benchmarks:

def test_benchmark_my_new_feature(benchmark):
    """Benchmark the new feature."""
    result = benchmark(my_new_feature, input_data)

    # Assert performance requirements
    assert benchmark.stats['mean'] < 0.001  # <1ms mean
    assert benchmark.stats['max'] < 0.010   # <10ms max

See the integration test examples in the repository: https://github.com/kpeacocke/souschef/blob/main/tests/integration/test_integration.py

Resources#

  • pytest-benchmark docs: https://pytest-benchmark.readthedocs.io/
  • Python profiling: https://docs.python.org/3/library/profile.html
  • Performance best practices: https://wiki.python.org/moin/PythonSpeed

For questions about performance, open an issue: https://github.com/kpeacocke/souschef/issues