Skip to content

Data Persistence and Storage Guide#

SousChef v3.6+ includes comprehensive data persistence capabilities to store analysis history, conversion results, and generated assets across container restarts.

Features#

Persistent Storage#

  • SQLite Database: Stores analysis results, conversion history, and metadata
  • PostgreSQL Database: Production-grade relational database for shared deployments
  • Blob Storage: Stores generated Ansible playbooks, roles, and supporting files
  • Cache Layer: Automatically caches AI analysis results to save costs and time
  • Historical Tracking: View and download past analyses and conversions

Database Schema#

The persistence layer uses SQLite or PostgreSQL with the following tables:

Analysis Results#

  • Cookbook name, version, and metadata
  • Complexity ratings and effort estimates
  • AI provider and model used
  • Full analysis data (JSON)
  • Cache keys for efficient lookups
  • Source archive reference (v3.7+): Blob storage key for uploaded cookbook
  • Timestamps

Conversion Results#

  • Output type (playbook/role/collection)
  • Status (success/partial/failed)
  • Files generated count
  • Blob storage reference
  • Full conversion data (JSON)
  • Links to original analysis

Storage Backends#

Local Storage (Default)#

By default, SousChef stores data under a .souschef directory inside the system temporary directory (for example /tmp/.souschef on Linux/macOS): - <system-temp>/.souschef/data/souschef.db - SQLite database - <system-temp>/.souschef/storage/ - Generated artefacts

from souschef.storage import get_storage_manager, get_blob_storage

# Uses default local storage (system temporary directory, e.g. /tmp/.souschef)
storage = get_storage_manager()
blob = get_blob_storage("local")

S3-Compatible Storage#

Use AWS S3, MinIO, or any S3-compatible service:

from souschef.storage import get_blob_storage

# AWS S3
blob = get_blob_storage(
    backend="s3",
    bucket_name="souschef-artefacts",
    region="us-east-1"
    # access_key and secret_key from environment or explicit
)

# MinIO
blob = get_blob_storage(
    backend="s3",
    bucket_name="souschef",
    endpoint_url="http://minio:9000",
    access_key="admin",
    secret_key="password"
)

Using the History Page#

The History page in the UI provides comprehensive access to all past analyses and conversions.

Analysis History#

  1. Navigate to History in the main navigation
  2. Click the Analysis History tab
  3. Filter by cookbook name (optional)
  4. View detailed metrics:
  5. Complexity ratings
  6. Manual vs AI-assisted effort estimates
  7. Time saved
  8. AI provider/model used
  9. Click on individual analyses to see full details and recommendations

Source Archive Storage (v3.7+): - When you upload a cookbook archive (tar/zip), it's automatically stored alongside the analysis - Stored archives enable re-conversion without re-uploading - Archives are referenced by cookbook_blob_key in the database - Future versions will enable one-click re-conversion from stored archives

Conversion History#

  1. Navigate to History in the main navigation
  2. Click the Conversion History tab
  3. Filter by cookbook name or status
  4. View conversion details:
  5. Success/failure status
  6. Files generated
  7. Output types (playbook/role/collection)
  8. Download previously generated artefacts with one click

Automatic Archive Creation (v3.6+): - Conversion artefacts are automatically stored as tar/zip archives during conversion - Both roles and repository archives are saved to blob storage - Archives are immediately available for download from the History page - No need to regenerate - just click to download pre-created archives

Statistics Dashboard#

View overall metrics: - Total analyses performed - Unique cookbooks analysed - Total conversions and success rate - Average time saved with AI assistance - Total files generated

Automated Caching#

SousChef automatically caches AI analysis results to: - Reduce costs: Avoid re-running expensive AI analyses - Improve speed: Serve cached results instantly - Maintain consistency: Same cookbook + AI settings = cached result

Cache Invalidation#

Cache is automatically invalidated when: - Cookbook content changes (detected via content hash) - Different AI provider or model is used - Manual cache clear (future feature)

Cache Keys#

Cache keys are generated from:

SHA256(cookbook_path + ai_provider + ai_model + content_hash)

Mass Analysis and Conversion#

Batch Analysis#

For entire Chef estates:

  1. Prepare Archive:

    # Create archive of all cookbooks
    tar czf chef_estate.tar.gz /path/to/cookbooks/*
    

  2. Upload to SousChef UI:

  3. Navigate to Cookbook Analysis
  4. Upload the archive
  5. Click "Analyse ALL Cookbooks"

  6. View Results:

  7. Results are automatically saved to database
  8. Access anytime from History page
  9. Download comprehensive reports

Parallel Processing#

For large estates, SousChef processes cookbooks in parallel: - Checks cache before analysis (instant for cached cookbooks) - Analyses new cookbooks in batches - Saves results incrementally - Shows progress for long-running operations

API Usage#

Saving Analysis Results#

from souschef.storage import get_storage_manager

storage = get_storage_manager()

# Save analysis
analysis_id = storage.save_analysis(
    cookbook_name="nginx",
    cookbook_path="/cookbooks/nginx",
    cookbook_version="1.0.0",
    complexity="Medium",
    estimated_hours=16.0,
    estimated_hours_with_souschef=8.0,
    recommendations="Use Ansible Galaxy nginx role...",
    analysis_data={"details": "..."},
    ai_provider="anthropic",
    ai_model="claude-3-5-sonnet-20241022",
    cookbook_blob_key="cookbooks/nginx/nginx-1.0.0.tar.gz"  # Optional: v3.7+
)

Retrieving Cached Results#

# Check for cached analysis
cached = storage.get_cached_analysis(
    "/cookbooks/nginx",
    ai_provider="anthropic",
    ai_model="claude-3-5-sonnet-20241022"
)

if cached:
    print(f"Found cached analysis from {cached.created_at}")
    print(f"Complexity: {cached.complexity}")
    print(f"Recommendations: {cached.recommendations}")

Saving Conversion Results#

# Save conversion with blob storage reference
# Note: In UI mode (v3.6+), this happens automatically during conversion
conversion_id = storage.save_conversion(
    cookbook_name="nginx",
    output_type="role",
    status="success",
    files_generated=12,
    conversion_data={
        "parsed_result": {...},
        "roles_blob_key": "conversions/nginx/roles_20250202_123456",
        "repo_blob_key": "conversions/nginx/repo_20250202_123500",
        "timestamp": "20250202_123456"
    },
    analysis_id=analysis_id,
    blob_storage_key="conversions/nginx/roles_20250202_123456"
)

Automatic Conversion Storage (UI): - When you convert cookbooks in the UI, artefacts are automatically: 1. Uploaded to blob storage as tar/zip archives 2. Saved to database with blob storage keys 3. Made available for download from the History page - Both roles and repository archives are stored separately - Archives include timestamps for version tracking

Uploading to Blob Storage#

from pathlib import Path
from souschef.storage import get_blob_storage

blob = get_blob_storage("local")

# Upload directory (automatically zipped)
storage_key = blob.upload(
    Path("/tmp/nginx_role"),
    "conversions/nginx_20250202_123456"
)
print(f"Uploaded to: {storage_key}")

# Download later
blob.download(
    storage_key,
    Path("/tmp/restored_nginx_role")
)

Environment Configuration#

Configure storage via environment variables:

# Default local storage (no config needed)

# Default SQLite database (no config needed)

# PostgreSQL database
export SOUSCHEF_DB_BACKEND=postgres
export SOUSCHEF_DB_HOST=postgres
export SOUSCHEF_DB_PORT=5432
export SOUSCHEF_DB_NAME=souschef
export SOUSCHEF_DB_USER=souschef
export SOUSCHEF_DB_PASSWORD=souschef
export SOUSCHEF_DB_SSLMODE=require
# For local development against a PostgreSQL instance without TLS, you may override:
# export SOUSCHEF_DB_SSLMODE=disable

# For S3 storage
export SOUSCHEF_STORAGE_BACKEND=s3
export SOUSCHEF_S3_BUCKET=souschef-artefacts
export SOUSCHEF_S3_REGION=us-east-1
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret

# For MinIO
export SOUSCHEF_STORAGE_BACKEND=s3
export SOUSCHEF_S3_BUCKET=souschef
export SOUSCHEF_S3_ENDPOINT=http://minio:9000
export SOUSCHEF_S3_ACCESS_KEY=admin
export SOUSCHEF_S3_SECRET_KEY=password

Docker Deployment#

Persistent Volumes#

Mount volumes to persist data across container restarts:

# docker-compose.yml
version: '3.8'

services:
  souschef-ui:
    image: souschef-ui:latest
    volumes:
      # Persist database and local storage
      - souschef-data:/tmp/.souschef/data
      - souschef-storage:/tmp/.souschef/storage
      # Optional: mount S3 credentials
      - ~/.aws:/root/.aws:ro
    environment:
      - SOUSCHEF_DB_BACKEND=sqlite
      - SOUSCHEF_STORAGE_BACKEND=local
    ports:
      - "8501:8501"

volumes:
  souschef-data:
    driver: local
  souschef-storage:
    driver: local

With PostgreSQL and MinIO#

version: '3.8'

services:
  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: souschef
      POSTGRES_USER: souschef
      POSTGRES_PASSWORD: souschef
    volumes:
      - postgres-data:/var/lib/postgresql/data

  minio:
    image: minio/minio
    command: server /data --console-address ":9001"
    environment:
      MINIO_ROOT_USER: minioadmin
      MINIO_ROOT_PASSWORD: minioadmin
    volumes:
      - minio-data:/data
    ports:
      - "9000:9000"
      - "9001:9001"

  souschef-ui:
    image: souschef-ui:latest
    depends_on:
      - postgres
      - minio
    environment:
      - SOUSCHEF_DB_BACKEND=postgres
      - SOUSCHEF_DB_HOST=postgres
      - SOUSCHEF_DB_PORT=5432
      - SOUSCHEF_DB_NAME=souschef
      - SOUSCHEF_DB_USER=souschef
      - SOUSCHEF_DB_PASSWORD=souschef
      - SOUSCHEF_STORAGE_BACKEND=s3
      - SOUSCHEF_S3_BUCKET=souschef
      - SOUSCHEF_S3_ENDPOINT=http://minio:9000
      - SOUSCHEF_S3_ACCESS_KEY=minioadmin
      - SOUSCHEF_S3_SECRET_KEY=minioadmin
    volumes:
      - souschef-data:/tmp/.souschef/data
      - souschef-storage:/tmp/.souschef/storage
    ports:
      - "8501:8501"

volumes:
  minio-data:
  postgres-data:
  souschef-data:
  souschef-storage:

Best Practices#

1. Regular Backups#

# Backup SQLite database
cp ~/.souschef/data/souschef.db ~/backups/souschef_$(date +%Y%m%d).db

# For S3, enable versioning on the bucket
aws s3api put-bucket-versioning \
  --bucket souschef-artefacts \
  --versioning-configuration Status=Enabled

2. Cache Management#

  • Cache grows over time - monitor disk usage
  • Cached analyses are automatically reused
  • Different AI models create separate cache entries

3. Large Estates#

  • Upload cookbooks in batches if possible
  • Use S3 storage for large numbers of conversions
  • Enable parallel processing for faster analysis

4. Cost Optimisation#

  • Caching dramatically reduces AI API costs
  • Analyse similar cookbooks in the same session
  • Review cached results before re-analysing

Troubleshooting#

Database Locked Errors#

SQLite may show "database is locked" under heavy concurrent use:

# Configure in storage manager (future enhancement)
# For now, retry operations or use PostgreSQL backend

Storage Path Errors#

Ensure the storage directory is writable:

# Check permissions
ls -la ~/.souschef/
chmod 700 ~/.souschef/

S3 Connection Issues#

Verify credentials and connectivity:

from souschef.storage import get_blob_storage

try:
    blob = get_blob_storage("s3", bucket_name="test-bucket")
    keys = blob.list_keys()
    print(f"Connected successfully, found {len(keys)} objects")
except Exception as e:
    print(f"Connection failed: {e}")

Cache Not Working#

Verify cache key generation:

from souschef.storage import get_storage_manager

storage = get_storage_manager()
cache_key = storage.generate_cache_key(
    "/path/to/cookbook",
    ai_provider="anthropic",
    ai_model="claude-3-5-sonnet-20241022"
)
print(f"Cache key: {cache_key}")

# Check if cached entry exists
cached = storage.get_cached_analysis(
    "/path/to/cookbook",
    ai_provider="anthropic",
    ai_model="claude-3-5-sonnet-20241022"
)
print(f"Cached: {cached is not None}")

Future Enhancements#

Planned features for upcoming releases:

  • Manual cache invalidation controls
  • Export/import database snapshots
  • Azure Blob Storage support
  • Google Cloud Storage support
  • Retention policies for old analyses
  • Data deduplication for similar cookbooks
  • Full-text search across analysis history
  • Collaborative analysis sharing
  • Enhanced PostgreSQL features (connection pooling, replication)

Migration from Previous Versions#

If you're upgrading from SousChef v3.5 or earlier:

  1. No migration needed: The database is created automatically
  2. Existing data: Previous session state data is lost (by design)
  3. New analyses: Will be automatically persisted going forward
  4. Re-analyse: Consider re-analysing important cookbooks to populate history

For more information, see: - Architecture Documentation - API Reference - Docker Deployment Guide