Data Persistence and Storage Guide#
SousChef v3.6+ includes comprehensive data persistence capabilities to store analysis history, conversion results, and generated assets across container restarts.
Features#
Persistent Storage#
- SQLite Database: Stores analysis results, conversion history, and metadata
- PostgreSQL Database: Production-grade relational database for shared deployments
- Blob Storage: Stores generated Ansible playbooks, roles, and supporting files
- Cache Layer: Automatically caches AI analysis results to save costs and time
- Historical Tracking: View and download past analyses and conversions
Database Schema#
The persistence layer uses SQLite or PostgreSQL with the following tables:
Analysis Results#
- Cookbook name, version, and metadata
- Complexity ratings and effort estimates
- AI provider and model used
- Full analysis data (JSON)
- Cache keys for efficient lookups
- Source archive reference (v3.7+): Blob storage key for uploaded cookbook
- Timestamps
Conversion Results#
- Output type (playbook/role/collection)
- Status (success/partial/failed)
- Files generated count
- Blob storage reference
- Full conversion data (JSON)
- Links to original analysis
Storage Backends#
Local Storage (Default)#
By default, SousChef stores data under a .souschef directory inside the system temporary directory (for example /tmp/.souschef on Linux/macOS):
- <system-temp>/.souschef/data/souschef.db - SQLite database
- <system-temp>/.souschef/storage/ - Generated artefacts
from souschef.storage import get_storage_manager, get_blob_storage
# Uses default local storage (system temporary directory, e.g. /tmp/.souschef)
storage = get_storage_manager()
blob = get_blob_storage("local")
S3-Compatible Storage#
Use AWS S3, MinIO, or any S3-compatible service:
from souschef.storage import get_blob_storage
# AWS S3
blob = get_blob_storage(
backend="s3",
bucket_name="souschef-artefacts",
region="us-east-1"
# access_key and secret_key from environment or explicit
)
# MinIO
blob = get_blob_storage(
backend="s3",
bucket_name="souschef",
endpoint_url="http://minio:9000",
access_key="admin",
secret_key="password"
)
Using the History Page#
The History page in the UI provides comprehensive access to all past analyses and conversions.
Analysis History#
- Navigate to History in the main navigation
- Click the Analysis History tab
- Filter by cookbook name (optional)
- View detailed metrics:
- Complexity ratings
- Manual vs AI-assisted effort estimates
- Time saved
- AI provider/model used
- Click on individual analyses to see full details and recommendations
Source Archive Storage (v3.7+):
- When you upload a cookbook archive (tar/zip), it's automatically stored alongside the analysis
- Stored archives enable re-conversion without re-uploading
- Archives are referenced by cookbook_blob_key in the database
- Future versions will enable one-click re-conversion from stored archives
Conversion History#
- Navigate to History in the main navigation
- Click the Conversion History tab
- Filter by cookbook name or status
- View conversion details:
- Success/failure status
- Files generated
- Output types (playbook/role/collection)
- Download previously generated artefacts with one click
Automatic Archive Creation (v3.6+): - Conversion artefacts are automatically stored as tar/zip archives during conversion - Both roles and repository archives are saved to blob storage - Archives are immediately available for download from the History page - No need to regenerate - just click to download pre-created archives
Statistics Dashboard#
View overall metrics: - Total analyses performed - Unique cookbooks analysed - Total conversions and success rate - Average time saved with AI assistance - Total files generated
Automated Caching#
SousChef automatically caches AI analysis results to: - Reduce costs: Avoid re-running expensive AI analyses - Improve speed: Serve cached results instantly - Maintain consistency: Same cookbook + AI settings = cached result
Cache Invalidation#
Cache is automatically invalidated when: - Cookbook content changes (detected via content hash) - Different AI provider or model is used - Manual cache clear (future feature)
Cache Keys#
Cache keys are generated from:
Mass Analysis and Conversion#
Batch Analysis#
For entire Chef estates:
-
Prepare Archive:
-
Upload to SousChef UI:
- Navigate to Cookbook Analysis
- Upload the archive
-
Click "Analyse ALL Cookbooks"
-
View Results:
- Results are automatically saved to database
- Access anytime from History page
- Download comprehensive reports
Parallel Processing#
For large estates, SousChef processes cookbooks in parallel: - Checks cache before analysis (instant for cached cookbooks) - Analyses new cookbooks in batches - Saves results incrementally - Shows progress for long-running operations
API Usage#
Saving Analysis Results#
from souschef.storage import get_storage_manager
storage = get_storage_manager()
# Save analysis
analysis_id = storage.save_analysis(
cookbook_name="nginx",
cookbook_path="/cookbooks/nginx",
cookbook_version="1.0.0",
complexity="Medium",
estimated_hours=16.0,
estimated_hours_with_souschef=8.0,
recommendations="Use Ansible Galaxy nginx role...",
analysis_data={"details": "..."},
ai_provider="anthropic",
ai_model="claude-3-5-sonnet-20241022",
cookbook_blob_key="cookbooks/nginx/nginx-1.0.0.tar.gz" # Optional: v3.7+
)
Retrieving Cached Results#
# Check for cached analysis
cached = storage.get_cached_analysis(
"/cookbooks/nginx",
ai_provider="anthropic",
ai_model="claude-3-5-sonnet-20241022"
)
if cached:
print(f"Found cached analysis from {cached.created_at}")
print(f"Complexity: {cached.complexity}")
print(f"Recommendations: {cached.recommendations}")
Saving Conversion Results#
# Save conversion with blob storage reference
# Note: In UI mode (v3.6+), this happens automatically during conversion
conversion_id = storage.save_conversion(
cookbook_name="nginx",
output_type="role",
status="success",
files_generated=12,
conversion_data={
"parsed_result": {...},
"roles_blob_key": "conversions/nginx/roles_20250202_123456",
"repo_blob_key": "conversions/nginx/repo_20250202_123500",
"timestamp": "20250202_123456"
},
analysis_id=analysis_id,
blob_storage_key="conversions/nginx/roles_20250202_123456"
)
Automatic Conversion Storage (UI): - When you convert cookbooks in the UI, artefacts are automatically: 1. Uploaded to blob storage as tar/zip archives 2. Saved to database with blob storage keys 3. Made available for download from the History page - Both roles and repository archives are stored separately - Archives include timestamps for version tracking
Uploading to Blob Storage#
from pathlib import Path
from souschef.storage import get_blob_storage
blob = get_blob_storage("local")
# Upload directory (automatically zipped)
storage_key = blob.upload(
Path("/tmp/nginx_role"),
"conversions/nginx_20250202_123456"
)
print(f"Uploaded to: {storage_key}")
# Download later
blob.download(
storage_key,
Path("/tmp/restored_nginx_role")
)
Environment Configuration#
Configure storage via environment variables:
# Default local storage (no config needed)
# Default SQLite database (no config needed)
# PostgreSQL database
export SOUSCHEF_DB_BACKEND=postgres
export SOUSCHEF_DB_HOST=postgres
export SOUSCHEF_DB_PORT=5432
export SOUSCHEF_DB_NAME=souschef
export SOUSCHEF_DB_USER=souschef
export SOUSCHEF_DB_PASSWORD=souschef
export SOUSCHEF_DB_SSLMODE=require
# For local development against a PostgreSQL instance without TLS, you may override:
# export SOUSCHEF_DB_SSLMODE=disable
# For S3 storage
export SOUSCHEF_STORAGE_BACKEND=s3
export SOUSCHEF_S3_BUCKET=souschef-artefacts
export SOUSCHEF_S3_REGION=us-east-1
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
# For MinIO
export SOUSCHEF_STORAGE_BACKEND=s3
export SOUSCHEF_S3_BUCKET=souschef
export SOUSCHEF_S3_ENDPOINT=http://minio:9000
export SOUSCHEF_S3_ACCESS_KEY=admin
export SOUSCHEF_S3_SECRET_KEY=password
Docker Deployment#
Persistent Volumes#
Mount volumes to persist data across container restarts:
# docker-compose.yml
version: '3.8'
services:
souschef-ui:
image: souschef-ui:latest
volumes:
# Persist database and local storage
- souschef-data:/tmp/.souschef/data
- souschef-storage:/tmp/.souschef/storage
# Optional: mount S3 credentials
- ~/.aws:/root/.aws:ro
environment:
- SOUSCHEF_DB_BACKEND=sqlite
- SOUSCHEF_STORAGE_BACKEND=local
ports:
- "8501:8501"
volumes:
souschef-data:
driver: local
souschef-storage:
driver: local
With PostgreSQL and MinIO#
version: '3.8'
services:
postgres:
image: postgres:16-alpine
environment:
POSTGRES_DB: souschef
POSTGRES_USER: souschef
POSTGRES_PASSWORD: souschef
volumes:
- postgres-data:/var/lib/postgresql/data
minio:
image: minio/minio
command: server /data --console-address ":9001"
environment:
MINIO_ROOT_USER: minioadmin
MINIO_ROOT_PASSWORD: minioadmin
volumes:
- minio-data:/data
ports:
- "9000:9000"
- "9001:9001"
souschef-ui:
image: souschef-ui:latest
depends_on:
- postgres
- minio
environment:
- SOUSCHEF_DB_BACKEND=postgres
- SOUSCHEF_DB_HOST=postgres
- SOUSCHEF_DB_PORT=5432
- SOUSCHEF_DB_NAME=souschef
- SOUSCHEF_DB_USER=souschef
- SOUSCHEF_DB_PASSWORD=souschef
- SOUSCHEF_STORAGE_BACKEND=s3
- SOUSCHEF_S3_BUCKET=souschef
- SOUSCHEF_S3_ENDPOINT=http://minio:9000
- SOUSCHEF_S3_ACCESS_KEY=minioadmin
- SOUSCHEF_S3_SECRET_KEY=minioadmin
volumes:
- souschef-data:/tmp/.souschef/data
- souschef-storage:/tmp/.souschef/storage
ports:
- "8501:8501"
volumes:
minio-data:
postgres-data:
souschef-data:
souschef-storage:
Best Practices#
1. Regular Backups#
# Backup SQLite database
cp ~/.souschef/data/souschef.db ~/backups/souschef_$(date +%Y%m%d).db
# For S3, enable versioning on the bucket
aws s3api put-bucket-versioning \
--bucket souschef-artefacts \
--versioning-configuration Status=Enabled
2. Cache Management#
- Cache grows over time - monitor disk usage
- Cached analyses are automatically reused
- Different AI models create separate cache entries
3. Large Estates#
- Upload cookbooks in batches if possible
- Use S3 storage for large numbers of conversions
- Enable parallel processing for faster analysis
4. Cost Optimisation#
- Caching dramatically reduces AI API costs
- Analyse similar cookbooks in the same session
- Review cached results before re-analysing
Troubleshooting#
Database Locked Errors#
SQLite may show "database is locked" under heavy concurrent use:
# Configure in storage manager (future enhancement)
# For now, retry operations or use PostgreSQL backend
Storage Path Errors#
Ensure the storage directory is writable:
S3 Connection Issues#
Verify credentials and connectivity:
from souschef.storage import get_blob_storage
try:
blob = get_blob_storage("s3", bucket_name="test-bucket")
keys = blob.list_keys()
print(f"Connected successfully, found {len(keys)} objects")
except Exception as e:
print(f"Connection failed: {e}")
Cache Not Working#
Verify cache key generation:
from souschef.storage import get_storage_manager
storage = get_storage_manager()
cache_key = storage.generate_cache_key(
"/path/to/cookbook",
ai_provider="anthropic",
ai_model="claude-3-5-sonnet-20241022"
)
print(f"Cache key: {cache_key}")
# Check if cached entry exists
cached = storage.get_cached_analysis(
"/path/to/cookbook",
ai_provider="anthropic",
ai_model="claude-3-5-sonnet-20241022"
)
print(f"Cached: {cached is not None}")
Future Enhancements#
Planned features for upcoming releases:
- Manual cache invalidation controls
- Export/import database snapshots
- Azure Blob Storage support
- Google Cloud Storage support
- Retention policies for old analyses
- Data deduplication for similar cookbooks
- Full-text search across analysis history
- Collaborative analysis sharing
- Enhanced PostgreSQL features (connection pooling, replication)
Migration from Previous Versions#
If you're upgrading from SousChef v3.5 or earlier:
- No migration needed: The database is created automatically
- Existing data: Previous session state data is lost (by design)
- New analyses: Will be automatically persisted going forward
- Re-analyse: Consider re-analysing important cookbooks to populate history
For more information, see: - Architecture Documentation - API Reference - Docker Deployment Guide