Introduction
March 2023: Growain's AI behaves differently in staging than production. Nobody knows why.
April 2023: Urgent fix needed for Securitain. Deployment blocked because configuration is hard-coded. Takes 3 hours to release.
May 2023: MindAgain's tone changed after deployment. Rollback required. Team discovers someone manually edited prompt in Lambda environment variable. No audit trail exists.
June 2023: We decided enough was enough.
This article shares how we revolutionized Cloudain's operations by treating configuration as data-the Single Source of Truth (SSOT) that transformed our AI platform from chaos to predictability.
Drift and Chaos in Multi-AI Systems
The Problem
When configuration lives in code, environment variables, or (worse) manual edits, drift is inevitable:
// securitain-handler.ts
300">const MODEL_CONFIG = {
provider: 39;bedrock39;,
model: 39;claude-v239;,
temperature: 0.0,
maxTokens: 500
}
Problems:
- Changing model requires code deployment
- Can't A/B test configurations
- No rollback without redeploying code
- Different values in dev/staging/prod (manual copying)
# .env.production
GROWAIN_MODEL=gpt-4
GROWAIN_TEMPERATURE=0.7
GROWAIN_MAX_TOKENS=800
# .env.staging
GROWAIN_MODEL=gpt-3.5-turbo # ??? Why different?
GROWAIN_TEMPERATURE=0.5 # ??? Who changed this?
GROWAIN_MAX_TOKENS=600 # ??? Not documented
Problems:
- Drift between environments
- No change history
- No validation before deployment
- Manual synchronization required
-- Configuration table
INSERT INTO config (key, value) VALUES
(39;growain_model39;, 39;gpt-439;),
(39;growain_temperature39;, 39;0.739;);
-- Updated by different team member
UPDATE config SET value=39;0.839;
WHERE key=39;growain_temperature39;;
-- WHY? WHO? WHEN? Unknown.
Problems:
- No version control
- No code review process
- Difficult to replicate environments
- Audit trail limited
The Cost of Drift
Real incident:
Production Growain: Creative, engaging responses (temp: 0.7)
Staging Growain: Robotic, repetitive responses (temp: 0.1)
Testing in staging looked bad.
Team spent 2 days "debugging" code.
Issue was configuration drift.
Cost: 16 engineering hours wasted
JSON-Based Config Pipelines
The SSOT Principle
Single Source of Truth:
- All configuration lives in version-controlled JSON files
- S3 is the authoritative source for runtime config
- Git provides change history and review process
- Deployments promote config, not copy it
The File Structure
cloudain-config/
├── brands/
│ ├── growain/
│ │ ├── production.json
│ │ ├── staging.json
│ │ └── development.json
│ ├── securitain/
│ │ ├── production.json
│ │ ├── staging.json
│ │ └── development.json
│ └── mindagain/
│ ├── production.json
│ ├── staging.json
│ └── development.json
├── models/
│ ├── bedrock-claude.json
│ ├── gpt-4.json
│ └── gpt-3.5.json
├── policies/
│ ├── refund-approval.json
│ ├── data-retention.json
│ └── rate-limits.json
└── locales/
├── tone-packs/
│ ├── growain-en.json
│ ├── growain-es.json
│ └── mindagain-ja.json
└── translations/
└── common.json
Example Configuration File
growain/production.json:
{
"brandId": "growain",
"environment": "production",
"version": "2.4.1",
"lastUpdated": "2025-01-21T10:30:00Z",
"updatedBy": "platform-team@cloudain.com",
"ai": {
"primaryModel": {
"provider": "bedrock",
"modelId": "anthropic.claude-v2:1",
"parameters": {
"temperature": 0.7,
"maxTokens": 800,
"topP": 0.9
}
},
"fallbackModel": {
"provider": "openai",
"modelId": "gpt-3.5-turbo",
"parameters": {
"temperature": 0.7,
"maxTokens": 800
}
}
},
"features": {
"chat": 300">true,
"campaignAnalysis": 300">true,
"automatedReporting": 300">true,
"multiLanguage": ["en", "es", "fr"]
},
"rateLimit": {
"messagesPerMinute": 100,
"messagesPerHour": 1000,
"tokensPerDay": 500000
},
"security": {
"turnstileEnabled": 300">true,
"piiRedaction": 300">true,
"auditLogging": 300">true
},
"integrations": {
"analytics": {
"enabled": 300">true,
"provider": "mixpanel"
},
"crm": {
"enabled": 300">true,
"provider": "salesforce"
}
}
}
CoreCloud-Managed Configuration
Secure Storage in S3
// Upload configuration to S3 with encryption
300">async 300">function uploadConfig(
brand: string,
environment: string,
config: BrandConfig
) {
// Validate config against schema
300">const validation = 300">await validateConfig(config)
300">if (!validation.valid) {
300">throw 300">new ValidationError(validation.errors)
}
// Encrypt using KMS
300">const encrypted = 300">await KMS.encrypt({
KeyId: CORECLOUD_KMS_KEY,
Plaintext: JSON.stringify(config)
})
// Upload to S3
300">await S3.putObject({
Bucket: 39;cloudain-config39;,
Key: 96;brands/${brand}/${environment}.json96;,
Body: encrypted.CiphertextBlob,
ServerSideEncryption: 39;aws:kms39;,
SSEKMSKeyId: CORECLOUD_KMS_KEY,
Metadata: {
version: config.version,
updatedBy: config.updatedBy,
lastUpdated: config.lastUpdated
}
})
// Invalidate cache
300">await Redis.del(96;config:${brand}:${environment}96;)
// Log change to audit trail
300">await CoreCloud.logConfigChange({
brand: brand,
environment: environment,
version: config.version,
changedBy: config.updatedBy,
changes: diffConfig(currentConfig, config)
})
}
Loading Configuration at Runtime
// AgenticCloud loads config 300">from CoreCloud
300">async 300">function loadBrandConfig(
brand: string,
environment: string
): Promise<BrandConfig> {
// Check cache first
300">const cached = 300">await Redis.get(96;config:${brand}:${environment}96;)
300">if (cached) {
300">return JSON.parse(cached)
}
// Load 300">from S3
300">const encrypted = 300">await S3.getObject({
Bucket: 39;cloudain-config39;,
Key: 96;brands/${brand}/${environment}.json96;
})
// Decrypt using KMS
300">const decrypted = 300">await KMS.decrypt({
CiphertextBlob: encrypted.Body
})
300">const config = JSON.parse(decrypted.Plaintext.toString())
// Cache for 5 minutes
300">await Redis.setex(
96;config:${brand}:${environment}96;,
300,
JSON.stringify(config)
)
300">return config
}
Secret Management
// Secrets separate 300">from config
300">const config = 300">await loadBrandConfig(39;growain39;, 39;production39;)
// Secrets loaded 300">from AWS Secrets Manager
300">const secrets = 300">await SecretsManager.getSecretValue({
SecretId: 96;cloudain/${config.brandId}/${config.environment}96;
})
300">const apiKeys = JSON.parse(secrets.SecretString)
// Combine for runtime use
300">const runtimeConfig = {
...config,
secrets: apiKeys
}
Promotion API Ensures Consistency
The Workflow
┌────────────────────────────────────────┐
│ 1. Developer Updates Config Locally │
│ Edit JSON file in Git repo │
└─────────────────┬──────────────────────┘
│
▼
┌────────────────────────────────────────┐
│ 2. Create Pull Request │
│ Automated validation runs │
│ Team reviews changes │
└─────────────────┬──────────────────────┘
│
▼
┌────────────────────────────────────────┐
│ 3. Merge to Main │
│ CI/CD deploys to development │
└─────────────────┬──────────────────────┘
│
▼
┌────────────────────────────────────────┐
│ 4. Promotion API Call │
│ POST /api/promote-config │
│ 300">from: development │
│ to: staging │
└─────────────────┬──────────────────────┘
│
▼
┌────────────────────────────────────────┐
│ 5. Automated Tests Run on Staging │
│ Validate config works │
└─────────────────┬──────────────────────┘
│
▼
┌────────────────────────────────────────┐
│ 6. Promote to Production │
│ POST /api/promote-config │
│ 300">from: staging │
│ to: production │
│ requiresApproval: 300">true │
└────────────────────────────────────────┘
Promotion API Implementation
// Promotion endpoint
app.post(39;/api/promote-config39;, 300">async (req, res) => {
300">const { brand, fromEnv, toEnv, approvedBy } = req.body
// 1. Validate promotion path
300">const validPromotions = {
39;development39;: [39;staging39;],
39;staging39;: [39;production39;]
}
300">if (!validPromotions[fromEnv]?.includes(toEnv)) {
300">return res.status(400).json({
error: 39;Invalid promotion path39;
})
}
// 2. Check approval requirement
300">if (toEnv === 39;production39;) {
300">const approval = 300">await CoreCloud.getApproval({
300">type: 39;config_promotion39;,
brand: brand,
requester: req.user.id
})
300">if (!approval.approved) {
300">return res.status(403).json({
error: 39;Production promotion requires approval39;
})
}
}
// 3. Load source config
300">const sourceConfig = 300">await loadBrandConfig(brand, fromEnv)
// 4. Validate config
300">const validation = 300">await validateConfig(sourceConfig)
300">if (!validation.valid) {
300">return res.status(400).json({
error: 39;Config validation failed39;,
errors: validation.errors
})
}
// 5. Run smoke tests
300">const smokeTests = 300">await runSmokeTests(brand, sourceConfig, toEnv)
300">if (!smokeTests.passed) {
300">return res.status(400).json({
error: 39;Smoke tests failed39;,
failures: smokeTests.failures
})
}
// 6. Create backup of current production config
300">if (toEnv === 39;production39;) {
300">await backupConfig(brand, toEnv)
}
// 7. Promote config
300">await uploadConfig(brand, toEnv, {
...sourceConfig,
environment: toEnv,
version: incrementVersion(sourceConfig.version),
promotedFrom: fromEnv,
promotedBy: req.user.email,
promotedAt: 300">new Date().toISOString()
})
// 8. Trigger cache refresh
300">await refreshConfigCache(brand, toEnv)
// 9. Log promotion
300">await CoreCloud.logConfigPromotion({
brand: brand,
300">from: fromEnv,
to: toEnv,
version: sourceConfig.version,
promotedBy: req.user.email
})
res.json({
success: 300">true,
brand: brand,
environment: toEnv,
version: incrementVersion(sourceConfig.version)
})
})
Automated Validation
// Validation schema
300">const configSchema = {
300">type: 39;object39;,
required: [39;brandId39;, 39;environment39;, 39;version39;, 39;ai39;, 39;features39;],
properties: {
brandId: {
300">type: 39;string39;,
enum: [39;growain39;, 39;securitain39;, 39;mindagain39;, 39;corefinops39;]
},
environment: {
300">type: 39;string39;,
enum: [39;development39;, 39;staging39;, 39;production39;]
},
version: {
300">type: 39;string39;,
pattern: 39;^\\d+\\.\\d+\\.\\d+$39;
},
ai: {
300">type: 39;object39;,
required: [39;primaryModel39;],
properties: {
primaryModel: {
300">type: 39;object39;,
required: [39;provider39;, 39;modelId39;, 39;parameters39;],
properties: {
temperature: {
300">type: 39;number39;,
minimum: 0,
maximum: 1
},
maxTokens: {
300">type: 39;integer39;,
minimum: 100,
maximum: 32000
}
}
}
}
}
}
}
// Validate 300">function
300">async 300">function validateConfig(config: any): Promise<ValidationResult> {
300">const ajv = 300">new Ajv()
300">const valid = ajv.validate(configSchema, config)
300">if (!valid) {
300">return {
valid: 300">false,
errors: ajv.errors
}
}
// Additional business logic validation
300">if (config.environment === 39;production39;) {
300">if (config.ai.primaryModel.parameters.temperature > 0.8) {
300">return {
valid: 300">false,
errors: [39;Production temperature should not exceed 0.839;]
}
}
}
300">return { valid: 300">true }
}
Faster Launches, Fewer Manual Errors
Before SSOT: Traditional Deployment
1. Developer writes code [2 hours]
2. Hard-codes configuration [30 min]
3. Creates pull request [15 min]
4. Code review [1 day]
5. Merge to main [5 min]
6. Deploy to staging [20 min]
7. Manually copy config to staging env vars [15 min]
8. Test in staging [2 hours]
9. Find config issue (temp wrong) [1 hour debugging]
10. Fix and redeploy [40 min]
11. Deploy to production [30 min]
12. Manually copy config to prod env vars [20 min]
13. Verify production [30 min]
Total: ~3 days, 2 manual steps, high error risk
After SSOT: Config-as-Data
1. Developer updates config JSON [15 min]
2. Creates pull request [5 min]
3. Automated validation runs [2 min]
4. Team reviews config changes [30 min]
5. Merge to main [2 min]
6. Auto-deploy to development [3 min]
7. Promotion API: dev → staging [2 min]
8. Automated tests run [5 min]
9. Promotion API: staging → prod [2 min]
10. Verify production [10 min]
Total: 1 hour, 0 manual steps, low error risk
Error Rate Reduction
6 months before SSOT:
- 23 production incidents related to config
- 47 staging/production drift issues
- 156 hours spent debugging config problems
6 months after SSOT:
- 2 production incidents (both network, not config)
- 0 drift issues
- 8 hours total on config troubleshooting
Impact: 95% reduction in config-related issues
Version Control and Rollback
Git as Change History
# View config change history
git log --oneline brands/growain/production.json
# Output:
a3f8c91 Increase token limit for enterprise users
b2e7d85 Enable campaign analysis feature
c1d6f73 Switch to Claude v2.1
d0c5e62 Add Spanish language support
Instant Rollback
// Rollback to previous version
300">async 300">function rollbackConfig(
brand: string,
environment: string,
targetVersion: string
) {
// Get previous config 300">from Git
300">const previousConfig = 300">await getConfigFromGit(
brand,
environment,
targetVersion
)
// Validate it still works
300">const validation = 300">await validateConfig(previousConfig)
300">if (!validation.valid) {
300">throw 300">new Error(39;Previous config no longer valid39;)
}
// Create backup of current config
300">await backupConfig(brand, environment)
// Upload previous config
300">await uploadConfig(brand, environment, {
...previousConfig,
version: incrementVersion(previousConfig.version),
rolledBackFrom: getCurrentVersion(brand, environment),
rolledBackBy: req.user.email,
rolledBackAt: 300">new Date().toISOString()
})
// Refresh cache
300">await refreshConfigCache(brand, environment)
300">return {
success: 300">true,
rolledBackTo: targetVersion
}
}
Real Incident:
12:15 PM: Growain production updated
12:22 PM: User reports strange behavior
12:23 PM: Team investigates
12:26 PM: Decision to rollback
12:27 PM: Rollback executed
12:28 PM: Service restored
Total downtime: 6 minutes (vs. 45+ min with code rollback)
Configuration Testing
Unit Tests for Config
// Test configuration files
describe(39;Growain Production Config39;, () => {
300">let config: BrandConfig
beforeAll(300">async () => {
config = 300">await loadConfig(39;brands/growain/production.json39;)
})
test(39;should have valid schema39;, () => {
300">const validation = validateConfig(config)
expect(validation.valid).toBe(300">true)
})
test(39;should use production-approved models39;, () => {
300">const approvedModels = [39;anthropic.claude-v239;, 39;gpt-439;]
expect(approvedModels).toContain(config.ai.primaryModel.modelId)
})
test(39;should have conservative temperature for production39;, () => {
expect(config.ai.primaryModel.parameters.temperature).toBeLessThanOrEqual(0.8)
})
test(39;should have security features enabled39;, () => {
expect(config.security.turnstileEnabled).toBe(300">true)
expect(config.security.piiRedaction).toBe(300">true)
expect(config.security.auditLogging).toBe(300">true)
})
test(39;should have rate limits configured39;, () => {
expect(config.rateLimit).toBeDefined()
expect(config.rateLimit.messagesPerMinute).toBeGreaterThan(0)
})
})
Integration Tests
// Test config works with actual services
describe(39;Config Integration Tests39;, () => {
test(39;should successfully load and use config39;, 300">async () => {
300">const config = 300">await loadBrandConfig(39;growain39;, 39;staging39;)
// Test AI model access
300">const response = 300">await AgenticCloud.generate({
brand: 39;growain39;,
config: config,
input: 39;Test message39;
})
expect(response).toBeDefined()
expect(response.error).toBeUndefined()
})
test(39;should respect rate limits39;, 300">async () => {
300">const config = 300">await loadBrandConfig(39;growain39;, 39;staging39;)
300">const limit = config.rateLimit.messagesPerMinute
// Send limit + 1 messages
300">const requests = Array(limit + 1).fill(300">null).map(() =>
sendMessage(39;Test39;)
)
300">const results = 300">await Promise.allSettled(requests)
300">const rejected = results.filter(r => r.status === 39;rejected39;)
expect(rejected.length).toBeGreaterThan(0)
})
})
CI/CD Integration
GitHub Actions Workflow
name: Deploy Config
on:
push:
branches: [main]
paths:
- 39;brands/**/*.json39;
- 39;policies/**/*.json39;
- 39;models/**/*.json39;
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Validate JSON syntax
run: |
find brands -name "*.json" -exec json_verify {} \;
- name: Validate config schema
run: |
npm run validate:config
- name: Run config tests
run: |
npm test -- config.test.ts
deploy-dev:
needs: validate
runs-on: ubuntu-latest
steps:
- name: Deploy to development
run: |
aws s3 sync brands/ s3://cloudain-config/brands/ \
--exclude "*" \
--include "*/development.json" \
--sse aws:kms \
--sse-kms-key-id ${{ secrets.KMS_KEY_ID }}
- name: Invalidate cache
run: |
aws lambda invoke \
--300">function-name invalidate-config-cache \
--payload 39;{"environment":"development"}39; \
response.json
promote-staging:
needs: deploy-dev
runs-on: ubuntu-latest
steps:
- name: Run smoke tests on dev
run: npm run test:smoke:dev
- name: Promote to staging
run: |
curl -X POST https://api.cloudain.com/promote-config \
-H "Authorization: Bearer ${{ secrets.API_TOKEN }}" \
-d 39;{"fromEnv":"development","toEnv":"staging"}39;
# Production promotion requires manual approval
promote-production:
needs: promote-staging
runs-on: ubuntu-latest
environment: production
steps:
- name: Promote to production
run: |
curl -X POST https://api.cloudain.com/promote-config \
-H "Authorization: Bearer ${{ secrets.API_TOKEN }}" \
-d 39;{"fromEnv":"staging","toEnv":"production"}39;
Real-World Impact
Deployment Velocity
Deployment Velocity
Measured improvements after adopting Configuration-as-Data
SSOT Results
Metric
Config change time
Before
3 days
After
1 hour
24x faster
Metric
Manual steps
Before
2
After
0
100% elimination
Metric
Error rate
Before
8%
After
<0.5%
94% reduction
Metric
Rollback time
Before
45+ min
After
<5 min
9x faster
Metric
Environment drift incidents
Before
47 cases / quarter
After
0 cases
Perfect sync
Developer Experience
Before:
"I need to change the AI temperature"
→ Find where it39;s configured (code? env 300">var? database?)
→ Update in development
→ Test locally
→ Update in staging (manually)
→ Test in staging
→ Update in production (manually)
→ Hope they match
→ 3 days later, discover they don39;t match
After:
"I need to change the AI temperature"
→ Edit brands/growain/production.json
→ Create PR (auto-validated)
→ Merge
→ Auto-promoted through environments
→ Done in 1 hour
→ Perfect consistency guaranteed
Conclusion
Configuration-as-data transformed Cloudain's operations from chaotic to predictable. By treating config as a first-class citizen-version controlled, validated, tested, and promoted-we achieved:
Consistency:
- 0 drift between environments
- Identical behavior in staging and production
- Predictable deployments
Speed:
- 24x faster config changes
- <5 min rollback vs. 45+ min
- Deploy multiple times per day safely
Quality:
- 94% reduction in config errors
- Automated validation prevents mistakes
- Git history provides audit trail
Key principles:
- Config is data, not code
- Git is source of truth
- S3 is runtime authority
- Promote, don't copy
- Test everything
CoreCloud manages it all - encryption, access control, promotion workflows, and audit trails.
The result: Cloudain deploys faster, with confidence, and without the constant fear of configuration drift.
Ready to modernise configuration?
Book a configuration architecture review with Cloudain engineers.
We assess Git workflows, environment promotion, validation guardrails, and change automation to design a single source of truth tailored to your teams.
Talk to Cloudain
View service options
- Playbook design for configuration-as-data adoption
- GitOps pipelines with policy and validation gates
- Secrets & compliance alignment across environments
3-hour discovery workshop
Environment drift assessment
Promotion automation blueprint
Modernize Your Config Management
Ready to eliminate configuration chaos?
Schedule a Config Architecture Review →
Learn how CoreCloud's SSOT approach can transform your operations.

Cloudain Editorial Team
Expert insights on AI, Cloud, and Compliance solutions. Helping organisations transform their technology infrastructure with innovative strategies.
