Cloudain LogoCloudainInnovation Hub
The Source of Truth Revolution: Why Config-as-Data Is the Future of AI Ops

Posted by

Cloudain Editorial Team

DevOps & AI

The Source of Truth Revolution: Why Config-as-Data Is the Future of AI Ops

How Cloudain eliminated configuration drift, accelerated deployments, and unified 6 AI products using encrypted JSON as the Single Source of Truth.

Author

Cloudain Editorial Team

Published

2025-01-21

Read Time

9 min read

Introduction

March 2023: Growain's AI behaves differently in staging than production. Nobody knows why.

April 2023: Urgent fix needed for Securitain. Deployment blocked because configuration is hard-coded. Takes 3 hours to release.

May 2023: MindAgain's tone changed after deployment. Rollback required. Team discovers someone manually edited prompt in Lambda environment variable. No audit trail exists.

June 2023: We decided enough was enough.

This article shares how we revolutionized Cloudain's operations by treating configuration as data-the Single Source of Truth (SSOT) that transformed our AI platform from chaos to predictability.

Drift and Chaos in Multi-AI Systems

The Problem

When configuration lives in code, environment variables, or (worse) manual edits, drift is inevitable:

TYPESCRIPT
// securitain-handler.ts
300">const MODEL_CONFIG = {
  provider: 'bedrock',
  model: 'claude-v2',
  temperature: 0.0,
  maxTokens: 500
}

Problems:

  • Changing model requires code deployment
  • Can't A/B test configurations
  • No rollback without redeploying code
  • Different values in dev/staging/prod (manual copying)
BASH
# .env.production
GROWAIN_MODEL=gpt-4
GROWAIN_TEMPERATURE=0.7
GROWAIN_MAX_TOKENS=800

# .env.staging
GROWAIN_MODEL=gpt-3.5-turbo  # ??? Why different?
GROWAIN_TEMPERATURE=0.5      # ??? Who changed this?
GROWAIN_MAX_TOKENS=600       # ??? Not documented

Problems:

  • Drift between environments
  • No change history
  • No validation before deployment
  • Manual synchronization required
SQL
-- Configuration table
INSERT INTO config (key, value) VALUES
  ('growain_model', 'gpt-4'),
  ('growain_temperature', '0.7');

-- Updated by different team member
UPDATE config SET value='0.8'
WHERE key='growain_temperature';
-- WHY? WHO? WHEN? Unknown.

Problems:

  • No version control
  • No code review process
  • Difficult to replicate environments
  • Audit trail limited

The Cost of Drift

Real incident:

CODE
Production Growain: Creative, engaging responses (temp: 0.7)
Staging Growain: Robotic, repetitive responses (temp: 0.1)

Testing in staging looked bad.
Team spent 2 days "debugging" code.
Issue was configuration drift.
Cost: 16 engineering hours wasted

JSON-Based Config Pipelines

The SSOT Principle

Single Source of Truth:

  • All configuration lives in version-controlled JSON files
  • S3 is the authoritative source for runtime config
  • Git provides change history and review process
  • Deployments promote config, not copy it

The File Structure

CODE
cloudain-config/
├── brands/
│   ├── growain/
│   │   ├── production.json
│   │   ├── staging.json
│   │   └── development.json
│   ├── securitain/
│   │   ├── production.json
│   │   ├── staging.json
│   │   └── development.json
│   └── mindagain/
│       ├── production.json
│       ├── staging.json
│       └── development.json
├── models/
│   ├── bedrock-claude.json
│   ├── gpt-4.json
│   └── gpt-3.5.json
├── policies/
│   ├── refund-approval.json
│   ├── data-retention.json
│   └── rate-limits.json
└── locales/
    ├── tone-packs/
    │   ├── growain-en.json
    │   ├── growain-es.json
    │   └── mindagain-ja.json
    └── translations/
        └── common.json

Example Configuration File

growain/production.json:

JSON
{
  "brandId": "growain",
  "environment": "production",
  "version": "2.4.1",
  "lastUpdated": "2025-01-21T10:30:00Z",
  "updatedBy": "platform-team@cloudain.com",

  "ai": {
    "primaryModel": {
      "provider": "bedrock",
      "modelId": "anthropic.claude-v2:1",
      "parameters": {
        "temperature": 0.7,
        "maxTokens": 800,
        "topP": 0.9
      }
    },
    "fallbackModel": {
      "provider": "openai",
      "modelId": "gpt-3.5-turbo",
      "parameters": {
        "temperature": 0.7,
        "maxTokens": 800
      }
    }
  },

  "features": {
    "chat": 300">true,
    "campaignAnalysis": 300">true,
    "automatedReporting": 300">true,
    "multiLanguage": ["en", "es", "fr"]
  },

  "rateLimit": {
    "messagesPerMinute": 100,
    "messagesPerHour": 1000,
    "tokensPerDay": 500000
  },

  "security": {
    "turnstileEnabled": 300">true,
    "piiRedaction": 300">true,
    "auditLogging": 300">true
  },

  "integrations": {
    "analytics": {
      "enabled": 300">true,
      "provider": "mixpanel"
    },
    "crm": {
      "enabled": 300">true,
      "provider": "salesforce"
    }
  }
}

CoreCloud-Managed Configuration

Secure Storage in S3

TYPESCRIPT
// Upload configuration to S3 with encryption
300">async 300">function uploadConfig(
  brand: string,
  environment: string,
  config: BrandConfig
) {
  // Validate config against schema
  300">const validation = 300">await validateConfig(config)
  300">if (!validation.valid) {
    300">throw 300">new ValidationError(validation.errors)
  }

  // Encrypt using KMS
  300">const encrypted = 300">await KMS.encrypt({
    KeyId: CORECLOUD_KMS_KEY,
    Plaintext: JSON.stringify(config)
  })

  // Upload to S3
  300">await S3.putObject({
    Bucket: 'cloudain-config',
    Key: `brands/${brand}/${environment}.json`,
    Body: encrypted.CiphertextBlob,
    ServerSideEncryption: 'aws:kms',
    SSEKMSKeyId: CORECLOUD_KMS_KEY,
    Metadata: {
      version: config.version,
      updatedBy: config.updatedBy,
      lastUpdated: config.lastUpdated
    }
  })

  // Invalidate cache
  300">await Redis.del(`config:${brand}:${environment}`)

  // Log change to audit trail
  300">await CoreCloud.logConfigChange({
    brand: brand,
    environment: environment,
    version: config.version,
    changedBy: config.updatedBy,
    changes: diffConfig(currentConfig, config)
  })
}

Loading Configuration at Runtime

TYPESCRIPT
// AgenticCloud loads config 300">from CoreCloud
300">async 300">function loadBrandConfig(
  brand: string,
  environment: string
): Promise<BrandConfig> {
  // Check cache first
  300">const cached = 300">await Redis.get(&#96;config:${brand}:${environment}&#96;)
  300">if (cached) {
    300">return JSON.parse(cached)
  }

  // Load 300">from S3
  300">const encrypted = 300">await S3.getObject({
    Bucket: &#39;cloudain-config&#39;,
    Key: &#96;brands/${brand}/${environment}.json&#96;
  })

  // Decrypt using KMS
  300">const decrypted = 300">await KMS.decrypt({
    CiphertextBlob: encrypted.Body
  })

  300">const config = JSON.parse(decrypted.Plaintext.toString())

  // Cache for 5 minutes
  300">await Redis.setex(
    &#96;config:${brand}:${environment}&#96;,
    300,
    JSON.stringify(config)
  )

  300">return config
}

Secret Management

TYPESCRIPT
// Secrets separate 300">from config
300">const config = 300">await loadBrandConfig(&#39;growain&#39;, &#39;production&#39;)

// Secrets loaded 300">from AWS Secrets Manager
300">const secrets = 300">await SecretsManager.getSecretValue({
  SecretId: &#96;cloudain/${config.brandId}/${config.environment}&#96;
})

300">const apiKeys = JSON.parse(secrets.SecretString)

// Combine for runtime use
300">const runtimeConfig = {
  ...config,
  secrets: apiKeys
}

Promotion API Ensures Consistency

The Workflow

CODE
┌────────────────────────────────────────┐
│  1. Developer Updates Config Locally   │
│     Edit JSON file in Git repo         │
└─────────────────┬──────────────────────┘
                  │
                  ▼
┌────────────────────────────────────────┐
│  2. Create Pull Request                │
│     Automated validation runs          │
│     Team reviews changes               │
└─────────────────┬──────────────────────┘
                  │
                  ▼
┌────────────────────────────────────────┐
│  3. Merge to Main                      │
│     CI/CD deploys to development       │
└─────────────────┬──────────────────────┘
                  │
                  ▼
┌────────────────────────────────────────┐
│  4. Promotion API Call                 │
│     POST /api/promote-config           │
│     300">from: development                  │
│     to: staging                        │
└─────────────────┬──────────────────────┘
                  │
                  ▼
┌────────────────────────────────────────┐
│  5. Automated Tests Run on Staging    │
│     Validate config works              │
└─────────────────┬──────────────────────┘
                  │
                  ▼
┌────────────────────────────────────────┐
│  6. Promote to Production              │
│     POST /api/promote-config           │
│     300">from: staging                      │
│     to: production                     │
│     requiresApproval: 300">true             │
└────────────────────────────────────────┘

Promotion API Implementation

TYPESCRIPT
// Promotion endpoint
app.post(&#39;/api/promote-config&#39;, 300">async (req, res) => {
  300">const { brand, fromEnv, toEnv, approvedBy } = req.body

  // 1. Validate promotion path
  300">const validPromotions = {
    &#39;development&#39;: [&#39;staging&#39;],
    &#39;staging&#39;: [&#39;production&#39;]
  }

  300">if (!validPromotions[fromEnv]?.includes(toEnv)) {
    300">return res.status(400).json({
      error: &#39;Invalid promotion path&#39;
    })
  }

  // 2. Check approval requirement
  300">if (toEnv === &#39;production&#39;) {
    300">const approval = 300">await CoreCloud.getApproval({
      300">type: &#39;config_promotion&#39;,
      brand: brand,
      requester: req.user.id
    })

    300">if (!approval.approved) {
      300">return res.status(403).json({
        error: &#39;Production promotion requires approval&#39;
      })
    }
  }

  // 3. Load source config
  300">const sourceConfig = 300">await loadBrandConfig(brand, fromEnv)

  // 4. Validate config
  300">const validation = 300">await validateConfig(sourceConfig)
  300">if (!validation.valid) {
    300">return res.status(400).json({
      error: &#39;Config validation failed&#39;,
      errors: validation.errors
    })
  }

  // 5. Run smoke tests
  300">const smokeTests = 300">await runSmokeTests(brand, sourceConfig, toEnv)
  300">if (!smokeTests.passed) {
    300">return res.status(400).json({
      error: &#39;Smoke tests failed&#39;,
      failures: smokeTests.failures
    })
  }

  // 6. Create backup of current production config
  300">if (toEnv === &#39;production&#39;) {
    300">await backupConfig(brand, toEnv)
  }

  // 7. Promote config
  300">await uploadConfig(brand, toEnv, {
    ...sourceConfig,
    environment: toEnv,
    version: incrementVersion(sourceConfig.version),
    promotedFrom: fromEnv,
    promotedBy: req.user.email,
    promotedAt: 300">new Date().toISOString()
  })

  // 8. Trigger cache refresh
  300">await refreshConfigCache(brand, toEnv)

  // 9. Log promotion
  300">await CoreCloud.logConfigPromotion({
    brand: brand,
    300">from: fromEnv,
    to: toEnv,
    version: sourceConfig.version,
    promotedBy: req.user.email
  })

  res.json({
    success: 300">true,
    brand: brand,
    environment: toEnv,
    version: incrementVersion(sourceConfig.version)
  })
})

Automated Validation

TYPESCRIPT
// Validation schema
300">const configSchema = {
  300">type: &#39;object&#39;,
  required: [&#39;brandId&#39;, &#39;environment&#39;, &#39;version&#39;, &#39;ai&#39;, &#39;features&#39;],
  properties: {
    brandId: {
      300">type: &#39;string&#39;,
      enum: [&#39;growain&#39;, &#39;securitain&#39;, &#39;mindagain&#39;, &#39;corefinops&#39;]
    },
    environment: {
      300">type: &#39;string&#39;,
      enum: [&#39;development&#39;, &#39;staging&#39;, &#39;production&#39;]
    },
    version: {
      300">type: &#39;string&#39;,
      pattern: &#39;^\\d+\\.\\d+\\.\\d+$&#39;
    },
    ai: {
      300">type: &#39;object&#39;,
      required: [&#39;primaryModel&#39;],
      properties: {
        primaryModel: {
          300">type: &#39;object&#39;,
          required: [&#39;provider&#39;, &#39;modelId&#39;, &#39;parameters&#39;],
          properties: {
            temperature: {
              300">type: &#39;number&#39;,
              minimum: 0,
              maximum: 1
            },
            maxTokens: {
              300">type: &#39;integer&#39;,
              minimum: 100,
              maximum: 32000
            }
          }
        }
      }
    }
  }
}

// Validate 300">function
300">async 300">function validateConfig(config: any): Promise<ValidationResult> {
  300">const ajv = 300">new Ajv()
  300">const valid = ajv.validate(configSchema, config)

  300">if (!valid) {
    300">return {
      valid: 300">false,
      errors: ajv.errors
    }
  }

  // Additional business logic validation
  300">if (config.environment === &#39;production&#39;) {
    300">if (config.ai.primaryModel.parameters.temperature > 0.8) {
      300">return {
        valid: 300">false,
        errors: [&#39;Production temperature should not exceed 0.8&#39;]
      }
    }
  }

  300">return { valid: 300">true }
}

Faster Launches, Fewer Manual Errors

Before SSOT: Traditional Deployment

CODE
1. Developer writes code                    [2 hours]
2. Hard-codes configuration                 [30 min]
3. Creates pull request                     [15 min]
4. Code review                              [1 day]
5. Merge to main                            [5 min]
6. Deploy to staging                        [20 min]
7. Manually copy config to staging env vars [15 min]
8. Test in staging                          [2 hours]
9. Find config issue (temp wrong)           [1 hour debugging]
10. Fix and redeploy                        [40 min]
11. Deploy to production                    [30 min]
12. Manually copy config to prod env vars   [20 min]
13. Verify production                       [30 min]

Total: ~3 days, 2 manual steps, high error risk

After SSOT: Config-as-Data

CODE
1. Developer updates config JSON            [15 min]
2. Creates pull request                     [5 min]
3. Automated validation runs                [2 min]
4. Team reviews config changes              [30 min]
5. Merge to main                            [2 min]
6. Auto-deploy to development               [3 min]
7. Promotion API: dev → staging             [2 min]
8. Automated tests run                      [5 min]
9. Promotion API: staging → prod            [2 min]
10. Verify production                       [10 min]

Total: 1 hour, 0 manual steps, low error risk

Error Rate Reduction

6 months before SSOT:

  • 23 production incidents related to config
  • 47 staging/production drift issues
  • 156 hours spent debugging config problems

6 months after SSOT:

  • 2 production incidents (both network, not config)
  • 0 drift issues
  • 8 hours total on config troubleshooting

Impact: 95% reduction in config-related issues

Version Control and Rollback

Git as Change History

BASH
# View config change history
git log --oneline brands/growain/production.json

# Output:
a3f8c91 Increase token limit for enterprise users
b2e7d85 Enable campaign analysis feature
c1d6f73 Switch to Claude v2.1
d0c5e62 Add Spanish language support

Instant Rollback

TYPESCRIPT
// Rollback to previous version
300">async 300">function rollbackConfig(
  brand: string,
  environment: string,
  targetVersion: string
) {
  // Get previous config 300">from Git
  300">const previousConfig = 300">await getConfigFromGit(
    brand,
    environment,
    targetVersion
  )

  // Validate it still works
  300">const validation = 300">await validateConfig(previousConfig)
  300">if (!validation.valid) {
    300">throw 300">new Error(&#39;Previous config no longer valid&#39;)
  }

  // Create backup of current config
  300">await backupConfig(brand, environment)

  // Upload previous config
  300">await uploadConfig(brand, environment, {
    ...previousConfig,
    version: incrementVersion(previousConfig.version),
    rolledBackFrom: getCurrentVersion(brand, environment),
    rolledBackBy: req.user.email,
    rolledBackAt: 300">new Date().toISOString()
  })

  // Refresh cache
  300">await refreshConfigCache(brand, environment)

  300">return {
    success: 300">true,
    rolledBackTo: targetVersion
  }
}

Real Incident:

CODE
12:15 PM: Growain production updated
12:22 PM: User reports strange behavior
12:23 PM: Team investigates
12:26 PM: Decision to rollback
12:27 PM: Rollback executed
12:28 PM: Service restored

Total downtime: 6 minutes (vs. 45+ min with code rollback)

Configuration Testing

Unit Tests for Config

TYPESCRIPT
// Test configuration files
describe(&#39;Growain Production Config&#39;, () => {
  300">let config: BrandConfig

  beforeAll(300">async () => {
    config = 300">await loadConfig(&#39;brands/growain/production.json&#39;)
  })

  test(&#39;should have valid schema&#39;, () => {
    300">const validation = validateConfig(config)
    expect(validation.valid).toBe(300">true)
  })

  test(&#39;should use production-approved models&#39;, () => {
    300">const approvedModels = [&#39;anthropic.claude-v2&#39;, &#39;gpt-4&#39;]
    expect(approvedModels).toContain(config.ai.primaryModel.modelId)
  })

  test(&#39;should have conservative temperature for production&#39;, () => {
    expect(config.ai.primaryModel.parameters.temperature).toBeLessThanOrEqual(0.8)
  })

  test(&#39;should have security features enabled&#39;, () => {
    expect(config.security.turnstileEnabled).toBe(300">true)
    expect(config.security.piiRedaction).toBe(300">true)
    expect(config.security.auditLogging).toBe(300">true)
  })

  test(&#39;should have rate limits configured&#39;, () => {
    expect(config.rateLimit).toBeDefined()
    expect(config.rateLimit.messagesPerMinute).toBeGreaterThan(0)
  })
})

Integration Tests

TYPESCRIPT
// Test config works with actual services
describe(&#39;Config Integration Tests&#39;, () => {
  test(&#39;should successfully load and use config&#39;, 300">async () => {
    300">const config = 300">await loadBrandConfig(&#39;growain&#39;, &#39;staging&#39;)

    // Test AI model access
    300">const response = 300">await AgenticCloud.generate({
      brand: &#39;growain&#39;,
      config: config,
      input: &#39;Test message&#39;
    })

    expect(response).toBeDefined()
    expect(response.error).toBeUndefined()
  })

  test(&#39;should respect rate limits&#39;, 300">async () => {
    300">const config = 300">await loadBrandConfig(&#39;growain&#39;, &#39;staging&#39;)
    300">const limit = config.rateLimit.messagesPerMinute

    // Send limit + 1 messages
    300">const requests = Array(limit + 1).fill(300">null).map(() =>
      sendMessage(&#39;Test&#39;)
    )

    300">const results = 300">await Promise.allSettled(requests)
    300">const rejected = results.filter(r => r.status === &#39;rejected&#39;)

    expect(rejected.length).toBeGreaterThan(0)
  })
})

CI/CD Integration

GitHub Actions Workflow

YAML
name: Deploy Config

on:
  push:
    branches: [main]
    paths:
      - &#39;brands/**/*.json&#39;
      - &#39;policies/**/*.json&#39;
      - &#39;models/**/*.json&#39;

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Validate JSON syntax
        run: |
          find brands -name "*.json" -exec json_verify {} \;

      - name: Validate config schema
        run: |
          npm run validate:config

      - name: Run config tests
        run: |
          npm test -- config.test.ts

  deploy-dev:
    needs: validate
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to development
        run: |
          aws s3 sync brands/ s3://cloudain-config/brands/ \
            --exclude "*" \
            --include "*/development.json" \
            --sse aws:kms \
            --sse-kms-key-id ${{ secrets.KMS_KEY_ID }}

      - name: Invalidate cache
        run: |
          aws lambda invoke \
            --300">function-name invalidate-config-cache \
            --payload &#39;{"environment":"development"}&#39; \
            response.json

  promote-staging:
    needs: deploy-dev
    runs-on: ubuntu-latest
    steps:
      - name: Run smoke tests on dev
        run: npm run test:smoke:dev

      - name: Promote to staging
        run: |
          curl -X POST https://api.cloudain.com/promote-config \
            -H "Authorization: Bearer ${{ secrets.API_TOKEN }}" \
            -d &#39;{"fromEnv":"development","toEnv":"staging"}&#39;

  # Production promotion requires manual approval
  promote-production:
    needs: promote-staging
    runs-on: ubuntu-latest
    environment: production
    steps:
      - name: Promote to production
        run: |
          curl -X POST https://api.cloudain.com/promote-config \
            -H "Authorization: Bearer ${{ secrets.API_TOKEN }}" \
            -d &#39;{"fromEnv":"staging","toEnv":"production"}&#39;

Real-World Impact

Deployment Velocity

Deployment Velocity

Measured improvements after adopting Configuration-as-Data

SSOT Results

Metric

Config change time

Before

3 days

After

1 hour

24x faster

Metric

Manual steps

Before

2

After

0

100% elimination

Metric

Error rate

Before

8%

After

&lt;0.5%

94% reduction

Metric

Rollback time

Before

45+ min

After

&lt;5 min

9x faster

Metric

Environment drift incidents

Before

47 cases / quarter

After

0 cases

Perfect sync

Developer Experience

Before:

CODE
"I need to change the AI temperature"
→ Find where it&#39;s configured (code? env 300">var? database?)
→ Update in development
→ Test locally
→ Update in staging (manually)
→ Test in staging
→ Update in production (manually)
→ Hope they match
→ 3 days later, discover they don&#39;t match

After:

CODE
"I need to change the AI temperature"
→ Edit brands/growain/production.json
→ Create PR (auto-validated)
→ Merge
→ Auto-promoted through environments
→ Done in 1 hour
→ Perfect consistency guaranteed

Conclusion

Configuration-as-data transformed Cloudain's operations from chaotic to predictable. By treating config as a first-class citizen-version controlled, validated, tested, and promoted-we achieved:

Consistency:

  • 0 drift between environments
  • Identical behavior in staging and production
  • Predictable deployments

Speed:

  • 24x faster config changes
  • <5 min rollback vs. 45+ min
  • Deploy multiple times per day safely

Quality:

  • 94% reduction in config errors
  • Automated validation prevents mistakes
  • Git history provides audit trail

Key principles:

  • Config is data, not code
  • Git is source of truth
  • S3 is runtime authority
  • Promote, don't copy
  • Test everything

CoreCloud manages it all - encryption, access control, promotion workflows, and audit trails.

The result: Cloudain deploys faster, with confidence, and without the constant fear of configuration drift.

Ready to modernise configuration?

Book a configuration architecture review with Cloudain engineers.

We assess Git workflows, environment promotion, validation guardrails, and change automation to design a single source of truth tailored to your teams.

Talk to Cloudain

View service options
  • Playbook design for configuration-as-data adoption
  • GitOps pipelines with policy and validation gates
  • Secrets & compliance alignment across environments

3-hour discovery workshop

Environment drift assessment

Promotion automation blueprint

Modernize Your Config Management

Ready to eliminate configuration chaos?

Schedule a Config Architecture Review →

Learn how CoreCloud's SSOT approach can transform your operations.

Cloudain

Cloudain Editorial Team

Expert insights on AI, Cloud, and Compliance solutions. Helping organisations transform their technology infrastructure with innovative strategies.

Unite your teams behind measurable transformation outcomes.

Partner with Cloudain specialists to architect resilient platforms, govern AI responsibly, and accelerate intelligent operations.