API Key & Token Management: Scope, Rotation, and Incident Recovery
PART 1: The Fundamental Distinction
Most teams treat API keys like passwords. That is the foundational mistake.
In practice, many API keys function as shared secrets with no expiration and no scoping—effectively making them passwords by another name. But the design intent of API keys is different, and that distinction matters for how you manage them.
A credential that authenticates who you are grants access based on identity. Modern database credentials use RBAC, read-only roles, and schema-level restrictions—they don’t necessarily grant broad access, but they authenticate a persistent, human-tied identity.
An API key is designed to authenticate what a specific service is allowed to do: action X, on resource Y, until time Z. Compromise of a well-scoped API key means an attacker can only do what that key permits—which should be narrow. Compromise of an unscoped key with full permissions is as dangerous as a stolen master credential.
This is why scoping and lifecycle management matter. An API key treated like a password—one key, full permissions, used everywhere—is as risky as a shared admin account. The goal is to make API keys nothing like passwords: temporary, scoped, and replaceable.
The Analogy
- A master key to your house opens every lock. That’s how most teams use API keys today.
- A valet key starts the car and opens the trunk, but not the safe inside. That’s how API keys should work.
The valet key is meant to be temporary, meant to be replaceable, and meant to be limited in scope. If the valet key is stolen, you revoke it without revoking all your keys. The damage is bounded.
Why This Matters for Your Architecture
Database passwords are typically entered by humans or stored in tightly controlled application config. API keys are routinely embedded in source code, automation scripts, CI/CD pipelines, environment files, and third-party integrations—dramatically increasing their exposure surface. They leak in Git history, error messages, logs, browser consoles, monitoring dashboards, and third-party libraries.
Design for two goals simultaneously: minimize the probability of exposure, and minimize the blast radius when exposure occurs anyway. Treating keys as inviolable leads to under-engineered recovery processes. Assuming leakage is inevitable can justify sloppy prevention practices. Both disciplines matter.
PART 2: The Threat Model—Where API Keys Actually Leak
Understanding real exposure vectors is critical. These aren’t theoretical—they happen constantly.
Exposure Vector 1: Git History
A developer commits a .env file containing an API key. The file is deleted in the next commit, but the key lives forever in Git history. GitHub, GitLab, and third-party security scanners (like Truffle Hog, GitGuardian) find it within minutes. The key is now in public scanners and potentially in multiple forks.
Recovery from the exposure itself is difficult. Even after revocation, anyone who cloned the repo before deletion retains a copy. You can use git-filter-repo to scrub history and force-push, but this rewrites history and requires coordination across all contributors. Regardless -- revoke the key immediately and treat it as compromised. Do not rely on scrubbing history as your primary mitigation.
Prevention: Pre-Commit Secret Scanning
The most effective defense against Git history exposure is blocking the commit before it happens. Several tools enforce this at the developer workstation:
- gitleaks -- scans staged files and Git history for secrets; runs as a pre-commit hook
- detect-secrets (Yelp) -- generates a baseline of known safe values and blocks new detections
- GitHub Push Protection -- server-side scanning that blocks pushes containing known secret patterns (GitHub Advanced Security)
- git-secrets (AWS Labs) -- focused on AWS credentials
A minimal pre-commit hook using gitleaks:
# .git/hooks/pre-commit
gitleaks protect --staged --redact -v
if [ $? -ne 0 ]; then
echo "Secret detected. Commit blocked."
exit 1
fi
Install this in every repository. A blocked commit is far less expensive than an incident response.
Exposure Vector 2: Error Messages and Logs
Your application makes an API request and receives a 401 Unauthorized error. The error handling code logs the full request—including the Authorization header with the API key. This log entry goes to your centralized logging service (CloudWatch, Datadog, New Relic, Splunk). Now anyone with access to your logging service can see the key.
Worse: the error message is sent to Slack, forwarded to a colleague, and documented in a Jira ticket. It’s no longer localized.
Exposure Vector 3: API Keys in Frontend / Browser Code
The fundamental rule: Secret API keys must never exist in browser-side code. If a secret credential is in your JavaScript bundle, HTML, or any file served to the browser, it is public. DevTools, view-source, and network inspection expose everything sent to the client.
Not all client-side keys are secret. Public or restricted keys—such as a Google Maps embed key, a Firebase config key, or a client-scoped key with domain restrictions and quotas—may be acceptable in the browser if properly constrained. The distinction is between a credential that grants privileged server-side access versus a key that is intentionally limited in capability and scope.
For privileged keys, the correct architecture is:
Browser -> Your Backend Proxy -> External API (with secret key)
The backend holds the secret. The browser authenticates with your backend only; your backend calls the external API on its behalf.
A developer seeing a secret API key in the Network tab is a symptom that the architecture is wrong—not just a logging problem.
Exposure Vector 4: Observability and Monitoring Tools
You’re logging HTTP requests to Datadog or New Relic to debug performance. The request object includes headers. The API key is logged. Your observability vendor now has it. If their platform is breached, your key is exfiltrated.
Exposure Vector 5: Dependency Vulnerabilities
A third-party library you use (like a custom HTTP client) has verbose logging for debugging. That library gets compromised or has a vulnerability. The compromised version logs all network requests, including API keys. Your keys are exfiltrated when developers install the compromised version.
Exposure Vector 6: Memory Dumps and Process Inspection
Your application loads an API key at startup and keeps it in memory for the entire lifetime. If a developer memory-dumps the process for debugging, or if an attacker gains access to the running process, the key is readable from memory.
Exposure Vector 7: Webhook and Callback URLs
You configure a webhook to notify your service of events. The URL includes an API key as a query parameter: https://myservice.com/webhook?token=<YOUR_KEY>. That URL is logged in the third-party service’s access logs, visible to their ops team, possibly exposed in a breach.
The Cumulative Threat
Each of these vectors is independently realistic. Most teams experience at least three of them in a year. The compounding effect is that API keys are fundamentally difficult to keep secret once they’re in circulation.
PART 3: Token Types and Their Properties
The landscape of API authentication has evolved to address different threat models. Understanding which type solves which problem is critical for architecture decisions.
Static API Keys
These are long-lived credentials issued once and valid until manually revoked or rotated. Examples: OpenAI API keys, Stripe keys, basic Google Cloud API keys, Anthropic API keys.
Advantages: Simple to implement. No infrastructure required. Works for simple use cases.
Disadvantages: No automatic expiration. If compromised, there’s a window of vulnerability until detected and rotated. No scoping in many platforms. Requires manual rotation discipline.
Use case: External service calls where you control both ends (your backend calling OpenAI).
Bearer Tokens
These are typically short-lived tokens issued by an OAuth provider or your own service. They expire automatically after a set duration. They’re often refreshed without requiring the user to re-authenticate.
Advantages: Automatic expiration limits blast radius. Tokens can be instantly revoked. Often support scoping.
Disadvantages: Requires infrastructure to issue and refresh tokens. More complex than static keys.
Use case: User authentication in web applications. Service-to-service communication with built-in expiration.
OAuth 2.0 Tokens
Note: OAuth tokens are typically bearer tokens—the categories above are not mutually exclusive. Bearer tokens describe the format; OAuth is the framework that issues and manages them.
OAuth separates authentication (who you are) from authorization (what you can do). You authenticate once, receive a token, and use that token for subsequent requests. The token can be revoked without affecting other tokens. Speed of revocation depends on the token type (opaque tokens revoke immediately; JWT-based OAuth tokens depend on TTL or introspection) and provider implementation.
Advantages: Granular control. Instant revocation. User consent model. Integrates with identity providers (Google, GitHub, etc.).
Disadvantages: More complex to implement. Requires an authorization server.
Use case: Third-party integrations. User authentication. Enterprise identity federation.
JWT (JSON Web Tokens)
These are self-contained tokens. The token itself contains claims about what you can do (issued by, expiration time, scopes, user ID), and it’s cryptographically signed by a private key. No database lookup needed to validate the token.
Advantages: Stateless validation. Fast. No database calls required. Works well for distributed systems.
Disadvantages: Revocation requires additional mechanisms. Options include: short token lifetimes (industry standard is 15 minutes to 1 hour), signing key rotation (invalidates all outstanding tokens), token introspection endpoints, or an explicit revocation list. Each adds operational complexity. Token size is larger than opaque tokens. Key management is more involved.
Use case: Microservices authentication. Mobile app backends. Stateless API servers.
Temporary Credentials
Cloud platforms (AWS, Google Cloud, Azure) issue temporary credentials that are valid for minutes or hours. AWS STS (Security Token Service) is the most common example. These credentials auto-expire and are typically refreshed automatically by the SDK.
Advantages: Minimal blast radius. Automatic rotation. No manual key management.
Disadvantages: Requires cloud platform integration. More infrastructure.
Use case: AWS Lambda, EC2 instances, Google Cloud Run. Any workload where you want automatic credential rotation.
Kubernetes Secrets
Kubernetes Secrets are commonly misunderstood as encrypted. They are not. By default, Kubernetes stores secrets as base64-encoded values in etcd -- base64 is a reversible encoding, not encryption. Anyone with access to etcd or the Kubernetes API can read your secrets in plaintext.
To make Kubernetes Secrets actually secure:
- Enable etcd encryption at rest using a KMS provider (AWS KMS, Google Cloud KMS, Azure Key Vault)
- Use an External Secrets Operator to pull from a real secrets manager (AWS Secrets Manager, Google Secret Manager) at runtime -- the secret never lives in etcd
- Restrict the Kubernetes Secrets API via RBAC -- not every pod should be able to list or read secrets
- Never commit API keys to manifest files or Helm values files in Git
The safest pattern for Kubernetes: an External Secrets Operator (ESO) syncing from your secrets manager, combined with the Secrets Store CSI Driver for direct pod injection. Note that ESO by default writes a copy to a Kubernetes Secret object (stored in etcd). To avoid etcd entirely, use the Secrets Store CSI Driver—secrets are mounted into pod memory without passing through etcd. Regardless of pattern, enable etcd encryption at rest and restrict the Secrets API via RBAC.
The Decision Matrix
| Scenario | Type | Why |
|---|---|---|
| Calling external API (OpenAI, Stripe) | Static API Key | Simple, one-time setup |
| Your backend, multiple services | Temporary Credentials (AWS STS) | Auto-rotating, no manual work |
| Mobile app to your backend | Bearer Token or OAuth | Expiration, revocation, user consent |
| Microservice to microservice | JWT or OAuth Client Credentials | Stateless, fast validation; use Client Credentials flow, not Authorization Code |
| User authentication | OAuth 2.0 | Industry standard, integrates with identity providers |
| Instant revocation required | OAuth Bearer Token | Can revoke without redeployment |
PART 4: Scope Limiting—The Core Security Practice
Scope limiting is where most teams fail. They create an API key with full permissions, use it everywhere, and hope no one finds it.
Real security is the opposite: Create an API key with only the permissions needed for that specific use case.
Resource-Level Restrictions
Your monitoring service needs to read CloudWatch logs. Don’t give it access to all AWS services. Restrict it to CloudWatch. Better: restrict it to specific log groups, not all logs.
Your backup service needs to read from S3 bucket-A. Don’t give it access to all buckets. Restrict it to bucket-A only.
Your frontend needs to upload images to S3. Don’t give it full S3 access. Create a separate key that can only PutObject to a specific prefix in a specific bucket.
Action-Level Restrictions
Your analytics service needs to read your database. It doesn’t need to write or delete. Restrict it to SELECT queries only.
Your deployment pipeline needs to push Docker images to your registry. It doesn’t need to delete images. Restrict it to PushImage action only.
Your monitoring service needs to list and read. It doesn’t need to create, modify, or delete. Restrict to ReadOnly actions.
Time-Based Restrictions
Create API keys that auto-expire. AWS temporary credentials expire in hours. You can create custom API keys that expire in days or weeks.
A contractor needs access to your API for a month. Create a key that’s valid until the end of the month, then it automatically becomes invalid.
IP-Based Restrictions
Restrict an API key to specific source IPs. Your backend service calls an external API only from a known egress IP range. Restrict the key to those IPs.
Your webhook handler receives callbacks only from Stripe’s published IP addresses. Restrict the key to those ranges.
Caveat: IP-based restrictions are a supplementary control, not a primary defense. They are fragile in cloud environments (NAT gateways, elastic IPs, multi-region deployments), bypassable if an attacker compromises an allowed host, and incompatible with zero-trust architectures that do not treat network location as a trust signal. Use them as one layer of defense-in-depth, not as the primary mitigation.
Rate Limiting
An API key can make 1000 requests per hour, not unlimited. If the key is compromised and an attacker starts abusing it, the rate limit triggers alerts.
Implementation Example: AWS IAM Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-backup-bucket",
"arn:aws:s3:::my-backup-bucket/*"
]
}
]
}
This policy allows GetObject and ListBucket actions only on my-backup-bucket. Not on any other bucket. Not DeleteObject. Not CreateBucket.
The Analogy (Revisited)
You hire a contractor to paint your bedroom. You give them a key. What should that key open?
- Insecure: A master key that opens your entire house, your car, and your safe.
- Secure: A key that opens only the bedroom door, only during business hours, and expires when the job is done.
PART 5: Rotation Strategy—Building for Inevitable Compromise
Assume your API key will be compromised. Design rotation into your architecture from day one.
Pattern 1: Dual-Key Rotation
You maintain two valid API keys at all times. Every 30 days, you:
- Create a new key.
- Update your services to use the new key.
- Monitor for 24-48 hours to ensure all services are using the new key.
- Delete the old key.
During the transition window, both keys are valid. Requests using the old key still work. This prevents downtime during rotation.
Implementation:
- Create a key management function that returns the “current” key.
- Store both the active and pending keys in your secrets manager.
- When you rotate, you update the secrets manager, and your service picks up the new key on next fetch.
Pattern 2: Versioned Keys with Metadata
Each key has a version number, creation timestamp, and last-rotated timestamp. Instead of managing individual keys, you manage key versions.
If a key from version 3 is compromised, you can instantly invalidate all version-3 keys without knowing which ones are active.
Example metadata:
Key ID: api-key-prod-001
Version: 3
Created: 2026-06-01
LastRotated: 2026-06-15
ExpiresAt: 2026-07-15
Scope: [s3:GetObject, s3:ListBucket]
Resource: arn:aws:s3:::my-bucket
Status: active
Pattern 3: Keyless Authentication
This is the modern approach. Instead of storing an API key, you use temporary credentials that are automatically refreshed. AWS IAM roles, Google Cloud service accounts, and HashiCorp Vault all support this.
Your code doesn’t store a key at all. It asks the platform: “Who am I?” The platform responds: “You’re service-X, here’s a temporary token valid for one hour.” After one hour, the code asks again and gets a new token. The old one is worthless.
This approach eliminates manual rotation entirely. The platform handles it invisibly.
The Critical Point: Runtime Loading, Not Startup Caching
Here’s where most teams get it wrong: They load the API key once when the application starts and cache it in memory.
When you rotate the key, the running application still has the old one in memory. You have to restart all instances. During restart, requests fail.
One approach: fetch the key from a secrets manager at process start, cache it locally for 60-300 seconds, and refresh on each subsequent fetch after the TTL expires. When you rotate the key in the secrets manager, your application picks it up within that cache window.
This is one of several valid patterns. Alternatives include: sidecar injection (the secret is mounted into the container at deploy time), process restart on rotation (standard in Kubernetes and operationally safe), and long-lived in-memory caching with a rotation signal or webhook. In multi-instance systems, be aware that a rolling cache expiry means different instances may briefly use different key versions—design your rotation to tolerate a short overlap window.
When you rotate the key, the next request picks up the new one automatically. Zero downtime. Zero restarts.
For production setups, use a secrets manager (AWS Secrets Manager, HashiCorp Vault) and cache the secret locally for 60-300 seconds. The secrets manager is the single source of truth. When you rotate a key there, your application picks it up within that cache window.
Incident-Driven Rotation
If you discover a key is compromised:
- Check logs to assess damage (who accessed what, when).
- Immediately create a new key.
- Deploy the new key to all services (instant if using a secrets manager that’s polled).
- Revoke the old key.
- Monitor aggressively for 72 hours for any continued unauthorized access.
The key difference from scheduled rotation: you don’t wait for the next rotation window. You rotate immediately.
PART 6: Platform-Specific Implementations
Google Cloud API Keys
Google issues static API keys with no built-in expiration. You must implement rotation manually.
Features:
- Restrict by API (only BigQuery, only Compute Engine, etc.)
- Restrict by resource (specific projects, specific VPCs)
- IP address and HTTP referrer restrictions are supported on standard API keys via API key restrictions
- No automatic expiration
Rotation Pattern:
- Use Google Cloud Functions to automate rotation.
- Create a new key.
- Update your application configuration (via Cloud Config, Secrets Manager, or environment variables).
- Wait 24 hours for old key to be fully phased out.
- Delete the old key.
Best Practice: Set a calendar reminder to rotate keys every 30 days. Better: use a Cloud Function to automate this.
OpenAI API Keys
Static API keys with no expiration, no scoping, and no IP restrictions. Compromise means attacker can use your full quota.
Features:
- View API usage in the dashboard
- Set spending limits at the organization level (not per-key)
- Create separate keys for different projects
- Revoke immediately without downtime
Recovery Process:
- In the OpenAI dashboard, revoke the compromised key immediately.
- Check your API usage and billing to assess damage.
- Create a new key.
- Update your code and redeploy.
- Monitor for unusual activity (unexpected usage spikes, different regions, different models).
Limitation: You can’t restrict a key to specific models or specific IP addresses. All keys have equal power. The only mitigation is spending limits and usage monitoring.
Best Practice:
- Create separate keys for production and development.
- Create separate keys for different applications.
- Use spending limits to cap each key’s usage.
- Log API requests with timestamps so you can audit usage.
AWS Credentials
AWS has multiple authentication methods, each with different security properties.
Static Access Keys (Access Key ID + Secret Access Key):
- Long-lived by default.
- Require manual rotation.
- Can be restricted via IAM policies.
- Credentials are global (not environment-specific).
Better: Temporary Credentials via STS:
- Auto-expire in 1 to 12 hours (configurable).
- Automatically refreshed by boto3.
- Can be restricted via IAM policies.
- No manual rotation needed.
Best: IAM Roles:
- EC2 instances, Lambda, ECS containers, and other services automatically get credentials injected.
- Credentials auto-rotate every few minutes.
- Your code just uses
boto3without any configuration. - No keys stored anywhere.
Implementation Example: Using IAM Roles with EC2
Attach an IAM role to your EC2 instance. Your code uses boto3:
import boto3
# No credentials in code. boto3 automatically finds them via the EC2 metadata service.
s3 = boto3.client('s3')
response = s3.list_buckets()
The EC2 metadata service provides temporary credentials. boto3 automatically refreshes them before expiration. You never see or manage keys.
Stripe API Keys
Stripe issues separate keys for different environments (test and live). Each environment has restricted and unrestricted keys.
Features:
- Test and live keys are completely separate.
- Restricted API keys: Can be limited to specific resources and actions.
- Webhook signing keys: Used to verify webhook authenticity.
Security Pattern:
Create separate restricted keys for different parts of your application:
- Charges key: Can only create charges, not refund.
- Webhooks key: Can only verify webhooks.
- Dashboard key: Can only read billing information.
Each key has minimal permissions for its specific use case.
Recovery:
- Revoke the compromised key in the Stripe dashboard.
- Create a new key with the same restrictions.
- Update your code.
- Check recent API activity to see what was accessed.
Anthropic API Keys
Static API keys with no automatic expiration. No scoping or IP restrictions.
Features:
- Simple bearer token authentication.
- View usage on your account dashboard.
- Set spending limits.
- Revoke and recreate easily.
Best Practice:
- Create separate keys for different applications or environments.
- Monitor usage and costs.
- Rotate every 30 days.
- Set spending limits to detect abuse.
Recovery:
- Revoke the compromised key immediately.
- Create a new key.
- Update your code with the new key.
- Monitor for any charges from unusual locations or high request volumes.
PART 7: Code Examples—Secure Patterns in Practice
Python: Secure API Key Usage with Requests
The Problem (Insecure Pattern):
import os
import requests
api_key = os.getenv('OPENAI_API_KEY')
headers = {'Authorization': f'Bearer {api_key}'}
response = requests.post(
'https://api.openai.com/v1/chat/completions',
headers=headers,
json={'model': 'gpt-4', 'messages': [{'role': 'user', 'content': 'Hello'}]}
)
The Issues:
Note: os.getenv() is not the security issue here — loading from environment variables is an acceptable pattern for many contexts. Be aware of their limitations: env vars are visible to all processes running under the same user, accessible via /proc/[pid]/environ on Linux, often logged by process managers (systemd, PM2, Heroku), and inherited by child processes by default. For high-sensitivity keys, prefer a secrets manager or OS keyring. The vulnerabilities in this example are in error handling and logging:
- If an exception occurs, the full request object (including the Authorization header) gets logged.
- The headers dictionary is printed in error messages.
- If you serialize the request for debugging, the API key is exposed.
- If a library logs HTTP requests, the key is visible.
The Solution (Secure Pattern):
import os
import requests
from requests.auth import HTTPBearerAuth
import logging
# Configure logging to exclude sensitive data
logging.basicConfig(
format='%(asctime)s - %(levelname)s - %(message)s',
level=logging.INFO
)
logger = logging.getLogger(__name__)
def get_api_key():
"""Load API key at runtime, not startup."""
return os.getenv('OPENAI_API_KEY')
def call_openai_safe(prompt):
"""Make API call with secure error handling."""
api_key = get_api_key()
if not api_key:
raise ValueError("OPENAI_API_KEY not set")
auth = HTTPBearerAuth(api_key)
try:
response = requests.post(
'https://api.openai.com/v1/chat/completions',
auth=auth, # Separate auth from request body
json={
'model': 'gpt-4',
'messages': [{'role': 'user', 'content': prompt}]
},
timeout=30 # Prevent hanging requests
)
response.raise_for_status()
logger.info(f"API call succeeded with status {response.status_code}")
return response.json()
except requests.exceptions.Timeout:
logger.error("API call timed out after 30 seconds")
raise
except requests.exceptions.HTTPError as e:
# Log status, NOT the full response with headers
logger.error(f"API returned status {e.response.status_code}")
raise
except requests.exceptions.RequestException as e:
logger.error(f"API call failed: {type(e).__name__}")
raise
# Usage
if __name__ == '__main__':
result = call_openai_safe("What is API key security?")
print(result)
Why This Is Better:
- HTTPBearerAuth separates authentication from the request body.
- The exception handler logs only status codes and error types, never the full request.
- API keys are loaded at runtime, not startup, enabling zero-downtime rotation.
- The timeout prevents requests from hanging indefinitely.
Python: API Key Manager with Rotation
import os
import json
from datetime import datetime, timedelta
from typing import Optional
# WARNING: This class manages key metadata and rotation tracking.
# Do NOT store actual key values in the JSON file in production.
# Store key values in a secrets manager (AWS Secrets Manager, Google Secret Manager,
# HashiCorp Vault) or OS keyring. Use this class for metadata and rotation
# orchestration only — not as a key vault.
class APIKeyManager:
"""Manage API key metadata with versioning and expiration tracking."""
def __init__(self, key_file='api_keys.json'):
self.key_file = key_file
self.keys = self._load_keys()
def _load_keys(self):
"""Load keys from storage."""
if os.path.exists(self.key_file):
with open(self.key_file, 'r') as f:
return json.load(f)
return {}
def _save_keys(self):
"""Persist keys to storage."""
with open(self.key_file, 'w') as f:
json.dump(self.keys, f, indent=2)
def add_key(
self,
service: str,
key: str,
expires_in_days: int = 30,
scope: Optional[list] = None
):
"""Add a new API key with optional expiration and scope."""
if service not in self.keys:
self.keys[service] = []
expiration = datetime.utcnow() + timedelta(days=expires_in_days)
key_entry = {
'key': key,
'created': datetime.utcnow().isoformat(),
'expires': expiration.isoformat(),
'active': True,
'version': len(self.keys[service]) + 1,
'scope': scope or []
}
self.keys[service].append(key_entry)
self._save_keys()
print(f"Added {service} key version {key_entry['version']}")
def get_active_key(self, service: str) -> str:
"""Get the currently active key for a service."""
if service not in self.keys:
raise ValueError(f"No keys found for {service}")
for key_entry in self.keys[service]:
# Check if key is active and not expired
if key_entry['active']:
expiration = datetime.fromisoformat(key_entry['expires'])
if expiration > datetime.utcnow():
return key_entry['key']
raise ValueError(f"No active keys for {service}")
def revoke_key(self, service: str, key: str):
"""Revoke a specific key."""
if service not in self.keys:
return
for key_entry in self.keys[service]:
if key_entry['key'] == key:
key_entry['active'] = False
self._save_keys()
print(f"Revoked {service} key")
return
print(f"Key not found for {service}")
def revoke_all_before(self, service: str, cutoff_date: str):
"""Revoke all keys created before a date (incident response)."""
if service not in self.keys:
return
cutoff = datetime.fromisoformat(cutoff_date)
revoked_count = 0
for key_entry in self.keys[service]:
created = datetime.fromisoformat(key_entry['created'])
if created < cutoff:
key_entry['active'] = False
revoked_count += 1
self._save_keys()
print(f"Revoked {revoked_count} {service} keys created before {cutoff_date}")
def list_keys(self, service: str):
"""List all keys for a service (redacted)."""
if service not in self.keys:
return []
keys_info = []
for key_entry in self.keys[service]:
keys_info.append({
'version': key_entry['version'],
'created': key_entry['created'],
'expires': key_entry['expires'],
'active': key_entry['active'],
'scope': key_entry['scope'],
'key': f"{key_entry['key'][:10]}...{key_entry['key'][-4:]}" # Redacted
})
return keys_info
# Usage Example
if __name__ == '__main__':
manager = APIKeyManager()
# Add a new OpenAI key valid for 30 days
manager.add_key('openai', 'sk-test-12345abcdef', expires_in_days=30, scope=['chat.completions'])
# Get the active key
current_key = manager.get_active_key('openai')
print(f"Using key: {current_key[:10]}...{current_key[-4:]}")
# List all keys (redacted)
keys = manager.list_keys('openai')
for key in keys:
print(key)
# Simulate key compromise: revoke all keys created before a date
manager.revoke_all_before('openai', datetime.utcnow().isoformat())
Node.js: Secure API Key Usage with Axios
The Problem (Insecure Pattern):
const axios = require('axios');
const apiKey = process.env.OPENAI_API_KEY;
axios.post('https://api.openai.com/v1/chat/completions',
{ model: 'gpt-4', messages: [...] },
{ headers: { 'Authorization': `Bearer ${apiKey}` } }
).catch(err => console.log(err));
The Issues:
- The error handler logs the entire error object, which includes headers.
- API key is cached in memory at startup.
- No timeout configuration.
- Full request is visible if debugging tools are enabled.
The Solution (Secure Pattern):
const axios = require('axios');
// Load API key at runtime
function getApiKey() {
const key = process.env.OPENAI_API_KEY;
if (!key) {
throw new Error('OPENAI_API_KEY not set');
}
return key;
}
// Create axios instance with sensible defaults
const createOpenAIClient = () => {
return axios.create({
baseURL: 'https://api.openai.com/v1',
timeout: 30000, // 30 second timeout
headers: {
'Authorization': `Bearer ${getApiKey()}`,
'Content-Type': 'application/json'
}
});
};
// Safe API call with proper error handling
async function callOpenAISafe(prompt) {
const client = createOpenAIClient();
try {
const response = await client.post('/chat/completions', {
model: 'gpt-4',
messages: [
{ role: 'user', content: prompt }
]
});
console.log(`API call succeeded with status ${response.status}`);
return response.data;
} catch (err) {
// Log only safe information, never the full error object
if (err.response) {
// Server responded with error status
console.error(`API returned status ${err.response.status}`);
} else if (err.request) {
// Request made but no response
console.error('No response from API');
} else {
// Error in request setup
console.error(`Request failed: ${err.message}`);
}
throw err;
}
}
// Usage
callOpenAISafe('What is API key security?')
.then(result => console.log(result))
.catch(err => console.error('Failed to call API'));
Why This Is Better:
- API key is loaded at runtime via getApiKey(), not startup.
- Axios instance separates auth from request body.
- Timeout prevents hanging requests.
- Error handler logs only status codes, not full response or headers.
- Easy to swap getApiKey() with a secrets manager call later.
Node.js: API Key Manager with Rotation
const fs = require('fs');
const path = require('path');
// WARNING: This class manages key metadata and rotation tracking.
// Do NOT store actual key values in the JSON file in production.
// Store key values in a secrets manager (AWS Secrets Manager, Google Secret Manager,
// HashiCorp Vault) or environment variables. Use this class for metadata and rotation
// orchestration only — not as a key vault.
class APIKeyManager {
constructor(keyFile = 'api_keys.json') {
this.keyFile = keyFile;
this.keys = this.loadKeys();
}
loadKeys() {
try {
const data = fs.readFileSync(this.keyFile, 'utf8');
return JSON.parse(data);
} catch {
return {};
}
}
saveKeys() {
fs.writeFileSync(this.keyFile, JSON.stringify(this.keys, null, 2));
}
addKey(service, key, expiresInDays = 30, scope = []) {
if (!this.keys[service]) {
this.keys[service] = [];
}
const expirationDate = new Date();
expirationDate.setDate(expirationDate.getDate() + expiresInDays);
const keyEntry = {
key: key,
created: new Date().toISOString(),
expires: expirationDate.toISOString(),
active: true,
version: this.keys[service].length + 1,
scope: scope
};
this.keys[service].push(keyEntry);
this.saveKeys();
console.log(`Added ${service} key version ${keyEntry.version}`);
}
getActiveKey(service) {
if (!this.keys[service]) {
throw new Error(`No keys found for ${service}`);
}
for (const keyEntry of this.keys[service]) {
if (keyEntry.active) {
const expiration = new Date(keyEntry.expires);
if (expiration > new Date()) {
return keyEntry.key;
}
}
}
throw new Error(`No active keys for ${service}`);
}
revokeKey(service, key) {
if (!this.keys[service]) return;
for (const keyEntry of this.keys[service]) {
if (keyEntry.key === key) {
keyEntry.active = false;
this.saveKeys();
console.log(`Revoked ${service} key`);
return;
}
}
console.log(`Key not found for ${service}`);
}
revokeAllBefore(service, cutoffDate) {
if (!this.keys[service]) return;
const cutoff = new Date(cutoffDate);
let revokedCount = 0;
for (const keyEntry of this.keys[service]) {
const created = new Date(keyEntry.created);
if (created < cutoff) {
keyEntry.active = false;
revokedCount++;
}
}
this.saveKeys();
console.log(`Revoked ${revokedCount} ${service} keys created before ${cutoffDate}`);
}
listKeys(service) {
if (!this.keys[service]) {
return [];
}
return this.keys[service].map(keyEntry => ({
version: keyEntry.version,
created: keyEntry.created,
expires: keyEntry.expires,
active: keyEntry.active,
scope: keyEntry.scope,
key: `${keyEntry.key.substring(0, 10)}...${keyEntry.key.substring(keyEntry.key.length - 4)}`
}));
}
}
// Usage Example
if (require.main === module) {
const manager = new APIKeyManager();
// Add a new OpenAI key
manager.addKey('openai', 'sk-test-12345abcdef', 30, ['chat.completions']);
// Get the active key
const currentKey = manager.getActiveKey('openai');
console.log(`Using key: ${currentKey.substring(0, 10)}...${currentKey.substring(currentKey.length - 4)}`);
// List all keys (redacted)
const keys = manager.listKeys('openai');
keys.forEach(key => console.log(key));
// Simulate incident: revoke all keys before today
manager.revokeAllBefore('openai', new Date().toISOString());
}
module.exports = APIKeyManager;
Monitoring: What to Log and What NOT to Log
import logging
import os
from functools import wraps
from datetime import datetime
# Configure logging to NEVER include sensitive data
logging.basicConfig(
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
level=logging.INFO
)
logger = logging.getLogger(__name__)
def safe_api_call(func):
"""Decorator to log API calls safely."""
@wraps(func)
def wrapper(*args, **kwargs):
start_time = datetime.utcnow()
try:
result = func(*args, **kwargs)
duration = (datetime.utcnow() - start_time).total_seconds()
# Log: Success, duration, number of records processed
logger.info(f"API call succeeded in {duration:.2f}s")
return result
except Exception as e:
duration = (datetime.utcnow() - start_time).total_seconds()
# Log: Error type, duration
# DO NOT log: Exception message (may contain request details), full traceback
logger.error(f"API call failed after {duration:.2f}s: {type(e).__name__}")
raise
return wrapper
@safe_api_call
def fetch_data_from_api(endpoint):
"""Example safe API call."""
import requests
api_key = os.getenv('API_KEY')
response = requests.get(f'https://api.example.com/{endpoint}', headers={'Authorization': f'Bearer {api_key}'})
response.raise_for_status()
return response.json()
# What to log:
# ✓ Timestamps
# ✓ Status codes (200, 401, 500)
# ✓ Response size
# ✓ Request duration
# ✓ Error types (TimeoutError, HTTPError)
# ✓ User IDs (non-sensitive identifiers)
# ✓ API endpoint called (no query parameters)
# What NOT to log:
# ✗ API keys or tokens
# ✗ Full request/response bodies
# ✗ Headers (contain Authorization)
# ✗ Query parameters containing sensitive values (redact; don't omit entirely)
# ✗ Unfiltered exception messages (may contain request details)
#
# On stack traces: do not suppress them entirely -- they are critical for debugging
# and forensics. Route them to a secure, access-controlled log destination and
# scrub any lines matching credential patterns. Suppressing traces outright
# reduces observability and slows incident response.
PART 8: Recovery Process—What to Do When a Key is Compromised
Step 1: Assess Damage and Begin Key Replacement in Parallel
Start log analysis while simultaneously creating the replacement key. Do not delay revocation to complete the assessment — run both in parallel.
Check your logs to understand what was accessed:
- What API calls were made?
- What data was retrieved or modified?
- When did the unauthorized access occur?
- From what IP addresses?
- Using what rate of requests?
This tells you the blast radius. Log access patterns may change once the key is revoked, so begin your assessment immediately — but do not let it delay Steps 2 and 3.
Step 2: Create and Deploy a New Key
While you’re analyzing logs, simultaneously:
- Create a new API key with identical permissions (same scope, same resources).
- Update your code or secrets manager to use the new key.
- Deploy the new key to all services.
If you’re using runtime key loading (as recommended), this is automatic—services pick up the new key on next request.
If you’re using static configuration, you need to redeploy services. Coordinate this to minimize downtime.
Step 3: Revoke the Old Key
Only after the new key is deployed and confirmed working:
- Revoke the old key in the provider’s console.
- Confirm revocation is complete.
This prevents the attacker from continuing to use the old key.
Step 4: Monitor Aggressively for 72 Hours
Attackers often keep old keys cached and retry them later, expecting you won’t notice:
- Monitor API usage from the compromised service.
- Alert on any requests using the old key.
- Watch for unusual geographic locations or request patterns.
- Check for unexpected data access or modifications.
Step 5: Notify Affected Parties
Depending on what data was accessed:
- Notify customers if their data was accessed.
- Notify compliance teams (may be a reportable incident).
- Notify your security team.
- Document the incident for audit purposes.