Devops Interview Questions – Terraform
Table of Contents
- 1) Difference between terraform import and terraform taint
- 2) How do you manage secrets in Terraform without hardcoding them?
- 3) Difference between count and for_each + real-world use case
- 4) How do you handle drift detection in Terraform?
- 5) What is a Terraform remote backend, and why is it important?
- 6) How do you manage multiple environments (dev, staging, prod) in Terraform?
- 7) Difference between local-exec and remote-exec provisioners
- 8) How do you safely roll back infrastructure changes after a failed deployment?
- 9) terraform refresh vs terraform plan
- 10) How do you write reusable Terraform modules?
- 11) How do you troubleshoot slow Terraform apply?
- 12) How do you ensure Docker image immutability?
- 13) How do you troubleshoot Azure Kubernetes Service (AKS) scaling issues?
- 14) How do you handle Jenkins credentials securely?
- 15) How do you manage stateful applications in Kubernetes?
- 16) How do you monitor CI/CD pipeline performance?
- 17) How do you troubleshoot GCP Cloud Build quota exceeded error?
- 18) How do you enforce policy-as-code in Terraform?
- 19) How do you handle Kubernetes secret exposure in logs?
- 20) How do you implement CI/CD rollbacks automatically?
1) Difference between terraform import
and terraform taint
terraform import
- Brings an existing infrastructure resource (created outside Terraform) under Terraform management.
- Example: If you already created an AWS S3 bucket manually, you can run:
terraform import aws_s3_bucket.my_bucket bucket-name
terraform taint
- Marks a resource as โtaintedโ so Terraform will destroy and recreate it during the next
apply
. - Example:
terraform taint aws_instance.my_ec2
- Used when a resource is unhealthy or needs replacement.
- Marks a resource as โtaintedโ so Terraform will destroy and recreate it during the next
โ
Key difference: import
is for bringing external resources into Terraform, taint
is for forcing recreation of existing ones.
2) How do you manage secrets in Terraform without hardcoding them?
- Options:
- Use Terraform variables with
sensitive = true
. - Store secrets in environment variables and reference them via
TF_VAR_
prefix. - Integrate with secret managers:
- AWS Secrets Manager (
data "aws_secretsmanager_secret_version"
) - HashiCorp Vault provider
- Azure Key Vault / GCP Secret Manager
- AWS Secrets Manager (
- Use
.tfvars
files but never commit them to Git. - For CI/CD, inject secrets from pipeline variables instead of storing them in code.
- Use Terraform variables with
3) Difference between count
and for_each
+ real-world use case
count
- Creates multiple instances of a resource based on an index.
- Example:
resource "aws_instance" "servers" { count = 3 ami = "ami-123" instance_type = "t2.micro" }
- Creates 3 EC2 instances, referenced as
servers[0]
,servers[1]
, etc.
for_each
- Creates resources from a map or set, allows identifying resources by key.
- Example:
resource "aws_s3_bucket" "buckets" { for_each = toset(["dev", "staging", "prod"]) bucket = "my-app-${each.key}" }
- Creates 3 uniquely named S3 buckets (
my-app-dev
, etc.), referenced bybuckets["dev"]
.
โ Use case:
- Use
count
when creating identical resources. - Use
for_each
when resources must be uniquely identifiable.
4) How do you handle drift detection in Terraform?
- Drift = when infrastructure changes outside Terraform.
- Methods:
- Run
terraform plan
regularly โ shows differences. - Use
terraform refresh
โ updates state to match actual infra. - Integrate with CI/CD โ automated drift detection reports.
- Use Terraform Cloud/Enterprise โ built-in drift detection and notifications.
- Run
5) What is a Terraform remote backend, and why is it important?
- Remote backend = Stores Terraform state file in a shared, centralized location instead of locally.
- Examples: S3 + DynamoDB (AWS), Azure Blob Storage, GCS, Terraform Cloud.
- Benefits:
- State consistency across teams.
- Enables state locking (prevents two users from applying changes simultaneously).
- Secure storage (no local accidental leaks).
- Better collaboration in teams.
6) How do you manage multiple environments (dev, staging, prod) in Terraform?
- Best practices:
- Use separate workspaces (
terraform workspace new dev
). - Use directory structure:
/terraform /dev /staging /prod
- Parameterize using variables (e.g., instance size, region).
- Use remote backends per environment to isolate state.
- Optionally use Terragrunt to simplify multi-environment management.
- Use separate workspaces (
7) Difference between local-exec
and remote-exec
provisioners
- local-exec
- Runs a command on the machine where Terraform is executed.
- Example: run a script in your CI/CD pipeline.
- remote-exec
- Runs a command inside the remote resource (e.g., VM/EC2) after creation.
- Example: installing Apache on a new EC2 instance.
โ
Key difference: local-exec
= local machine, remote-exec
= target resource.
8) How do you safely roll back infrastructure changes after a failed deployment?
- Options:
- Use version control (Git) to revert
.tf
code and re-apply. - Use state snapshots (remote backends often store history).
- Use
terraform plan
before every apply to minimize surprises. - Use blue-green deployment โ switch traffic between old and new infra.
- In critical systems, rely on manual approval steps in CI/CD.
- Use version control (Git) to revert
9) terraform refresh
vs terraform plan
terraform refresh
- Updates the state file to reflect real-world infra but does not show planned changes.
- Example: If an EC2 instance was stopped manually, state will update to show its stopped status.
terraform plan
- Shows the execution plan: what changes Terraform will make to reach desired state.
- Detects drift and proposes corrective actions.
โ
Difference: refresh
just syncs state; plan
compares desired vs actual and shows changes.
10) How do you write reusable Terraform modules?
- Steps:
- Create a module directory with
.tf
files (main.tf
,variables.tf
,outputs.tf
). - Define input variables in
variables.tf
. - Expose useful values using
outputs.tf
. - Call module from root config:
module "vpc" { source = "./modules/vpc" cidr_block = "10.0.0.0/16" }
- Version modules when using remote sources (Git, Terraform Registry).
- Create a module directory with
11) How do you troubleshoot slow Terraform apply?
- Check if
terraform plan
is also slow (state refresh issue). - Use parallelism flag:
terraform apply -parallelism=20
- Check if backend (e.g., S3/DynamoDB) is slow โ enable state locking optimizations.
- Large modules? โ Break infra into smaller modules.
- Check provider API rate limits (AWS, Azure, GCP).
- Use
TF_LOG=DEBUG
to see delays.
12) How do you ensure Docker image immutability?
- Use unique tags (commit SHA, build ID) instead of
latest
. - Example:
myapp:1.0.0
ormyapp:gitsha1234
. - Store images in immutable registries (ECR, GCR, ACR) with tag immutability enabled.
- Use content digest (
sha256
) references when deploying to Kubernetes.
13) How do you troubleshoot Azure Kubernetes Service (AKS) scaling issues?
- Check Cluster Autoscaler logs (
kubectl logs deployment/cluster-autoscaler
). - Ensure VM quotas in Azure are not exhausted.
- Check node pool limits (max nodes reached).
- Verify pod scheduling constraints (taints, affinity, resource requests).
- Use
kubectl describe pod
to see why pods are pending. - Check Azure Monitor / Metrics for scaling events.
14) How do you handle Jenkins credentials securely?
- Store credentials in Jenkins Credentials Store, not in jobs/pipelines.
- Use withCredentials{} block in pipeline:
withCredentials([string(credentialsId: 'my-secret', variable: 'TOKEN')]) { sh "echo $TOKEN" }
- Integrate with Vault / AWS Secrets Manager / Azure Key Vault plugins.
- Use RBAC to limit credential access.
15) How do you manage stateful applications in Kubernetes?
- Use PersistentVolume (PV) and PersistentVolumeClaim (PVC).
- Use StatefulSets for stable network identity & ordered pod management.
- Ensure storage class supports dynamic provisioning.
- Plan for backup & disaster recovery (Velero, snapshots).
- Use anti-affinity rules to spread pods across nodes.
16) How do you monitor CI/CD pipeline performance?
- Measure build & deployment times (Jenkins, GitLab, GitHub Actions dashboards).
- Track failure rates, retries, queue wait time.
- Use APM tools (Datadog, New Relic, Prometheus + Grafana) for build agents.
- Set SLIs/SLOs for pipeline execution time.
- Profile pipelines โ optimize caching (Docker layers, dependencies).
17) How do you troubleshoot GCP Cloud Build quota exceeded error?
- Check GCP quota page (
gcloud compute regions describe
). - Common quotas: concurrent builds, API requests, storage usage.
- Increase quota via GCP console request.
- Optimize builds โ use smaller machine types or parallel builds.
- Cache dependencies with Cloud Build caching or external registry.
18) How do you enforce policy-as-code in Terraform?
- Use Open Policy Agent (OPA) or HashiCorp Sentinel.
- Example: Enforce only specific instance types:
deny[msg] { input.resource.type == "aws_instance" input.resource.values.instance_type != "t3.micro" msg = "Only t3.micro instances allowed" }
- Integrate with CI/CD pipeline (run policies before apply).
- Use tools like Terraform Cloud + Sentinel or OPA + Conftest.
19) How do you handle Kubernetes secret exposure in logs?
- Avoid
kubectl describe pod
printing secrets โ usekubectl get secret -o yaml
. - Set
automountServiceAccountToken: false
when not needed. - Enable audit logging to detect exposures.
- Use Sealed Secrets / External Secrets Operator for safer management.
- Mask sensitive env vars in CI/CD pipelines.
20) How do you implement CI/CD rollbacks automatically?
- Use blue-green or canary deployments โ rollback by switching traffic.
- Configure health checks โ if deployment fails, pipeline triggers rollback.
- Example in Kubernetes: use ArgoCD / Flux with GitOps rollback.
- Jenkins/GitLab CI: keep previous artifact versions for redeployment.
- Automate rollback in pipeline:
if: failure() script: - ./deploy.sh previous_version