Top DevOps Interview Questions for KPMG Job 2025

Devops Interview Questions – Terraform

1) Difference between terraform import and terraform taint

  • terraform import
    • Brings an existing infrastructure resource (created outside Terraform) under Terraform management.
    • Example: If you already created an AWS S3 bucket manually, you can run: terraform import aws_s3_bucket.my_bucket bucket-name
  • terraform taint
    • Marks a resource as โ€œtaintedโ€ so Terraform will destroy and recreate it during the next apply.
    • Example: terraform taint aws_instance.my_ec2
    • Used when a resource is unhealthy or needs replacement.

โœ… Key difference: import is for bringing external resources into Terraform, taint is for forcing recreation of existing ones.


2) How do you manage secrets in Terraform without hardcoding them?

  • Options:
    • Use Terraform variables with sensitive = true.
    • Store secrets in environment variables and reference them via TF_VAR_ prefix.
    • Integrate with secret managers:
      • AWS Secrets Manager (data "aws_secretsmanager_secret_version")
      • HashiCorp Vault provider
      • Azure Key Vault / GCP Secret Manager
    • Use .tfvars files but never commit them to Git.
    • For CI/CD, inject secrets from pipeline variables instead of storing them in code.

3) Difference between count and for_each + real-world use case

  • count
    • Creates multiple instances of a resource based on an index.
    • Example: resource "aws_instance" "servers" { count = 3 ami = "ami-123" instance_type = "t2.micro" }
    • Creates 3 EC2 instances, referenced as servers[0], servers[1], etc.
  • for_each
    • Creates resources from a map or set, allows identifying resources by key.
    • Example: resource "aws_s3_bucket" "buckets" { for_each = toset(["dev", "staging", "prod"]) bucket = "my-app-${each.key}" }
    • Creates 3 uniquely named S3 buckets (my-app-dev, etc.), referenced by buckets["dev"].

โœ… Use case:

  • Use count when creating identical resources.
  • Use for_each when resources must be uniquely identifiable.

4) How do you handle drift detection in Terraform?

  • Drift = when infrastructure changes outside Terraform.
  • Methods:
    • Run terraform plan regularly โ†’ shows differences.
    • Use terraform refresh โ†’ updates state to match actual infra.
    • Integrate with CI/CD โ†’ automated drift detection reports.
    • Use Terraform Cloud/Enterprise โ†’ built-in drift detection and notifications.

5) What is a Terraform remote backend, and why is it important?

  • Remote backend = Stores Terraform state file in a shared, centralized location instead of locally.
  • Examples: S3 + DynamoDB (AWS), Azure Blob Storage, GCS, Terraform Cloud.
  • Benefits:
    • State consistency across teams.
    • Enables state locking (prevents two users from applying changes simultaneously).
    • Secure storage (no local accidental leaks).
    • Better collaboration in teams.

6) How do you manage multiple environments (dev, staging, prod) in Terraform?

  • Best practices:
    • Use separate workspaces (terraform workspace new dev).
    • Use directory structure: /terraform /dev /staging /prod
    • Parameterize using variables (e.g., instance size, region).
    • Use remote backends per environment to isolate state.
    • Optionally use Terragrunt to simplify multi-environment management.

7) Difference between local-exec and remote-exec provisioners

  • local-exec
    • Runs a command on the machine where Terraform is executed.
    • Example: run a script in your CI/CD pipeline.
  • remote-exec
    • Runs a command inside the remote resource (e.g., VM/EC2) after creation.
    • Example: installing Apache on a new EC2 instance.

โœ… Key difference: local-exec = local machine, remote-exec = target resource.


8) How do you safely roll back infrastructure changes after a failed deployment?

  • Options:
    • Use version control (Git) to revert .tf code and re-apply.
    • Use state snapshots (remote backends often store history).
    • Use terraform plan before every apply to minimize surprises.
    • Use blue-green deployment โ†’ switch traffic between old and new infra.
    • In critical systems, rely on manual approval steps in CI/CD.

9) terraform refresh vs terraform plan

  • terraform refresh
    • Updates the state file to reflect real-world infra but does not show planned changes.
    • Example: If an EC2 instance was stopped manually, state will update to show its stopped status.
  • terraform plan
    • Shows the execution plan: what changes Terraform will make to reach desired state.
    • Detects drift and proposes corrective actions.

โœ… Difference: refresh just syncs state; plan compares desired vs actual and shows changes.


10) How do you write reusable Terraform modules?

  • Steps:
    1. Create a module directory with .tf files (main.tf, variables.tf, outputs.tf).
    2. Define input variables in variables.tf.
    3. Expose useful values using outputs.tf.
    4. Call module from root config: module "vpc" { source = "./modules/vpc" cidr_block = "10.0.0.0/16" }
    5. Version modules when using remote sources (Git, Terraform Registry).

11) How do you troubleshoot slow Terraform apply?

  • Check if terraform plan is also slow (state refresh issue).
  • Use parallelism flag: terraform apply -parallelism=20
  • Check if backend (e.g., S3/DynamoDB) is slow โ†’ enable state locking optimizations.
  • Large modules? โ†’ Break infra into smaller modules.
  • Check provider API rate limits (AWS, Azure, GCP).
  • Use TF_LOG=DEBUG to see delays.

12) How do you ensure Docker image immutability?

  • Use unique tags (commit SHA, build ID) instead of latest.
  • Example: myapp:1.0.0 or myapp:gitsha1234.
  • Store images in immutable registries (ECR, GCR, ACR) with tag immutability enabled.
  • Use content digest (sha256) references when deploying to Kubernetes.

13) How do you troubleshoot Azure Kubernetes Service (AKS) scaling issues?

  • Check Cluster Autoscaler logs (kubectl logs deployment/cluster-autoscaler).
  • Ensure VM quotas in Azure are not exhausted.
  • Check node pool limits (max nodes reached).
  • Verify pod scheduling constraints (taints, affinity, resource requests).
  • Use kubectl describe pod to see why pods are pending.
  • Check Azure Monitor / Metrics for scaling events.

14) How do you handle Jenkins credentials securely?

  • Store credentials in Jenkins Credentials Store, not in jobs/pipelines.
  • Use withCredentials{} block in pipeline: withCredentials([string(credentialsId: 'my-secret', variable: 'TOKEN')]) { sh "echo $TOKEN" }
  • Integrate with Vault / AWS Secrets Manager / Azure Key Vault plugins.
  • Use RBAC to limit credential access.

15) How do you manage stateful applications in Kubernetes?

  • Use PersistentVolume (PV) and PersistentVolumeClaim (PVC).
  • Use StatefulSets for stable network identity & ordered pod management.
  • Ensure storage class supports dynamic provisioning.
  • Plan for backup & disaster recovery (Velero, snapshots).
  • Use anti-affinity rules to spread pods across nodes.

16) How do you monitor CI/CD pipeline performance?

  • Measure build & deployment times (Jenkins, GitLab, GitHub Actions dashboards).
  • Track failure rates, retries, queue wait time.
  • Use APM tools (Datadog, New Relic, Prometheus + Grafana) for build agents.
  • Set SLIs/SLOs for pipeline execution time.
  • Profile pipelines โ†’ optimize caching (Docker layers, dependencies).

17) How do you troubleshoot GCP Cloud Build quota exceeded error?

  • Check GCP quota page (gcloud compute regions describe).
  • Common quotas: concurrent builds, API requests, storage usage.
  • Increase quota via GCP console request.
  • Optimize builds โ†’ use smaller machine types or parallel builds.
  • Cache dependencies with Cloud Build caching or external registry.

18) How do you enforce policy-as-code in Terraform?

  • Use Open Policy Agent (OPA) or HashiCorp Sentinel.
  • Example: Enforce only specific instance types: deny[msg] { input.resource.type == "aws_instance" input.resource.values.instance_type != "t3.micro" msg = "Only t3.micro instances allowed" }
  • Integrate with CI/CD pipeline (run policies before apply).
  • Use tools like Terraform Cloud + Sentinel or OPA + Conftest.

19) How do you handle Kubernetes secret exposure in logs?

  • Avoid kubectl describe pod printing secrets โ†’ use kubectl get secret -o yaml.
  • Set automountServiceAccountToken: false when not needed.
  • Enable audit logging to detect exposures.
  • Use Sealed Secrets / External Secrets Operator for safer management.
  • Mask sensitive env vars in CI/CD pipelines.

20) How do you implement CI/CD rollbacks automatically?

  • Use blue-green or canary deployments โ†’ rollback by switching traffic.
  • Configure health checks โ†’ if deployment fails, pipeline triggers rollback.
  • Example in Kubernetes: use ArgoCD / Flux with GitOps rollback.
  • Jenkins/GitLab CI: keep previous artifact versions for redeployment.
  • Automate rollback in pipeline: if: failure() script: - ./deploy.sh previous_version

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top