Job Description
US Work Authorization Reqirement:
Candidates must be legally authorized to work in the United States without employer sponsorship. This includes, but is not limited to, U.S. Citizens, Permanent Residents, and other individuals with valid U.S. work authorization.
Job Description:
We are seeking a Senior DevOps / Cloud Operations & Security Monitoring Engineer with 10+ years of experience in cloud infrastructure, operations, and security monitoring. This role is responsible for ensuring the reliability, performance, security, and cost-efficiency of AWS cloud environments while driving automation, observability, and incident response excellence.
The ideal candidate will bring deep hands-on experience in AWS operations, cloud security monitoring, infrastructure-as-code, and automation practices.
Key Responsibilities
Cloud Operations
- Administer, configure, and support AWS infrastructure (EC2, S3, VPC, IAM, RDS, Lambda, etc.).
- Monitor system health, performance, and availability across cloud workloads.
- Implement and manage Infrastructure-as-Code (Terraform, CloudFormation).
- Troubleshoot outages, service disruptions, and performance issues.
- Optimize cloud resource utilization and manage cost-efficiency initiatives.
- Maintain backup strategies, disaster recovery planning, and high-availability architecture.
- Support CI/CD pipelines and automated deployment workflows.
Security Monitoring & Incident Response
- Monitor AWS security logs, telemetry, and alerts (CloudTrail, GuardDuty, Security Hub, etc.).
- Investigate security incidents and coordinate remediation efforts.
- Maintain and tune detection rules, monitoring dashboards, and alert thresholds.
- Support vulnerability management, patching, and remediation processes.
- Assist in audit readiness and compliance evidence collection.
- Strengthen cloud security posture through collaboration with security teams.
Automation & Observability
- Develop automation scripts using Python, PowerShell, or Bash.
- Implement monitoring and alerting solutions (CloudWatch, Datadog, Splunk, etc.).
- Build dashboards for operational health and security reporting.
- Create runbooks and incident response documentation.
- Participate in on-call rotations and post-incident reviews.
- Identify recurring issues and implement long-term corrective solutions.
Required Qualifications
- 10+ years of IT experience with strong cloud operations background.
- 5+ years of hands-on experience in AWS environments.
- Experience with cloud monitoring, logging, and alerting tools.
- Strong understanding of networking, IAM, and system administration.
- Experience handling production incidents and security alerts.
- Hands-on scripting experience (Python, PowerShell, or Bash).
- Strong analytical and documentation skills.
Preferred Qualifications
- Experience with SIEM or threat detection platforms.
- Hands-on Infrastructure-as-Code (Terraform, CloudFormation).
- Knowledge of vulnerability scanning and remediation.
- Familiarity with compliance frameworks (NIST, CIS, ISO 27001).
- AWS Certifications (Solutions Architect, Security Specialty).
- Security+ or equivalent certification.
- Experience hardening AWS environments for enterprise standards.