Sign In
 [New User? Sign Up]
Mobile Version

Site Reliability Operations Engineer

Job Code:
  • IT
Applying for this job will take you to an external site
  • Shortlist
  • Email Friend
  • Print

Job Details

Job BriefIf youre looking for opportunities to work on challenging problems, surround yourself with other hardworking talented engineers, and want to spend your time getting things done instead of sitting in meetings, come join the orange revolution.The Site Reliability Engineer (SRE) will be responsible for the full system lifecycle including configuration and code deployment in production environments. The Reliability Engineer uses technical analysis to assess the availability, latency, scalability and efficiency of a product or infrastructure by engineering reliability into software and systems. The SRE will work closely with all development teams and relevant functional operations teams such as Tier1 support team, network engineers, database administrators, etc. The successful candidate will also need to effectively guide incident response where root cause is unknown for cross-functional operation and development scrum teams to troubleshoot and help provide solutions to address database, OS, application, network and any other issues. Identifying underlying root causes and providing recommendations or solutions for long term permanent fixes will be paramount for success in the role. You will also provide recommendations for the building and improvement of expanding infrastructure and library of applications. For this position, exceptional critical thinking, problem solving and troubleshooting skills are necessary. A very good balance of process-oriented thinking skills and experience in application high availability is a must.Successful candidate must be able to function at a high level in critical situations.Essential Duties and Responsibilities Participate in software and system performance analysis and tuning, service capacity planning and demand forecasting Manage the availability, scalability, security and performance of our platforms and applications. Identify business needs and if needed look at new technologies and the relevance for the use case of the technologies to meet those business needs. Diagnose bottlenecks for the full stack and provide recommendations to overcome the bottlenecks as an interim work around, while a long-term solution is investigated. Identify all monitoring requirements are met and carry out periodic reviews of checks currently in place to ensure service meets or exceeds customer expectations.Proactively review and recommend changes to the live infrastructure after ensuring the right validation has been carried out. Prepare and deploy application releases Perform periodic on-call duty as part of a global teamQualifications include Education - BS in Computer Science or equivalent with 5-8 years of relevant work experience Technical Skillso Must-Have At least 5 years experience working with Microsoft SQL Server At least 5 years experience operating a five nines software service. Deep expertise in the mentality, processes, and tools needed to deliver five nines. At least 5 years experience in a SAAS operating environment. Experience with .NET and AWS is a strong plus. Strong working knowledge of Windows and it s underlying components, system statistics, performance tuning, filesystems and io. Solid scripting skills in Powershell, Perl or Python. Experience with production deployment, monitoring and operational support for Enterprise class application. Experience in performance diagnostics, capacity planning, performance architecture design, performance tuning, performance monitoring Experience working with high-traffic solutions/services. Hands-on experience in Python, SQL, Load Balancers and Firewallso Good-to-Have Can do attitude no problem is too big or too small. A desire to delight customers. A systematic problem solver, with the ability to think outside the box. Good data analysis skills to pick up trends before they become major problems. Previous experience as an enterprise class Site Reliability Engineer A strong mix of Software Engineer and Operation Support skills. Eager to learn new technologies and programming languages. Technologies you are likely to be working with Git and GitHub, Python, C#, Solarwinds, Splunk, Logentries, Cloudability, AWS, RabbitMQ, SQL, Casandra Soft Skillso Must-Have Good verbal and written communication skills Should be able to manage business communication in English Work collaboratively with willingness to listen and work in fast paced challenging environment
Additional Degree: B.Sc. (Science)

Experience: 5-8


Application Programming | Database Administration (DBA) | Quality Assurance/Testing | System Analyst/Tech Architect
Applying for this job will take you to an external site


© Copyright 2015 Al Nisr Publishing LLC - powered by Gulf News