View all jobs

Sterling OMS Developer

Mumbai, Maharashtra
  • Responsible for the availability, performance, scaling, monitoring and incident response of the E-Commerce platform and services.
  • Ensure the E-commerce sites are up 24*7 building in the site reliability engineering.
  • Engineering services support and Application support to the Open source or Enterprise software stack of the E-commerce platform.
  • Troubleshooting of exceptions, performance issues and latencies / errors across multiple technologies.
  • Debugging of the code issues based on web service and API responses, errors, events, logs, etc.
  • Work on / triage of the daily tickets related to Application support.
  • Automate the critical jobs across the entire platform to minimize manual errors and human intervention.
  • Work closely with the Technology stakeholders, Product, Application development, QA, etc. and offer right feedback on the Java stack or Enterprise E-commerce stack from production engineering perspective.
  • Implementation of effective monitoring for all the events and logs with right alerting / escalations for the critical alerts.
  • Capacity planning and Infrastructure upgrades timely for best reliability of the site.
  • Ensure proper reviews are built to minimize the Mean Time to Recover (MTTR) and Mean Time to Failure (MTTF).
  • Implementation of ITIL processes like Incident management, problem management and change management.
  • Documentation of runbooks, incident response and post-mortem reports, etc.
  • Understand the business flow and map the technology problems to get the right solutions out.
  • Ability to understand the end-to-end product life cycle and map it to production engineering
  • Support the Engineering services of the entire technology platforms from the scaling and performance perspective.
  • Manage the uptime of each of the micro services by building and implementation of the right monitoring and alerts.
  • Ability to automate any repeatable job
  • Strong incident management with less response and resolution times to keep the Site Up always.
  • Build in the redundancies and proactively avoid any downtime situations.
  • Strong problem management abilities by automating any repeatable jobs and working with the stakeholders to ensure the incidents do not repeat again.
Technical Skills
  • BS degree in Computer Science or related engineering disciplines
  • 4-14 years of relevant work experience in any of the Online companies like Media, E-Commerce or Cloud based product companies.
  • Strong understanding of the business flow and software design.
  • Experience in at least one programming and scripting language; Python preferred.
  • Experience in optimizing the routine tasks through automation.
  • Experience in monitoring / alerting tools, both Infrastructure and application monitoring.
  • Strong experience on Opensource and Java stack
  • Experience managing varied stakeholders like Business, Development, QA and Product.
  • Strong troubleshooting / debugging skills and proven experience in Site reliability engineering of highly scalable and performance technology platforms.
  • Experience in Open source technologies like Elasticsearch, SolR, etc.
  • Experience in building monitoring and alerting framework for Infrastructure, applications, databases and NoSQL, is a plus.
  • Working knowledge of commercial technologies like Hybris, Sterling, etc.
Good to Have
  • Linux and Database Administration at intermediate level.
  • Ability to review the code and suggest inputs to the Development teams.
  • Ability to use commercial APM tools and troubleshoot the issues.
Powered by