Are you ready to make an impact at DTCC
Do you want to work on innovative projects, collaborate with a dynamic and supportive team, and receive investment in your professional developmentAt DTCC, we are at the forefront of innovation in the financial markets. We are committed to helping our employees grow and succeed. We believe that you have the skills and drive to make a real impact. We foster a thriving internal community and are committed to creating a workplace that looks like the world that we serve.
Pay and Benefits:
- Competitive compensation, including base pay and annual incentive
- Comprehensive health and life insurance and well-being benefits, based on location
- Pension / Retirement benefits
- Paid Time Off and Personal/Family Care, and other leaves of absence when needed to support your physical, financial, and emotional well-being.
- DTCC offers a flexible/hybrid model of 3 days onsite and 2 days remote (onsite Tuesdays, Wednesdays and a third day unique to each team or employee).
The Impact you will have in this role:
At DTCC, the Observability team is at the forefront of ensuring the health, performance, and reliability of our critical systems and applications. We empower the organization with real-time visibility into infrastructure and business applications by leveraging cutting-edge monitoring, reporting, and visualization tools.
Our team collects and analyzes metrics, logs, and traces using platforms like Splunk and other telemetry solutions. This data is essential for assessing application health and availability, and for enabling rapid root cause analysis when issues arise helping us maintain resilience in a fast-paced, high-volume trading environment.
If youre passionate about observability, data-driven problem solving, and building systems that make a real-world impact, we d love to have you on our team.
Primary Responsibilities:
As a member of DTCC s Observability team, you will play a pivotal role in enhancing our monitoring and telemetry capabilities across critical infrastructure and business applications. Your responsibilities will include:
- Lead the migration from OpenText monitoring tools to Grafana and other open-source platforms.
- Design and deploy monitoring rules for infrastructure and business applications.
- Develop and manage alerting rules and notification workflows.
- Build real-time dashboards to visualize system health and performance.
- Configure and manage OpenTelemetry Collectors and Pipelines.
- Integrate observability tools with CI/CD, incident management, and cloud platforms.
- Deploy and manage observability agents across diverse environments.
- Perform upgrades and maintenance of observability platforms.
Qualifications:
- Minimum of 6-8 years of related experience.
- Bachelors degree preferred or equivalent experience.
Talent needed for success
- Proven experience designing intuitive, real-time dashboards (e.g., in Grafana) that effectively communicate system health, performance trends, and business KPIs.
- Expertise in defining and tuning monitoring rules, thresholds, and alerting logic to ensure accurate and actionable incident detection.
- Strong understanding of both application-level and operating system-level metrics, including CPU, memory, disk I/O, network, and custom business metrics.
- Experience with structured log ingestion, parsing, and analysis using tools like Splunk, Fluentd, or OpenTelemetry.
- Familiarity with implementing and analyzing synthetic transactions and real user monitoring to assess end-user experience and application responsiveness.
- Hands-on experience with application tracing tools and frameworks (e.g., OpenTelemetry, Jaeger, Zipkin) to diagnose performance bottlenecks and service dependencies.
- Proficiency in configuring and using AWS CloudWatch for collecting and visualizing cloud-native metrics, logs, and events.
- Understanding of containerized environments (e.g., Docker, Kubernetes) and how to monitor container health, resource usage, and orchestration metrics.
- Ability to write scripts or small applications in languages such as Python, Java, or Bash to automate observability tasks and data processing.
- Experience with automation and configuration management tools such as Ansible, Terraform, Chef, or SCCM to deploy and manage observability components at scale.
Actual salary is determined based on the role, location, individual experience, skills, and other considerations. Please contact us to request accommodation.