Case Studies
Jan 13, 2025
ML for Handling Trading Platform System Outage
A large financial technology company faced frequent application outages affecting its trading platforms and order management systems. These outages ...
Summary
A large financial technology company faced frequent application outages affecting its trading platforms and order management systems. These outages led to financial losses and required immediate support from the IT operations team, especially during trading hours. To resolve these critical issues, the IT ops team had to manually sift through log files, syslogs, and telemetry data, a time-consuming process. Furthermore, fixes could not be implemented during trading hours, leading to significant customer dissatisfaction.
To address this challenge, TenJumps developed an ETL system to ingest telemetry data, including CPU usage, memory, and process logs from all servers. This data was then made available to experts for analysis, enabling them to identify potential issues in the early stages. Leveraging this telemetry data, a machine learning model was trained to recognize patterns and suggest potential fixes. Before applying for a fix, it was sent for approval to ensure there was no risk of unintended consequences.
As a result, outages were detected earlier, allowing for timely fixes or proactive resolutions through pattern matching. This significantly improved system resilience and reduced downtime, enhancing overall customer satisfaction.