Self-Healing AI-Native Real-Time Data Pipelines: Autonomous Resilience For Large-Scale Streaming Systems

Authors

  • Yogesh Pugazhendhi Duraisamy Rajamani

DOI:

https://doi.org/10.63278/jicrcr.vi.3653

Abstract

In large streaming platforms today, there are common operational issues, such as data drift, throughput degradation, partition imbalance, and cascading failures, that impact availability and performance. Existing monitoring and rule-based automatic remediation solutions are unsuitable for workloads with millisecond-level latency and high availability needs. This article introduces a fully self-healing AI-native real-time data pipeline that integrates machine learning into the control plane of the streaming platform. It presents an end-to-end architecture that leverages graph neural networks and transformers for hybrid anomaly detection, LSTM-based predictive fault modeling, and reinforcement learning-based agents that autonomously select the best remediation policy (e.g., dynamic resource scaling, partition rebalancing, and dataflow rerouting). The framework implements continuous healing based on the detect-diagnose-predict-decide-act-verify-learn loop. Evaluating the framework with synthetic and real-world high-throughput streaming workloads shows improvements in downtime, latency, fault domains, and resource utilization to establish a new model of autonomous stream processing infrastructures that can continue to operate mission-critical workloads in cloud, hybrid, and edge environments.

Downloads

Published

2026-01-05

How to Cite

Rajamani, Y. P. D. (2026). Self-Healing AI-Native Real-Time Data Pipelines: Autonomous Resilience For Large-Scale Streaming Systems. Journal of International Crisis and Risk Communication Research , 440–449. https://doi.org/10.63278/jicrcr.vi.3653

Issue

Section

Articles