Consumer Insights At Scale: ML-Driven Data Engineering For Distributed Cloud Analytics
DOI:
https://doi.org/10.63278/jicrcr.vi.3146Abstract
In today’s data-driven economy, extracting timely and actionable consumer insights is vital for businesses aiming to enhance competitiveness and customer engagement. This study presents an integrated framework combining machine learning (ML)-driven data engineering with distributed cloud analytics to process large-scale consumer data and derive predictive insights. Utilizing real-world datasets from e-commerce, digital platforms, and customer interactions, the research applies supervised learning models such as Random Forest, Gradient Boosting, and Neural Networks for behavior prediction, alongside K-Means clustering for market segmentation. Results indicate that Random Forest achieved the highest classification performance with a 96.4% accuracy and F1-score of 0.949. Segmentation revealed distinct consumer profiles, enabling targeted marketing strategies. The distributed cloud setup, evaluated across AWS and GCP regions and a hybrid mesh network, demonstrated high throughput and low latency, proving its suitability for scalable real-time analytics. Statistical validation, including fairness metrics and data drift assessments, confirmed the ethical integrity and stability of deployed models. The study concludes that the proposed architecture provides a robust, interpretable, and scalable solution for organizations seeking to operationalize consumer intelligence at scale through cloud-native, ML-powered infrastructures.