Data-Driven Predictive Modelling of Employee Absenteeism Using Workflow Automation Platforms
DOI:
https://doi.org/10.15157/ijitis.2026.9.1.115-136Keywords:
Employee Absenteeism, Agent, LightGBM, XGBoost, CatBoost, , Predictive Modelling, Human Resources, Machine LearningAbstract
Employee absence is a critical factor affecting organizational productivity and employee well-being. This study presents a data-driven predictive framework for employee absenteeism using a newly collected enterprise dataset comprising 8,336 employees. Absenteeism is formulated as a binary classification task, distinguishing employees with more than 80 hours of annual absence from those with lower absence levels, based on demographic and occupational characteristics. The proposed approach applies gradient-boosted decision tree models, including LightGBM, XGBoost, and CatBoost, evaluated through a stratified train–test split at the employee level to approximate temporal separation between training and prediction. Feature engineering procedures are detailed, including categorical encoding and the construction of a commuting-related indicator. All models demonstrate strong predictive performance, achieving accuracy between 85% and 87%, precision ranging from 78% to 80%, recall between 76% and 79%, and AUC–ROC values of 0.92–0.93. Model interpretability is addressed using SHAP-based feature attribution, identifying age, gender, and occupational role and location as key predictors of absenteeism risk. Furthermore, a practical system architecture is outlined, integrating the predictive models within an automated workflow using the n8n orchestration platform for deployment in human resource information systems. This enables proactive identification of high-risk absenteeism cases and supports early intervention strategies with minimal human oversight. The study contributes by addressing data leakage concerns, improving feature transparency, and demonstrating a deployable and interpretable predictive system. Future research directions include multi-organizational validation, temporal modelling using sequential data, and evaluation of system-level effectiveness in real-world HR settings.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Mohammed Alars, Abbas Albakry

This work is licensed under a Creative Commons Attribution 4.0 International License.


