A Contextual Approach to Detecting Synonymous and Polluted Activity Labels in Process Event Logs


Abstract

Process mining, as a well-established research area, uses algorithms for process-oriented data analysis. Similar to other types of data analysis, the existence of quality issues in input data will lead to unreliable analysis results (garbage in – garbage out). An important input for process mining is an event log, which is a record of events related to a business process as it is performed through the use of an information system. While addressing quality issues in event logs is necessary, it is usually an ad-hoc and tiresome task. In this paper, we propose an automatic approach for detecting two types of data quality issues related to activities, both critical for the success of process mining studies: synonymous labels (same semantics with different syntax) and polluted labels (same semantics and same label structures). We propose the use of activity context, i.e. control flow, resource, time, and data attributes to detect semantically identical activity labels. We have implemented our approach and validated it using real-life logs from two hospitals and an insurance company, and have achieved promising results in detecting frequent imperfect activity labels.

Venue

International Conference on Cooperative Information Systems (CoopIS) 

Year

2019

Cite as

Sadeghianasl, S., ter Hofstede, A.H.M., Wynn, M.T., Suriadi, S. (2019). “A Contextual Approach to Detecting Synonymous and Polluted Activity Labels in Process Event Logs”. In: Panetto, H., Debruyne, C., Hepp, M., Lewis, D., Ardagna, C., Meersman, R. (eds) On the Move to Meaningful Internet Systems: OTM 2019 Conferences. OTM 2019. Lecture Notes in Computer Science, vol. 11877. Springer, Cham. https://doi.org/10.1007/978-3-030-33246-4_5