Process Mining - Part 1

Photo by rawpixel on Unsplash

Big Processes are a Big Topic in data analytics. This is particularly true in today’s enterprises who are placing a heavy emphasis on operational efficiency and are looking to instill a culture of continuous improvement.

“Process mining” is a set of analytical techniques we can use to gain insights into the following types of questions:

  • Process discovery
    • What is really happening in the workforce ?
    • How can we understand the current process ? Especially where no formal model artefacts may exist.
  • How often are activities performed ?
  • Who are the fastest at performing certain tasks ?
  • What is the “cycle” time ? What is the “waiting” time ?
  • How often do “handoffs” occur ? … and many other questions !

Tools like R provide an ideal platform for getting into a journey of process mining.

There is a brilliant R library centered on process mining from the people here:

Gert Janssenswillen (2020). bupaR: Business Process Analysis in R. R package version 0.4.4. https://CRAN.R-project.org/package=bupaR
I’m going to use bupaR and its suite of related packages to explore a dataset of transactional event log type data to learn a bit about the processes associated with this data.

“Workflow” type applications are ubiquitous in corporate settings and are a common candidate for this type of data supply for analysis. You may be able to find good candidates for sourcing this type of data in your workplace. For demonstration purposes here, I’ve synthesised some data. I’ve created the data to resemble the transaction log of loan originations operations for a financial services organisation.

First up, get the data from an Excel file I created:

library(readxl)
auditextract <- read_excel("G:/OneDrive/Documents/bupaR/AuditExtract1.xlsx")

Then, load up the required bupaR libraries and create an event log object from the source while mapping the relevant columns into the event log construct in bupaR. I won’t go into all the details of the event data model here - it’s relatively simple - an event belongs to a case, and a case is an instance of the process. You can read about the model in more detail the bupaR site.

library(bupaR)
library(processanimateR)

el <- auditextract %>%
  mutate(status = "complete",
         activity_instance = 1:nrow(.)) %>%
  eventlog(
    case_id = "AuditCaseID",
    activity_id = "StatusDescription",
    activity_instance_id = "rowid",
    lifecycle_id = "status",
    timestamp = "CreatedTimestamp",
    resource_id = "ResourceName2"
  )

# animate_process(eldata)
# nrow(el)

So, what does the raw process consist of ? With bupaR, we can easily visualise a process by plotting a process map. Let’s have a look:

process_map(el)
%0 2->6 83 2->8 1 2->11 2 2->13 34 2->14 2 2->19 12 2->21 1 2->27 21 2->30 4 2->32 270 2->33 74 2->34 1 2->35 6 2->37 3 2->39 3 2->40 1 2->41 4 2->47 1 2->48 13 2->49 64 2->53 71 2->54 7 2->55 2 2->57 4 2->60 27 3->1 1 3->48 28 3->57 1 4->48 8 5->1 3 5->47 4 5->51 13 5->59 518 6->59 467 7->21 1 8->7 1 9->1 1 9->9 4 9->13 8 10->10 1 10->11 1 11->25 41 11->26 1 12->9 9 12->12 3 13->1 1 13->10 1 13->11 39 13->14 5 14->1 3 14->13 4 15->12 9 15->15 3 15->23 1 15->25 3 16->19 1 17->22 13 18->16 1 19->17 13 19->20 1 20->19 1 21->15 1 21->18 1 21->21 7 21->27 2 21->28 2 21->29 11 22->23 14 23->23 2 23->24 15 24->15 8 24->24 2 24->25 7 25->1 41 25->25 2 25->33 13 26->1 1 27->15 4 27->21 14 27->22 1 27->25 3 27->27 4 27->28 3 27->29 1 28->1 1 28->21 1 28->27 3 29->1 12 29->29 1 30->1 1 30->30 44 30->31 1 30->32 3 30->33 1 30->34 35 30->35 2 31->37 14 32->51 381 33->1 9 33->32 59 33->33 6 33->35 26 34->1 1 34->30 4 34->31 13 34->32 21 34->34 1 34->35 1 34->40 2 34->43 10 35->30 35 36->32 17 37->38 17 38->36 17 39->1 3 40->34 3 41->42 10 42->34 10 43->1 4 43->41 6 44->46 467 45->44 467 46->1 12 46->6 1 46->49 451 46->53 1 46->54 2 47->1 3 47->47 6 47->57 1 47->58 4 48->1 4 48->3 30 48->4 1 48->5 525 48->6 1 48->48 560 48->49 9 48->50 5 48->53 20 48->56 1 48->57 1 49->1 18 49->5 4 49->6 20 49->32 10 49->48 3 49->49 548 49->50 9 49->51 2 49->52 15 49->53 479 49->54 16 49->55 1 49->56 5 49->57 1 50->1 4 50->4 7 50->47 2 50->48 531 50->51 1 50->53 1 50->57 1 51->1 36 51->6 360 51->33 5 51->34 3 51->47 1 51->48 1 51->49 2 51->53 1 52->1 5 52->49 12 52->52 6 52->53 5 53->1 3 53->5 5 53->48 3 53->49 18 53->50 526 53->52 7 53->53 13 53->54 49 53->56 136 54->1 5 54->49 24 54->53 45 55->1 4 55->57 1 56->1 23 56->33 1 56->48 1 56->49 1 56->53 116 56->56 7 57->1 5 57->6 1 57->55 1 57->57 96 57->58 37 57->60 520 58->1 465 58->5 1 58->6 1 58->48 1 58->51 12 58->53 1 58->57 21 58->58 1 58->60 2 59->1 2 59->5 3 59->32 1 59->45 467 59->48 7 59->49 1 59->50 7 59->53 6 59->55 1 59->57 490 60->1 3 60->57 38 60->58 463 60->61 86 61->1 37 61->48 1 61->49 1 61->53 1 61->57 5 61->60 41 1 End 2 Start 3 Authorisation WOI 30 4 AutoD Awaiting QA 8 5 AutoD Complete 538 6 AutoD Created 467 7 Credit Assessment Complete 1 8 Credit Assessment In Progress 1 9 Credit Decisioning Assigned 13 10 Credit Decisioning Awaiting Reprocess Reasons 2 11 Credit Decisioning Complete 42 12 Credit Decisioning Created 12 13 Credit Decisioning In Progress 46 14 Credit Decisioning WOI 7 15 Credit Referred 16 16 Credit Servicing Assigned 1 17 Credit Servicing Complete 13 18 Credit Servicing Created 1 19 Credit Servicing In Progress 14 20 Credit Servicing WOI 1 21 Deal Build Assigned 24 22 Deal Build Awaiting BLS 14 23 Deal Build BLS Assigned 17 24 Deal Build BLS in Progress 17 25 Deal Build Complete 56 26 Deal Build Credit Reprocess 1 27 Deal Build In Progress 30 28 Deal Build RTS 5 29 Deal Build Terminated 13 30 Doc Prep Assigned 87 31 Doc Prep Awaiting QA 14 32 Doc Prep Complete 381 33 Doc Prep Created 100 34 Doc Prep In Progress 53 35 Doc Prep Pre Check Complete 35 36 Doc Prep QA Accepted 17 37 Doc Prep QA Assigned 17 38 Doc Prep QA In Progress 17 39 Doc Prep Terminated 3 40 Doc Prep WOI 3 41 Doc Prep WOI Assigned 10 42 Doc Prep WOI In Progress 10 43 Doc Prep WOI Waiting 10 44 Originals Assigned 467 45 Originals Created 467 46 Originals Started 467 47 QA required 14 48 Settlement 1st Authorisation 1157 49 Settlement Assigned 1131 50 Settlement Complete 547 51 Settlement Created 409 52 Settlement Delayed 28 53 Settlement In Progress 760 54 Settlement RTS 74 55 Settlement Terminated 5 56 Settlement WOI 149 57 Upload Assigned 660 58 Upload Complete 505 59 Upload Created 985 60 Upload In Progress 590 61 Upload WOI 86

Well, wait right there ! The obvious comes to mind straight away - this ain’t no simple map. If we want to break it down and get some insights into what’s really going on in the workflow, let’s do the following.

The process consists of a set of 5 stages, and for each stage a set of associated statuses. For example, for the stage “Start Application”, status moves through “Moved into queue”, “Assigned to James”, “Started application processing” and so on.

The following R code will collapse the sets of statuses for each stage, so that we can get a higher level picture of the workflow.

library(dplyr)
qs <- auditextract %>% distinct(objectTypeDesc, StatusDescription)

el <- el %>%
  act_collapse(StartApplication = filter(qs, objectTypeDesc=='Start Application')$StatusDescription) %>%
  act_collapse(UploadDocuments = filter(qs, objectTypeDesc=='Upload Documents')$StatusDescription) %>%
  act_collapse(Settlement = filter(qs, objectTypeDesc=='Settlement')$StatusDescription) %>%
  act_collapse(ProcessDocuments = filter(qs, objectTypeDesc=='Process Documents')$StatusDescription) %>%
  act_collapse(Disburse = filter(qs, objectTypeDesc=='Disburse')$StatusDescription) %>%
  act_collapse(StartDocuments = filter(qs, objectTypeDesc=='Start Documents')$StatusDescription) 

Let’s have a look at the process map now:

process_map(el)
%0 2->3 83 2->4 366 2->5 158 2->6 73 2->8 31 3->8 467 4->1 18 4->4 211 4->5 381 5->1 109 5->3 381 5->4 19 5->5 2509 5->8 524 6->1 60 6->4 13 6->6 60 7->1 12 7->3 1 7->5 454 8->1 512 8->3 2 8->4 1 8->5 40 8->7 467 8->8 1718 1 End 2 Start 3 Disburse 467 4 ProcessDocuments 610 5 Settlement 3542 6 StartApplication 133 7 StartDocuments 467 8 UploadDocuments 2740

Now we can start to get a picture of what’s going on in there !

The default provides is annotated with frequencies of activity flow.

In Part 2 of Process Mining I’ll show a little more of the capabilities of analysis of processes including how to animate the process map !

David Perry
David Perry
Senior MIS Specialist

process efficiency, data viz, graphy theory

Related