For almost 2 years now, I’ve been playing around with different ideas about BPM engine architectures — in my “free” time, of course, which wasn’t that often.
The initial idea that I started with was separating execution from implementation in the BPM engine. However, that idea isn’t self-explanatory, so let me explain more about what I mean.
If we look at the movement (migration) of an application from a single (internal) server to a cloud instance, this idea is evident: the application code is our implementation while the execution of the application is moving from a controlled environment to a remote environment (the cloud) that we have no control over.
A BPM engine, conceptually, can be split into the following idea’s:
- The interpretation of a workflow representation and the subsequent construction of an internal representation
- The execution, node for node, of the workflow based on the internal representation
- The maintainance and persistence of the internal workflow representation
A BPM engine is in this sense very similar to an interpreted language compiler and processor, except for the added responsibility of being able to “restart” the process instance after having paused for human tasks, timers or other asynchronous actions.
The separation for execution from implementation then ends up meaning that the code to execute a certain node, whether it’s a fork, join, timer or task node, is totally separated and independent of the code used to parse, interpret and maintain the internal workflow representation. In other words, the engine shouldn’t have to know where it is in the process in order to move forward: it only needs to know what the current node is.
While is this to some degree self-evident, this idea also lends itself to distributed systems easily.
Recently, I came across an interesting talk Fred George on Microservices at Baruco 2012. He describes the general idea behind Microservices: in short, microservices are 100 line services that are meant to be disposable and (enormously) loosely coupled. Part of the idea behind microservices is that the procedural “god” class disappears as well as what we typically think of when we think of the control flow of a program. Instead, the application becomes an Event Driven Cloud (did I just coin that term? ;D ). I specifically used the term cloud because the idea of a defined application disappears: microservices appear and disappear depending on demand and usage.
A BPM engine based on this architecture an then be used to help provide an overview or otherwise a translation between a very varied and populated landscape of microservices and the business knowledge possessed by actual people.
But, what we don’t want, is a god or service: in other words, we don’t want a single instance or thread that dictates what happens. In some sense, we’re coming back to one of the core idea’s behind BPM: the control flow should not be hidden in the code, it should be decoupled from the code and made visible and highly modifiable.
At this point, we come back to the separation of execution from implementation. What does this translate to in this landscape?
One example is the following. In this example, I’m using the word “event” very liberally: in essence, the “event” here is a message or packet of information to be submitted to the following service in the example. Of course, I’m ignoring the issue of routing here, but I will come back to that after the example:
- An “process starter” service parses a workflow representation and produces an event for the first node.
- The first workflow node is executed by the appropriate service, which returns an event asking for the next node in the process.
- The “process state” service checks off the first node as completed, and submits an event for the following node.
- Steps 2 and 3 repeat until the process is completed.
There are a couple of advantages that spring to mind here:
- Extending the process engine is as simple as
- introducing new services for new node types
- introducing new versions of the “process starter” and “process state” services
- Introducing new versions of the process engine becomes much easier
- Modifying the process definition of an ongoing process instance becomes a possibility!
However, there are drawbacks and challenges:
- As I mention above, we do run into the added overhead of routing the messages to the correct service.
- Performance is a little slower here, but then again, we’re doing BPM: performance penalties for BPM are typically found in the execution of specific nodes, not in the BPM engine itself.
- This is similar to databases where the choice between a robust database and a cache solution depend on your needs.
In particular, reactive programming (vert.x) seems to be a paradigm that would lend itself to this approach on a smaller scale (within the same server instance, so to speak), while still allowing this approach to scale.
Right, so I’ve been thinking about this for a little bit:
What exactly do you want from a BPM engine if it’s in the Cloud?
[Yes, here, we capitalize the word Cloud. Good, that’s settled.]
First, a minor diversion from the topic at hand:
At one of my previous jobs/assignments/workplaces the developer machines were all virtual machines. It was fantastic for the developers (well, almost) and particularly so for the system administrators and management. Far less inventory (machines) that simply weren’t worth anything after 5 years and no more walking around doing almost physical maintenance on machines.
But the biggest advantage is that my (virtual) workstation could be moved to whichever node that the systems administrators wanted it to be moved to, and that creating a new workstation was as simple as a couple of clicks and ticks. Furthermore I could simply login whereever I wanted to — even from home via a VPN — and presto magico, have my work environment available to me exactly as I had left it!
System administrators were no longer spending as much time on figuring out where a machine was and when it could be turned off to do what — they were becoming more efficient in their job.
Why do you move your (j)BPM process to the cloud? Because you don’t care (and don’t want to care) about where your BPM process is running.
Sure, to be clear, you want your BPM process instance to be secure and safe and dependable: but the hardware underneath — in short, the server, anything and everything except for your process just doesn’t matter to you.
And when we’re talking about BPM processes, that means that the only thing you care about is that somewhere out there, in the wild blue yonder, your process is running.
This has some consequences.
This means that, well, at lot like my virtual workstation, my BPM process should be able to be moved, on the fly.
As a 80’s valley girl would have said it: like, the process is just stopped and then like, it’s over here, like, and then it’s running again and like, if you didn’t know, like, you wouldn’t know! Oh.. my.. god!
BPM processes in the cloud:
What I’m imagining is being able to move, in this case a business process and the entire related java stack just like that <snaps fingers>.
There are several things that the infrastructure needs to be able to keep consistent in this case:
- Encapsulated (persistence) context.
- Java stack.
That’s actually it, as far as I can tell. We need the data and where we were — exactly — in the process.
A BPM process needs a connection to a database in most cases: usually in order to persist (save) process information at critical junctures, like the end of a process. With regards to a transportable process, that means that a process needs to be able to encapsulate and transfer it’s (not yet commited) persistence context with itself.
This doesn’t seem that hard: sure, it’s a bit of hacking but on the other hand, it’s pretty clear how we could do this.
This is an interesting one, because there are some interesting cases:
- BPMN 2 step.
- Java code that’s associated with a BPMN 2 node.
The first case is easy enough: in short, it’s a step in a BPMN 2 workflow, no associated java code, we pause the process at the end of the step, move everything and restart the process where we had paused it on the new cloud node.
The second case is more interesting. Maybe we have a long running step — a human task or some such — and we can’t wait for the step to finish. Things to think about are then the following:
- There are already mechanisms for asynchronous execution in jBPM and other frameworks. We can probably use these to our advantage here.
- Input from, say, a returning webservice: it might need to be “cached” and then “forwarded” to the node that jBPM is running on.
- Otherwise, we can mark steps in the BPMN 2 process as “atomic/unmoveable” in the sense that in those steps, the process can not be transported.
- We need to save the stack: we can already retrieve the stack using
Thread.currentThread().getStackTrace(). But how can we “recreate” that stack (with all of it’s data) on the new cloud node? There will be cases where we will want to do that!
- Related to saving the stack: what type of mechanisms or otherwise constraints do we give the process creator/user in order to ensure that the process is saveable and transportable? Do we “save points” in our XML? Do we force the user to call a “saveProcessStack” method, or otherwise build the automatic/automagic calling of that into processing methods?
Cool, more questions than answers! And lots to build on.
‘Till next time!