µServices and BPM engine architecture

4 July 2014 Leave a comment

For almost 2 years now, I’ve been playing around with different ideas about BPM engine architectures — in my “free” time, of course, which wasn’t that often.

The initial idea that I started with was separating execution from implementation in the BPM engine. However, that idea isn’t self-explanatory, so let me explain more about what I mean.

If we look at the movement (migration) of an application from a single (internal) server to a cloud instance, this idea is evident: the application code is our implementation while the execution of the application is moving from a controlled environment to a remote environment (the cloud) that we have no control over.

A BPM engine, conceptually, can be split into the following idea’s:

  1. The interpretation of a workflow representation and the subsequent construction of an internal representation
  2. The execution, node for node, of the workflow based on the internal representation
  3.  The maintainance and persistence of the internal workflow representation

A BPM engine is in this sense very similar to an interpreted language compiler and processor, except for the added responsibility of being able to “restart” the process instance after having paused for human tasks, timers or other asynchronous actions.

The separation for execution from implementation then ends up meaning that the code to execute a certain node, whether it’s a fork, join, timer or task node, is totally separated and independent of the code used to parse, interpret and maintain the internal workflow representation. In other words, the engine shouldn’t have to know where it is in the process in order to move forward: it only needs to know what the current node is.

While is this to some degree self-evident, this idea also lends itself to distributed systems easily.


Recently, I came across an interesting talk Fred George on Microservices at Baruco 2012. He describes the general idea behind Microservices: in short, microservices are 100 line services that are meant to be disposable and (enormously) loosely coupled. Part of the idea behind microservices is that the procedural “god” class disappears as well as what we typically think of when we think of the control flow of a program. Instead, the application becomes an Event Driven Cloud (did I just coin that term? ;D ). I specifically used the term cloud because the idea of a defined application disappears: microservices appear and disappear depending on demand and usage.

A BPM engine based on this architecture an then be used to help provide an overview or otherwise a translation between a very varied and populated landscape of microservices and the business knowledge possessed by actual people.

But, what we don’t want, is a god or service: in other words, we don’t want a single instance or thread that dictates what happens. In some sense, we’re coming back to one of the core idea’s behind BPM: the control flow should not be hidden in the code, it should be decoupled from the code and made visible and highly modifiable.

At this point, we come back to the separation of execution from implementation. What does this translate to in this landscape?

One example is the following. In this example, I’m using the word “event” very liberally: in essence, the “event” here is a message or packet of information to be submitted to the following service in the example. Of course, I’m ignoring the issue of routing here, but I will come back to that after the example:

  1. An “process starter” service parses a workflow representation and produces an event for the first node.
  2. The first workflow node is executed by the appropriate service, which returns an event asking for the next node in the process.
  3. The “process state” service checks off the first node as completed, and submits an event for the following node.
  4. Steps 2 and 3 repeat until the process is completed.

There are a couple of advantages that spring to mind here:

  • Extending the process engine is as simple as
    • introducing new services for new node types
    • introducing new versions of the “process starter” and “process state” services
  • Introducing new versions of the process engine becomes much easier
  • Modifying the process definition of an ongoing process instance becomes a possibility!

However, there are drawbacks and challenges:

  • As I mention above, we do run into the added overhead of routing the messages to the correct service.
  • Performance is a little slower here, but then again, we’re doing BPM: performance penalties for BPM are typically found in the execution of specific nodes, not in the BPM engine itself.
    • This is similar to databases where the choice between a robust database and a cache solution depend on your needs.

In particular, reactive programming (vert.x) seems to be a paradigm that would lend itself to this approach on a smaller scale (within the same server instance, so to speak), while still allowing this approach to scale.

 

 

 

Categories: Cloud, jBPM, Other

New REST API for jBPM 6

24 October 2013 5 comments

Most of the team has been working very hard for the last couple months on the Drools/jBPM/Optaplanner 6.0 release. One thing that’s changed with this release is that the umbrella project has given a new name KIE. You can find more about that elsewhere.

I wanted to quickly introduce the new REST API — and ask for feedback, should you be so inclined. Some of the community has been kind enough to already submit jira’s with suggestions, which is great!

I’ve been documenting the REST API here: https://github.com/droolsjbpm/droolsjbpm-integration/wiki/Rest-API

If you use the (new) jbpm-console war or the kie-wb war, the REST API is available via those war’s.

Again, if you have any ideas or suggestion, please feel free to leave a comment or submit a jira.

Thanks!

Categories: jBPM

An update (perl) script for updating github repositories.

14 August 2013 Leave a comment

One of the challenges of the drools/jbpm/kie project is that there are lots or repositories — even more so with the 6.x branch. Of course, there are the core repositories (drools, jbpm), the shared API repositories, the integration repositories and then once you start looking at the appliations, most of which based on the new UberFire framework, the list just keeps on growing.

There’s also the fact that I regularly dive into the code of say, Hibernate, Hornetq, RestEasy or Wildfly. Of course, then there are git repositories with examples, Arquillian container repositories..

You get the point.

What I wanted to share with you is a script that I use to regularly update all of these repositories. You can find it here, where I’ve pasted it into a github gist.

First off, the script assumes that it’s in the parent directory of your repository. That is to say that if all of your github repositories are in /home/me/workspace, then the script assumes that it’s been started there.

This script does the following for every repository that you give it:

  1. Changes directory into the repository.
  2. runs git remote update
  3. Checks to see that the current (git) branch is master.
  4. Checks to see if there are any changes to the repository (any file changes that are unstaged, in the index or otherwise not committed).
  5. Calls git merge --ff origin/master
  6. Prints the output of git status
  7. Calls git prune
  8. Calls git gc

The list of repositories can be found at the end of the script, underneath the __DATA__ tag in the perl script. You can also add comments in the list.

Of course, you can restart the script halfway if you want to by calling ./update.pl -f <repo> (which restarts “from” that repo) or ./update.pl -a <repo> (which restarts “after” that repo). Enjoy!

Categories: Other

Why it doesn’t pay to write unit tests

21 November 2012 5 comments

I’ve unfortunately had to do a lot of traveling in the last couple days and I’ve been reading No one makes you shop at Wal-Mart on the plane. Among other things, the book describes the economic model underlying  the idea of a ‘public good’.

A public good, as opposed to a private good, is basically a resource that is enjoyed freely by a group. A clean environment or a quiet neighborhood is a good example of this.

A private good, in this example, might be the right to play your music as hard as you want to. However, if everyone starts doing that, the public good of a quiet neighborhood will soon disappear. In this scenario, everyone in the neighborhood has to pay the cost of setting a limit on how hard and when you can play your music in order to preserve the public good of a quiet neighborhood. This idea is related to the Prisoner’s Dilemma, for those of you curious about that.

With regards to software development, a set of well-running unit tests is a public good while the act of writing a  unit test is actually a private cost. I think this is self-explanatory to most of the developers reading this,  but I’ll explain it just to be sure.

Writing a unit test that is of good quality is not advantageous to a developer writing or modifying code for the following reasons:

The productivity of a developer is measured based on 2 things:

  1. The number and quality of features the she produces
  2. The number of bugs she fixes

The first measurement, the number of features produced, is weighted more heavily: that is to say that creating a feature is, in general, seen as more productive than fixing a bug.

However, writing a good unit test does not directly contribute to either the creation of features or bug fixes.

While writing a unit test might be helpful when creating or modifying a feature, it’s not necessary. Every decent developer out there is perfectly capable of writing software without having to write a unit test. In fact, writing a unit test costs time which a developer might otherwise have spent on writing more features or fixing bugs! The better the unit test is, the more time that a developer will have needed to spend on it, making good unit tests more “expensive” than lower quality unit tests.

Thus, from an individual developer’s perspective, it does not pay to write good unit tests, especially in the short term.

Furthermore, the unfortunate thing about “quality”, is that the quality of a feature (or any piece of code) is something that can only be measured in relation to how long the code has existed. In other words, the quality of code is never immediately apparent and frequently only apparent after a significant period of time and use. Often, by the time the (lesser) quality of the code becomes apparent, no one can remember or determine exacty who created the feature and it’s not productive to search for that information.

But it does always pays to have a high quality suite of unit tests. A well-written suite of unit tests does 3 things:

  • Most importantly, it will alert the developer to any problems caused by new or changed code.
  • Because an existing unit test will obviously use the existing API, it will alert the developer to problems with backwards compatibility if the developer changes the existing API.
  • Lastly, unit tests are functional examples of code use: they document how existing code should and can be used.

All of these benefits help a developer to write better quality features (in less time) and help not only with fixing bugs, but also with preventing bugs!

But, in a situation in which there are no external pressures in how a developer writes his or her code, there are no immediate reasons for a developer to write unit tests. This is especially true in situations in which the developer will only be working on a project for a relatively short period of time.

Of course, some developers might feel that writing tests helped them develop features more quickly — or that it might help them fix bugs more quickly. However, if at a certain point they have to justify their use of time to a superior (the project lead, project manager, etc.) and they explain that they were writing unit tests instead of writing new features or fixing bugs, they will get in trouble, especially if there’s less value placed on unit tests or refactoring.

At a company, obviously, this is where a project manager, project lead or even a CTO comes in. While it may be in the interest of the developer to create new features and fix bugs as quickly as possible, it’s probably equally important to the CTO and other managers that the quality of the software created meets certain standards. Otherwise, users of the software might become so disgruntled with the software that they’ll complain, leading to a negative reputation of the software, which may lead to the company going bankrupt!

It’s in the interests of the CTO and other managers to require software developers to write unit tests that are of a certain level of quality: namely, the unit tests should be good enough to assure that the software retains a positive reputation among it’s customer base. This is often a difficult limit to quantify but luckily often easier to qualify, I think.

Open source software is, however, a different story. There is no CTO and the highest authority in an open source project is often the lead of the project.

The question, then, is what determines the quality of a suite of unit tests in an open source project? To a large degree, the answer is obviously the attitude of the lead of the project. Attitude is often very hard to measure, unfortunately: it’s easy enough to say one thing and do another. If you ask the lead what she thinks, it’s hard to say if the answer she gives you represents the attitude that she communicates to the rest of her project.

Realistically, one of the most decisive factors determining the quality of the features of an open source project is simply the example set by the lead in the code and tests that he or she writes.

Categories: Other

Openshift at Devoxx 2012

14 November 2012 1 comment

Good morning! If you’re at Devoxx today, feel free to come by and visit me at 14.00: I’m giving a talk on OpenShift. I’ve got to say this: I’ve rarely had so much fun preparing a talk.

The thing is, OpenShift just works. Sure, now and then you have to figure a few things out, but given all of the work it does for you — or rather, given all of the work you no longer need to do, it.. it rocks!

The talk is called “Openshift: State of the Union” and it’s a quick primer followed by a couple of demos. Now, I just need to hope they don’t randomly decide to do maintenance during my talk. ;D

I’ve just uploaded my slides for those curious: you can find them on slideshare.

See you at the talk!

Categories: Other

Persistence testing in jBPM

27 October 2012 Leave a comment

In the last year or so, I’ve started 3 different persistence-related testing initiatives for jBPM. This is a quick summary of what’s already been created and what the plans are going forward.

Maven property injection cross-database testing

 
I’ve described the basic mechanisms and infrastructure of this testing infrastructure here:
- jBPM 5 Database testing wiki page

The main jBPM project pom contains maven properties that are injected into resource files that are placed in (test) resource directories for which maven (property) filtering has been turned on.

In turn, when maven processes the test resources, the properties in the filtered files are replaced with the values placed in the pom. The resource files filtered are mostly a persistence.xml file and a datasource.properties file used to create data sources.

Unfortunately, there a still a couple problems:

  • To start with, this is only really completely implemented in the drools-persistence-jpa and jbpm-persistence-jpa modules. It needs to be fully implemented or “turned on” in a number of other important modules, such as jbpm-bam, jbpm-bpmn, jbpm-human-task-core and jbpm-test.
  • Some developers have encountered problems in their environments due to the fact that the persistence.xml file (in src/test/filtered-resources) contains properties (like ${maven.jdbc.url}) instead of real values. I’m not sure what’s going on there, but I’d like to fix that problem if I can.
  • Lastly, this infrastructure doesn’t help us test with different ORM frameworks. The problem here is that it’s practically impossible to test using a specific ORM framework (for example Hibernate 3.3) while the other ORM frameworks (Hibernate 4.1, OpenJPA) are in the classpath. But if you want to test with a specific ORM framework, the first thing you need to do is to have it in the classpath. So how do I test against multiple (“specific”) ORM frameworks?
    • While maven profiles are an answer to this, maven profiles add lots of complexity to a setup. I don’t want to make a maven profile for every different ORM framework that we need to test against, instead, I’d like to just make a maven profile that turns on the cross-database and cross-ORM framework testing
  • In the coming months, it looks like we’ll try to make sure that this framework is turned on and executed in all of the modules were it’s applicable. While I expect that I’ll eventually remove the maven filtering being used here, I think I’ll probably try to keep the property based control: being able to run the test suite on a different database simply by injecting (settings.xml) or otherwise changing (in the pom.xml) properties is valuable.

    Backwards compatible BLOB testing

     
    One new that I encountered when learning the jBPM 5 (and Drools) code is the use of BLOB’s in the database to store session and process instance state (and work item info). One of the great advantages of storing process instance state in a BLOB is that it avoids the complicated nest of tables that a BPM engine would otherwise bring with it — see the jBPM 3 database schema for a good example. :/

    Using a BLOB also has the advantage that changes can be made to the underlying data model without having an impact on the database schema that the (end) user will use while working with jBPM 5. This, more than any other reason, is really _the_ reason to use BLOBs. I still like to think about how much easier that makes the work of developers working on Drools and jBPM. (Considering the complexity of the Drools RETE network, it makes even more sense to use a BLOB.)

    However, when the serialization (or “marshalling”) code was originally developed, the choice was made for a hand-crafted serialization algorithm, instead of relying on pure Java serialization. This is also with good reason given that Java serialization is slow and the hand-crafted serialization algorithm was a lot faster.

    At the beginning of this year, for reasons having to do with the evolution of both Drools and jBPM 5, Edson (Tirelli) replaced all of the serialization code for Drools and jBPM 5 with protobuf. That was definitely an improvement, mostly because protobuf makes forward and backwards compatibility way easier.

    However, it made some of the “marshalling testing framework” work I had done up until that moment not usable anymore: the marshalling testing framework relies on generated databases from previous versions to be able to test for backwards and forwards compatibility. Switching to protobuf unfortunately broke all backwards compatibility at the cost of making sure that backwards compatibility from then on would be ensured.

    At the moment, what needs to be done is the following:

    • The databases for the different branches (5.2, 5.3, 5.4 and now 6.0) of jBPM need to be regenerated.
    • The code for the marshalling framework (as well as other persistence testing classes) needs to be cleaned up a little bit.

    Multiple ORM framework (ShrinkWrap based) testing

     
    I ran into a specific Hibernate 4 problem last month and it was unfortunately something that could also have affected compatibility with Hibernate 3. This meant that I needed to run the test suite multiple times with Hibernate 3 and then Hibernate 4 to check certain issues.

    As a result, I ended up building the framework in the org.jbpm.dependencies [github link] package. This framework creates a jar containing the tests to be run with the specified database settings (persistence.xml) as well as the specific ORM framework that the tests should be run with. It then runs the tests in the jar.

    This is eventually the framework that I want to have used within all persistence related jbpm modules. The code itself should probably be moved to the jbpm-persistence-jpa module.

Categories: jBPM, Persistence

How Guvnor and Designer talk to each other

4 October 2012 5 comments

I just spent a good hour talking with Tihomir about Designer.

Specifically, he explained how the interaction between Designer and Guvnor works.

In order to finish my work on a persistence layer for Designer, one of the things I need to understand is how Designer interacts with Guvnor. At the moment, Designer basically uses Guvnor for persistence: everything that you modify, create and save in Designer is saved in the Guvnor instance that Designer runs in.

However, there’s been more and more interest in being able to run Designer standalone: running Designer independently and without a ‘backing’ Guvnor insance. What I’m working on is inserting a persistence “layer” in the Designer architecture so that users can choose whether to use Guvnor for this persistence or whether to use a standalone persistence layer for this (such as a a database) — or some combination of this.

But in order to do that work, it’s important to understand exactly how Guvnor interacts with Designer: what does Guvnor tell Designer and what does Designer store in Guvnor, and how and when does the communication about this happen?

And now I can explain that interaction to you.

Let’s start at the very beginning: you’ve installed Guvnor and Designer on your application server instance and everything is running. You have this brilliant idea for a new BPMN2 process and so you click, click around to create a new BPMN2 asset and in doing so, open Designer in a new IFrame within Guvnor.

Now, when you open Designer, if I understand correctly, Designer takes a bunch of the standard, default “assets” that it will use later and goes ahead and stores them in Guvnor. Some of these are stored in the global area and others are stored in the particular package that your asset will belong to. But how does Designer actually know what the package and asset name is of what it’s editing?

Let me focus on a detail here: when you actually open Designer in Guvnor, you’ll be opening a specific URL, that will look something like this:

http://localhost:8080/drools-guvnor/org.drools.guvnor.Guvnor/Guvnor.jsp?#AssetEditorPlace:972fc35a-a3cc-430c-b460-71477931ca5b

This results in the UUID that you see after AssetEditorPlace: being sent to Designer (on the server-side). Unfortunately, Guvnor doesn’t have a method for looking up an asset based on its UUID, so Designer needs to figure out which the package and asset name of the asset it’s working with. Designer needs this information in order to interact with Guvnor. That means Designer does the following:

List of assets for the defaultPackage package

List of assets for the defaultPackage package

  • Designer first requests the list of all packages available in Guvnor.
  • After that, Designer then requests the list of all assets in each package.
  • Then Designer keeps searching list of assets until it finds the UUID it’s been given.
  • This way, it can figure out what the package name and asset name (title) is of the asset (process) it’s editing.

Naturally, I didn’t believe Tiho at first when he said this. My second reaction was to submit a Jira to fix this (GUVNOR-1951). I’llmake this better as soon as I finish this persistence stuff (or maybe as part of it.. hmm. Depends on how quickly Guvnor adds the needed feature.)

In any case, at this point, you have your blank canvas in Designer and Designer knows where it can store it’s stuff in Guvnor.

You go ahead and create your brilliant BPMN2 process with the help of all of Designer’s awesomely helpful features.

And then you need to save the process, what next?

Saving the process in Guvnor

Saving the process in Guvnor

Right, you click on a menu in Guvnor, and then it gets complicated. To start with, clicking on “Save Changes” or “Save and Close” in the Guvnor menu calls some client-side JavaScript in Guvnor that then calls some client-side JavaScript from Designer.

The ORYX.EDITOR.getSerializedJSON() call

This here on the right is a screen print of a find in Tiho’s IntelliJ IDE of the Guvnor (GWT) Java class showing where the code for this call is. The call shown is the ORYX.EDITOR.getSerializedJSON(), in case you don’t have superhuman vision and can’t read the text in the picture.

This calls JavaScript in Designer that retrieves the JSON model of the BPMN2 process in the canvas. (Designer actually stores the BPMN2 process information on the client side in a JSON data structure, which gets translated to BPMN2 XML on the server side.)

Once the Guvnor JavaScript (client-side) has gotten this JSON representation of the BPMN2 process back, it then sends a request to a Designer servlet that translates the JSON to BPMN2. Guvnor doesn’t really care about JSON — it certainly can’t read it and doesn’t know what to do with it, so it relies on Designer to translate this JSON to BPMN2 that it can store in its repository.

The Guvnor to Designer JSON to BPMN 2 "Translator" call

Again, for those of you without superhuman vision, the screen print fragment above is the Guvnor GWT code that makes sure that the retrieved JSON is translated to BPMN2.

Once the Guvnor client-side JavaScript (derived from Guvnor GWT code) has gotten the BPMN2 back, it then sends that XML back to the Guvnor server-side to be stored in the repository (under the correct package name and asset title).

And that’s how it all works!

Of course, I’m no GWT expert, so I might have glossed over or incorrectly reported some details — please do let me know what you don’t understand or if I made any mistakes (that means you too, Tiho :) ).

Regardless, the above summarizes most of the interaction between Guvnor and Designer. The idea with the persistence layer I’m adding to Designer is that much of this logic in Designer will be centralized. Having the logic in a central place in the code in Designer will then allow us to expose a choice to the user (details pending) on where and how he or she wants to store the process and associated asset data.

Lastly, there are definitely opportunities to improve the performance of this logic. For example, using Infinispan as a cache on the server-side, even when Designer is used with Guvnor is an idea that occurred to me, although I haven’t thought enough yet about whether or not it’s a good idea. We’ll see what else I run into as I get further into this..

Categories: jBPM
Follow

Get every new post delivered to your Inbox.