Styles

Friday, December 18, 2015

INNER JOIN vs WHERE EXISTS Sub-Queries in MSSQL

Currently working at WiseTech, I've learn't a great deal about the pros and cons of monolithic applications, and more so about why microservices have been a massive talking point in the industry today. There are many specialists in the technology stack here and recently I got into a discussion about SQL performance tuning.

For a long time I always thought there was a standard golden rule when it came to arguing against sub-queries, and it was almost as sacrilege as cursors, however there have been a few circumstances where the difference may actually vary very little.

Here is a very simple statement that contains one INNER JOIN:

SELECT *
FROM StmNote
INNER JOIN JobDeclaration ON JE_PK = ST_ParentID
WHERE ST_Description = 'Justification Note'
       AND ST_Table <> 'JobDeclaration';

And below is an identical query that returns the same results, only with the difference being that it contains a WHERE EXISTS sub-query clause instead of an INNER JOIN:

SELECT *
FROM StmNote
WHERE ST_Description = 'Justification Note'
       AND ST_Table <> 'JobDeclaration'
       AND EXISTS (SELECT 1 FROM JobDeclaration WHERE JE_PK = ST_ParentID);

The below execution plan proves there is absolutely no difference in the decision path that the query optimizer makes.


This could easily make one argue that the difference in both statements would simply come down to a matter of preference.

However as we know the query optimizer behaves very differently when considering the number of rows involved in an index-seek/table-scan and the statistics it has to work with in determining and efficient execution path.

So what if there were other joins or larger result sets which could ultimately mean more variations in the execution path for query optimizer?

Below is a more complex scenario involving multiple joins that in the first case uses the following query:

SELECT bna.*
FROM BmncnAttachment bna
INNER JOIN BmncnShape bns
       ON bns.BNS_PK = bna.BNA_BNS_FromShape OR bns.BNS_PK = bna.BNA_BNS_ToShape OR bns.BNS_PK = bna.BNA_BNS_Owner
INNER JOIN ProcessHeader fh
       ON bns.BNS_FH_ProcessHeader = fh.FH_PK
INNER JOIN WorkItem wki
       ON wki.WKI_PK IS NOT NULL
WHERE wki.WKI_PK = fh.FH_ParentId
       AND wki.WKI_Summary LIKE 'Issue %'
       AND wki.WKI_Status IN ('ASN', 'OPN', '');

While the second case uses the following query with a WHERE EXISTS sub-query clause:

SELECT bna.*
FROM BmncnAttachment bna
INNER JOIN BmncnShape bns
       ON bns.BNS_PK = bna.BNA_BNS_FromShape OR bns.BNS_PK = bna.BNA_BNS_ToShape OR bns.BNS_PK = bna.BNA_BNS_Owner
INNER JOIN WorkItem wki
       ON wki.WKI_PK IS NOT NULL
WHERE wki.WKI_Summary LIKE 'Issue %'
       AND wki.WKI_Status IN ('ASN', 'OPN', '')
       AND EXISTS (SELECT 1 FROM ProcessHeader fh WHERE bns.BNS_FH_ProcessHeader = fh.FH_PK AND wki.WKI_PK = fh.FH_ParentId);

And here is the varying execution plans for both:




It seems they differ significantly on the final Hash Matching where it is probing the hash key fh.FH_ParentId and taking 62% or the total time longer than the standard inner join. The is mainly because you lose parallelism when using the sub-query.

So basically to sum it all up, even though the first statement may show identical results, performance issues would exists if there were multiple joins, because the optimizer could choose the sub-query over another join which would otherwise have been more efficient.

Conversely however, performance could be effected for joins that return duplicate rows and that may be another variable to consider when making the decision between WHERE EXIST sub-query clauses and JOINS, and is definitely something to keep in mind when making the final decision on what type of statement to use.

Wednesday, September 16, 2015

Is Server-Side Javascript the Flavor of the Month?

As many experienced developers would tell you, "choose the right tool for the job". While many may have strong bias towards one programming style over another, it is always good to encounter those who are willing to present an argument objectively.

These days there are a lot of advocates for Node.js, and they all begin their marketing campaigns with the same opening statements, "I know Javascript is crap but..."

And therein lies the dilemma. Its not an easy task having to justify a scripting language as the solution for web application development in an era of staleness within the open-source community. Lets face it, php has one of the largest online communities on the planet, particularly with CMS solutions like WordPress and Drupal, however the language hasn't evolved or adapted to an ever-changing environment. Ruby as a framework doesn't scale well and Python has a very poor reputation for performance.

Node.js seems like a breath of fresh air.

In all honesty, when Node.js came out a few years back with this "new" concept of server-side Javascript, I questioned why we were getting back into the world of Classic ASP which was prevalent in the late 1990s. Back then, we had the power of inline server-side code with your choice of either VBScript or JScript, so the concept wasn't "new" unless you think relatively in terms of the 20 years that this idea has been around.

It was only a little later that I realised it had nothing to do with "Javascript being server-side", but rather that Node.js was able to tame the beast of a scripting language, and create a discipline using successfully proven concepts such as modularisation and encapsulation.

These concepts are very familiar in the object-oriented world and have been around for decades. So why create another server-side framework using Javascript? The easiest answer would most likely be because it is cross-platform and has been jointly adopted and standardized by Microsoft, Sun Microsystems and Mozilla.

Javascript has been superior in web development and, as Scott Hanselman put it, "is the assembly language of the web". It is uncontested as a front-end solution for interactive browser-based development, which is why the adoption of such frameworks like angular.js have been extremely popular, with react.js fiercely becoming a dominant framework too. Bundlers like require.js and webpack are becoming household names, and templating tools such as handlebars.js and jade have shown that Javascript has the power to dominate an ever-growing community. Together with the power of HTML5 and CSS3, it has killed off both Flash and Silverlight in one single stoke.

But lets be clear, we are talking about front-end development here, not server-side development.

While searching the online community to find any significant advantages of using Javascript for back-end development using Node.js, there have only really been two common themes:
  • "Front-end developers can work on the full stack of an application using one language"
    This argument is great, as we have seen with the emergence of the MEAN stack. But the separation of front-end and back-end development has never been one of language barriers. Rather the difference is based on a mind-set. Lets face it, you won't see front-end developers picking up tasks related to data-access or API security. The real question is whether or not a functional scripting language is appropriate for server-side application programming.

  • "You can start a web application in literally 5 minutes!"
    While that might be great for those who want a pet project to build "hello world" applications, professional programmers will find obvious challenges once complexity is inevitably added. The headaches around peer dependencies is just one, however the topic of discussion that has been the most entertaining is: to bundle or not to bundle. An issue so primitive it is almost comical to consider that other frameworks solved compilation concerns with output libraries more than 15 years ago. It is a non-issue now and yet somehow it seemed like we have gone back to re-inventing the wheel all over again with Node.js. Those who work on other frameworks with rich environments that provide debugging, diagnostics and compilation already had the wheels they needed to concentrate on real programming to provide the vehicles for those wheels.

Stability in the community is one point Node.js cannot boast about either, especially when recalling how easy it was to fork the code-base and create an entirely new community with io.js simply because they were "dissatisfied" by the Advisory Board. They have recently reunited, but how long will it be before another subgroup becomes fickle and creates another community of their own? Angular.js is already facing the same problems, which is why aurelia.io has come out as a potential competitor. A unified community generally gives confidence in a framework's future.

But lets not get into politics.

Without getting too deep into the "one thread per request" model limitation, or the limitation of only two basic data structures in comparison to other frameworks that provide hash-maps, dictionaries, enumerables, sets, vectors etc, there is one more concerning factor.

If we take a deeper look into programming principals, then we can start comparing Node.js with enterprise frameworks such as Java and .NET and their object-oriented programming edicts.

Software engineering principals like OOP adopt concepts like abstraction and polymorphism, with mathematical concepts like covariance and contravariance.
They define the difference between software engineering and simple web development because it is the philosophy that engineers and science professors like Alan Kay prescribed for application programming in order to help mimic the real world.

The universe is defined with objects that interact with other objects in one way or another by performing a specific function. Each object has its own attributes, and has general similarities with other existing objects which can be abstracted. Since the universe is infinitely scalable, the very concept of OOP over functional and procedural programming can determine the best fit for scalability in complex web applications, particularly in the space that exists for implementing business functions.

One may argue that even between different object-oriented programming languages, the health of the community is equally important. Unfortunately the Java community falls short, and nothing could have been clearer when Java8 came out with the "new" concept of lambda expressions, especially when .NET 3.0 introduced it 5 years prior.

But if "new" is any measure for Node.js being the better option today, is the mere reasoning for a cool new hip way of doing things good enough? What happens when Node.js is 10 years old? Will Node.js outlive its hipster status? And then there are those who are already talking about Golang being the better option for performance and scalability, which definitely raises alarm bells for its longevity.

If the fact that Node.js is lightweight and uses a dynamically typed scripting language has anything to do with it, then a simpler question arises: What would stop us from building a web application using say PowerShell scripts? Why not? Its possible to write primitive code in PowerShell to open an HTTP port with binding through IIS to stream strings of HTML! Given enough time someone might write a wrapper and call it express.ps1 to make it easier!

But that would be silly, wouldn't it...

Going back to the initial statement to "choose the right tool for the job" might help us maintain focus in an ever-changing web space.

Being language or framework agnostic is key to determining the right tools and it could help us realise the natural strengths of the technologies at our disposal. Javascript is excellent for front-end development, PowerShell is suitable for systems programming, and frameworks like .NET are great for scalable server-side application programming.

Tuesday, July 14, 2015

Resetting CSS3 Transitions and Animations

Everyone gets excited whenever they get the opportunity to work with CSS3 and particularly the new transition and animation features. They work smoothly compared to Javascript animations, and the reason they do is because these visual interactions are managed by the browser itself and how it decides to interpret the movement of elements that it has rendered from markup. This is significantly more efficient than how Javascript tries manipulating DOM elements functionally by altering the rendered elements frame by frame.

With every great thing however, there are always shortfalls. Lets look specifically at animations and their keyframes.

.notify {
    animation: notify 5s;
}
@keyframes notify {
    from { height: 0; }
    20% { height: 60px; }
    80% { height: 60px; }
    to { height: 0; }
}

This is a very simple implementation of a keyframe animation to slide an element down for notifications. All that is needed is a div element that initially doesn't have a "notify" class.

<div id="tip" class="tip">message</div>

With a little Javascript we can add the "notify" class to the element so that the keyframe can be triggered. We can do this with JQuery.

$(".tip").addClass("notify");

There is one problem with this however. Once this keyframe has been executed, it can no longer be executed unless you refresh the page.

One way which should typically solve this problem would be to remove the class then re-add it again.

$(".tip").removeClass("notify");
$(".tip").addClass("notify");

That would be logical, unfortunately it doesn't work!
Amazingly, what is required is to set any property of the element, even if it is to the same value.

$(".tip").removeClass("notify");

$(".tip").position().top = $(".tip").position().top;

$(".tip").addClass("notify");


The reason for this is it helps trigger a "reflow" of the browser so as to listen for the next time the "notify" class has been added to the element, and the browser will do its magic all over again!