Styles

Friday, November 14, 2014

Regex for Unicode Variations

Ever had a spammer try to get around spam rules for a subject line like "Account Verification" which would otherwise be picked up by your regular spam? They would usually try something sneaky like "Acc0unt Verification" by replace o's with zeros (0) or have colons in between so your pattern matching wont pick up subject lines like "A:ccount Verification".

Worse they would even use subtle variations in vowels to substitute an e with a unicode variation like é to make pattern matching extremely difficult "Accøünt Vérificatiøn".

Regular expression can definitely make your life easier, but unicode would make things more interesting. First thing you will need is the basic pattern matching to catch out the characters in between that usually tries to through out whole word pattern matches:

^(?i).*(a.?c.?c.?o.?u.?n.?t.?[\s]*v.?e.?r.?i.?f.?i.?c.?a.?t.?i.?o.?n)+.*$

You will probably want to add some basic variations to the vowels which would catch any spammer trying to replace o's with zeros (0) or i's with ones (1) or a's with at symbols (@).
^(?i).*([a@].?c.?c.?[o0].?u.?n.?t.?[\s]*v.?[e3].?r.?[il1].?f.?[il1].?c.?[a@].?t.?[il1].?[o0].?n)+.*$

And if you really want to be pedantic, I've found a good table listing all the unicode characters and have provided them below so you can replace the vowel matching with the following unicode variations.

[a@À-Åà-åĀ-ąǍǎǞ-ǡǺǻȀ-ȃȦȧȺɑΆΑάαаӐ-ӓᗅᶏᶐḀḁẚẠ-ặἈ-ἏÅᾈ-ᾏᾸ-ᾼ₳]
[e3È-Ëè-ëĒ-ěȄ-ȇȨȩɆɇΈΕеѐёҼ-ҿӖӗᴇḔ-ḝẸ-ệⴹἘ-ἝῈΈ]
[il1Ì-Ïì-ïĨ-ıĺļľŀłƖƗǏǐȈ-ȋɨ-ɭΊΙіїӀḬ-ḯḷ-ḽỈ-ịἰ-Ἷὶίῐ-Ί]
[o0Ò-Øð-øŌ-őǑǒǪ-ǭǾǿȌ-ȏȪ-ȱɸɵΌΟθоѲѳӦ-ӫ০੦௦ᴏṌ-ṓỌ-ợὀ-Ὅὸό]
[uµÙ-Üù-üŨ-ųǓ-ǜȔ-ȗɄᴜṲ-ṻỤ-ựὐ-ὗὺύῠ-ΰ]

There are a few common consonants that also get used by spammers to avert the pattern matching:
[nŃ-ŋƝƞǸǹɳΝᶇṄ-ṋἠ-ἧ]
[tŢ-ŧƫ-ƮȚțȾᴛṪ-ṱ]
[y¥ÝýÿŶ-ŸƳƴȲȳɎɏΫγϒ-ϔўҮ-ұӮ-ӳẎẏẙỲ-ỹỾỿὙ-ὟⲨⲩ]
The end result of the regex pattern may seem long and complicated, but can definitely prevent many spam emails from slipping through the cracks

Tuesday, May 27, 2014

Javascript's Decorator Pattern using Require.js

While working at Mi9 I stumbled across a very interesting way to use the Javascript Decorator Pattern in order to separate the concerns of the html and the way it populates information using the require.js framework.

For those of you who are unfamiliar to require.js, I have provided an extremely simple example below. It is basically a simple lightweight framework that helps you load javascript files on demand. Each of these files provides a module that can introduce features to your javascript functionality.

A simple example of require.js is can be shown below when you specify a module in the require construct, which uses dependency injection to pass in the helper service and its members:

    require(['jquery', './helper'], function ($, helper) {
        var callSingleStoryService = function (url) {
            $.get(url, function (data) {
                helper.evaluateElapsedTime(data);
            });
        };
        return {
            decorate: function (data) {
                callSingleStoryService(data);
            }
        };
    });

This will locate the helper file located relatively in ./helper, and the actual helper module provides the functionality for evaluateElapsedTime(data); as shown below:

    define(['jquery'], function ($) {
        return {
            evaluateElapsedTime: function (element) {
                //do stuff...
            }
        };
    });

This is just a simple example of how to use require.js.

Now, to apply this to an html element in order to provide a decorator to its inner elements, you can provide the following attributes as follows in your view:

    <div class="module" data-module="./feedDecorator">
        <div class="block">
            <div class="grid">
                Some content...
            </div>
        </div>
    </div>


The above html tells the application to associate the feedDecorator.js file which has a decorate function in it to apply functionality to this specific element with a "module" class name. Your feedDecorator.js file will look like the following:

(function () {
    define(['jquery', './helper'], function ($, helper) {
        return {
            decorate: function ($target, data) {
                helper.evaluateElapsedTime($target, data);
            }
        };
    });
});

How does it all work? Your main.js (the only js file that you need to initially load) should implement the following functionality that will go through each element with the class name "module" and run its respective "decorate" function, which is simply associated by the module specified in the "data-module" html attribute.

(function() {
  define(['require', 'jquery'], function (require, $) {
    var decorator = function ($target) {
      var moduleName = $target.dataset['module'];
      require([moduleName], function (module) {
        var data;
        if ($.isFunction(module.decorate)) {
          data = $target.data();
          delete data.module;
          return module.decorate($target, data);
        }
      });
    };
    function () {
        $(".module").each(function () {
          return decorator($(this));
        });
    };
  });
});

And its as simple as that! Now for any specific html element, you can associate specific javascript functionality to apply to it without having to load large amounts of javascript files unnecessarily.



Monday, March 10, 2014

Lazy Loading in C# - Performance vs Memory

After a long deliberating discussion with a couple of work colleagues about the pros and cons of lazy loading compared to eager loading, a few interesting conclusions arose.

Many programmers often tend to fall into the trap of instantiating all their objects during the initialization of another parent object (either in the constructor or otherwise).


    public class AccountController : BaseContoller
    {
        UserBusinessFunctions _userFunctions = null;
        RoleBusinessFunctions _roleFunctions = null;
        PermissionBusinessFunctions _permissionsFunctions = null;
        MailoutBusinessFunctions _mailoutFunctions = null;
        public AccountController()
        {
            _userFunctions = new UserBusinessFunctions();
            _roleFunctions = new RoleBusinessFunctions();
            _permissionsFunctions = new PermissionBusinessFunctions();
            _mailoutFunctions = new MailoutBusinessFunctions();
        }
        public ActionResult Index()
        {
            return View();
        }
    }

For windows applications this might be beneficial in that the initial load time may vary, however the rest of the user's experience is very smooth because everything has been loaded already.

However, for much larger projects this can seriously affect the processing of an application in terms of memory and initial loading. Imagine you had to create a new instance of AccountController just to call one function (e.g. Index()) and all that the function did was return an AcionResult object just as MVC does on the initial load of the default page. Moreover, web pages specifically have a decoupling of server side code to the client-side rendered HTML page; so memory is eventually released after a Response is complete. However, that means that the AccountController class will be instantiated every time a request is made to the Index() function. Imagine 15,000 users make a request from their browser to the same function at the same time.
The Garbage Collector may not have enough time to clean up the unused functions.

Some developers could write that off quite easily when they are under the impression that production servers are enormous beasts with infinite amount of RAM. That may be true to an extent, but when you use only one production server to host multiple sites, each of which can have many thousands of users, memory can be easily chewed up. You may have a cloud solution for this, and it could be as simple as "upping" the memory in cloud management, but that could get very expensive very fast.

Lazy loading is the best solution to solve this problem.

    public class AccountController : BaseContoller
    {
        UserBusinessFunctions _userFunctions = null;
        RoleBusinessFunctions _roleFunctions = null;
        PermissionBusinessFunctions _permissionsFunctions = null;
        MailoutBusinessFunctions _mailoutFunctions = null;
        public AccountController()
        {
        }
        public ActionResult Index()
        {
            return View();
        }
        public UserBusinessFunctions UserFunctions
        {
            get
            {
                if (_userFunctions == null)
                {
                    _userFunctions = new UserBusinessFunctions();
                }
                return _userFunctions;
            }
        }
        public RoleBusinessFunctions RoleFunctions
        {
            get
            {
                if (_roleFunctions == null)
                {
                    _roleFunctions = new RoleBusinessFunctions();
                }
                return _roleFunctions;
            }
        }
        public PermissionBusinessFunctions PermissionsFunctions
        {
            get
            {
                if (_permissionsFunctions == null)
                {
                    _permissionsFunctions = new PermissionBusinessFunctions();
                }
                return _permissionsFunctions;
            }
        }
        public MailoutBusinessFunctions MailoutFunctions
        {
            get
            {
                if (_mailoutFunctions == null)
                {
                    _mailoutFunctions = new MailoutBusinessFunctions();
                }
                return _mailoutFunctions;
            }
        }
    }


As shown above, there is no longer any instantiation made in the constructor. This means that for every call to the Index() function, there will no longer be any unnecessary allocations in memory to unused objects just to return an ActionResult object.

What that also means is, if you actually want to use one of the business function objects in the example above, then rather than using the _userFunctions variable, you would have to access the property instead:

User user = this.UserFunctions.GetUser(id);

There is also another page that elaborates on the new C# 4.0 feature of "Lazy Initialization" which is slightly different to the standard "Lazy Loading" pattern mentioned above.


I can see advantages in all three of the scenarios mentioned above:
  • Initializing all in the constructor - good for Silverlight apps where performance is vital over memory usage on the client side. What that means is that there will be more time loading the Silverlight progress bar at the beginning while the application loads the objects in its memory resources, but the rest of the experience is seamless from then on.
  • Lazy loading in the property - good for servers that have multiple web applications/web services/win services running where slight performance trade-offs for valuable memory is important.
  • C#4.0 "Lazy Initializing" - good for examples where you instantiate a bunch of objects before entering a large loop, but want the loop to start as soon as possible.



Tuesday, January 14, 2014

DateTime.Parse vs DateTime.ParseExact for culture issues


I'm sure most of us savvy .NET developers have come across the dreaded issues that surround the inconsistent standards when working with multilingual / multicultured applications. Namely the sites that provide the opportunity for a user to change their language and/or culture for their convenience.

Dealing with different languages is one thing, but that in itself isn't so bad because it mainly deals with a resource file of some sort that allows mapping of one word (or a sentence) to a different word of a different language. The site can then replace those words where it finds them.

Dealing with data, and specifically dates on the other hand, is a completely different story.

Seriously, why did the Americans have to come up with a date format that doesn't even conform to itself!? Generally speaking, numbering systems work by having the larger metric unit at the front, then the second metric unit succeeding it, and so on (e.g. $15,432,67). Naturally you would expect the date structure to be the same so that the military format (yyyy/MM/dd) would be found wherever we deal with dates.

Even though the Australian format doesn't conform to this, it at least has consistency by having the reverse, whereby the smallest metric is found at the beginning of the sequence (dd/MM/yyyy).

The U.S. format, however, is an influence of their own language. It comes purely from a cultural perspective where they are naturally inclined to say "February the 15th" rather than "the 15th of February". That has lead to the demise of their date format which logically makes no sense (MM/dd/yyyy).

Regardless of the format, we are still faced with the dilemma of having to interpret this in code so that our data isn't corrupted by the front-end client machine's culture, and the server's culture, when the server-side code deals with the data.

You may find that you have a JQuery date picker that can help you "set the culture" as shown below:


$(".datepicker").datepicker({ dateFormat: "dd/mm/yy" });


The problem would arise once the user clicks submit and hits a server with the "en-US" culture.
The moment you write the following code, you will face some issues:


var date = DateTime.Parse(Request.Form["date"]);

One of two problems will arise:
  • the date conforms to either en-AU or en-US because the "day" is not greater than 12, and therefore will not complain when translating it into a month part.
  • a FormatException will occur because the day is considered month and is invalid if greater than 12.
The way to work around this so as to Parse the date into the current culture of the server, but based on the culture you are expecting the string to be coming in from the client side.

var date = DateTime.ParseExact(Request.Form["date"], "dd/MM/yyyy", CultureInfo.CurrentCulture);

What this does is define the structure of the string that is passed into the first parameter, based on the format of the second parameter. Once it maps the string exactly as defined in the second parameter, it can then convert it successfully to a DateTime type based on the third parameter. The CurrentCulture can define the timezone as well as the format, and any other relevant metadata.