As we know regular expression is a very handy tool to processing the string. It really saves your coding and very nested if-else statements. But when you run your application on resource sensitive system then you may want to carefully choose the tools to use.  Initially I thought regular expression would equal or less time as compared to handcrafted method for string processing. Finally I decided to compare the result rather than just guessing. I use Google caliper tools for benchmarking the code. I wrote two methods, using regular expression and using simple string operation. These two methods supports the following phone number format.

  • 1-234-567-8901
  • 1-234-567-8901 x1234
  • 1-234-567-8901 ext1234
  • 1 (234) 567-8901
  • 1.234.567.8901
  • 1/234/567/8901
  • 12345678901

And here is the mdn number string on which these methods were tested.

mdn[0][i] = String.format("%04d", random.nextInt(10000));
mdn[1][i] = String.format("%10d", random.nextInt((int) 1e10));
mdn[2][i] = String.format("-%10d", random.nextInt((int) 1e10));
mdn[3][i] = String.format("%03ddsfasdf00000", random.nextInt(1000));
mdn[4][i] = String.format("%10d-", random.nextInt((int) 1e10));
mdn[5][i] = String.format("%03d-%03d-%03d", random.nextInt(1000), random.nextInt(1000), random.nextInt(1000));
mdn[6][i] = String.format("-%03d-%03d-%03d-", random.nextInt(1000), random.nextInt(1000), random.nextInt(1000));
mdn[7][i] = String.format("%03d-%03d-%03d-", random.nextInt(1000), random.nextInt(1000), random.nextInt(1000));
mdn[8][i] = String.format("%03d-%03d-%03d ext %04d", random.nextInt(1000), random.nextInt(1000), random.nextInt(1000), random.nextInt(10000));
mdn[9][i] = String.format("%03d-%03d-%03d ext %04d-", random.nextInt(1000), random.nextInt(1000), random.nextInt(1000), random.nextInt(10000));
mdn[10][i] = "123456789012345677890";

I was really surprise to see the result. The handcrafted method had outperformed regular expression method. In some case it is 5 times faster than the regular expression method.

              benchmark index  ns  linear runtime
ExtensiveSimpleMDNCheck     0   961 =
ExtensiveSimpleMDNCheck     1  1742 ==
ExtensiveSimpleMDNCheck     2  1852 ==
ExtensiveSimpleMDNCheck     3  1874 ==
ExtensiveSimpleMDNCheck     4  1921 ==
ExtensiveSimpleMDNCheck     5  1545 ==
ExtensiveSimpleMDNCheck     6  1387 ==
ExtensiveSimpleMDNCheck     7  1666 ==
ExtensiveSimpleMDNCheck     8  2237 ===
ExtensiveSimpleMDNCheck     9  2547 ===
ExtensiveSimpleMDNCheck    10  2636 ===
 ExtensiveMDNRegexCheck     0 13583 ===================
 ExtensiveMDNRegexCheck     1 12820 ==================
 ExtensiveMDNRegexCheck     2  3466 =====
 ExtensiveMDNRegexCheck     3 17135 ========================
 ExtensiveMDNRegexCheck     4 12350 =================
 ExtensiveMDNRegexCheck     5 15963 =======================
 ExtensiveMDNRegexCheck     6  3341 ====
 ExtensiveMDNRegexCheck     7 18588 ==========================
 ExtensiveMDNRegexCheck     8 19107 ===========================
 ExtensiveMDNRegexCheck     9 20786 ==============================
 ExtensiveMDNRegexCheck    10 19556 ============================

where as ExtensiveSimpleMDNCheck is simple string manipulation method and ExtensiveMDNRegexCheck is regular expression method which check
I later realized that Regular Expression itself is interpreter which clearly indicates that it involves lots of processing. Regular expression include two class Matcher, and Pattern( its internal static class Node).These method invovles the allocation of array and computation of the regulor expression. This regular expression is the generalized version and hence consider all the possible scenario. Where as in my user-defined method, the logic scope was defined and limited to the specific phone number format. In this post am trying to assert that it depends on the given scenario and your discription which one you would like to chose over other.

Regular expression Normal String operation
Less code to write,very clean way of writing code Nested if-else statement
hard to understand, if you don’t know RE Easier to understand
Comparatively takes higher time to execute take less time to execute
debugging is comparatively difficult Debugging is easier

If you thoughts differ then please put your thought in the comment section.

If you are interested in source code and benchmarking code you can find here at github.

Hope this blog helped you in some way. If you like this blog then please share it. You can also leave your comment below. You can find Facebook page here.

, ,
Trackback

no comment untill now

Add your comment now