Understanding Bit Operators

In my latest post about optimizing js code I've built a function that takes advantage of bit operations to increase the performance of a common function 200 fold. I'm beginning to fall in love with these operators and the performance gain they seem to bring. So, to understand them better, I'll break down the isIsogram function and explain some other concepts here.

function isIsogram(str) {
let chars = 0;
for(let i = 0; i < str.length; i++) {
const bit = 1 << ((str.charCodeAt(i) | 32) - 97);
if (chars & bit) return false;
chars |= bit;
}
return true;
}

Lets break this code into sections, but first:

Overview

I want to be able to store the letters we've already seen in order to compare them. We can group the letters with a boolean value for 'seen' like this:

[[a,0][b,0][c,0].....[x,0][y,0][z,0]]

This works, but notice that alphabet, just as this array, is ordered and the order of the element can represent its essence (first element 'a', last is 'z'). While the value can represent the 'seen' boolean!

[0,0,0.....0,0,0]
=
[[a,0][b,0][c,0].....[x,0][y,0][z,0]]

Thus if we want to know weather 'a' has been seen we would check the value at the index 0, for 'b' at index 1 and for 'z' at index 25.

better way?

array of switches

Is there a better way to store this data then array? Yes. Binary code!
lets say we have a binary like this:

25 - 00000000000000000000000000 - 0

We can represent the characters we have already seen using the corresponding place in the binary. Imagine having switches in a line, the switch position represents weather it has been seen, and its index represents the letter.

example

say we want to set a, b and x to seen:

a is 0
b is 1
x is 22

Then we would change the corresponding bits from 0 to 1:

25 - 00100000000000000000000011 - 0
^                     ^^

(H)ASCII

All characters can be represented using ascii code. str.charCodeAt(i) gives us the ascii code for character in str in i'th position.

  • a = 97
  • b = 98
  • c = 99
  • A = 65
  • B = 66

Notice that removing 97 from the ascii code of the lower case letters would give us exactly the representation we've seen:

  • a = 97 -97 = 0
  • b = 98 -97 = 1
  • c = 99 -97 = 2

Remember the difference between the lower and upper case is always 32! This is an important observation that would allow us to convert characters to lower and upper case without using maps, or functions, simply add or subtract 32 to change the case.

  • A = 97 -32 = 65 = A
  • b = 98 -32 = 66 = B
  • c = 99 -32 = 67 = C

Binary or (|)

How but how do we change case sing binary?
Take a look at this:

a = 97 = 01100001
A = 65 = 01000001
^

Notice anything? the only difference is in the 6th bit, and 6th bit is 32! Thus if we want to convert upper to lower case we need to remove only the 6th bit.

_ = 32 = 00100000

How OR works

combines true values
ORtruefalse
truetruetrue
falsetruefalse

Let 32 in binary represent the bit we want to add. A good metaphor for binary OR is that it keeps the true value if it exists. Since 32 has only one true value at 6th bit, it would be added if it doesn't already exits.

A or 32 = a
1 | 0 = 1
0 | 0 = 0
0 | 0 = 0
0 | 0 = 0
0 | 0 = 0
0 | 1 = 1
1 | 0 = 1
0 | 0 = 0

Notice how this is much better then just adding 32, since we don't need to bother with checking weather were adding 32 to lower case letter

In short

We add 32 using binary or and removing 97. Thus each letter becomes its numerical representation 0 through 25!

Binary shift (<<)

Since our seen is represented as a binary, we need to convert our letter index 0-25 to a binary. To do that we shift 1 to the corresponding index!

Let's say we see the letters 'a', 'c', 'd':
'a' (position 0):
1 << 0 = 00000001

'c' (position 2):
1 << 2 = 00000100

'd' (position 3):
1 << 3 = 00001000

So, if we have seen both a, c and d we would do:
00000001
00000100
00001000
--------
00001101

Binary and (&)

Now lets check if the indexed letter have already been set in seen. To do that we need to check if at any given position, both the indexed letter and the seen 'array' both have 1:

Lets use the seen binary for a, c and d:

00001101

and test for letters b and a using AND:

ANDtruefalse
truetruefalse
falsefalsefalse

And returns true if both compared values are true. In case of seen and 'b':

s b result
1 0 0
0 1 0
1 0 0
1 0 0
0 0 0
0 0 0
0 0 0
0 0 0

since there are no common bits between the binary representation of seen and 'b' the result is 00000000 which is 0.

Now, lets add b to our seen list using OR that combines the truth values, and test for a:

s a result
1 1 1
1 0 0
1 0 0
1 0 0
0 0 0
0 0 0
0 0 0
0 0 0

Notice the result now, is 00000001 which is 1!


Leave a Reply

Your email address will not be published. Required fields are marked *