Search This Blog

Sunday 30 June 2013

Introduction to Data Representation in Binary

1 Introduction

The Tsang Dynasty (June 18, 618 – June 1, 907) was one of the most influential empires of the Chinese civilization. From “The Thousand Golden Prescriptions” book of medicine to the invention of the gun powder the empire has a lot be proud about. Here we shall look at one of the less talked about inventions of the Tsang Dynasty – the playing cards.

After their origin in China, the playing cards went all around the world before finally taking up the format in which we know them today:

clip_image002

There are 52 cards in total, divided into 4 suits – Clubs, Hearts, Spades and Diamonds – each suit holding 13 cards with values ranging from Ace(1) to King(13). Having understood a bit about their history and appearance let us try to represent these playing cards as a series of bits (0s and 1s).


2 Representation (Encoding)

When you look at one of the cards in the card deck your eyes will tell you the suit of the card and its value. Similarly, the binary representation should also make it easy for you to identify the suit and the value.

So in how many ways can you represent these 52 cards in binary? Well, there are as many ways as your imagination will allow. Each way of representing is also called as an Encoding. Here we shall look at 4 types of encodings:

1. One hot encoding

2. Two hot encoding

3. Binary value encoding

4. Hybrid encoding


2.1 One-hot Encoding

This would probably be the most obvious representation. There are 52 cards, so I can use 52 bits to represent each card. Each card will have only one bit set to one and the remaining bits would be zero. The representation would look as below. Here the first bit might represent the Ace of Clubs and the next one the Two of Clubs and so on.

Note that:

1. Large number of bits are required (see how cramped the image looks, no room for the 0s and 1s to breathe)

2. Comparison of values and suits – our basic objective – is quite hard to meet

clip_image004


2.2 Two-hot Encoding

As an alternative to the highly dense one hot encoding we can try a slightly more intelligent representation. Since we have 4 suits and 13 values, we can use 4 bits to represent the suit type and 13 bits to represent the value of the card. This is called “Two Hot Encoding”. For any card: Two bits will be set to 1. You will notice that this would make it easier to compare the suit and the values. However, it still seems to use a large number of bits.

clip_image006


2.3 Binary Value Encoding

Let us now try to eliminate the problem identified with the previous approach – too many bits. If you think in binary, to represent numbers from 1 to 52, we need only 6 bits (26 is 64 which means we can represent up to 64 decimal numbers). Using this 6 bit representation (low order bits), a single card can be neatly fit into a byte. This uses much less bits than one or two hot encoding but in comparing two cards it is as crippled as one hot encoding.

clip_image008

2.4 Approach 4 - Hybrid Encoding

Considering the limitations of binary value encoding and the two hot encoding methods, let us try to create a hybrid approach for representation- two hot encoding but with reduced number of bits - 2 bits to represent the suit (00, 01, 10, and 11) and 4 bits for the card value (4 bits can represent 24 or 16 values).

As an aside, the card value bits 1101, 1110, and 1111 will not representing anything (so a total of 3*4 or 12 values will be “undefined” in this representation).

clip_image009


2.5 Comparing Representation Approaches

Type of Encoding

Number of bits/Card

Total Number of bits

% of Maximum

Ease of comparing two cards (Interpretation)

One Hot Encoding

52

2704

100%

HARD

Two Hot Encoding

17

884

32.69%

EASY

Binary Value Encoding

6

312

11.54%

HARD

Hybrid Encoding

6

312

11.54%

EASY

We started out by using 52 bits for a card and steadily evolved our representation to use only 6 bits per card (11.5% of the original)! The “Hybrid Encoding” proves to be the most effective method in terms of number of bits used and the ease of comparing the cards. In the coming section we shall see how hybrid encoding is implemented in C code.