Search This Blog

Wednesday 18 September 2013

test page hamming distance

1.1 Understanding Hamming Distance



1.1.1 What is the Hamming distance between two binary strings?


The Hamming distance of two code words can be calculated as the number of 1 bits in the bitwise exclusive-or of the two code words. For example, the Hamming distance between the code words 1001 and 0101 is:


1001


0101 (XOR)


----------------


1100


----------------


Since there are two 1's in the output the Hamming Distance is 2.


1.1.2 How does error detection map to Hamming Distance?




This is what I understand:









Number of errors that can be detected = Minimum Hamming Distance between any two code words





For calculating minimum Hamming distance you can take any two adjacent code words.


For calculating maximum Hamming distance you should take the first and last code word in the list.


1.1.2.1 Example with 3 bit code words

































Data Word



Even Parity



Code Word



00



0



000



01



1



011



10



1



101



11



0



110




Minimum Hamming Distance = 2


Therefore number of errors that can be detected = 2


Maximum Hamming Distance = 2


Given any two codes in the above list, you need two at least two bit flips to convert one code to another. Therefore this coding approach can detect up to two errors.


1.1.2.2 Example with 4 bit code words




















































Data Word



Even Parity Bit



Code Word



000



0



0000



001



1



0011



010



1



0101



011



0



0110



100



1



1001



101



0



1010



110



0



1100



111



1



1111





Minimum Hamming Distance = 2


Maximum Hamming Distance = 4


This coding scheme can detect up to 2 errors


We can generalize saying that when using Parity, the minimum Hamming distance will always be equal to 2.


To detect d single-bit errors, a code must have a minimum Hamming distance of at least d + 1.


To correct d single-bit errors, a code must have a minimum Hamming distance of at least 2d + 1.



1.2 Working out the Hamming Code





1.3 The Internet Checksum




























RFC



Title



Description



1071



Computing the Internet Checksum





1141



Incremental Updating of the Internet Checksum





1624



Computation of Internet Checksum via Incremental Update






Let us say we want to send the following 64 bit data stream:














Binary



0000000000000001 1111001000000011 1111010011110101 1111011011110111



Hexadecimal



0001 F203 F4F5 F6F7




Sender - Checksum computation






































































Step



Description



Variable Name



Binary Representation



Hexadecimal Representation



1



Arrange the 64 bit data as 16 bit words - one below the other



First Byte



0000000000000001



0001



Second Byte



1111001000000011



F203



Third Byte



1111010011110101



F4F5



Fourth Byte



1111011011110111



F6F7



2



Put all zeroes in the checksum position



Checksum Byte



0000000000000000



0000



3



Add all the above numbers



Sum with carryover



101101110111110000 (The 10 on the left is the carryover of the addition)



2DDF0



4



Take the carryover bits from the number and add them back



16 bit Sum



1101110111110000


10 (+)


------------------------------


1101110111110010



DDF2



5



Negate the 16 bit Sum



Checksum



0010001000001101



220D



6



Append the Checksum to the original data and transmit.


The length of the data transmitted is 64+16 = 80 bits



Final Code



0000000000000001


1111001000000011


1111010011110101


1111011011110111


0010001000001101



0001


F203


F4F5


F6F7


220D




Receiver - Checksum Validation





























































Step



Description



Variable Name



Binary Representation



Hexadecimal Representation



1



Arrange the received data into 16 bit words - one below the other



First Byte



0000000000000001



0001



Second Byte



1111001000000011



F203



Third Byte



1111010011110101



F4F5



Fourth Byte



1111011011110111



F6F7



Checksum Byte



0010001000001101



220D



2



Add all the above numbers



Sum with carryover



101111111111111101



0000



3



Take the carryover bits from the number and add them back



16 bit Sum



1111111111111101


10 (+)


------------------------------


1111111111111111



DDF2



4



Negate the 16 bit Sum



Checksum Result



0000000000000000



0000




The same algorithm is applied at the receiver as well.


It can correct burst errors (errors in many consecutive bits rather than occurring independently of each other) up to 16 bits.


For 16 bit checksum there are 216 possible values. So the error detection is going to be much more efficient that Parity.


It is easy for the programmers to see what is going on inside a computer


http://www.youtube.com/user/numberphile/




















































Octal Representation



Binary Representation



UNIX file permissions



0



000



No Permission



1



001



Execute Only



2



010



Write Only



3



011



Write and Execute Only



4



100



Read Only



5



101



Read and Execute Only



6



110



Read and Write Only



7



111



Read, Write and Execute







2 Cyclic Redundancy Check



CRC is based on the mathematics of finite fields. It is difficult for us to work out the generator to be used because it has been done by people for years!


What are the problems of using a checksum?



1. If the checksum itself is corrupted, a correctly transmitted message might be incorrectly identified as a corrupted one. However, this is a safe-side failure.



2. A dangerous-side failure occurs where the message and/or checksum is corrupted in a manner that results in a transmission that is internally consistent.


Unfortunately, this possibility is completely unavoidable and the best that can be done is to minimize its probability by increasing the amount of information in the checksum (e.g. widening the checksum from one byte to two bytes).




What is a CRC algorithm?


CRC algorithms, which fall into the class of error detection algorithms that leave the data intact and append a checksum on the end.



The basic idea of CRC algorithms is simply to treat the message as an


enormous binary number, to divide it by another fixed binary number,


and to make the remainder from this division the checksum. Upon


receipt of the message, the receiver can perform the same division and


compare the remainder with the "checksum" (transmitted remainder).



Why division is better than addition?



Although the effect of each bit of the input message on the quotient


is not all that significant, the 4-bit remainder gets kicked about


quite a lot during the calculation, and if more bytes were added to


the message (dividend) it's value could change radically again very


quickly. This is why division works where addition doesn't.













CRC-1



most hardware; also known as parity bit



Sunday 30 June 2013

Introduction to Data Representation in Binary

1 Introduction

The Tsang Dynasty (June 18, 618 – June 1, 907) was one of the most influential empires of the Chinese civilization. From “The Thousand Golden Prescriptions” book of medicine to the invention of the gun powder the empire has a lot be proud about. Here we shall look at one of the less talked about inventions of the Tsang Dynasty – the playing cards.

After their origin in China, the playing cards went all around the world before finally taking up the format in which we know them today:

clip_image002

There are 52 cards in total, divided into 4 suits – Clubs, Hearts, Spades and Diamonds – each suit holding 13 cards with values ranging from Ace(1) to King(13). Having understood a bit about their history and appearance let us try to represent these playing cards as a series of bits (0s and 1s).


2 Representation (Encoding)

When you look at one of the cards in the card deck your eyes will tell you the suit of the card and its value. Similarly, the binary representation should also make it easy for you to identify the suit and the value.

So in how many ways can you represent these 52 cards in binary? Well, there are as many ways as your imagination will allow. Each way of representing is also called as an Encoding. Here we shall look at 4 types of encodings:

1. One hot encoding

2. Two hot encoding

3. Binary value encoding

4. Hybrid encoding


2.1 One-hot Encoding

This would probably be the most obvious representation. There are 52 cards, so I can use 52 bits to represent each card. Each card will have only one bit set to one and the remaining bits would be zero. The representation would look as below. Here the first bit might represent the Ace of Clubs and the next one the Two of Clubs and so on.

Note that:

1. Large number of bits are required (see how cramped the image looks, no room for the 0s and 1s to breathe)

2. Comparison of values and suits – our basic objective – is quite hard to meet

clip_image004


2.2 Two-hot Encoding

As an alternative to the highly dense one hot encoding we can try a slightly more intelligent representation. Since we have 4 suits and 13 values, we can use 4 bits to represent the suit type and 13 bits to represent the value of the card. This is called “Two Hot Encoding”. For any card: Two bits will be set to 1. You will notice that this would make it easier to compare the suit and the values. However, it still seems to use a large number of bits.

clip_image006


2.3 Binary Value Encoding

Let us now try to eliminate the problem identified with the previous approach – too many bits. If you think in binary, to represent numbers from 1 to 52, we need only 6 bits (26 is 64 which means we can represent up to 64 decimal numbers). Using this 6 bit representation (low order bits), a single card can be neatly fit into a byte. This uses much less bits than one or two hot encoding but in comparing two cards it is as crippled as one hot encoding.

clip_image008

2.4 Approach 4 - Hybrid Encoding

Considering the limitations of binary value encoding and the two hot encoding methods, let us try to create a hybrid approach for representation- two hot encoding but with reduced number of bits - 2 bits to represent the suit (00, 01, 10, and 11) and 4 bits for the card value (4 bits can represent 24 or 16 values).

As an aside, the card value bits 1101, 1110, and 1111 will not representing anything (so a total of 3*4 or 12 values will be “undefined” in this representation).

clip_image009


2.5 Comparing Representation Approaches

Type of Encoding

Number of bits/Card

Total Number of bits

% of Maximum

Ease of comparing two cards (Interpretation)

One Hot Encoding

52

2704

100%

HARD

Two Hot Encoding

17

884

32.69%

EASY

Binary Value Encoding

6

312

11.54%

HARD

Hybrid Encoding

6

312

11.54%

EASY

We started out by using 52 bits for a card and steadily evolved our representation to use only 6 bits per card (11.5% of the original)! The “Hybrid Encoding” proves to be the most effective method in terms of number of bits used and the ease of comparing the cards. In the coming section we shall see how hybrid encoding is implemented in C code.