Well, now let us get back to the topic at hand. A couple of days back, I was cribbing as to “Why I will Never Have a Girlfriend” after reading this thought provoking article. The author used pure maths, statistics and proper census data to conclude that there are exactly 18,726 non-committed girls who are beautiful, intelligent and might also like me (yeah, the last point is debatable, but a safe 50% chance of liking me is taken into consideration… this is my blog… sue me). Though this number doesn’t appear to be very small, statistically the search for 1 in 18,726 would take 67 years to complete. Hence the title of the article was well justified.

However, just when I was trying to overcome my disappointment I got to know that the beautiful girls are not in a much good position either! However, the analysis isn’t as simple as the guys. It involves Game Theory, Nash Equilibrium and the famous NP-complete *Interactive Decision Problem*! (Yes, women are _really_ hard to understand!). To give a brief idea, let me remind you of the movie **The Beautiful Mind**. It dealt with a situation as follows: Suppose you are hanging out in a bar with a couple of your friends and there is a group of beautiful girls all of whom are brunettes except one who is a blonde. Now, all the boys would like to approach the blonde first, however this doesnt quite seem to be a good strategy. Here is how Nash argued:

If we all go for the blonde, we block each other and not a single one of us is going to get her. So then we go for her friends, but they will all give us the cold shoulder because nobody likes to be second choice. But what if no one goes to the blonde? We don’t get in each other’s way and we don’ t insult the other girls. That’s the only way we win“.

Things are turning interesting now! This means that the prettiest girl in a group will always be available for potential guys to ask. But there is a catch! Consider a guy who is attracted to our beautiful blonde girl Carol and would like to talk to her. Realizing that Carol is shy, he is also a bit skeptical about his decision. But there are 3 possible outcomes as highlighted by Dr. José-Manuel Rey:

- He talks to Carol and she responds in a friendly manner and he gets her phone number. (‘a’ reward points)
- He does not approach Carol. Instead enjoys doing something else little less rewarding. (‘b’ reward points)
- He talks to Carol and she proves uninterested and he feels miserable for a week. (‘0’ reward points)

Let us associate reward points with each of these actions. First outcome has reward points ‘a’, second has b reward points and third has zero reward points. Clearly, a > b > 0. Now, if there are N guys taking independent decisions, we will have possible set of actions (aah, NP-complete!). However, for a particular guy, success means that guy approaches Carol while all others donot. By symmetry, let us assume that the proabability of each guy approaching Carol is p, so for our man to approach Carol, the award associated with it is (He approaches Carol, while other N-1 guys don’t!). Compared to this, the reward he gets doing something else is b. So, for so called pareto optimal solution/ Nash Equilibrium,

i.e.

This result is pretty interesting. Some of the few points that it highlights is:

1. As N increases, p tends to zero! Which implies no one will go for the most beautiful girl around as everyone will be too afraid to ask her out. Waah!

2. If you are a beautiful girl (with a large value of N) reading this , don’t despair, make a move with any one out of those N guys

3. If you are a guy reading this, don’t despair if the girl has a large fan following! In fact, the more the better are your chances!

4. If you are neither, then stop pointing out Mathematical flaws in this article and get a life you loser! (yeah, look who is talking)

PS: On a serious note, Game Theory is a really really interesting topic! If anyone of you is interested, I recommend an excellent book by **Roger B. Myerson** titled **Game Theory: Analysis of Conflict**.

PS 2: The analysis is still incomplete due to an error pointed out by an ingenious personality (read point 4). I am still not working on it because I don’t give a damn!

]]>

**Configuring Pidgin (in Linux):**

For this you will need libpurple-bin to be installed on your system. You can find this in all common linux repositories. For eg. in Ubuntu, this can be done by a simple command:

$ sudo apt-get install libpurple-bin

Once this is done, we need a simple shell script to change our status message. For eg. the script below, makes the current date/time being set up in the status message in case you are “available” in the format :

Wed Jul 2 13:46:53 EDT 2008

and updates it every 30 seconds.

sleep_time=30 while test 1 = 1; do msg=`date` if test "`purple-remote getstatus`" = "available"; then purple-remote "setstatus?status=available&message=${msg}" oldmsg="$msg" fi sleep "${sleep_time}s" done

This file can be downloaded here.

Let us call it status.sh

Now what remains is just to execute this status.sh with pidgin running and you are done! The status messages will keep on changing every 30 seconds and I guarantee it will be fun! Please don’t tamper much with the sleep time because while you are chatting, your status message is displayed on your friend’s side whenever it is changed. So you can imagine that it becomes quite irksome for your friend to continuously notice your status message change.

**Configuring GTalk (in Windows):**

Configrung GTalk messages programatically in Windows is a little tricky (another reason to switch to open source linux ) . For this, you need to have a Java Development environment in your system as we need to write XMPP application code using a well known API called Smack API which is in Java.

So, first of all, let us start with installing Java on the system (if you donot already have it) which you can get here. Once this is done, we need to get the Smack API which is an open source, pure Java library for working with XMPP (clients only). The API can be downloaded from http://www.igniterealtime.org/downloads/index.jsp#smack. The source code for this can be obtained from http://www.igniterealtime.org/downloads/source.jsp.Just download the appropriate libraries and put the JAR files in your classpath. If you can’t figure out how it is done, please Google your way around or refer this.

Now, we get down on writing the code!

The java file will look like:

// Import appropriate headers and packages XMPPConnection connection = new XMPPConnection("gmail.com"); try { connection.connect(); connection.login("username", "password"); Presence presence = new Presence(Presence.Type.available); presence.setStatus("Hello Friends!"); presence.setPriority(24); presence.setMode(Presence.Mode.available); connection.sendPacket(presence); Thread.sleep(30000); // Sleeps for 30 seconds } catch (XMPPException e) { System.out.println(e.toString()); }

The thing to notice here is that the argument of setStatus is a String. So, to rotate your status message, just use the following function:

private static String rotate(String input) { return input.substring(1) + input.charAt(0); }

which takes in a simple string as input and rotates it one character at a time:

eg. CORNELL

ORNELLC

RNELLCO etc.

So, get down to it and Enjoy!

]]>

Ever wondered how the browser/text editor spell checkers work? They tend to take some bare minimum space and are very time efficient. You guessed it right : Bloom Filters is the key! I have designed a Spellchecker in C language using bloom filters and would like to explain how it works, thereby explaining bloom filters in general.

Now, the data structure of the spell checker is basically a m-bit array. Moreover we have k hash functions which act on strings and return a number between 1 to m-1. For inserting each word in the dictionary, we calculate its hash and set the corresponding bit it generated to be 1. So for each word, it is acted upon by k hash functions and correspondingly k bits are set to 1. Similarly, other words are also added in a similar fashion. Now, for testing membership, we calculate the k hashes of the word to be tested and if either of the bit corresponding to the generated position is found to be zero, we reject it. Clearly, false positives are possible (the bit being set to 1 by some other string) but false negatives can never be possible.

And now, the coolest part. Let us mathematically play around with the probability of false positives :

Assume that a hash function selects each array position with equal probability. The probability that a certain bit is not set to one by a certain hash function during the insertion of an element is then

- .

The probability that it is not set by any of the hash functions is

- .

If we have inserted *n* elements, the probability that a certain bit is still 0 is

- ;

the probability that it is 1 is therefore

- .

Now test membership of an element that is not in the set. Each of the *k* array positions computed by the hash functions is 1 with a probability as above. The probability of all of them being 1, which would cause the algorithm to erroneously claim that the element is in the set, is then

- .

Obviously, the probability of false positives decreases as *m* (the number of bits in the array) increases, and increases as *n* (the number of inserted elements) increases. For a given *m* and *n*, the value of *k* (the number of hash functions) that minimizes the probability is

- ,

which gives the false positive probability of

- .

OK, there was quite a bit of catch in assuming that the “hash function selects each array position with equal probability”. Theoretically this is not quite possible. Randomness is not to random in the world of computer science, but it is nevertheless a good approximation (or at least I would LIKE to believe that)

**Implementation**

Source code of the spell checker can be obtained here

The spell checker presently has 3 options :

**1. Add a new word** : This basically requires making the word go through k hash functions (k = 1 1 in my implementation) and the setting of corresponding bits in the bit array. The standard hash functions I used are:

- RSHash
- JSHash
- PJWHash
- ELFHash
- BKDRHash
- SDBMHash
- DJBHash
- DEKHash
- BPHash
- FNVHash
- APHash

These functions are a beauty in themselves, I will will also blog about them soon

**2. Query a word :** Requires the word to be operated on by k hash functions are returned true if all the corresponding bits are set.

**3. Recreate optimized bit array :** Now, as shown above, the value of k that minimizes the probability of false positives

So, here we have k fixed and n depends on the number of words in the dictionary. So, we can only edit m, i.e. the number of bits in the bit array. Therefore, whenever we needs to recreate the optimized bit array, we simply re-calculate m based on this equation and re-hash the words.

Operation 1 and 2 are constant time operations while operation 3 is quite expensive and must be used once in a while. Moreover, just an interesting fact is that a Bloom filter with 1% error and an optimal value of *k*, on the other hand, requires only about 9.6 bits per element — regardless of the size of the elements! This advantage comes partly from its compactness, inherited from arrays, and partly from its probabilistic nature. If a 1% false positive rate seems too high, each time we add about 4.8 bits per element we decrease it by ten times.

**References :**

[1] http://en.wikipedia.org/wiki/Bloom_filter

[2] http://michael.dipperstein.com/bitlibs/

[3] http://www.partow.net/programming/hashfunctions/index.html

]]>