# 1 Introduction:

The infinite monkey theorem states that a monkey touching keys haphazardly on a keyboard for an infinite quantity of your time can nearly sure group a given text. Supported this idea, we tend to developed a series of pc programs & routines that simulate a monkey typewriting random keys associated generating an output of text. The goal was to explore properties and doable use of character frequency of a tongue text.

In this assignment I develop totally different programs to hide the wants. Every question was created in numerous file named once the matter name therefore it’ll be simple to differentiate them. All the issues got to use the matrix, therefore I develop a separate file to feature books then generate the orders (first, second, third and fourth order). In the initial 3 queries we want to use a wordbook to match the words generated from the monkey drawback with our wordbook and realize significant words. The program was designed using .NET framework with Telerik tools.

# 2 Generate Orders Program:

In Generate Orders, I generate firsts, second, third and fourth orders by generating one, 2, three and four dimensional arrays severally.

For the third order, I generate it 1st using a pair of dimensional array wherever I loop through the text and take each a pair of characters and store them in my array and count the third character incidence for every 2 characters, however it took long term to get the matrix. So, I made a decision to undertake it with 3 dimensional array and it had been a lot of quicker.

I completed the remainder of issues using the 3 dimensional array methodology, however I displayed my 2 dimensional array for the third order matrix in downside 1e.

When I use the third dimensional array for third order monkey downside I sleek the array by adding one to all or any the weather within the array, as a result of while not the smoothing generating the text can begin by generating some words then it can end up with typing aaaaaaaaa as a result of the summation of the third character incidence will initiate to zero and zero is that the index of “a” so it\’ll print “a”.

While in fourth order matrix I didn’t use the smoothing as a result of using it didn’t offer me nearly as good word yields as within the third dimensional array and also the generated text was containing all the symbols that shouldn’t be occurred that often. therefore to avoid the matter of typewriting aaaaaaaa within the generated text I supplemental further condition that if the summation of the fourth character is zero don’t print something therefore we are going to not find yourself with aaaaaaaaaa.

# 3 Problems/Output:

Generate simple monkey downside, then compare the result with the lexicon file, the program runs to sort a hundred thousand characters.

To do this, I used a random generator perform, and therefore the output of this perform was thought of to be the index of the alphabet.

Dim validchars As String = “abcdefghijklmnopqrstuvwxyz,.;:?!()-‘@””# ”

Dim idx As Integer = rand.Next(0, validchars.Length)

Dim randomChar As Char = validchars(idx)

To match the text with the dictionary I used arrayFind() perform, wherever every word from the generated text is compared with the dictionary in an exceedingly binary explore for quicker output. This perform is employed within the initial 3 issues.

As it is clearly seen that the matching words are little word and most of them are one letter like “a”

## 3.1 Problem 1a:

## 3.2 Problem 1b:

Generate initial order monkey drawback from the character distribution provided within the assignment for Act III of Hamlet, the program runs to kind 100,000 characters.

To do this, I build an array that contains the character distribution for Act III of Hamlet, so I generated a random range between zero and therefore the total range of prevalence for all the characters.

Dim idx As Integer = rand.Next(0, total – 1)

For j As Integer = 0 To 27 S

= S + Dist(j)

If idx < S And flag = False Then

randomChar = validchars(j)

sb.Append(randomChar)

flag = True End If

Next j

As we will see the word count within the 1st order monkey is quite the easy monkey drawback, conjointly with additional varieties in words, though most of the words square measure short words with 2 to 3 characters.

The purposeful word count was 2700 words and also the proportion of the right words to the entire range of written words is 13.68%

## 3.3 Problem 1c:

Generate initial, second and third order monkey issues, the program runs to sort one hundred,000 characters then compares the written characters with the lexicon.

The user will select a book that he/she needs to get the primary, second and third order for from the change posture list that contains all the books that have a matrix. Then the user will generate the text and so compare it with the lexicon.

Generate initial, second and third order monkey issues, the program runs to sort one hundred,000 characters then compares the written characters with the lexicon.

The user will select a book that he/she needs to get the primary, second and third order for from the change posture list that contains all the books that have a matrix. Then the user will generate the text and so compare it with the lexicon.

For initial order, a similar rule for the matter 1b was used; the operate for this half is RadFirst_G_Click(). For second order, the rule from flier was accustomed generate the text from the second order correlation matrix; the operate for this half is RadSecond_G_Click()

Dim idx As Integer = rand.Next(0, FO_intArray(firstCh))

For j As Integer = 0 To 39

S = S + ResultsArray(firstC,j) If idx < S And flag = False Then randomChar = validchars(j) sb.Append(randomChar) temp = j flag = True

End If Next j firstCh = temp

At the start, I generated a random range between zero and FO_intArray at the primary character index. The “FO_intArray” Array is that the total of all the prevalence of the second characters given the primary character. This data was hold on within the system whereas generating the second order matrix.

Then, I loop to search out the second order per the books algorithmic program. For Third order, the algorithmic program from Bennett was used to generate the text from the third order matrix, the perform for this half is RadThird_G_Click()

For x As Integer = 0 To ResultArray.GetUpperBound(2) sum += ResultArray(firstCh, SecondCh, x)

Next

Dim idx As Integer = 0 idx = rand.Next(0, sum) For j As Integer = 0 To 39

S = S + ResultArray(firstCh, SecondCh, j) If idx <= S Then randomChar = validchars(j) sb.Append(randomChar)

temp = j Exit For End If sNext j firstCh = SecondCh

SecondCh = temp

At the first loop, I sum all the occurrence of the third characters given the first and second characters, then I generated the random number between 0 and the sum I just calculate.

The last loop is where I found the third character that the monkey will type according to the books algorithm.

For fun and curiosity, I generated fourth order monkey. The program runs to type 1000,000 characters but I added a condition that if the sum is 0 which happens a lot, don’t print anything.

With this condition, the output text contains a few words comparing with the other orders but most of these words are meaningful words. The algorithm used for this part is the same as the one for the third order monkey but with four dimensional array and with the condition that if the sum is zero don’t print anything. The function for this part is RadFourth_G_Click()

## 3.4 Problem 1d:

To change the resolution of the matrix we are going to divide all entries within the frequency matrix by a continuing issue.

To do so, the user 1st opt for the book he/she needs to vary the resolution of, then enters a continuing issue to divide the matrix with and press Generate New Matrix, a brand new matrix are going to be generated and therefore the user will generate the text and compare it with the lexicon.

The matrix that has been employed in this drawback is that the second order matrix.

The perform that was wont to divide the matrix by the issue is RadMatrix_Click()

## 3.5 Problem 1e:

Routine to calculate the matrix were already tired “Generate Order” to be ready to solve the previous issues. To show the matrix the user has got to opt for a book to show from the primary, second, third order and 2nd third order matrix.

The perform that was wont to show the primary, second, third order and 2nd third order matrix is RadFirst_G_Click(), RadSecond_G_Click(), RadThird_G_Click() and RadThird_2D_Click() severally.

## 3.6 Problem 1f

This drawback asks for computing the foremost probable digraph path that starts with letter T. I generated digraph paths from 1st and second order matrix. For 1st order matrix I didn’t specify the primary letter however within the second order matrix I specify that the digraph ought to begin with T letter because it was mentioned within the assignment.

For 1st order digraph path I used this operate RadPath1_Click()

For j = 0 To 39 flag = False

For i = 0 To 39

If FO_intArray(i) > FO_intArray(max) Then

max = i

flag = True

End If

Next

If flag = True Then

G_text1.Text = G_text1.Text & validchars(max)

FO_intArray(max) = 0

End If

Next

## 3.7 Problem 1g:

To make author attribution I used 2 ways to check that one will provide us a higher result. I used each ways on initial and second order matrix. The 2 ways I used are Euclidean distance and scalar product.

Euclidean distance: we tend to take 2 frequency tables M and N and reckon the space by this equation for the second order, or for the primary order.

Before computing the space I normalize the matrixes therefore the addition of all components will be one to give all matrixes identical weight. The perform that was wont to calculate this is often RadED_Click() for initial order and RadED2_Click() for second order.

## 3.8 Problem 1h:

The same technique accustomed classify author attribution was utilized in this problem; Euclidean distance for the primary order matrix. To see if the metric may classify genre, every book was compared with alternative books from completely different genre and with books with a similar genre, then we have a tendency to compare the results and see if the gap between books with same genre is a smaller amount than books with completely different genre.

Books written by a similar author weren’t compared, thus author based mostly} correlation won\’t have an effect on our genre based correlation.

To do so, I used a perform Compare_Click() that add the gap between the most book (NewArray1) that I would like to match alternative books with and also the alternative books I choose (ReturnArray). The count that I take advantage of within the loop is that the variety of books I choose to match with, thus if I choose a pair of books to match my main book, I’ll have 2 loops and every time I’ll fetch the matrix of the books elite from ReadFile() perform.

For x = 0 To count – 1

Dim ReturnArray() As Double ReturnArray = ReadFile(BooksName(x)) sum = 0 For f = 0 To 39

sum = sum + Math.Pow(NewArray1(f) – ReturnArray(f), 2) Next

distance = distance + Math.Sqrt(sum)

Next

## 3.9 Problem 1i:

To make an author profile, I mix all the books for one author in one document then generate initial order matrix for this author. I compare completely different authors by using 2 strategies the Euclidian Distance and inner product for initial order matrix.

# 4 Conclusion:

In this document I mentioned totally different algorithms that were enforced in my program. The correlation matrices that were enforced for the monkey drawback were employed in all of the later issues.

For the monkey drawback we have a tendency to saw that the word count will increase with the order of the frequency table. Also, after us modification the resolution of the table the word count will increase.

Most probable ways were generated for initial and second order; similarity between ways for various authors was determined that create it tough to use the foremost probable ways for author attribution.

Different strategies were enforced for author attribution and author profile; Euclidean Distance and inner product. In my opinion Euclidean Distance provides additional correct results because it doesn’t rely on the training set just like the inner product. Also, if we have a tendency to get the space between a similar book or a similar author we’ve got a zero worth that we have a tendency to don’t get in inner product.

Euclidean Distance was additionally accustomed classify stories on their genre, stories with a similar genre gave smaller distance than stories with totally different genre which implies that Euclidean Distance may classify stories.