answer.
Ask question
Login Signup
Ask question
All categories
  • English
  • Mathematics
  • Social Studies
  • Business
  • History
  • Health
  • Geography
  • Biology
  • Physics
  • Chemistry
  • Computers and Technology
  • Arts
  • World Languages
  • Spanish
  • French
  • German
  • Advanced Placement (AP)
  • SAT
  • Medicine
  • Law
  • Engineering
Bingel [31]
2 years ago
8

When an author produce an index for his or her book, the first step in this process is to decide which words should go into the

index; the second is to produce a list of the pages where each word occurs. Instead of trying to choose words out of our heads, we decided to let the computer produce a list of all the unique words used in the manuscript and their frequency of occurrence. We could then go over the list and choose which words to put into the index.
The main object in this problem is a "word" with associated frequency. The tentative definition of "word" here is a string of alphanumeric characters between markers where markers are white space and all punctuation marks; anything non-alphanumeric stops the reading. If we skip all un-allowed characters before getting the string, we should have exactly what we want. Ignoring words of fewer than three letters will remove from consideration such as "a", "is", "to", "do", and "by" that do not belong in an index.

In this project, you are asked to write a program to read any text file and then list all the "words" in alphabetic order with their frequency together appeared in the article. The "word" is defined above and has at least three letters.

Computers and Technology
1 answer:
Igoryamba2 years ago
7 0

Answer:

import string

dic = {}

book=open("book.txt","r")

# Iterate over each line in the book

for line in book.readlines():

   tex = line

   tex = tex.lower()

   tex=tex.translate(str.maketrans('', '', string.punctuation))

   new = tex.split()

   for word in new:

       if len(word) > 2:

           if word not in dic.keys():

               dic[word] = 1

           else:

               dic[word] = dic[word] + 1

for word in sorted(dic):

   print(word, dic[word], '\n')

                 

book.close()

Explanation:

The code above was written in python 3.

<em>import string </em>

Firstly, it is important to import all the modules that you will need. The string module was imported to allow us carry out special operations on strings.

<em>dic = {} </em>

<em>book=open("book.txt","r") </em>

<em> </em>

<em># Iterate over each line in the book</em>

<em>for line in book.readlines(): </em>

<em> </em>

<em>    tex = line </em>

<em>    tex = tex.lower() </em>

<em>    tex=tex.translate(str.maketrans('', '', string.punctuation)) </em>

<em>    new = tex.split() </em>

<em />

An empty dictionary is then created, a dictionary is needed to store both the word and the occurrences, with the word being the key and the occurrences being the value in a word : occurrence format.

Next, the file you want to read from is opened and then the code iterates over each line, punctuation and special characters are removed from the line and it is converted into a list of words that can be iterated over.

<em />

<em> </em><em>for word in new: </em>

<em>        if len(word) > 2: </em>

<em>            if word not in dic.keys(): </em>

<em>                dic[word] = 1 </em>

<em>            else: </em>

<em>                dic[word] = dic[word] + 1 </em>

<em />

For every word in the new list, if the length of the word is greater than 2 and the word is not already in the dictionary, add the word to the dictionary and give it a value 1.

If the word is already in the dictionary increase the value by 1.

<em>for word in sorted(dic): </em>

<em>    print(word, dic[word], '\n') </em>

<em>book.close()</em>

The dictionary is arranged alphabetically and with the keys(words) and printed out. Finally, the file is closed.

check attachment to see code in action.

You might be interested in
Which command suppresses the visibility of a particular Row or column in a worksheet?
leonid [27]

Answer:

Ctrl+Space is the keyboard shortcut to select an entire column.

Explanation:

When you press the Shift+Space shortcut the first time it will select the entire row within the Table.  Press Shift+Space a second time and it will select the entire row in the worksheet.

The same works for columns.  Ctrl+Space will select the column of data in the Table.  Pressing the keyboard shortcut a second time will include the column header of the Table in the selection.  Pressing Ctrl+Space a third time will select the entire column in the worksheet.

You can select multiple rows or columns by holding Shift and pressing the Arrow Keys multiple times.

4 0
2 years ago
Widget Corp. wants to shift its list of inventory to a cloud so that its different branches can access it easily. The company ne
____ [38]

Answer:

The best cloud option for Widget Corp considering the cloud computing option should be cost-effective and should not expose mission-critical applications and data to the outside world.

Is a hybrid could.

Explanation:

The reasons behind this answer are that in the first place the cloud is not going to be fully managed by a third party, but also by the IT department of Widget Corp allowing it to control its security. Also, that the cloud can be adapted to public or mainstream sources, resources, and platforms. Making it very user friendly and lowering down specific knowledge to use it.

5 0
2 years ago
given:an int variable k,an int array currentMembers that has been declared and initialized,an int variable memberID that has bee
Oksi-84 [34.3K]

Answer:

// The code segment is written in C++ programming language

// The code segment goes as follows

for (k = 0; k < nMembers; k++)

{

//check if memberID can be found in currentMembers

if (currentMembers[k] == memberID){

// If yes,

// assigns true to isAMember

isAMember = true;

k = nMembers;

}

else{

isAMember = false;

// If no

// assigns false to isAMember

}

}

// End of segment:

The following assumption were made in the code segment above.

There exists

1. An already declared and initialised int array currentMembers.

2. An already initialised int variable memberID

Line 3 initiates a loop to scan through the array

Line 6 checks for the condition below

If current element of array equals memberID then

It assigns true to isAMember and nMembers to k

Else

It assigns false to isAMember

7 0
2 years ago
A team of students is collaborating on a program to obtain local weather data from a website, and predict weather-related school
love history [14]

Answer:

1. Using meaningful names for all variables and functions.

2. Use shorter blocks of codes wherever possible

Explanation:

Their code should be easy to read, this can be achieved by avoiding complex syntax, basically they should adopt the two methods stated in the answer

7 0
2 years ago
When researching which keywords should be included in a résumé, what four sources are valuable resources?
zimovet [89]
I would suggest the following sources

1) job postings - those offer include the phrases that the employers themselves want to hear

2) keyword lists - they're made for looking for keywords!

3) professional site's skills sections. You could use LinkedIn or any other professional site.

4)Sometimes you can find software that can suggest keywords for you
4 0
2 years ago
Read 2 more answers
Other questions:
  • Which technology had the same effect in the 1920s as the internet did in the 2000s? the widespread effect of technology 1920s 20
    8·1 answer
  • Claire writes a letter to her grandmother, in which she describes an amusement park she visited last week. She adds pictures of
    11·1 answer
  • A ____ partition contains the data necessary to restore a hard drive back to its state at the time the computer was purchased an
    8·2 answers
  • Given a positive integer, output its complement number. The complement strategy is to flip the bits of its binary representation
    6·1 answer
  • 1. PGP encryption can be performed from the command line as well. What is the PGP command line syntax to encrypt the my-message.
    7·1 answer
  • Ciscon Telecom is a mobile operator in the European Union. The company provides personalized services to its customers, and its
    8·2 answers
  • 4. When emergency changes have to be made to systems, the system software may have to be modified before changes to the requirem
    13·1 answer
  • You resurrected an old worksheet. It appears to contain most of the information that you need, but not all of it. Which step sho
    5·1 answer
  • In what section of the MSDS would you find information that may help if you use this substance in a lab with a Bunsen burner?
    11·1 answer
  • The microprogram counter (MPC) contains the address of the next microcode statement for the Mic1 emulator to execute. The MPC va
    12·1 answer
Add answer
Login
Not registered? Fast signup
Signup
Login Signup
Ask question!