Entropy of English text: Experiments with humans and a machine learning system based on rough sets

Moradi, Hamid; Grzymala-Busse, Jerzy W.; Roberts, James A.

View/Open

j42-hamid.pdf (59.03Kb)

Issue Date

1998-01

Author

Moradi, Hamid

Grzymala-Busse, Jerzy W.

Roberts, James A.

Publisher

ELSEVIER SCIENCE INC

Format

60449 bytes

Type

Article

Metadata

Show full item record

Abstract

The goal of this paper is to show the dependency of the entropy of English text on the subject of the experiment, the type of English text, and the methodology used to estimate the entropy. Claude Shannon first described the technique for estimating the entropy of English text by a human subject guessing the next letter after viewing a string of characters taken from actual text. We show how this result is affected by using different humans in the experiment (Shannon used only his wife) and by using different types of text material (Shannon used only a single book). We also show how the results are affected when we replace the human subjects with a machine learning system based on rough sets. Automating the play of the guessing game with this system, called LERS, gives rise to a lossless data compression scheme. (C) Elsevier Science Inc. 1998.

Citation

Moradi, H; GrzymalaBusse, JW; Roberts, JA. Entropy of English text: Experiments with humans and a machine learning system based on rough sets. INFORMATION SCIENCES. Jan 1998. 104:31-47.

The University of Kansas prohibits discrimination on the basis of race, color, ethnicity, religion, sex, national origin, age, ancestry, disability, status as a veteran, sexual orientation, marital status, parental status, gender identity, gender expression and genetic information in the University’s programs and activities. The following person has been designated to handle inquiries regarding the non-discrimination policies: Director of the Office of Institutional Opportunity and Access, IOA@ku.edu, 1246 W. Campus Road, Room 153A, Lawrence, KS, 66045, (785)864-6414, 711 TTY.