From msuinfo!agate!howland.reston.ans.net!europa.eng.gtefsd.com!usenet Fri Sep 17 20:13:20 1993 Path: msuinfo!agate!howland.reston.ans.net!europa.eng.gtefsd.com!usenet From: Sig@Seuss.Vantage.GTE.COM Newsgroups: comp.ai,comp.ai.nat-lang,comp.compression,comp.compression.research,sci.crypt Subject: American language standardized dictionary for text compression Date: 17 Sep 1993 13:58:26 GMT Organization: GTE Lines: 62 Message-ID: <27cfq2$nhj@europa.eng.gtefsd.com> NNTP-Posting-Host: seuss.vantage.gte.com Xref: msuinfo comp.ai:18911 comp.ai.nat-lang:706 comp.compression:8845 comp.compression.research:1087 sci.crypt:19365 As an aid to those involved in natural language parsing, dictionary compression, or textual encryption, I have been collecting and compiling a lengthy list of words. It is expected that a comprehensive standardized dictionary will eventually result. This dictionary should contain most common American words, abbreviations, hyphenations, and even incorrect spellings. An anonymous ftp server has been built on wocket.vantage.gte.com which contains the following files in the pub/standard_dictionary directory: -r--r--r-- 1 ftp ftp 1269760 Aug 16 08:36 dic-0893.tar -r--r--r-- 1 ftp ftp 523393 Aug 16 08:43 dic-0893.tar.Z -r--r--r-- 1 ftp ftp 421239 Aug 16 08:39 dic-0893.zip -r--r--r-- 1 ftp ftp 3186688 Sep 17 08:26 dic-0993.tar -r--r--r-- 1 ftp ftp 1503561 Sep 17 09:27 dic-0993.tar.Z -r--r--r-- 1 ftp ftp 3052 Sep 17 08:26 length02.txt -r--r--r-- 1 ftp ftp 37805 Sep 17 08:26 length03.txt -r--r--r-- 1 ftp ftp 99996 Sep 17 08:26 length04.txt -r--r--r-- 1 ftp ftp 212723 Sep 17 08:26 length05.txt -r--r--r-- 1 ftp ftp 361496 Sep 17 08:26 length06.txt -r--r--r-- 1 ftp ftp 456741 Sep 17 08:26 length07.txt -r--r--r-- 1 ftp ftp 609880 Sep 17 08:26 length08.txt -r--r--r-- 1 ftp ftp 388586 Sep 17 08:26 length09.txt -r--r--r-- 1 ftp ftp 305936 Sep 17 08:26 length10.txt -r--r--r-- 1 ftp ftp 228787 Sep 17 08:26 length11.txt -r--r--r-- 1 ftp ftp 170744 Sep 17 08:26 length12.txt -r--r--r-- 1 ftp ftp 108060 Sep 17 08:26 length13.txt -r--r--r-- 1 ftp ftp 70864 Sep 17 08:26 length14.txt -r--r--r-- 1 ftp ftp 43384 Sep 17 08:26 length15.txt -r--r--r-- 1 ftp ftp 26478 Sep 17 08:26 length16.txt -r--r--r-- 1 ftp ftp 14953 Sep 17 08:26 length17.txt -r--r--r-- 1 ftp ftp 7980 Sep 17 08:26 length18.txt -r--r--r-- 1 ftp ftp 5397 Sep 17 08:26 length19.txt -r--r--r-- 1 ftp ftp 2948 Sep 17 08:26 length20.txt -r--r--r-- 1 ftp ftp 1978 Sep 17 08:26 length21.txt -r--r--r-- 1 ftp ftp 1440 Sep 17 08:26 length22.txt -r--r--r-- 1 ftp ftp 825 Sep 17 08:26 length23.txt -r--r--r-- 1 ftp ftp 650 Sep 17 08:26 length24.txt -r--r--r-- 1 ftp ftp 297 Sep 17 08:26 length25.txt -r--r--r-- 1 ftp ftp 140 Sep 17 08:26 length26.txt -r--r--r-- 1 ftp ftp 116 Sep 17 08:26 length27.txt -r--r--r-- 1 ftp ftp 30 Sep 17 08:26 length28.txt -r--r--r-- 1 ftp ftp 0 Sep 17 08:26 length29.txt -r--r--r-- 1 ftp ftp 0 Sep 17 08:26 length30.txt -r--r--r-- 1 ftp ftp 0 Sep 17 08:26 length31.txt -r--r--r-- 1 ftp ftp 34 Sep 17 08:26 length32.txt -r--r--r-- 1 ftp ftp 11521 Aug 13 16:35 tarread.com The most recent compilation being dic-0993.tar is composed of the 31 text files and may be restored on an MS-DOS computer using the tarread.com utility program. Any words for inclusion in future dictionaries should be submitted to my E-Mail address directly or placed in the /pub/incoming directory. Please compare your dictionaries with standard Unix 'words' and submit only the differences. Many thanks to those that have submitted the 200,000 words during the last month. Take care. - Sig Sigurd P. Crossland Advanced Technology Lab Telephone: (703) 818-8504 GTE Facsimile: (703) 802-3110 15000 Conference Center Drive Internet: sig@seuss.vantage.gte.com Chantilly, VA 22021 Home: (703) 818-8942 -------- check out: ftp.funet.fi:/pub/unix/security/dictionaries They have dictionaries of many languages there, and if you have about 3 megs to spare you can even get Webster's!