Problems of Assembling, Describing, and Computerizing Corpora. Research Techniques and Prospects. Papers in Southwest English, No. 1 [electronic resource] / W. N. Francis.

The paper investigates the problems of assembling, describing and computerizing corpora, defined as collections of "texts assumed to be representative of a given language, dialect or other subject of a language, to be used for linguistic analysis." Specific reference is made to the formati...

Full description

Saved in:
Bibliographic Details
Online Access: Full Text (via ERIC)
Main Author: Francis, W. N.
Corporate Author: Trinity University (San Antonio, Tex.)
Format: Electronic eBook
Language:English
Published: [S.l.] : Distributed by ERIC Clearinghouse, 1975.
Subjects:

MARC

LEADER 00000cam a22000002u 4500
001 b6533087
003 CoU
005 20080306155037.6
006 m d f
007 cr un
008 750101s1975 xx |||| ot ||| | eng d
035 |a (ERIC)ed111204 
040 |a ericd  |c ericd  |d MvI 
099 |f ERIC DOC #  |a ED111204 
099 |f ERIC DOC #  |a ED111204 
100 1 |a Francis, W. N. 
245 1 0 |a Problems of Assembling, Describing, and Computerizing Corpora. Research Techniques and Prospects. Papers in Southwest English, No. 1  |h [electronic resource] /  |c W. N. Francis. 
260 |a [S.l.] :  |b Distributed by ERIC Clearinghouse,  |c 1975. 
300 |a 25 p. 
500 |a ERIC Document Number: ED111204. 
500 |a Availability: Trinity University, San Antonio, Texas 78222 ($2.00).  |5 ericd. 
520 |a The paper investigates the problems of assembling, describing and computerizing corpora, defined as collections of "texts assumed to be representative of a given language, dialect or other subject of a language, to be used for linguistic analysis." Specific reference is made to the formation of the Brown Standard Corpus. The formation of a corpus is justified in terms of saving effort and in providing a compilation of data that will serve as a research tool in comparative studies. Important questions in the process concern the body of language from which the sample will be drawn, the size of the sample, and its structure. These, in turn, are dependent on the purpose for which the corpus is assembled: graphic analysis will require a different corpus than will phonological or grammatical analysis, for example, the latter presenting the most problems. Practical constraints on the size of the corpus, including time, energy and money are mentioned. The organization of the corpus is discussed, underlining such factors as the size of the base units, mode of selection and collection, assembly of the corpus and computerization. The question of how much additional explanatory material should accompany the corpus is raised, with particular reference to lexical and semantic analyses. (CLK) 
650 1 7 |a Comparative Analysis.  |2 ericd. 
650 1 7 |a Computational Linguistics.  |2 ericd. 
650 0 7 |a Contrastive Linguistics.  |2 ericd. 
650 1 7 |a Data Collection.  |2 ericd. 
650 0 7 |a Descriptive Linguistics.  |2 ericd. 
650 0 7 |a Language Research.  |2 ericd. 
650 0 7 |a Linguistic Competence.  |2 ericd. 
650 0 7 |a Linguistic Performance.  |2 ericd. 
650 1 7 |a Research Tools.  |2 ericd. 
650 0 7 |a Semantics.  |2 ericd. 
650 1 7 |a Word Frequency.  |2 ericd. 
650 0 7 |a Word Lists.  |2 ericd. 
710 2 |a Trinity University (San Antonio, Tex.) 
856 4 0 |u http://files.eric.ed.gov/fulltext/ED111204.pdf  |z Full Text (via ERIC) 
907 |a .b65330870  |b 07-06-22  |c 10-19-10 
998 |a web  |b 10-19-12  |c f  |d m   |e -  |f eng  |g xx   |h 0  |i 1 
956 |a ERIC 
999 f f |i 59a687f1-162c-5fcc-965f-fdf7e3291f57  |s 12e938a5-543e-51ca-8360-bd948dfccca4 
952 f f |p Can circulate  |a University of Colorado Boulder  |b Online  |c Online  |d Online  |e ED111204  |h Other scheme  |i web  |n 1