BNC2 POS-tagging Manual

GUIDELINES TO WORDCLASS TAGGING

[ Related documents: Introduction to the Manual | Automatic tagging of the BNC | Error rates | Acknowledgments]


CONTENTS

  1. Preliminaries: The Tagset, Ambiguity Tags and Appearance of tags and citations in the guide

  2. INTRODUCTION TO WORD CLASSES

  • DISAMBIGUATION GUIDE, by TAG PAIR

  • DISAMBIGUATION GUIDE, by WORD

  • Features of spoken corpus tagging


      Section 1. Preliminaries

      THE BNC BASIC TAGSET

      (also known as the "C5" tagset; followed by Ambiguity tag list)

      Tag

      Description

      AJ0

      Adjective (general or positive) (e.g. good, old, beautiful)

      AJC

      Comparative adjective (e.g. better, older)

      AJS

      Superlative adjective (e.g. best, oldest)

      AT0

      Article (e.g. the, a, an, no)

      AV0

      General adverb: an adverb not subclassified as AVP or AVQ (see below) (e.g. often, well, longer (adv.), furthest.

      AVP

      Adverb particle (e.g. up, off, out)

      AVQ

      Wh-adverb (e.g. when, where, how, why, wherever)

      CJC

      Coordinating conjunction (e.g. and, or, but)

      CJS

      Subordinating conjunction (e.g. although, when)

      CJT

      The subordinating conjunction that

      CRD

      Cardinal number (e.g. one, 3, fifty-five, 3609)

      DPS

      Possessive determiner-pronoun (e.g. your, their, his)

      DT0

      General determiner-pronoun: i.e. a determiner-pronoun which is not a DTQ or an AT0.

      DTQ

      Wh-determiner-pronoun (e.g. which, what, whose, whichever)

      EX0

      Existential there, i.e. there occurring in the there is ... or there are ... construction

      ITJ

      Interjection or other isolate (e.g. oh, yes, mhm, wow)

      NN0

      Common noun, neutral for number (e.g. aircraft, data, committee)

      NN1

      Singular common noun (e.g. pencil, goose, time, revelation)

      NN2

      Plural common noun (e.g. pencils, geese, times, revelations)

      NP0

      Proper noun (e.g. London, Michael, Mars, IBM)

      ORD

      Ordinal numeral (e.g. first, sixth, 77th, last) .

      PNI

      Indefinite pronoun (e.g. none, everything, one [as pronoun], nobody)

      PNP

      Personal pronoun (e.g. I, you, them, ours)

      PNQ

      Wh-pronoun (e.g. who, whoever, whom)

      PNX

      Reflexive pronoun (e.g. myself, yourself, itself, ourselves)

      POS

      The possessive or genitive marker 's or '

      PRF

      The preposition of

      PRP

      Preposition (except for of) (e.g. about, at, in, on, on behalf of, with)

      PUL

      Punctuation: left bracket - i.e. ( or [

      PUN

      Punctuation: general separating mark - i.e. . , ! , : ; - or ?

      PUQ

      Punctuation: quotation mark - i.e. ' or "

      PUR

      Punctuation: right bracket - i.e. ) or ]

      TO0

      Infinitive marker to

      UNC

      Unclassified items which are not appropriately considered as items of the English lexicon.

      VBB

      The present tense forms of the verb BE, except for is, 's: i.e. am, are, 'm, 're and be [subjunctive or imperative]

      VBD

      The past tense forms of the verb BE: was and were

      VBG

      The -ing form of the verb BE: being

      VBI

      The infinitive form of the verb BE: be

      VBN

      The past participle form of the verb BE: been

      VBZ

      The -s form of the verb BE: is, 's

      VDB

      The finite base form of the verb BE: do

      VDD

      The past tense form of the verb DO: did

      VDG

      The -ing form of the verb DO: doing

      VDI

      The infinitive form of the verb DO: do

      VDN

      The past participle form of the verb DO: done

      VDZ

      The -s form of the verb DO: does, 's

      VHB

      The finite base form of the verb HAVE: have, 've

      VHD

      The past tense form of the verb HAVE: had, 'd

      VHG

      The -ing form of the verb HAVE: having

      VHI

      The infinitive form of the verb HAVE: have

      VHN

      The past participle form of the verb HAVE: had

      VHZ

      The -s form of the verb HAVE: has, 's

      VM0

      Modal auxiliary verb (e.g. will, would, can, could, 'll, 'd)

      VVB

      The finite base form of lexical verbs (e.g. forget, send, live, return) [Including the imperative and present subjunctive]

      VVD

      The past tense form of lexical verbs (e.g. forgot, sent, lived, returned)

      VVG

      The -ing form of lexical verbs (e.g. forgetting, sending, living, returning)

      VVI

      The infinitive form of lexical verbs (e.g. forget, send, live, return)

      VVN

      The past participle form of lexical verbs (e.g. forgotten, sent, lived, returned)

      VVZ

      The -s form of lexical verbs (e.g. forgets, sends, lives, returns)

      XX0

      The negative particle not or n't

      ZZ0

      Alphabetical symbols (e.g. A, a, B, b, c, d)

      Total number of wordclass tags in the BNC basic tagset = 57, plus 4 punctuation tags

      2. Ambiguity Tag list

      In addition, there are 30 "Ambiguity Tags". These are applied wherever the probabilities assigned by the CLAWS automatic tagger to its first and second choice tags were considered too low for reliable disambiguation. So, for example, the ambiguity tag AJ0-AV0 indicates that the choice between adjective (AJ0) and adverb (AV0) is left open, although the tagger has a preference for an adjective reading. The mirror tag, AV0-AJ0, again shows adjective-adverb ambiguity, but this time the more likely reading is the adverb.

      Ambiguity tag

      Ambiguous between

      More probable tag

      AJ0-NN1

      AJ0 or NN1

      AJ0

      AJ0-VVD

      AJ0 or VVD

      AJ0

      AJ0-VVG

      AJ0 or VVG

      AJ0

      AJ0-VVN

      AJ0 or VVN

      AJ0

      AV0-AJ0

      AV0 or AJ0

      AV0

      AVP-PRP

      AVP or PRP

      AVP

      AVQ-CJS

      AVQ or CJS

      AVQ

      CJS-AVQ

      CJS or AVQ

      CJS

      CJS-PRP

      CJS or PRP

      CJS

      CJT-DT0

      CJT or DT0

      CJT

      CRD-PNI

      CRD or PNI

      CRD

      DT0-CJT

      DT0 or CJT

      DT0

      NN1-AJ0

      NN1 or AJ0

      NN1

      NN1-NP0

      NN1 or NP0

      NN1

      NN1-VVB

      NN1 or VVB

      NN1

      NN1-VVG

      NN1 or VVG

      NN1

      NN2-VVZ

      NN2 or VVZ

      NN2

      NP0-NN1

      NP0 or NN1

      NP0

      PNI-CRD

      PNI or CRD

      PNI

      PRP-AVP

      PRP or AVP

      PRP

      PRP-CJS

      PRP or CJS

      PRP

      VVB-NN1

      VVB or NN1

      VVB

      VVD-AJ0

      VVD or AJ0

      VVD

      VVD-VVN

      VVD or VVN

      VVD

      VVG-AJ0

      VVG or AJ0

      VVG

      VVG-NN1

      VVG or NN1

      VVG

      VVN-AJ0

      VVN or AJ0

      VVN

      VVN-VVD

      VVN or VVD

      VVN

      VVZ-NN2

      VVZ or NN2

      VVZ

      Total number of wordclass tags including punctuation and ambiguity tags = 91.

      Contents | Tagset | Nouns | Verbs | Adjectives | Adverbs | Articles/Determiners/Pronouns | Prepositions | Conjunctions | Numerals | Miscellaneous | Disambiguate tags | Disambiguate words


      Appearance of wordclass tags and citations

      To illustrate BNC wordclass tagging, we will show text examples in a format similar to the SGML contained in the corpus. The underlying grammatical tags and other structural markup (for example, paragraph and pause markers) are generally invisible when using concordancing software such as SARA, BNCWeb, and WordSmith.

      Each orthographic word in the corpus generally has its own wordclass tag, which appears in the form

      <w TAG>word

      followed by a single space. There are important exceptions in the case of contracted forms and multiword sequences. Spacing may also vary around punctuation tags (eg <c PUN>. ). The following excerpt from a file in the spoken part of the corpus illustrates the general format:

      <s n=1000> <w PNP>I <w VVB>mean <w DTQ>what <w PRP>about <w AV0>apparently <w PNP>we <w VVB>eat <w DT0>more <w NN1>chocolate <w CJS>than <w DT0>any <w AJ0>other <w NN1>country<c PUN>.

      For examples in this guide, we will retain just the POS-tag of the word (or words) in question. Under subordinating conjunctions, for instance, the citation above has been reduced to:

      ...apparently we eat more chocolate <w CJS>than any other country
      [G3U.1000]

      Citations: Examples taken from the BNC have the filename and line number appended in square brackets; [G3U.1000] means line 1000 of file G3U. In the Disambiguation Guide, section 3 and section 4, we also cite cases where the POS-tagging in the corpus does not match the tag given in the citation, in that it is either an error or an ambiguity tag. This is to give an idea of the contexts in which the resolution of ambiguities has been less reliable. We list the tag found in the corpus next to the file reference with an asterisk, eg. in Section 4 well we give the ideal tag as VVB, but the actual tag as AV0.

      Tears <w VVB>well up in my eyes.
      [BN3.5 *AV0]

      Note also that we occasionally use invented examples, rather than corpus citations, especially where a contrast between categories is being made.

      Appearance and tagging of contracted forms

      Contracted forms -- including enclitics, eg he's, she'll, negatives eg don't and can't, and 'fused words', eg wanna and gimme -- are broked down by the tagger into their component parts, with each part being assigned its own tag. No spaces are introduced in POS-tagged contracted words:

      could've = <w VM0>could<w VHI>'ve
      doesn't = <w VDZ>does<w XX0>n't
      dunno = <w VDB>du<w XX0>n<w VVI>no
      wanna = <w VVB>wan<w TO0>na --or-- <w VVB>wan<wAT0>na
      gimme = <w VVB>gim<w PNP>me

      This procedure sometimes results in strange-looking word divisions, particularly with the fused words. However, they do provide a ready means of comparison with the full forms, such as <w VVB>want <w TO0>to and <w VVB>give <w PNP>me.

      [ View list of contracted forms and associated tags ]

      Note that in the case of ain't it has been tricky to resolve the tag of the first part ( ai ) satisfactorily. Therefore in all contexts we have tagged this as an unclassified word, followed by the negative particle. Eg

      <w UNC>Ai<w XX0>n't got yours yet
      [KCT.1282]


      Appearance and tagging of multiwords

      The term `multiwords' denotes multiple-word combinations which function as one wordclass - for example, a complex preposition, an adverbial, or a foreign expression naturalised into English as a compound noun. To clarify which words form part of the multiword sequence, we highlight them in the guide in bold typeface, although in the BNC they will appear in normal typeface.

      <w AV0>of course (adverb)
      <w PRP>according to (preposition)
      <w NN1>persona non grata ('naturalised' compound noun)
      <w CJS>except that (conjunction)

      Because they function as one unit, only one tag is assigned to the multiword (at the beginning), and not to the individual component parts (contrast contracted forms, above). The spaces between the parts of the multiword are retained (again, compare contracted forms).

      [ View list of multiword forms and their associated tags ]

      Note that some multiwords can represent different categories according to context, e.g. in between in:

      The stage <w PRP>in between the original negative and the dupe is called an interpositive
      [FB8.295]

      The truth lies somewhere <w AV0>in between
      [ABK.2834]

      Moreover, sometimes it is more appropriate to tag a word combination as consisting of ordinary words than as a multiword sequence, as in the case of but for below:

      <w CJC>But <w PRP>for years now darkness has been growing
      [F99.2027]

      cf. which they would not have done <w PRP>but for the presence of the police.
      [H81.766]


      Words joined by the slash character

      Words which are joined together by a slash ( / ) but no whitespace, such as and/or, are not split up in tagged versions of the text.

      Examples

      A title <w CJC>and/or an author's name
      [H0S.358]

      You should be a graduate in <w AJ0>Electrical/Electronic Engineering, Physics , Mathematics , Computing or a related discipline .
      [CJU.1049]

      A time-space matrix for each <w UNC>rural/social/age group.
      [FR2.346]


      SECTION 2. INTRODUCTION TO WORD CLASSES

      Contents | Tagset | Nouns | Verbs | Adjectives | Adverbs | Articles/Determiners/Pronouns | Prepositions | Conjunctions | Numerals | Miscellaneous | Disambiguate tags | Disambiguate words


      NOUNS

      Basic tags:
      NN1, NN2, NN0 = common nouns
      NP0 = proper nouns

      Ambiguity tags:
      NN1-NP0, NN1-VVB, NN1-VVG, NN2-VVZ
      NP0-NN1, VVB-NN1, VVG-NN1, VVZ-NN2

      Common nouns

      Singular common nouns are tagged NN1, while plurals take NN2:

      A <w NN1>child.
      Several <w NN2>children
      An <w NN1>air of <w NN1>distinction
      Fifteen <w NN2>miles away

      Nouns such as fish, which is morphologically invariant for number, and government, which can take either a singular or plural verb, (so-called 'neutral for number') end in zero: NN0.

      Now the <w NN0>government is considering new warnings on steroids ...
      [K24.3057]

      ... the <w NN0>Government are putting people's lives in jeopardy.
      [A7W.518]

      I caught a <w NN0>fish.
      [KBW.316]

      I had caught four <w NN0>fish with hardly any effort
      [B0P.1387]

      We make no special distinction between common nouns that can be mass (or 'non-count') nouns (eg water, cheese), and other common nouns. All are tagged NN1 when singular and NN2 when plural:

      <w NN1>Cheese is a protein of high biological value.
      [ABB.1950]

      three <w NN2>cheeses.
      [CH6.7834]

      A <w NN1>car glistens in the <w NN1>distance.
      [HH0.1035]

      Three <w NN2>cars, two <w NN2>lorries and a <w NN1>motorbike!
      [CHR.290]

      Abbreviations

      In general we try to tag abbreviations for common nouns (and other word classes) as if they were written as full forms. Abbreviations for measurement nouns are generally tagged NN0 as they are invariant for number.

      Crewe are top of <w NN1>div 3 by 8 points
      [J1C.961]
      (where div = division)

      1 <w NN0>km
      400 <w NN0>km
      (km = 'kilometre' or 'kilometres')

      1 <w NN0>oz.
      6 <w NN0>oz
      (oz = 'ounce' or 'ounces')

      Numeral nouns

      Nouns such as hundred, hundreds, dozens, gross, are all tagged as numbers, CRD, rather than nouns.

       

      Proper nouns

      The tag NP0 ideally should denote any kind of proper noun, but in practice the open-endedness of naming expressions makes it difficult to capture all possible types consistently. We have confined its coverage mainly to personal and geographical names, and even within these, somewhat arbitrary borderlines have had to be drawn. Users of version 1 of the corpus should be aware of a few small but important changes in BNC2.

      (a) Personal names

      <w NP0>Sally
      <w NP0>Joe <w NP0>Bloggs
      <w NP0>Madame <w NP0>Pompadour
      <w NP0>Leonardo <w NP0>da <w NP0>Vinci

      (b) Geographical names

      <w NP0>London
      <w NP0>Lake <w NP0>Tanganyika
      <w NP0>New <w NP0>York

      (c) Also: days of the week; months of the year

      <w NP0>April
      <w NP0>Sunday

      Notes

      1. The distinction between singular and plural proper nouns is not indicated in the tagset, plural proper nouns being a comparative rarity.

        <w NP0>John <w NP0>Smith.
        All of the <w NP0>Smiths.

      2. Multiwords. As the examples in (a) and (b) above show, proper nouns are not processed as multiwords (even though there may be good linguistic reasons for doing so). Each word in such a sequence gets its own tag.

      3. Initials in names

        A person's initials preceding a surname are tagged NP0, just as the surname itself. The choice whether to use a space and/or full-stop between initials (eg J.F. or J. F. or J F or JF) is determined in the original source text; the tagged version follows the same format.

        John F. Kennedy = <w NP0>John <w NP0>F. <w NP0>Kennedy

        J. F. Kennedy = <w NP0>J. <w NP0>F. <w NP0>Kennedy

        J.F. Kennedy = <w NP0>J.F. <w NP0>Kennedy

        IMPORTANT NOTE: In the spoken part of the BNC, however, the components of names -- and, in fact, most words -- that are spelt aloud as individual letters, such as I B M, and J R in J R Hartley, are not tagged NP0 but ZZ0 (letter of the alphabet). See below

      4. Nouns of style

        Preceding a proper noun, or sequence of proper nouns, style (or title) nouns with uppercase initial capitals are tagged NP0.


        <w NP0>Pastor <w NP0>Tokes
        <w NP0>Chairman <w NP0>Mao
        <w NP0>Sub-Lieutenant <w NP0>R <w NP0>C <w NP0>V <w NP0>Wynn
        <w NP0>Sister <w NP0>Wendy

        Contrast: You remember your <w NN1>sister <w NP0>Wendy... [HGJ.800]
        where Wendy is in apposition to a common noun sister, in lowercase letters.

      5. Geographical names

        For names of towns, streets, countries and states, seas, oceans, lakes, rivers, mountains and other geographical placenames, the general rule is to tag as NPO. (If the word the precedes, it is tagged AT0, as normally.)

        <w NP0>East <w NP0>Timor
        <w NP0>South <w NP0>Carolina
        <w NP0>Baker <w NP0>Street
        <w NP0>West <w NP0>Harbour <w NP0>Lane
        <w AT0>the <w NP0>United <w NP0>States
        <w AT0>the <w NP0>United <w NP0>Kingdom
        <w AT0>the <w NP0>Baltic
        <w AT0>the <w NP0>Indian <w NP0>Ocean
        <w NP0>Mount <w NP0>St <w NP0>Helens
        <w AT0>the <w NP0>Alps

        Ordinary (non-NP0) tags are applied to more verbose (especially political) descriptions of placenames, or those that are not typically marked on maps. (As above, the preceding word the is optional.)

        <w AJ0>Latin <w NP0>America
        <w AJ0>Western <w NP0>Europe
        <w AT0>the <w AJ0>Western <w NN1>Region
        <w AT0>the <w AJ0>Soviet <w NN1>Union
        <w AT0>the <w NN0>People<w POS>'s <w NN1>Republic <w PRF>of <w NP0>China
        <w AT0>the <w AJ0>Dominican <w NN1>Republic
        <w AT0>the <w NN1>Sultanate <w PRF>of <w NP0>Oman

        The examples show a little arbitrariness in application, for example with United States counting, and Soviet Union not counting as proper nouns. (Also: <w AT0>the <w AJ0>ex-Soviet <w NN1>Union [KJS.28])

        NB. Multiple-word names containing a compass point, ie. those beginning North, South, East, West, North East, South-west etc. nearly always become NP0, whereas those with Northern, Southern, Eastern, Western follow the non-NP0 pattern. Rare exceptions are:

        <w NP0>Northern <w NP0>Ireland
        <w NP0>Western <w NP0>Samoa

      6. Non-personal and non-geographical names

        -- including eg names of organisations, sports teams, commercial products (incl newspapers), shops, restaurants, horses, ships etc.

        When such names consist of ordinary words (common nouns, adjectives etc.), they receive ordinary tags (NN1, AJ0 etc.)

        Where a word as part of a name is an existing NP0 (typically a personal or geographical name), or a specially-coined name, it is tagged NP0. Examples:

        1. Organisations, sports teams etc.
          Ordinary tags Tagged NP0
          <w NN1>Cable <w CJC>and <w NN1>Wireless <w NP0>Procter <w CJC>and <w NP0>Gamble
          <w NN1>Acorn <w NN1>Marketing <w AJ0>Limited <w NP0>Minolta; <w NP0>IBM; <w NP0>NATO
          <w NP0>Wolverhampton <w NN2>Wanderers ( <w NN1>football <w NN1>club ) <w NP0>Tottenham <w NP0>Hotspur ( <w NN1>football <w NN1>club )
          <w AT0>The <w NP0>Chicago <w NN2>Bears <w NP0>Spartak <w NP0>Moscow
          <w NN1>World <w NN1>Health <w NN1>Organisation <w NP0>Oxfam

          There is a slight inconsistency here, in that acronyms of organisation names (WHO, NATO, IBM etc.) take NP0, whereas the expanded forms of these names take regular tags.

        2. Products (including newspapers and magazines).
          Ordinary tags Tagged NP0
          <w NN2>Windows <w NN1>software <w NP0>Weetabix
          <w NP0>Lancashire <w NN1>Evening <w NN1>Post <w NP0>Mars <w NN2>bars
          <w NN1>Time <w NN1>Magazine <w NP0>Scotchgard
          <w AT0>The <w NN1>Reader<w POS>'s <w NN1>Digest <w NP0>Perrier <w NN1>water

          Company names may sometimes be used to represent product names; in such cases the same tags apply. For example:

          John drives a <w NP0>Volkswagen <w NN1>Golf.

          John drives a <w NP0>Volkswagen.

        3. Shops, pubs, restaurants, hotels, horses, ships etc.
          Ordinary tags Tagged NP0
          <w NN1>Body <w NN1>Shop <w NP0>Mothercare
          <w AT0>The <w AJ0>Grand <w NN1>Theatre <w NP0>Sainsburys <w NN1>supermarket
          <w AT0>The <w NN1>King<w POS>'s <w NN2>Arms <w AT0>The <w NP0>Ritz
          <w AJ0>Red <w NN1>Rum <w NP0>Aldaniti
          <w AT0>The <w NN1>Bounty <w AT0>The <w NP0>Titanic

          Here again NP0 is reserved for parts of names that are specially coined, or derived from existing personal/geographical proper nouns.

      7. Changes in NP0 assignment since BNC1

        In the first release of the BNC, the use of NP0 tags applied a little more widely. The geographical category tagged NP0 used to include names of buildings and other institutions. Names of newspapers and magazines used to be treated separately from other products and tagged NP0.

        NOTE THAT IN BNC2 BOTH THESE TYPES NOW TAKE ORDINARY (non-NP0) TAGS:

        • Buildings and institutions

          BNC1:
          <w NP0>Blackpool <w NP0>Tower
          <w NP0>Prospect <w NP0>Theatre <w NN0>Company
          <w NP0>Austro-Hungarian <w NP0>Empire

          BNC2:
          <w NP0>Blackpool <w NN1>Tower
          [B22.1633]
          <w NN1>Prospect <w NN1>Theatre <w NN1>Company
          [A06.1962]
          <w AJ0>Austro-Hungarian <w NN1>Empire
          [G3B.617]

        • Newspapers and magazines

          BNC1:
          <w AT0>the <w NP0>Daily <w NP0>Mail
          <w NP0>Railway <w NP0>Gazette

          BNC2:
          <w AT0>the <w AJ0>Daily <w NN1>Mail
          [D95.334]
          <w NN1>Railway <w NN1>Gazette
          [HWM.1860]

      Contents | Tagset | Nouns | Verbs | Adjectives | Adverbs | Articles/Determiners/Pronouns | Prepositions | Conjunctions | Numerals | Miscellaneous | Disambiguate tags | Disambiguate words


      VERBS

      Basic tags:
      VBB VBD VBG VBI VBN VBZ = forms of be
      VDB VDD VDG VDI VDN VDZ = forms of do
      VHB VHD VHG VHI VHN VHZ = forms of have
      VM0 = modal verbs
      VVB VVD VVG VVI VVN VVZ = lexical verbs

      Ambiguity tags:
      VVB-NN1 VVD-VVN VVD-AJ0 VVG-AJ0 VVG-NN1 VVZ-NN2 = verb more probable
      NN1-VVB VVN-VVD AJ0-VVD AJ0-VVG NN1-VVG NN2-VVZ = verb less probable

      1. Inflection is marked by the third character in the tag.

        --B base form finite

        --D past tense

        --Z 3rd person sing present

        --N past participle

        --I infinitive

        --G present participle

      2. All forms of BE, HAVE and DO receive tags beginning VB-, VH- and VD- respectively.
        Auxiliary and main uses of these verbs are not distinguished.

        she <w VBZ>is playing her best tennis for six years.
        [CH3.1383]

        she <w VBZ>is just a star.
        [CH3.6940]

        John <w VHZ>has built a set of bookshelves.
        [C9X.121]

        John <w VHZ>has great courage.
        [CA9.1941]

        We <w VDD>did<w XX0>n't see anybody.
        [KB2.702]

        They <w VDB>do nice work.
        [ANY.514]

        Note the variant form of have in non-standard English:

        they shouldn't <w VHI>of left it the last minute
        [KD8.7289]

        That could <w VHI>of been 'bout us
        [B38.322]

      3. Lexical verbs

        Tags beginning VV- apply to all other (lexical) verbs.

        She <w VVZ>travels in every Saturday morning.
        [KRH.4013]

        The young kids <w VVB>want to <w VVI>dance and have fun
        [CHA.1600]

        I <w VVD>thought he <w VVD>looked a sad sort of a boy.
        [CDY.2831]

        ...after <w VVG>running out of coal, the crew were <w VVN>forced to <w VVI>burn timber and resin
        [HPS.270]

      4. Modals

        All modals are tagged VM0. We do not differentiate between so-called past and present forms:

        We <w VM0>can go there.

        We <w VM0>could go there.

        We <w VM0>used <w TO0>to go there every year.

        The form let's is treated as one verb:

        <w VM0>Let's <w VVI>go!
        [A61.1443]

      5. Contracted forms (can't, won't, gimme, dunno etc) are split into their component parts, which are tagged individually.

        <w VBB>Are<w XX0>n't you coming?
        [A0R.2215]

        I <w VDB>du<w XX0>n<w VVI>no
        [KR0.23]

        It is not always clear if and where they should be divided. Please refer to the list of contracted forms. (See also above on appearance of contracted forms)

      6. Note, in addition, that no special tags apply for the following:

        • Subjunctives and Imperatives. (Both take V-B tags)

          She suggested that they <w VVB>get married.
          [CBC.12107]

          Please <w VBB>be patient.
          [CHJ.901]

          <w VDB>Do<w XX0>n't just stand there watching!
          [ACB.3470]

        • Catenative or semi-auxiliary verbs
          such as going to, ought to, and used to + infinitive

          we're <w VVG>going <w TO0>to get killed.
          [HNN.445]

          you <w VM0>ought <w TO0>to let them know.
          [KCT.6117]

      See further - Disambiguation Guide
      Section 3 Adjective vs. Participle (AJ0 vs. VVG and AJ0 vs. VVN)
      Section 4 's

      Contents | Tagset | Nouns | Verbs | Adjectives | Adverbs | Articles/Determiners/Pronouns | Prepositions | Conjunctions | Numerals | Miscellaneous | Disambiguate tags | Disambiguate words


      ADJECTIVES

      Basic tags:
      AJ0 AJC AJS
      Ambiguity tags:
      AJ0-NN1, AJ0-VVG, AJ0-VVN = adjective more probable
      NN1-AJ0, VVG-AJ0, VVN-AJ0 = adjective less probable

      1. General adjectives (AJ0)

        AJ0 is the general tag for adjectives. It subsumes:

        • Predicative and attributive uses:

          The ground was <w AJ0>dry and <w AJ0>dusty
          [GWA.118]

          The dust from the <w AJ0>dry ground
          [GWA.121]

        • Quasi-comparatives and quasi-superlatives

          Adjectives which have a heightening or downtoning effect rather like that of comparatives and superlatives, but which do not behave syntactically like comparatives or superlatives, are treated as ordinary adjectives. Examples include utter, upper and uppermost, which are acceptable in these examples

          Events in Eastern Europe were still <w AJ0>uppermost in Mr Li's mind.
          [A95.366]

          Family contacts were very important in uniting the <w AJ0>upper classes
          [FB6.1495]

          BUT not these:

          * It was an utter shambles than I have ever seen.
          (cf It was a worse shambles than I have ever seen.)

          * The salmon pool is upper than the dam.
          (cf The salmon pool is lower than the dam.)

        • Adjectives used catenatively, namely able and unable

          Will you be <w AJ0>able to manage? (catenative)

          Your son is very <w AJ0>able (non-catenative)

      2. Comparative adjectives receive the tag AJC; superlatives take AJS.

        A <w AJC>faster car.

        The <w AJS>best in its class.

      Ambiguities frequently arise between adjectives and other worclasses, in particular adverbs, nouns and participles.

      See further - Disambiguation Guide
      Section 3 ADJECTIVE vs. PARTICIPLE (AJ0 vs. VVG, AJ0 vs. VVN)
      Section 3 ADJECTIVE vs. NOUN (AJ0 vs. NN1)
      Section 3 ADJECTIVE vs. ADVERB (AJ0 vs. AV0, AJC vs. AV0)
      Section 4 well, right

      Contents | Tagset | Nouns | Verbs | Adjectives | Adverbs | Articles/Determiners/Pronouns | Prepositions | Conjunctions | Numerals | Miscellaneous | Disambiguate tags | Disambiguate words


      ADVERBS

      Basic tags:
      AV0, AVQ, AVP
      Ambiguity tags:
      AV0-AJ0 = adverb more probable; AJ0-AV0 = adverb less probable

      1. AV0 is the default tag for adverbs. It incorporates a very mixed bag, including

        • adverbs of time, manner, place etc. Eg slowly; here; soon

        • degree adverbs. Eg very and rather in

          <w AV0>very <w AJ0>tall

          <w AV0>rather <w AV0>painfully

        • sentence adverbs (eg

          <w AV0>However,

          <w AV0>In addition

        • postnominal adverbs. Eg

          during 1986-91 <w AV0>inclusive
          [FT0.1400]

          Diamonds <w AV0>galore
          [FPH.900]

        • discourse markers, notably well, right, like.

          you know <w AV0>like, it's worthwhile opening a cinema at 4 o'clock...
          [F7A.358]

      2. Note that adverbs, unlike adjectives, are not tagged as positive, comparative, or superlative. This is because of the relative rarity of comparative and superlative adverbs.

      3. Ordinal-type adverbs are treated separately with the ORD tag

      4. Prepositional Adverb (also known as "Adverbial Particle") AVP - see Prepositions

      5. Interrogative and relative wh-adverbs (when, where, how, why, wherever)

        The same tag, AVQ, is applied to these adverbs, whether the word occurs in interrogative or relative use.

        "<w AVQ>When do your courses start?"
        [A0F.3117]

        "...if you let me know <w AVQ>when the police are called in."
        [BMU.2291]

        Yet <w AVQ>why is that so?
        [CR7.3089]

      See further - Disambiguation Guide
      Section 3 ADVERB vs. ADJECTIVE (AV0 vs. AJ0, AV0 vs. AJC )
      Section 3 DETERMINER-PRONOUN vs. ADVERB
      Section 3 ADVERB vs. PREPOSITION
      Section 4 about, as, but, like, little, much, no, right, so, well, when

      Contents | Tagset | Nouns | Verbs | Adjectives | Adverbs | Articles/Determiners/Pronouns | Prepositions | Conjunctions | Numerals | Miscellaneous | Disambiguate tags | Disambiguate words


      ARTICLES, DETERMINERS & PRONOUNS

      Basic tags:
      AT0 = article
      DPS DT0 DTQ = determiner-pronoun
      PNP PNI PNQ PNX = pronoun only
      Ambiguity tags: none

      1. Articles, tagged AT0

        Articles are defined here as determiner words which typically begin a noun phrase, but which cannot occur as the head of a noun phrase. Examples: a/an, the, no and every

        Have <w AT0>a break

        <w AT0>Every year

        There's <w AT0>no time

      2. Determiner-Pronoun: DT0

        Recognising that there is a high degree of formal and functional overlap between determiners and pronouns, we have conflated under the D-- heading words that are capable of either function, such as that, few, both, another. Examples:

        at <w DT0>all times of the day
        [A7P.1196]

        free secondary education for <w DT0>all
        [ECB.1610]

        <w DT0>Few diseases are incurable
        [GV1.1130]

        for the benefit of the <w DT0>few
        [HHX.10188]

        DTQ is the wh- (interrogative) determiner-pronoun (and also relative pronoun - see below). Which and what are always tagged DTQ

        <w DTQ>Which country do you live in?
        [A7N.979]

        And she didn't say <w DTQ>which?
        [KCF.352 ]

        <w DTQ>What time is it?
        [A0N.406]

        DPS is the prenominal possessive pronoun (my, your, etc). Eg

        <w DPS>my hat

        Compare the nominal use:

        That is your way. This is <w PNP>mine
        [A0N.726-7]

        [ View list of Determiner-Pronoun tagged words and compounds. ]

      3. `Pronoun-only' words

        Tags beginning P-- indicate pronouns which do not share the determiner function, for example I, it , anyone. Pronouns are differentiated according to whether they are:

        • personal (PNP), eg I, him, they, us. Note also: it is included here.

        • reflexive personal (PNX), eg herself, themselves

        • indefinite pronouns (PNI-), anyone, everything, nobody (all PNI).

        • interrogative (PNQ), eg who, whoever

        [ View list of Pronoun-only words and compounds ]

      4. Relative pronouns

        Which as a relative (or interrogative) pronoun is grouped with the other determiner-pronouns, and tagged DTQ.

        Give 4 details <w DTQ>which should appear on an order form
        [HBP.417]

        Meanwhile, that as a relative clause complementizer is treated with that as a complement clause complementizer, and tagged CJT

        I got some currants <w CJT>that are left over
        [KST.3734]

        this girl <w CJT>that Claire knows
        [KC7.1101]

        He dismissed reports <w CJT>that his party was divided over tactics
        [A28.11]

        We both knew <w CJT>that enough was enough.
        [FEX.268]

        Note, however, that that takes the tag DT0 when it functions as a demonstrative pronoun or determiner:

        Look at <w DT0>that bear!
        [KP8.1547]

        I guess I was sad about <w DT0>that.
        [BMM.239]

      For D-- tagged words, the main source of ambiguity is between determiners and adverbs. See Disambiguation Guide
      Section 3: DT0 vs. AV0 (illustrated by more and less) and
      Section 4: much; no; that

      Contents | Tagset | Nouns | Verbs | Adjectives | Adverbs | Articles/Determiners/Pronouns | Prepositions | Conjunctions | Numerals | Miscellaneous | Disambiguate tags | Disambiguate words


      PREPOSITIONS AND PREPOSITIONAL ADVERBS

      Basic tags:
      PRP PRF AVP
      Ambiguity tags:
      PRP-AVP = Prep more probable; AVP-PRP = Prep less probable

      1. Prepositions

        Most prepositions are tagged PRP, including a large number of complex prepositions (shown in bold here). Examples

        <w PRP>at the Pompidou Centre <w PRP>in Paris
        [A04.325]

        I use humour <w PRP>as a protection
        [FBL.363]

        Heard <w PRP>about this have you?
        [KE6.9557]

        <w PRP>According to ancient tradition, ...
        [A04.784]

        Many disputes are dealt with by bodies <w PRP>other than courts.
        [F9B.4]

        Nice walls and a big sky to look <w PRP>at.
        [A25.122]

      2. The preposition of is assigned a special tag PRF because of its frequency and its almost exclusively postnominal function. Examples

        a couple <w PRF>of cans <w PRF>of Coke
        [AJN.283]

        DNA consists <w PRF>of a string <w PRF>of four kinds <w PRF>of bases
        [AE7.107]

        NB. Numerous multiwords contain of, eg in front of, in light of, by means of, etc.

        [ View list of Preposition-tagged words and compounds ]

      3. Prepositional adverbs/particles

        Preposition-type words which have no complement are tagged AVP. Typical uses of AVP are in phrasal verb constructions, or when it functions as a place adjunct. e.g.

        We gave <w AVP>up after two hours.
        [KSV.1029]

        there were a lot of horses <w AVP>around.
        [HR7.3105]

        The following is a list of possible AVP words:

        'bout about along around back by down in off
        on out over round through thru to under up

        Of the above list, all except back allow also a prepositional reading. Thus there are many instances of ambiguity between PRP and AVP. See further - Disambiguation Guide:

      Section 3 Preposition vs. Prepositional Adverb vs. Locative Adverb (PRP vs. AVP vs. AV0)
      Section 4 but, about

      Contents | Tagset | Nouns | Verbs | Adjectives | Adverbs | Articles/Determiners/Pronouns | Prepositions | Conjunctions | Numerals | Miscellaneous | Disambiguate tags | Disambiguate words


      CONJUNCTIONS

      Basic tags: CJC CJS CJT
      Ambiguity tags: CJS-PRP PRP-CJS

      1. The tag CJC (and, or, but, nor) denotes coordinators.

        Fish <w CJC>and chips

        James laughed <w CJC>and spilled wine.
        [A0N.136]

        She was paralysed <w CJC>but she could still feel the pain.
        [FLY.536]

      2. CJS denotes subordinators in:

        • Adverbial clauses (of time, reason, condition etc.)

          "<w CJS>When you 've done it , you should go home,"
          [CRE.949]

          I still stayed there <w CJS>after I heard the shooting
          [HW8.3264]

          <w CJS>As you may know Scorton will again enter the Best Kept Village competition in 1992
          [HPK.768]

          Do send me an interim copy <w CJS>as soon as you can
          [HD3.69]

          If it's wet just take your time.
          [KCL.554]

        • Comparative clauses, introduced by than or as. These can occur with or without ellipsis.

          It was worse <w CJS>than she could have imagined.
          [CH0.1315]

          ...apparently we eat more chocolate <w CJS>than any other country.
          [G3U.1000]

          "it's as good <w CJS>as it's going to get."
          [K9K.199]

          make the transporter as light <w CJS>as possible.
          [CA1.1114]

        • Nominal wh-clauses containing whether or if

          Can you tell me <w CJS>whether ivies do damage trees.
          [C9C.726]

      3. CJT applies to that-clauses, introducing reported speech and thought, and also relative clauses

        Historians knew <w CJT>that this was nonsense.
        [G3C.363]

        China announced <w CJT>that it was ending martial law in the Tibetan capital Lhasa .
        [KRU.95]

        The problem <w CJT>that he was having was <w CJT>that she was his legal wife 's sister
        [HE3.210]

        [ View list of Conjunction words and compounds ]

      See further - Disambiguation Guide
      Section 4 as, so, that

      Contents | Tagset | Nouns | Verbs | Adjectives | Adverbs | Articles/Determiners/Pronouns | Prepositions | Conjunctions | Numerals | Miscellaneous | Disambiguate tags | Disambiguate words


      NUMERALS

      Basic tags:
      CRD ORD
      Ambiguity tags:
      CRD-PNI, PNI-CRD

      1. Cardinal numbers, numeral nouns, fractions and so on
        take the tag CRD, whether they are written as words or numerals, and whether functioning nominally or prenominally. Examples:

        <w CRD>5 out of <w CRD>10
        [CGM.525]

        <w CRD>one striking feature of the years <w CRD>1929-31
        [A6G.134]

        his <w ORD>first innings, when he scored <w CRD>forty-two, with <w CRD>seven <w CRD>fours

        <w CRD>Hundreds of people audition each year
        [K1S.2241]

        About a <w CRD>dozen.
        [H2U.5182]

      2. Ordinal numbers are assigned ORD in all syntactic positions, including adverbial positions, as in

        We only came <w ORD>fourth in the county championship <w ORD>last year
        [EDT.1629]

        NOTE: ORD is also assigned to the less overtly numeric words like next and last, even in clear adverbial, adjectival or nominal contexts. This is because next and last function like ordinals both syntactically and semantically.

      3. Currency and measurement expressions

        Measurement expressions, consisting of numbers and a unit of measurement of some kind (together as one word), are assigned a noun tag, usually NN0 (neutral for number) or NN2 (plural):

        <w NN0>6kg

        <w NN0>&pound;600

        <w NN0>12.5%

        <w NN2>12&ins; ( = 12 inches)

      4. Other mixtures of numeric and alphabetic characters are assigned UNC (formulaic) tags

        Figure <w UNC>2b
        [FTC.248]

        Serial no. <w UNC>S835508
        [C9H.2284]

        <w UNC>A4 sheet of paper
        [CN4.296]

        Mark drove home along the <w UNC>M1
        [AC2.2210]

      The main ambiguity in this category is between one functioning as a cardinal number (CRD) and as a pronoun (PNI).

      Contents | Tagset | Nouns | Verbs | Adjectives | Adverbs | Articles/Determiners/Pronouns | Prepositions | Conjunctions | Numerals | Miscellaneous | Disambiguate tags | Disambiguate words


      MISCELLANEOUS OTHER TAGS

      The following tags are included here: EX0 | ITJ | POS | TO0 | XX0 | ZZ0 | UNC

      1. EX0 = existential there

        In its existential use there does not carry any real meaning: it merely states that something exists or existed. It occurs at the beginning of a clause and is usually followed by the verb be and an indefinite noun phrase; for example

        <w EX0>There was a long long pause in which nothing at all happened
        [H80.3991]

        Waiter! Waiter! <w EX0>There's an awful film on my soup!
        [CHR.657-9]

        <w EX0>There appears to be little alternative
        [ECE.2139]

        Compare this with there when it has a clear locative meaning ('in/to that place'):

        Don't stand <w AV0>there grinning like a stuck pig
        [C85.1553]

      2. ITJ = interjection.

        <w ITJ>Hello, Nell.

        <w ITJ>Oi - come here!

        <w ITJ>Yes , <w AV0>please do

        <w ITJ>No <w XX0>not <w AV0>yet

        For the distinction between ITJ and the unclassified tag, UNC, see section 3: INTERJECTION vs. UNCLASSIFIED.

      3. POS = genitive morpheme 's (singular) or ' (plural after an s), eg

        <w NN1>teacher<w POS>'s pet
        <w NN2>teachers<w POS>' pet

        Note the lack of space between the noun and the following POS, as 's is tokenized in the same way whether it represents a genitive or a contracted verb. See further on tagging of 's in Section 4.

      4. TO0 = the infinitive marker. This includes elliptical uses.

        "Do you want <w TO0>to talk about it?"
        [EFG.1935]

        In the summer holidays I can , I can get up early if I want <w TO0>to .
        [KPG.4204]

        Note the morphological variation of to in the following colloquial forms:

        We <w VVN>got<w TO0>ta go

        We <w VVB>wan<w TO0>na stay.

      5. UNC is the tag for unclassified words. It is applied in contexts where no other wordclass tag seems appropriate, including

        For the distinction between UNC and ITJ see section 3, INTERJECTION vs. UNCLASSIFIED.
        See also features of spoken corpus tagging.

      6. XX0 is the tag for the negative particle not, and also for its contracted or fused form, eg

        Brown <w VDD>did<w XX0>n't see it that way.
        [A6W.338]

        no, that is <w XX0>not correct.
        [JK0.257]

      7. ZZ0 = letter of the alphabet: A, X, x, p, r

        ZZ0 vs. NP0 vs. CRD.
        ZZ0 is the default tag for a single letter of the alphabet.

        If however, the letter clearly represents a separate word, or an abbreviation of a separate word, we have tried to assign the appropriate POS-tag for the full form of that word, rather than ZZ0.

        Examples:

      Contents | Tagset | Nouns | Verbs | Adjectives | Adverbs | Articles/Determiners/Pronouns | Prepositions | Conjunctions | Numerals | Miscellaneous | Disambiguate tags | Disambiguate words


      DISAMBIGUATION GUIDE

      The following is a guide to resolution of the most common tagging ambiguities. It states the principles by which we have drawn the line between the "correct" and the "incorrect" assignment of a tag in particular contexts (as applied in the report on tagging error rates.) Note that in the next section and section 4, Disambiguation by word, we also cite examples where the POS-tagging in the corpus is less reliable and does not match that given for the citation. In such cases we append the actual tag in the corpus to the file reference with an asterisk. Eg. under Adjective vs Adverb (next section), the preferred tag for long is AV0, but the actual tag is ambiguous AV0-AJ0.

      You're not supposed to keep medicine that <w AV0>long.
      [H8Y.1976 *AV0-AJ0]

      Note also that in this section we use a number of invented examples (in addition to corpus citations) to clarify the distinction between categories.

      SECTION 3. DISAMBIGUATION BY TAG PAIR

      Contents | Tagset | Intro to Wordclasses | Adjective vs. Adverb | Adjective vs. Noun | Determiner-Pronoun vs. Adverb | Adjective vs. Participle | Prep vs. Prep Adverb | Interjection vs. Unclassified | Disambiguate words

      ADJECTIVE vs. ADVERB

      After a verb or an object, there is sometimes a difficult choice between AJ0 and AV0, or between AJC and AV0. e.g.:

      We arrived <w AJ0>tired, but <w AJ0>safe
      [CCP.530]

      Here, both tired and safe are AJ0. The main test is to see whether one can express the relation between these words and their logical subjects using the verb be: They arrived tired but safe implies 'They were tired but safe'. The word tagged AJ0 refers to a property of a noun, rather than to a property of an event or situation. Contrast:

      Peter sang out <w AV0>loud and <w AV0>clear.

      This sentence does not imply that Peter was loud and clear, but is more or less equivalent to Peter sang out loudly and clearly. It means that his singing was loud and clear.

      It follows that when, in colloquial English, a word which we normally expect to be an adjective is used as an adverb, we should tag it AV0; e.g:

      You did <w AV0>great though.
      [HH0.3248 *AV0-AJ0]

      Here is another pair of examples, where the AJ0/AV0 word follows an object:

      everyone below 25 grew their hair too <w AJ0>long.
      [ARP.592 *AV0-AJ0]
      (i.e. 'their hair was too long'.)

      Try not to keep her too <w AV0>long.
      [FAB.3620 *AV0-AJ0]
      (i.e. NOT 'she will be too long.')

      Also note the similar distinction between AJC and AV0:

      We can make this piece <w AJC>higher if you want to.
      [BNG.2270]

      You should aim <w AV0>higher
      [ACN.985 *AJC]

      and between AJS and AV0:

      Delgard thought it <w AJS>best to leave the subject alone.
      [FS4.1559]

      BUT: I liked the cartoons <w AV0>best.
      [CAM.194]

      Contents | Tagset | Intro to Wordclasses | Adjective vs. Adverb | Adjective vs. Noun | Determiner-Pronoun vs. Adverb | Adjective vs. Participle | Prep vs. Prep Adverb | Interjection vs. Unclassified | Disambiguate words


      ADJECTIVE vs. NOUN

      There are many words in English which can be tagged either adjective (AJ0) or noun (NN1). Colour words like black, white and red are fairly consistent in allowing the two tags, and may be used to illustrate the difference. In attributive (premodifying) or predicative (complementing) positions without further modification these words are normally adjectives: a <w AJ0>white screen, The screen is <w AJ0>white. When the word is the head of a noun phrase, on the other hand, it is a noun: <w NN1>Red is my favourite colour. They painted the wall a brilliant <w NN1>white.

      Sometimes a word cannot be used predicatively as an adjective, but can occur attributively in a way which suggests adjectival use. For example, past and present are adjectives in

      (1) All <w AJ0>past and <w AJ0>present employees of the branch are invited.
      [K99.217]

      We do not find present, etc. being used as predicative adjectives, however:

      (2) *These needs are past, present, and future.

      (Note that present can be used as a predicative adjective meaning the opposite of 'absent'; but this meaning is not comparable to the temporal meanings of past, present and future above.)

      Contrast (1) above with cases where past, present etc. are heads of noun phrases, e.g. following the definite article, and are clearly nouns:

      "You're living in the <w NN1>past."
      [HGS.1045]

      "I don't even want to think about the <w NN1>future."
      [JY4.2864]

      The only reason for treating past and present in example (1) above as adjectives is that they have an institutionalized meaning as modifiers, which is rather different from the meaning they have as nouns. Further examples of this type are words such as model in model behaviour, giant in a giant caterpillar and vintage in vintage cars.

      Words ending in -ing are a particular problem: when they premodify a noun, they can be tagged either NN1 (noun) or AJ0 (adjective). Contrast:

      new <w NN1>spending plans
      [CEN.5922]

      a <w AJ0>working mother
      [ED4.153]

      his <w NN1>reading ability
      [CFV.1903]

      in the <w AJ0>coming weeks
      [HKU.1341]

      The guideline is as follows. If X-ing + Noun is equivalent in meaning to 'Noun who/which X-es / X-ed / BE + X-ing', then X-ing is an adjective (AJ0). That is, a word ending -ing is an adjective when it is the notional subject of the noun it premodifies. E.g.:

      two <w AJ0>smiling children ('two children who are/were smiling')
      [HTT.743]

      In other cases, X-ing is generally a noun (NN1). In such cases, it is often possible to paraphrase X-ing + Noun by a more explicit phrase in which X-ing is clearly a noun:

      new <w NN1>spending plans ('new plans for spending')

      his <w NN1>reading ability ('his ability in reading')

      Further examples:

      a <w AJ0>mating animal
      [G08.2142]

      the <w NN1>mating game
      [ECG.336 *AJ0-NN1]

      a <w AJ0>falling rate of unemployment
      [KR2.2129]

      <w NN1>slimming tablets.
      [KCA.941 *NN1-VVG]

      Contents | Tagset | Intro to Wordclasses | Adjective vs. Adverb | Adjective vs. Noun | Determiner-Pronoun vs. Adverb | Adjective vs. Participle | Prep vs. Prep Adverb | Interjection vs. Unclassified | Disambiguate words


      DETERMINER-PRONOUN vs. ADVERB (DT0 vs. AV0)

      More and less can be assigned to either of the tags DAR or AV0. The difference between them is that DT0 is for noun-phrase-like (and determiner-like) uses of the word in question, whereas AV0 is for adverbial uses. The two can be hard to distinguish, particularly after a verb:

      (a) You should relax <w AV0>more.

      (b) You should spend <w DT0>more.

      Since relax is an intransitive verb in (a), more cannot be a noun phrase following it. Instead, more can be paraphrased roughly as 'to a greater extent' or 'to a greater degree'. On the other hand, spend in (b) is a transitive verb, and so more is a determiner-pronoun form following it. As confirmation of this, note that sentence (b) could be turned into a passive with more as subject: More should be spent.... There are unfortunately some verbs for which the distinction is less clear than in the above examples, e.g.:

      You should eat more. You should read more. You should smoke less.

      In these cases, the verb may be used transitively or intransitively with almost identical meanings, so that the syntactic structures of the immediate and/or surrounding context are the only clues as to which is the case:

      Do you smoke? (Intransitive)

      How many do you smoke in a week? (Transitive)

      Contrast (c) and (d) below:

      (c) At the moment we have 23 fixtures per season. Personally, I would rather play <w DT0>more.

      (d) You should work less and play <w AV0>more.

      (In (d) the adverb more has roughly the meaning of 'more often'.)

      Note. The automatic disambiguation of determiners and adverbs is not reliable, because transitivity has not been encoded in the tagger. Sentences like (c) and (d), where more follows the verb at end of a sentence, are invariably tagged AV0.

      Contents | Tagset | Intro to Wordclasses | Adjective vs. Adverb | Adjective vs. Noun | Determiner-Pronoun vs. Adverb | Adjective vs. Participle | Prep vs. Prep Adverb | Interjection vs. Unclassified | Disambiguate words


      ADJECTIVE vs. PARTICIPLE (AJ0 vs. VVG and AJ0 vs. VVN)

      Another area of borderline cases is the tagging of words as adjectives (AJ0) or as participles (VVG or VVN).

      1. In both cases, the word can be an AJ0. One test is to see whether a degree adverb like very can be inserted in front of the word: e.g. in We were very surprised, surprised is an AJ0.

      2. Another test, having the opposite effect, is to see whether there is an agent by-phrase following the word in -ed or -en. If so it is a VVN: e.g. We were <w VVN>surprised by pirates. Even where it is not present, the possibility of adding the by-phrase, without changing the meaning of the word, is evidence in favour of VVN. (However, this criterion can clash with the preceding one - since it occasionally happens that an -ed word is both preceded by an adverb like very AND followed by a by-phrase: E.g. I was so irritated by his behaviour that I put the phone down. When these do occur, we give preference to AJ0.)

      3. A third test is negative: to see whether the word in question can be placed before a noun. e.g.:

        The effect is <w AJ0>lasting (compare a <w AJ0>lasting effect).

        The door is <w AJ0>locked (compare the <w AJ0>locked door.)

        This shows that lasting or locked can easily be (but need not be) an AJ0. If the word could not be placed (with the same meaning) before the noun, this would be evidence that the word is a participle.

      4. Even though an -ing word is normally a VVG after the verb be, it is generally treated as an AJ0 before a noun:

        The man was <w VVG>dying.
        [HTM.1494 *VVG-AJ0]

        BUT: the <w AJ0>dying man.
        [FSH.606]

      5. However, when the -ing or -ed forms part of a premodifying phrase, the VVG or VVN tag is preferred:

        an <w NN1>interest <w VVG>earning account

        a <w NN1>hypothesis <w VVN>driven approach

        In these examples the NN1+VVG/VVN sequence has the character of a premodifying adjective compound. We can therefore imagine the two words bracketed together forming an adjective: an [AJ0 interest-earning AJ0] account. But within the adjective, the VVG and VVN tags retain their verbal character, with the initial noun acting as object of the verb (cf. the account earns interest).

        The same applies when the premodifying compound phase is noun-like:

        a [ <w NN1>shanty <w VVG>singing ] competition
        [K4W.2952]

      6. If the verb be can be replaced by another verb such as seem or become, without changing the meaning of the following AJ0 / VVN word, this is a strong indication that the construction is not properly a passive, and that the word is an AJ0:

        The building was <w AJ0>infested with cockroaches

        (cf.: The building seemed/became infested with cockroaches)

      7. A further distinction which can be used to test with 'event' verbs is that the AJ0 refers to a 'resultant state', whereas the VVN refers to an event:

        Bill was <w AJ0>married. (i.e. he was not single)

        Bill was <w VVN>married to Sarah on the 15th May. (i.e. the actual event)

        This is a manifestation of the general semantic character of adjectives (which typically refer to states or qualities) and verbs (which typically refer to events or actions).

        However, this criterion is not definitive, as VVG and VVN can also sometimes refer to states, when the meaning of the verb is stative:

        She is not <w VVN>disturbed by that sort of threat.

        The tourists were <w VVG>standing around a map of the city.

      8. Finally, here is a test which clearly identifies an -ing form as a verb.

        A verb takes following complements such as a noun phrase, an adjective or an adverbial. These cannot follow the same word as adjective. E.g.:

        Are you <w VVG>expecting someone?

        The arithmetic is <w VVG>looking good.

        <w VVG>Turning suddenly, she ran for the safety of the car

        Contrast:

        His manner was <w AJ0>insulting.

        where insulting could not normally be followed by an object:

        * insulting us.

      Contents | Tagset | Intro to Wordclasses | Adjective vs. Adverb | Adjective vs. Noun | Determiner-pronoun vs. Adverb | Adjective vs. Participle | Prep vs. Prep Adverb | Interjection vs. Unclassified | Disambiguate words


      PREPOSITION vs. PREPOSITIONAL ADVERB vs. GENERAL ADVERB
      (PRP vs. AV0, and PRP vs. AVP)

      This kind of ambiguity occurs frequently, particularly in spoken texts. Compare:

      (a) She ran <w PRP>down the hill.

      (b) She ran <w AVP>down her best friends.

      In (a), down is a preposition, because:

      In (b), down is an adverbial particle because:

      Notice that the syntactic distinction between (for example) down as an adverbial particle and down as a preposition is independent of the semantic distinction between locative and non-locative interpretations of down.

      When the verb is simply followed by down or out, etc., without a following noun phrase, it is normally an AVP:

      Income tax is coming <w AVP>down.

      The decorations are put <w AVP>up on Christmas Eve.

      However, it is important to recognize 'stranded' prepositions, which have been deprived of the company of their noun phrase, the prepositional complement, because it has been fronted or omitted through ellipsis (e.g. in relative clauses, with passives, in questions, etc.):

      This is the hill (which) she ran <w PRP>down.
      (Cf. This is the hill down which she ran.)

      The poor were looked down <w PRP>on by the rich.
      (Here on is the stranded preposition)

      Which car did she arrive <w PRP>in?

      The same tests apply to words which are tagged either as prepositions or as general adverbs (AV0), such as across, past and behind.

      Note, additionally, the use of about as a degree adverb.

      Contents | Tagset | Intro to Wordclasses | Adjective vs. Adverb | Adjective vs. Noun | Determiner-pronoun vs. Adverb | Adjective vs. Participle | Prep vs. Prep Adverb | Interjection vs. Unclassified | Disambiguate words


      INTERJECTION vs. UNCLASSIFIED (ITJ vs. UNC)

      The borderline between interjections or exclamatory particles (tagged ITJ) and unclassified 'noise' words (tagged UNC) is drawn as follows:

      ITJ is used for 'institutionalized' interjections or discourse particles such as

      good-bye oh no oops hallelujah whoa wow

      Well, right and like functioning as discourse markers are tagged AV0.

      UNC is used in contexts where no other wordclass tag seems appropriate:

      Contents | Tagset | Intro to Wordclasses | Adjective vs. Adverb | Adjective vs. Noun | Determiner-pronoun vs. Adverb | Adjective vs. Participle | Prep vs. Prep Adverb | Interjection vs. Unclassified | Disambiguate words


      SECTION 4 : DISAMBIGUATION BY WORD

      In this section we discuss some common words which belong to more than one word class, and are among the most problematic for disambiguation. As in section 3, if the tag stated in the example differs from the actual tag in the corpus, we append the latter to the file reference number in the next line. Eg *AV0 in

      Tears <w VVB>well up in my eyes.
      [BN3.5 *AV0]

      The words covered are:

      apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

      Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


      apostrophe 'S

      Choice of tags: VBZ VHZ VDZ POS
      [in fused words: VM0 ZZ0 CRD ]

      In the BNC the apostrophe 's is generally tagged as a separate wordform (that is <w TAG>'s ), attached without a space to the immediately preceding word.

      1. Contracted forms, When it represents a shortened form of is, has or (rarely) does, it has the appropriate verb tag. Occasionally, for example with auxiliaries followed by past participles, there are difficulties determining what the full form of the verb should be. Examples:

        <w DT0>That<w VBZ>'s perfect is that one... (= That is...)
        [KCX.1254]

        <w NP0>She<w VHZ>'s got tickets. (= She has...)
        [KPV.6481]

        well, <w DTQ>what<w VDZ>'s he do?, is he a plumber? (= What does...)
        [KD6.310]

      2. Genitives

        <w NP0>Britain<w POS>'s small businesses
        [HMH.67]

        After <w AV0>today<w POS>'s announcement
        [K6F.39]

      3. However, when 's acts as a marker of the -s plural, or as part of the verb form let's, it is part of a single word, and is not assigned its own tag. E.g.:

        success in the three <w ZZ0>R's
        [EVY.59]

        in the <w CRD>1980's
        [HJ1.22024]

        <w VM0>Let's <w VVI>go.
        [A61.1443]

        Note that let's is not considered a contraction of let us, but is treated as a single 'verbal particle', tagged VM0, on the grounds that it is closely analogous to modal auxiliaries.

      apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

      Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


      ABOUT

      Choice of tags: PRP, AV0 and AVP

      apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

      Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


      AS

      Choice of tags: PRP, AV0 and CJS (also multiword tags)

      1. Comparative constructions:

        As is a degree adverb (AV0) when it occurs before an adjective, adverb or determiner (and sometimes other words) in phrases of the type as X as Y, or simply as X (where the comparative clause or phrase as Y) is omitted but understood:

        I go to see them <w AV0>as often as I can .
        [AC7.1192]

        and they employ ninety people, twice <w AV0>as many as last year.
        [K1C.3542]

        And every bit <w AV0>as good.
        [EEW.1132 *CJS]

        In the first and second examples above, the second as introduces a comparative construction which expresses 'equal comparison', as contrasted with the unequal comparison of more X than Y. When as is a word introducing such a comparative construction, it is tagged CJS:

        Capitalism is not <w AV0>as good <w CJS>as it claims.
        [CFT.2051]

        Linked together, they can crunch numbers <w AV0>as fast <w CJS>as any mainframe.
        [CRB.271]

        She will deposit <w AV0>as many <w CJS>as a dozen eggs there.
        [F9F.424]

        Notice that as in this comparative use is tagged CJS whether or not it introduces a clause. Often it introduces a noun phrase. In the following example, it introduces an adjective:

        always reply <w AV0>as quickly <w CJS>as possible.
        [C9R.989]

      2. Introducing other clauses:

        The tag CJS is also used when introducing other subordinate clauses, such as adverbial clauses of time or reason:

        Mr Phelps arrived just <w CJS>as I was leaving.
        [G1K.1685]

        <w CJS>As you've gone to so much trouble , it would seem discourteous to refuse
        [G1K.1685]

      3. Preposition:

        The tag PRP is used for as functioning clearly as a preposition:

        Consider it <w PRP>as a kind of insurance
        [AD0.1641]

        <w PRP>As head of information, Christina will lead a team of four TEC staff...
        [BM4.2830]

        Usually the meaning is related to the equative meaning of the verb be. However, the guideline restricts PRP to cases where as is followed by the normal noun phrase or nominal, as is normal for prepositions. Where the as is followed by an adjective or a past participle clause, it is tagged CJS, even though it may retain the equative type of meaning:

        We regard these results <w CJS>as encouraging.
        [B1G.184]

        I very much hope that you will in fact support the motion <w CJS>as originally intended.
        [KGX.93]

      4. Multiwords:

        As is part of many multiwords which get tagged with a single tag: e.g. as soon as, such as, in so far as, as long as, as well as. The sequence as well as, for example, is tagged as a preposition (PRP) in such examples as

        Sometimes <w PRP>as well as going this way we actually need to go in this was too.       [G5N.31]

        Note that this is different from the multiword adverb as well (meaning also); it is also different from the sequence of as well as as three separate words, e.g. in:

        She's <w AV0>as <w AJ0>well <w CJS>as can be expected.
        [F9X.2095]

      apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

      Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


      BUT

      Choice of tags: CJC, CJS, PRP, AV0
      The coordinating conjunction CJC is overwhelmingly the most common use of but.

      1. Adverb:

        But is an adverb when its meaning is similar to 'only':

        She can spare you <w AV0>but a few minutes
        [CCD.82 *CJC]

        There is <w AV0>but one penalty.
        [ALS.185 *CJC]

      2. Subordinating conjunction or preposition:

        But is either a conjunction (CJS) or a preposition (PRP) if it has the meaning of 'except (for)', 'other than' or 'apart from'. CJS is used when it introduces a clause, and PRP is used when it introduces a phrase:

        ...mediocre albums that do nothing <w CJS>but take up shelf space
        [C9M.1014]

        I couldn't help <w CJS>but notice.
        [JY0.5323 *CJC]

        I always feel they are open meetings in everything <w PRP>but name.
        [HJ4.5520]

        No one had guessed she was anything <w PRP>but a boy.
        [C85.517]

      3. Coordinating conjunction:

        Otherwise but is a coordinating conjunction, tagged CJC, linking units of the same kind (e.g. clauses or adjective/adverb phrases). Its function is to express contrastive or 'adversative' meaning:

        God and minds do exist , <w CJC>but materially so .
        [ABM.1260]

        And that's it for another week <w CJC>but don't forget the late news at eleven thirty.
        [J1M.2520]

        Hares ( <w CJC>but not rabbits ) are particularly vulnerable...
        [B72.892]

      4. Multiwords

        Note also multiwords such as but for (PRP):

        The fare increases would have been bigger <w PRP>but for the governments last minute intervention.
        [K6D.124]


      apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

      Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


      HOME

      Choice of tags: AV0 and NN1

      As a locative adverb, home has no determiner or article preceding:

      We stayed <w AV0>home.
      [FAP.313]

      This place is my <w NN1>home.
      [AMB.1805]


      apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

      Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


      LIKE

      Choice of tags: PRP AV0 CJS VVB VVI NN1 AJ0

      1. Discoursal function:

        In speech, when like has a discoursal function as a 'hedge', we tag it AV0:

        well she says <w AV0>like, I won't be a minute
        [KCY.1518]

        I'm driving along, you know <w AV0>like <trunc> wha </trunc> when you're in the car by yourself and everything's turning over in your head
        [KBU.1096]

      2. Other functions:

        Like very frequently occurs as a preposition or as a verb. The noun and adjective uses are fairly rare:

        ...but I <w VVB>like Monday best.
        [FU4.1089]

        He didn't look <w PRP>like a goodie.
        [H0M.1353]

        ... fuel, weapons, ground crew and the <w NN1>like.
        [J1N.105 *AJ0-NN1]

        Churchill and Eden were not of <w AJ0>like minds...
        [ACH.1299]


      apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

      Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


      LITTLE

      Choice of tags: AJ0, DT0, AV0, (also multiwords)

      1. Adjective:

        The meaning of little (AJ0) is the opposite of big:

        Bless their dear <w AJ0>little faces.
        [HRB.722]

        <w AJ0>Little green shoots of recovery are stirring.
        [CEL.968]

      2. Determiner-pronoun:

        The meaning of little (DT0) is 'not much':

        I have <w DT0>little to say.
        [G1Y.1137]

        ...there was <w DT0>little food left.
        [FSJ.721]

      3. Adverb:

        As an adverb (AV0), too, little has the meaning 'not much':

        I care very <w AV0>little about petty-minded, selfish "rules".
        [B0P.211]

      4. A little

        Note that a little can also be a multiword adverb (AV0):

        They are all <w AV0>a little drunk.
        [G0F.2118]

        However, the quantifier a little meaning 'a small amount' is not tagged as a multiword1 but as AT0 + DT0:

        You couldn't let me have <w AT0>a <w DT0>little milk?
        [GUM.1661]

        [See DETERMINER-PRONOUN vs. ADVERB ]


      apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

      Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


      MUCH

      Choice of tags: DT0 AV0

      1. Determiner-pronoun:

        <w DT0>Much of this work has to be done on the spot.
        [C8R.24]

        I've spent too <w DT0>much money.
        [KPV.6261]

      2. Adverb:

        Thanks very <w AV0>much.
        [A73.5]

        I didn't sleep <w AV0>much last night
        [ALH.1495]

        See also DETERMINER-PRONOUN vs. ADVERB


      apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

      Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


      MORE and LESS

      Choice of tags: DT0 AV0 AV0


      apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

      Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


      NO

      Choice of tags: AT0 NN1 AV0 ITJ

      1. Article

        <w AT0>No <w NN1>problem.
        [H4H.227]

      2. Noun

        As a noun, no is usually an abbreviation for number:

        quoting <w NN1>Ref <w NN1>No <w UNC>BCE90/10/4(NS)
        [CJU.673]

      3. Adverb

        but the matter was taken <w AV0>no <w AV0>further.
        [ARF.183 no: *AT0]

        To put it <w AV0>no <w AV0>more <w AV0>strongly, it has not been proved beyond doubt that....     [EW7.125]

      4. Interjection:

        No is tagged as an interjection (ITJ) where it functions as the opposite of Yes.

        "...See how easy my job can be?"
        "Frankly, <w ITJ>no".
        [HR4.2329]

      apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

      Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


      ONE

      Choice of tags: PNI, CRD

      1. Numeral:

        The clearest cases of CRD are:

        In a quantifying noun phrase, typically allowing the substitution of another numerical expression (e.g. one chip contrasts with two chips) or of the digit 1 (1 chip):

        Can I have <w CRD>one chip, please?
        [KDB.1417]

        So are there criticisms? Just <w CRD>one.
        [CG2.1490]

        ... <w CRD>one in five sufferers never tells their partners.
        [CF5.8 *PNI]

        Orford Ness is <w CRD>one of Britain's most unusual coastal features.
        [CF8.86]

        In such noun phrases, one functions like a determiner-pronoun such as some.

      2. Indefinite Pronoun:

        The clearest cases of PNI are:

        (a) As a substitute form, standing for an understood noun or noun phrase:

        The channel was not a broad <w PNI>one
        [AEA.1461]

        In this use, one has a plural form ones.

        (b) As a generic personal pronoun, meaning 'people in general':

        And I think <w PNI>one might go on to argue that far from saving labour it creates it.
        [J17.1915]

      Note that the reliability of the ambiguity tag PNI-CRD (in which the pronoun is rated more likely) is somewhat low. See Error rates, table 2.

      apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

      Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


      RIGHT

      Choice of tags: AV0 VVB VVI NN1

      As both an adverb (AV0) and an adjective (AJ0) right means the opposite of 'wrong' and also the opposite of 'left'. As a noun, it generally means 'entitlements': e.g. I have a <w NN1>right to know. The uses of right as a verb are very rare.

      Less obvious points:

      1. Discoursal function:

        As a discourse marker, right is tagged AV0:

        <w AV0>Right, how you doing there?
        [KBL.4671]

        <w AV0>Right, er, members, any questions to <pause> the speakers?
        [F7V.139]

      2. Degree adverb (intensifier):

        In dialectal usage, right can be an intensifier, and is tagged AV0:

        it's a ... it's a <w AV0>right soft carpet.
        [KB2.1242-4]

      apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

      Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


      SO

      Choice of tags: AV0 AV0 CJS

      1. In most cases so is tagged as an adverb (AV0):

        "<w AV0>So this is where you work..."
        [H8M.2964]

        Right, <w AV0>so what's fifty three per cent as a decimal?
        [JP4.354]

        They waited but nothing happened <w AV0>so they made a fuss.
        [FU1.2484]

      2. As a pro-form meaning 'thus' or standing for a clause/predicate, so is tagged AV0:

        <w AV0>So say I and <w AV0>so say the folk.
        [G11.230]

        "Yes, I think <w AV0>so."
        [CCM.151]

      3. As a degree adverb or intensifier, so is tagged AV0:

        tough and long lasting -- that's why they're <w AV0>so popular.
        [BN4.940]

        ... there are <w AV0>so many lonely people in hospitals
        [FPS.2227]

      4. Introducing purpose clauses, so is tagged CJS (subordinating conjunction):

        Drink your tea <w CJS>so they can have your cup.
        [KB2.1767]

      5. Note that so is frequently part of a multiword: so that, so far, so as to, (in) so far as, etc. See the list of multiwords

      apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

      Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


      THAT

      Choice of tags: DT0 CJT AV0

      1. As a demonstrative (pronoun or determiner), that is tagged DT0:

        <w DT0>That<w VBZ>'s my coat yeah.
        [KBS.1310]

        he's getting hooked on the taste of vaseline, <w DT0>that dog.
        [KCL.197]

      2. As a clause-initiating conjunction, that is tagged CJT:

        This applies to that as a complementizer:

        Many experts claim <w CJT>that it is good for your growing baby, too.
        [G2T.1091]

        and also to that as a relativizer (introducing a relative clause):

        A ship <w CJT>that never enters harbour.
        [BPA.1326]

        This is different from the more traditional analysis which treats that introducing a relative clause as a relative pronoun.

      3. As a degree adverb (intensifier):

        It wasn't all <w AV0>that bad.
        [KPP.322]

      4. In multiwords: That occurs commonly in multiwords such as so that, in that, in order that.


      apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

      Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


      THEN

      Choice of tags: AV0 AJ0

      In all functions except clear adjectival usage (AJ0, usually following the), then receives the tag AV0:

      And <w AV0>then she spoke.
      [H8T.2675]

      "Come on, <w AV0>then."
      [K8V.1722]

      Mr Willi Brandt, the <w AJ0>then Mayor of West Berlin.
      [A87.84]

      ...the <w AJ0>then state governor , who wasn't <w AV0>then Bill Clinton
      [JSM.131]


      apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

      Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


      TO

      Choice of tags: TO0 PRP AVP

      1. Infinitive marker (TO0):

        Note elliptical uses of the pre-infinitival to, especially in informal spoken texts:

        In the summer holidays, I can, I can get up early if I want <w TO0>to.
        [KPG.4204]

        Note also the common colloquial spelling of want to, got to, and going to as fused words:

        wanna = <w VVB>wan<w TO0>na
        gotta = <w VVN>got<w TO0>ta
        gonna = <w VVG>gon<w TO0>na

      2. Preposition (PRP): Prepositions are normally followed by a noun phrase or nominal clause. Where the preposition is 'stranded' (i.e. where the noun phrase associated with the preposition has been moved or ellipted) it can be confused with an adverbial particle:

        That 's the school that Terry goes <w PRP>to.
        [KB8.2443]

        ...what you're entitled <w PRP>to by law is money back
        [FUT.360]

        "Where <w PRP>to?"
        "<w PRP>The moon."

        [FNW.240-1]

      3. The adverbial particle to is extremely rare: it occurs in come to meaning 'regain consciousness'.


      apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

      Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


      WELL

      Choice of tags: AV0 VVB VVI AJ0 NN1

      1. By far the most common function for well is as an adverb: e.g. She's playing <w AV0>well..

      2. Discoursal function:

        When well has the function of a discourse marker, it is treated as an adverb (AV0):

        Oh <w AV0>well. That'll be the finish.
        [FX6.196-7]

        I bet he doesn't get up till about, <w AV0>well, it's eleven now.
        [KBL.3808]

      3. Degree adverb:

        Well is tagged AV0, too, where it has an intensifying function: e.g.

        It was dark outside and <w AV0>well past your bedtime.
        [ASS.898]

      4. Adjective (AJ0):

        Well is tagged as an adjective where it means 'in good health': You don't look <w AJ0>well.
        [HPR.107]

      5. As a verb, well is very rare, and occurs in the phrasal verb well up. NB. This use has not been accurately tagged in the corpus:

        Tears <w VVB>well up in my eyes.
        [BN3.5 *AV0]


      apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

      Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


      WHEN

      Choice of tags: AVQ CJS

      When can introduce three types of clauses: an adverbial clause, a nominal clause, or a relative clause. Where it introduces an adverbial clause, it is tagged CJS. Otherwise it is tagged AVQ. The AVQ tag is also used for when introducing a question. Examples:

      1. Adverbial clause:

        <w CJS>When I got back to my flat, I decided to ring Toby.
        [CS4.1265]

        the crowd left quietly <w CJS>when the police arrived.
        [APP.1017]
        (when = at the time at which)

        If you smoke <w CJS>when you're pregnant...
        [A0J.1600]
        (when = whenever)


        Note that when is also a conjunction (CJS) in abbreviated adverbial clauses which lack a subject and finite verb, such as when in doubt, when ready, when completed.

      2. Nominal clause

        I can't remember <w AVQ>when we last had a frost.
        [KBF.11728]

        "Do you remember <w AVQ>when we used to go with Daddy in the boat on Saturdays?"
        [A6N.2022]

        You never know <w AVQ>when the next big story will break.
        [HJ6.101]
        (when = at what time)

        Before an infinitive, when is also tagged AVQ:

        Otto knew <w AVQ>when to change the subject.
        [FAT.1606]

        Also when the rest of the infinitive clause is understood: Tell me <w AVQ>when.

      3. Relative clause

        in the year <w AVQ>when I was born (when = in which)

        the moment <w AVQ>when he arrived (when = at which)

        Note that when can often be omitted in relative clauses: the moment he arrived.

      4. Direct questions

        <w AVQ>When did you find out?


      apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

      Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


      WHERE

      Choice of tags: AVQ CJS

      Where is like when in that it can be a wh- adverb (AVQ) or a subordinating conjunction (CJS). However, with where the CJS tag is much less likely than the AVQ tag. Examples:

      1. In adverbial clauses (CJS):

        ...to hit him <w CJS>where it hurts.
        [CEN.2816]

      2. In other contexts it is tagged AVQ:

        • Nominal clause:

          I don't know <w AVQ>where she picked them up.
          [G1D.1163]

        • Relative clauses

          It was the house <w AVQ>where the poor woodcutter lived with Hansel and Gretel
          [CH4.2635]

        • Direct questions:

          <w AVQ>Where are you going?
          [KB9.363]


      apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

      Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


      WORTH

      Choice of tags: PRP NN1

      1. Preposition

        Worth is tagged PRP where it could answer the question 'How much is ... worth?' or 'What is ... worth?'

        these pictures are <w PRP>worth a small fortune.
        [FNT.1060]

        That makes him <w PRP>worth about $dollar;60m.
        [CT3.479]

        'Darling, it's not <w PRP>worth getting upset.
        [HH9.2310]

        Worth also occurs as a 'stranded preposition' in questions used to elicit such responses, and in some other common constructions:

        how much d'ya think it's <w PRP>worth?
        [KCX.1344]

        share prices say nothing about what a company is <w PRP>worth.
        [A9U.305 *NN1]

        Please go ahead and push Grapevine for all you are <w PRP>worth.
        [AP1.575]

      2. Noun

        Worth is tagged NN1 when it is an obvious noun (meaning 'value'). Typically this occurs following expressions of quantity, whether or not the quantity is expressed by a possessive or genitive (e.g. its, 's).

        Baker showed his <w NN1>worth for Ipswich in the 20th minute
        [CF9.103]

        hundreds of pounds' <w NN1>worth of damage.
        [A0H.15]

        2,500 <w NN1>WORTH OF PRIZES
        [ECJ.1147]

      apostrophe 'S | ABOUT | AS | BUT | HOME | LIKE | LITTLE | MUCH | MORE | NO | ONE | RIGHT | SO | THAT | THEN | TO | WELL | WHEN | WORTH |

      Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs


      SECTION 5: Features of spoken corpus tagging

      The spoken and written texts of the BNC have been tagged in the same way, except that the following phenomena occur almost entirely in the spoken part of the corpus.

      Footnotes

      1 In BNC version 1, the quantifier a little meaning 'a small amount' was sometimes (but not reliably) tagged as a multiword DT0. See multiword list for differences in multiword tag assignment from the earlier tagging of the corpus.

      2 In our experience, human analysts too sometimes have difficulty resolving ambiguities such as these, especially when using the plain orthographic transcriptions of the BNC, and with no direct access to the original sound recordings.

      Contents | Tagset | Intro to Wordclasses | Disambiguate tag pairs | Disambiguate words


      Related documents

      Introduction to the manual | Automatic tagging of the BNC | Error rates | Acknowledgments

      Date: 17 March 2000