News from the LDC: Membership Year 1999

Membership Year 1999

28 August, 1998

Dear Member,

The LDC is pleased to announce the opening of the 1999 membership year beginning on September 1. To date the LDC has released more than 140 corpora in more than 20 languages to more than 750 organizations worldwide. The 1998 membership year saw the release of 22 new corpora. We have a very ambitious publications schedule for 1999 as well. I have provided a partial list below.

Past membership years ran from September 1 through August 31. We will be discontinuing that practice this year. The 1999 membership year will actually run sixteen months from September 1, 1998 through December 31, 1999. Beginning January 1, 2000, the membership year will match the calendar year.

Once your organization joins the LDC for a specific membership year, you have rights to one copy of each corpus released in that membership year. You may exercise those rights at any time in the current membership year or thereafter. Current members have the added benefit of being able to purchase additional copies of current releases and releases from closed membership years for $100 per CD-ROM. Current members may also use LDC On-Line, the World Wide Web access to LDC's speech and text corpora.

The publications planned for release in membership year 1999 include:

  • Enhanced pronouncing Lexicons in Egyptian Arabic, English, Mandarin, Manding, Spanish, and Yoruba.

  • Newswire collections in French, Japanese, Korean, Mandarin, Portuguese, Spanish, and Thai.

  • The SUSAS (Speech Under Simulated and Actual Stress) database was collected by Duke University in support of research and development in speech recognition under conditions of noise and stress.

  • Switchboard-2 Phase II, collected by the LDC in early 1997, contains nearly 4,000 five-minute conversations among college students from the Midwestern United States and should benefit research in speech recognition and speaker identification.

  • TDT2 Text was collected by the LDC from January through June 1998 in support of a DARPA sponsored research project on Topic Detection and Tracking. TDT2 Text contains approximately 60,000 English news stories from six different television, radio, and newswire sources with each story tagged for its relevance to each of 100 news topics. TDT2 is the largest corpus LDC has collected in support of information retrieval research.

  • UCSB Corpus of Spoken American English, collected under the direction of John DuBois at UC Santa Barbara, contains speech and transcripts from a variety of conversational settings including for example: a telephone conversation between boyfriend and girlfriend, a conversation between mother and daughter in their car, and a judo class instruction.

  • USC Marketplace Speech and Transcripts, recorded in 1996, contains 50 hours of broadcast news programs and their transcripts. Marketplace should be useful for research in speech recognition and spoken document retrieval.

  • Westpoint Arabic Speech Corpus is part of the US Military Academy's Project Santiago in which researchers build acoustic models and lexicons to deliver speech recognition technology to computer assisted language learning.

    The operations of the LDC are closely tied to the evolving needs of the research and development community that it supports. Since research programs increasingly depend upon access to shared data, LDC membership fees have been set at affordable levels and have not changed since the LDC began operations in 1992. Membership is open to researchers around the world. If you would like to receive more information or request a quote for the upcoming membership year, please do not hesitate to contact me. I can be reached via e-mail or telephone (ldc@ldc.upenn.edu or (215) 898-0464). It has been a pleasure working with you through the past membership years and we look forward to continuing to work with you in the future.

    Sincerely,
    Shannon Sears
    Member Relations Coordinator