next up previous contents
Next: 2. An Introduction to Up: Fast Transformation-Based Learning Toolkit Previous: Contents   Contents

Subsections

1. Preface

The TBL toolkit presented here is the result of work done by the NLP group at Johns Hopkins University over a period of 2 years. It includes ideas from 3 papers presented by the authors over the 2000-2001 period.

The goal of this toolkit is rather broad: to handle any discrete classification task1.1 and to offer a quick and simple way to obtain either baseline results for a particular task, or to obtain a classifier that performs the task at hand. For the toolkit to be able to address the problem, the problem must first be cast as a classification problem, where the goal is to assign classifications to a set of samples, selected from some vectorial space.

1.1 How to Read this Report

The report first introduces the transformation-based learning paradigm in Section 2. Section 3 describes the structure of the system, including the file formats used and the conventions for declaring the rule types. Section 4 quickly describes how the toolkit major applications (the rule list training and application) should be launched, and briefly touches on the interaction between the rules that can appear. Section 5 describes some test cases - for people that are most interested in POS tagging, for instance, this is the section that needs to be read - mainly section 5.2 and subsequent sections. For the people with very little time on their hands in search for POS tagging, the Section 5.2.3 describes the 2 scripts that will do most of the work for this task. Also presented are base NP chunking (Section 5.1) and word sense disambiguation (Section 5.3), the latter being presented as a case where samples are independent. Finally, in the Appendix all the parameters that can be defined in the parameter file are described in Section A.1), and all the scripts that are part of the distribution are also described, together with all their parameters (Section A.2).

If you rather went directly into business, go to the directory test-cases of the distribution, pick the appropriate problem for you, read the short README file in the directory (which lists mostly the commands to run for the problem), copy the template files from that directory, adapt the rules to fit your problem and take it from there; come back to this report if you have some trouble with the programs, or want to find some flag that does something or the other...

1.2 Contacting the Authors

If you have problems with the code (compiling, found a bug, cannot make it to work, want to extend the code, etc), you can contact the authors at:

$\displaystyle \textrm{rflorian}@\textrm{cs}.\textrm{jhu}.\textrm{edu}$      
$\displaystyle \textrm{gyn}@\textrm{cs}.\textrm{jhu}.\textrm{edu}$      
$\displaystyle \textrm{ fnTBLtk}@\textrm{nlp}.\textrm{cs}.\textrm{jhu}.\textrm{edu}$      

The last address is the mailing list address, so if you send mail there, it will be forwarded to all the people that are subscribed to the mailing list -- that might be the fastest way to get your problem fixed. If you want to submit changes to the program, submit a patch file to rflorian@cs.jhu.edu, describing the changes that were made, and we will include the patch (and your name) in the next release of the code.


next up previous contents
Next: 2. An Introduction to Up: Fast Transformation-Based Learning Toolkit Previous: Contents   Contents
Radu Florian 2001-09-12