Name Last modified Size Description
Parent Directory - src/ 2007-04-21 05:54 - build.xml 2006-04-05 15:11 1.9K Ant build file creole.xml 2006-04-05 15:11 1.4K LICENCE.html 2006-04-05 15:12 28K README.html 2006-04-05 15:12 22K MtlTransducer.jar 2007-04-21 05:23 109K Java Archive
The GATE framework comes with a basic "Jape Transducer" which is fully described in the Gate user guide. The JAPE grammar language understood by the transducer is also explained. There is also an "Ontology Aware Transducer" that is a wrapper around the Jape Transducer (in fact, the latter's core is already ontology aware). And there is a "ANNIE Transducer" that is nothing more than a Jape Transducer that loads with a named-entity recognition grammar.
The Montreal Transducer is an improved Jape Transducer. It is intended to make grammar authoring easier by providing a more flexible version of the JAPE language and it also fixes a few bugs.
If you write JAPE grammars, see section Changes to the JAPE language for all the details. Otherwise, here is a short description of the enhancements:
The Montreal Transducer sources are freely available, so user support will be very limited. You may find what you are looking for on the project homepage.
Developers will find comments on classes and methods through the javadoc pages: doc/javadoc/index.html.
Note that the directory must be accessible by the embedding application via the "file:" protocol. Unlike for most GATE modules, the directory (also known as a repository in GATE 2.x) of a transducer cannot be a web URL ("http://www..."). This is because the transducer compiles java code (the grammar rules) every time it is loaded and the resource jar file must be part of the classpath when compiling, but only regular file URLs are allowed in the classpath. The resource will try to add the jar file to the classpath automatically.
If problems arise when loading the transducer, add the jar file to the classpath manually prior to running the application.
If you plan to use the transducer with the GATE GUI, see section How to use it with the GATE GUI. If you plan to use it in a standalone program, jump to section How to use it in a standalone GATE program.
Gate 3.0: In the GUI menu, click on File / Manage CREOLE plugins, find the Montreal Transducer and tick the "Load now" or "Load always" box.
Then, for all versions of GATE: Click on File / New processing resource and choose Montreal Transducer. The only mandatory field is the Grammar URL: enter the path of a main.jape file in the same manner as for a regular Jape Transducer (this URL can point to a file on the web). Add the new module to a processing pipeline. It may be necessary to run a tokeniser and gazetteer before the transducer if the grammar uses Token and Lookup annotations.
A good starting point is the example code here. The following code registers a repository (the directory where the MtlTransducer.jar and creole.xml files live; the directory cannot be a web URL, see Installation procedure), then creates a Montreal Transducer with specific parameters (the grammarURL parameter is mandatory and it should point to a main.jape file like for a regular Jape Transducer), and finally adds the resource to a pipeline. It may be necessary to run a tokeniser and gazetteer before the transducer if the grammar uses Token and Lookup annotations.
// Create a pipeline
SerialAnalyserController annieController = (SerialAnalyserController)
Factory.createResource("gate.creole.SerialAnalyserController",
Factory.newFeatureMap(), Factory.newFeatureMap(),
"ANNIE_" + Gate.genSym());
// Load a tokeniser, gazetteer, etc. here
// Register the external repository where the Montreal Transducer
jar file lives
gate.Gate.getCreoleRegister().registerDirectories(new URL("file:MtlTransducer/build"));
// Create an instance of the transducer after having set the grammar
URL
FeatureMap params;
params = Factory.newFeatureMap();
params.put("grammarURL", new URL("file:creole/NE/main.jape"));
params.put("inputASName", "Original markups");
ProcessingResource transducerPR = (ProcessingResource)
Factory.createResource("ca.umontreal.iro.rali.gate.MtlTransducer",
params);
annieController.add(transducerPR);
The Montreal Transducer offers more comparison operators to put in left hand side constraints of a JAPE grammar. The standard ANNIE transducer allows constraints only like these:
Notes on equality operators: "==" and "!="
The "!=" operator is the negation of the "==" operator, that is to say: {Annot.attribute != value} is equivalent to {!Annot.attribute == value}.
When a constraint on an attribute cannot be evaluated because an annotation does not have a value for the attribute, the equality operator returns false (and the difference operator returns true).
If the constraint's attribute is a string, then the String.equals method is called with the annotation's attribute as a parameter. If the constraint's attribute is an integer, then the Long.equals method is called. If the constraint's attribute is a float, then the Double.equals method is called. And if the constraint's attribute is a boolean, then the Boolean.equals method is called. The grammar parser does not allow other types of constraints.
Normally, when the types of the constraint's and the annotation's attribute differ, they cannot be equal. However, because some ANNIE processing resources (namely the tokeniser) set all attribute values as strings even when they are numbers (Token.length is set to a string value, for example), the Montreal Transducer can convert the string to a Long/Double/Boolean before testing for equality. In other words, for the token "dog":
If the constraint's attribute is a string, then the String.compareTo method is called with the annotation's attribute as a parameter (strings can be compared alphabetically). If the constraint's attribute is an integer, then the Long.compareTo method is called. If the constraint's attribute is a float, then the Double.compareTo method is called. The transducer issues a warning if an attempt is made to compare two Boolean because this type does not extend the Comparable interface and thus has no compareTo method.
The transducer issues a warning when it encounters an annotation's attribute that cannot be compared to the constraint's attribute because the value types are different, or because one value is null. For example, given a constraint {MyAnnot.attrib > 2}, a warning is issued for any MyAnnot in the document for which attrib is not an integer, such as attrib = "dog" because we cannot evaluate "dog" > 2. Similarly, {MyAnnot.attrib > 2} cannot be compared to attrib = 2.5 because 2.5 is a float. In this case, force 2 as a float with {MyAnnot.attrib > 2.0}.
The transducer does not issue a warning when the constraint's attribute is an integer/float and the annotation's attribute is a string but can be parsed as an integer/float. Some ANNIE processing resources (namely the tokeniser) set all attribute values as strings even when they are numbers (Token.length is set to a string value, for example), and because {Token.length < "10"} would lead to an alphabetical comparison, a workaround was needed so we could write {Token.length < 10}.
Notes on pattern matching operators: "=~" and "!~"
The "!~" operator is the negation of the "=~" operator, that is to say: {Annot.attribute !~ "value"} is equivalent to {!Annot.attribute =~ "value"}.
When a constraint on an attribute cannot be evaluated because an annotation does not have a value for the attribute, the value defaults to an empty string ("").
The regular expression must be enclosed in double quotes, otherwise the transducer issues a warning:
To have a match, the regular expression must cover the entire attribute string, not only a part of it. For example:
Bindings: when a constraint contains both negated and regular elements, the negated elements do not affect the bindings of the regular elements. Thus, {Person, !Organization} binds to the same annotations (amongst those that starts at current node in the annotation graph) as {Person}; the difference between the two is that the first will simply not match if one of the annotations starting at current node is an Organization. On the other hand, when a constraint contains only negated elements such as {!Organization}, it binds to all annotations starting at current node. It is important to keep that in mind especially when a rule ends with a constraint with negated elements only: the longest annotation at current node will be preferred.
Conjunctions of constraints on different types of annotation
The Montreal Transducer allows constraints on different types of annotation. Though the JAPE implementation exposed in the GATE 2.1 User Guide details an algorithm that would allow such constraints, the ANNIE transducer does not implement it. This transducer does. Those examples do not work as expected with the ANNIE transducer but do with this transducer:
Greedy Kleene operators: "*" and "+"
The ANNIE transducer does not behave consistently regarding the "*" and "+" Kleene operators. Suppose we have the following rule with 2 bindings:
Basically, the Montreal Transducer source code and binaries are free. A work that would be a modification of it should also be free. However, a work that would only USE the Montreal Transducer would be exempted from the terms of the licence, provided the GATE and the Montreal Transducer binaries, source code and licence are distributed with the embedding work and provided the use of those softwares is acknowledged. For additional help on the interpretation of the GATE licence, see http://www.gate.ac.uk/gate/doc/index.html.
1.1:
- Bug fixed: a constraint with multiple negated tests on the same attribute
of a given annotation type would match when at least one test succeeds,
but it should match only when ALL negated tests succeed.
1.0:
- Initial release.