GATE Version 3.1 release (April 2006)
1. Major new features
1.1. Support for UIMA
UIMA (http://www.research.ibm.com/UIMA/) is a language processing framework
developed by IBM. UIMA and GATE share some functionality but are
complementary in most respects. GATE now provides an interoperability layer
to allow UIMA applications to include GATE components in their processing
and vice-versa. For full information, see chapter 14 of the User Guide.
1.2. New Ontology API
The ontology layer has been rewritten in order to provide an abstraction
layer between the model representation and the tools used for input and
output of the various representation formats. An implementation that uses
Jena 2 (http://jena.sourceforge.net/ontology) for reading and writing OWL
and RDF(S) is provided.
1.3. Ontotext Japec Compiler
Japec is a compiler for JAPE grammars developed by Ontotext Lab. It has some
limitations compared to the standard JAPE transducer implementation, but can
run JAPE grammars up to five times as fast. By default, GATE still uses the
stable JAPE implementation, but if you want to experiment with Japec, see
section 9.27 of the User Guide.
2. Other new features and improvements
-
Addition of a new JAPE matching style ”all”. This is similar to Brill, but
once all rules from a given start point have matched, the matching will
continue from the next offset to the current one, rather than from the
position in the document where the longest match finishes. More details can
be found in Section 7.2.
-
Limited support for loading PDF and Microsoft Word document formats. Only
the text is extracted from the documents, no formatting information is
preserved.
-
The Buchart parser has been deprecated and replaced by a new plugin called
SUPPLE - the Sheffield University Prolog Parser for Language Engineering.
Full details, including information on how to move your application from
Buchart to SUPPLE, is in section 9.12.
-
The Hepple POS Tagger is now open-source. The source code has been
included in the GATE distribution, under src/hepple/postag. More
information about the POS Tagger can be found in Section 8.4.
-
Minipar is now supported on Windows. minipar-windows.exe, a modified
version of pdemo.cpp is added under the gate/plugins/minipar directory to
allow users to run Minipar on windows platform. While using Minipar on
Windows, this binary should be provided as a value for miniparBinary
parameter. For full information on Minipar in GATE, see section 9.10 of
the User Guide.
-
The XmlGateFormat writer(Save As Xml from GATE GUI, gate.Document.toXml()
from GATE API) and reader have been modified to write and read GATE
annotation IDs. For backward compatibility reasons the old reader has been
kept. This change fixes a bug which manifested in the following situation:
If a GATE document had annotations carrying features of which values were
numbers representing other GATE annotation IDs, after a save and a reload
of the document to and from XML, the former values of the features could
have become invalid by pointing to other annotations. By saving and
restoring the GATE annotation ID, the former consistency of the GATE
document is maintained. For more information, see Section 6.5.2 of the
User Guide.
-
The NP chunker and chemistry tagger plugins have been updated. Mark
Greenwood has relicenced them under the LGPL, so their source code has
been moved into the GATE distribution. See sections 9.3 and 9.15 for
details.
-
The Tree Tagger wrapper has been updated with an option to be less strict
when characters that cannot be represented in the tagger’s encoding are
encountered in the document. Details are in section 9.7.
-
JAPE Transducers can be serialized into binary files. The option to load
serialized version of JAPE Transducer (an init-time parameter
binaryGrammarURL) is also implemented which can be used as an alternative
to the parameter grammarURL. More information can be found in Section 7.7.
-
On Mac OS, GATE now behaves more ‘naturally’. The application menu items
and keyboard shortcuts for About and Preferences now do what you would
expect, and exiting GATE with command-Q or the Quit menu item properly
saves your options and current session.
-
Updated versions of Weka (3.4.6) and Maxent (2.4.0).
-
Optimisation in gate.creole.ml: the conversion of AnnotationSet into ML
examples is now faster.
-
It is now possible to create your own implementation of Annotation, and
have GATE use this instead of the default implementation. See
AnnotationFactory and AnnotationSetImpl in the gate.annotation package for
details.
3. Bug fixes
-
The Tree Tagger wrapper has been updated in order to run under Windows.
See 9.7.
-
The SUPPLE parser has been made more user-friendly. It now produces more
helpful error messages if things go wrong. Note that you will need to
update any saved applications that include SUPPLE to work with this
version - see section 9.12 of the User Guide for details.
-
Miscellaneous fixes in the Ontotext JapeC compiler.
-
Optimization : the creation of a Document is much faster.
-
Google plugin: The optional pagesToExclude parameter was causing a
NullPointerException when left empty at run time. Full details about the
plugin functionality can be found in section 9.20.
-
Minipar, SUPPLE, TreeTagger: These plugins that call external processes
have been fixed to cope better with path names that contain spaces. Note
that some of the external tools themselves still have problems handling
spaces in file names, but these are beyond our control to fix. If you want
to use any of these plugins, be sure to read the documentation to see if
they have any such restrictions.
-
When using a non-default location for GATE configuration files, the
configuration data is saved back to the correct location when GATE exits.
Previously the default locations were always used.
-
Jape Debugger: ConcurrentModificationException in JAPE debugger. The JAPE
debugger was generating a ConcurrentModificationException during an attempt
to run ANNIE. There is no exception when running without the debugger
enabled. As result of fixing one unnesesary and incorrect callback to
debugger was removed from SinglePhaseTransducer class.
-
Plus many other small bugfixes...