MaltParser
Alex Rudnick edited this page Nov 22, 2013
·
1 revision
Not so hard:
- download here: http://maltparser.org/download.html
- fastest possible thing you can do is get pre-trained models from here: http://maltparser.org/mco/mco.html
- The user guide is pretty good: http://maltparser.org/userguide.html
This assumes that you're in the directory with malt.jar, and that engmalt.linear.mco (available above) is also in the cwd.
java -jar malt.jar -c engmalt.linear -m parse -i pierre.conll
(optionally, you can also specify "-o outfile.conll" to put the output into a file instead of stdout.)
Assuming that pierre.conll contains this (tab-separated):
1 Pierre _ NNP NNP _ 2 Vinken _ NNP NNP _ 3 , _ , , _ 4 61 _ CD CD _ 5 years _ NNS NNS _ 6 old _ JJ JJ _ 7 , _ , , _ 8 will _ MD MD _ 9 join _ VB VB _ 10 the _ DT DT _ 11 board _ NN NN _ 12 as _ IN IN _ 13 a _ DT DT _ 14 nonexecutive _ JJ JJ _ 15 director _ NN NN _ 16 Nov. _ NNP NNP _ 17 29 _ CD CD _ 18 . _ . . _
The module you want from nltk is ntlk.parse.malt -- but before you can use it, you have to tell it where the MaltParser jar lives; this is done by setting an environment variable. That works fine from inside Python.
- issue: if you've got a pre-trained model, it doesn't look like there's a good way to specify that file?
- you should also be able to pass MALTPARSERHOME as an argument.
- also: if your text is already tagged, no way to specify that? the code in NLTK as it stands wants to run a tagger on your text.
- It also wants to do something dumb like tokenizing by just string splitting. There should be a way to pass in pre-tokenized and pre-tagged text.
import nltk import os os.environ["MALTPARSERHOME"] = "/path/to/malt-1.4.1" nltk.parse.malt.demo()
CategoryTechnicalNotes