Skip to content
Alex Rudnick edited this page Nov 22, 2013 · 1 revision

Setting up MaltParser

Not so hard:

parsing sentences in CoNLL format

This assumes that you're in the directory with malt.jar, and that engmalt.linear.mco (available above) is also in the cwd.

java -jar malt.jar -c engmalt.linear -m parse -i pierre.conll

(optionally, you can also specify "-o outfile.conll" to put the output into a file instead of stdout.)

Assuming that pierre.conll contains this (tab-separated):

1	Pierre	_	NNP	NNP	_
2	Vinken	_	NNP	NNP	_
3	,	_	,	,	_
4	61	_	CD	CD	_
5	years	_	NNS	NNS	_
6	old	_	JJ	JJ	_
7	,	_	,	,	_
8	will	_	MD	MD	_
9	join	_	VB	VB	_
10	the	_	DT	DT	_
11	board	_	NN	NN	_
12	as	_	IN	IN	_
13	a	_	DT	DT	_
14	nonexecutive	_	JJ	JJ	_
15	director	_	NN	NN	_
16	Nov.	_	NNP	NNP	_
17	29	_	CD	CD	_
18	.	_	.	.	_

Using MaltParser with nltk

The module you want from nltk is ntlk.parse.malt -- but before you can use it, you have to tell it where the MaltParser jar lives; this is done by setting an environment variable. That works fine from inside Python.

  • issue: if you've got a pre-trained model, it doesn't look like there's a good way to specify that file?
  • you should also be able to pass MALTPARSERHOME as an argument.
  • also: if your text is already tagged, no way to specify that? the code in NLTK as it stands wants to run a tagger on your text.
  • It also wants to do something dumb like tokenizing by just string splitting. There should be a way to pass in pre-tokenized and pre-tagged text.
import nltk
import os

os.environ["MALTPARSERHOME"] = "/path/to/malt-1.4.1"
nltk.parse.malt.demo()

CategoryTechnicalNotes

Clone this wiki locally