MaltParser

Setting up MaltParser

Not so hard:

download here: http://maltparser.org/download.html
fastest possible thing you can do is get pre-trained models from here: http://maltparser.org/mco/mco.html
The user guide is pretty good: http://maltparser.org/userguide.html

parsing sentences in CoNLL format

This assumes that you're in the directory with malt.jar, and that engmalt.linear.mco (available above) is also in the cwd.

java -jar malt.jar -c engmalt.linear -m parse -i pierre.conll

(optionally, you can also specify "-o outfile.conll" to put the output into a file instead of stdout.)

Assuming that pierre.conll contains this (tab-separated):

1	Pierre	_	NNP	NNP	_
2	Vinken	_	NNP	NNP	_
3	,	_	,	,	_
4	61	_	CD	CD	_
5	years	_	NNS	NNS	_
6	old	_	JJ	JJ	_
7	,	_	,	,	_
8	will	_	MD	MD	_
9	join	_	VB	VB	_
10	the	_	DT	DT	_
11	board	_	NN	NN	_
12	as	_	IN	IN	_
13	a	_	DT	DT	_
14	nonexecutive	_	JJ	JJ	_
15	director	_	NN	NN	_
16	Nov.	_	NNP	NNP	_
17	29	_	CD	CD	_
18	.	_	.	.	_

Using MaltParser with nltk

The module you want from nltk is ntlk.parse.malt -- but before you can use it, you have to tell it where the MaltParser jar lives; this is done by setting an environment variable. That works fine from inside Python.

issue: if you've got a pre-trained model, it doesn't look like there's a good way to specify that file?
you should also be able to pass MALTPARSERHOME as an argument.
also: if your text is already tagged, no way to specify that? the code in NLTK as it stands wants to run a tagger on your text.
It also wants to do something dumb like tokenizing by just string splitting. There should be a way to pass in pre-tokenized and pre-tagged text.

import nltk
import os

os.environ["MALTPARSERHOME"] = "/path/to/malt-1.4.1"
nltk.parse.malt.demo()

CategoryTechnicalNotes

(this space intentionally left blank)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MaltParser

Setting up MaltParser

parsing sentences in CoNLL format

Using MaltParser with nltk

Clone this wiki locally