NLP4J by emorynlp

Release Notes

Version 1.1.3 (08/01/2016)

All NLP4J projects (except for the english-models) are united into two projects: api and cli. Now, instead of adding nlp4j, you should add nlp4j-api and nlp4j-cli to the maven dependecies (see the installation instruction).
If you are interested in running only the command-line tools, you can now simply download the assembler and run executable files (see the installation instruction).
Features: dependency evaluator, scorer speed-up.

Version 1.1.2 (06/29/2016)

Bugfixes: tokenization-issue-7
Features: tokenization-issue-6
The tokenizer does not tokenize left/right brackets where the content inside is a single character or all numbers (e.g., (a),[12]).

Version 1.1.1 (04/29/2016)

Bugfixes: core-pull-7.
Features: issue-3, issue-6.
NLPNode comes with several useful constructors.
The segmentize method in Tokenizer takes the generic type of Token.

Version 1.1.0 (04/20/2016)

Bugfixes: tokenization-pull-5.
Features: morphology-issue-4.
All the statistical models are about twice smaller than the previous ones without compromising accuracy. The whole pipeline can be run in 4GB of RAM now.
Training automatically saves the best model in a single pass (no need to run training twice any more to save the best model).
The nlp4j-common project is separated out from the nlp4j-core project.
GlobalLexica is no longer static, so it does not get conflicted by another process.
NLPNode extends AbstractNLPNode, which allows to create your own custom node. Generics are added all over for this change (e.g., NLPState, OnlineComponent).
The part-of-speech tagger gives the 2nd-best predictions when the best predictions have low confidence (pos2 in the extra feats).

Version 1.0.0 (02/24/2016)

NLP4J is the successor of the widely used toolkit, ClearNLP, developed by the Emory NLP research group.
This version supports tokenization, part-of-speech tagging, morphological analysis, named entity recognition, and dependency parsing. The next release (March, 2016) will include supports for semantic role labeling and sentiment analysis, and the following release (April, 2016) will include supports for coreference resolution.
NLP4J makes it easy to train your own model. Please see how to train for more details about the training process.
Calling the decoding API is easier than ever. See the examples for more details.
The biggest difference between NLP4J and ClearNLP is in machine learning. NLP4J is capable of updating existing models with new training data, which is useful for domain adaptation. We also started implementing a deep learning package although we realized that the GPU support for Java is pretty limited and without a good GPU support, deep learning would make everything much slower. Please let us know if you'd like to contribute for this project.
One could consider the NLP4J project is a more stabilized version of ClearNLP. I have been using this package for the NLP course I teach, and my students (including undergrads) were able to develop new NLP components without much effort using the built-in APIs in NLP4J. We are preparing a tutorial for developing NLP components using NLP4J.
We do not expect our tools would work perfectly out of box. We now have a good team working on this project. Please let us know if you'd like to collaborate so we can make this project more robust for you.
Please visit our online demo. It parses 10K tokens with a couple of seconds and visualizes the dependency trees.