English

Lexica

All lexica can be found here:

Models

All models can be found here:

Models are trained on the following corpora.

OntoNotes 5.0 Sentences Tokens Names
Broadcasting conversations 10,822 171,101 9,771
Broadcasting news 10,344 206,029 19,670
News magazines 6,672 163,627 10,736
Newswires 34,438 875,800 77,496
Religious texts 21,418 296,432 0
Telephone conversations 8,963 85,444 2,021
Web texts 12,448 284,951 8,170
English Web Treebank Sentences Tokens
Answers 2,699 43,916
Email 2,983 44,168
Newsgroup 1,996 37,816
Reviews 2,915 44,337
Weblog 1,753 38,770
QuestionBank Sentences Tokens
Questions 3,198 29,704
MiPACQ Sentences Tokens
Clinical questions 1,600 30,138
Medpedia articles 2,796 49,922
Clinical notes 8,383 113,164
Pathological notes 1,205 21,353
SHARP Sentences Tokens
Seattle group health notes 7,204 94,450
Clinical notes 6,807 93,914
Stratified 4,320 43,536
Stratified SGH 13,662 139,403
THYME Sentences Tokens
Clinical / pathological notes 26,661 387,943
Brain cancer 18,722 225,899