Thursday, May 03, 2018

Stanford Log-linear Part-Of-Speech Tagger

Another day, another requirement. I was looking for projects, when I came across one project asking for an integration with Stanford NLP POS Tagger. So here were 4 big words, about which I obviously needed to do some search on to understand them in detail.

A google search for the exact term gave me the page to Stanford Natural Language Processing Group's site, which had this to say:
A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'.
Digging a step further, it seems this comes already pre shipped with the nltk package. From the downloads section:
Python: NLTK (2.0+) contains an interface to the Stanford POS tagger.
And this is the package called as the Stanford Log-linear Part-Of-Speech Tagger.

Why the Log-Linear? Well, from wikipedia:
A log-linear model is a mathematical model that takes the form of a function whose logarithm equals a linear combination of the parameters of the model, which makes it possible to apply (possibly multivariate) linear regression.

I think freelancing is a good idea once in a while - it helps one come across a multitude of technologies, and even basic reading on them helps one grasp the direction industry is moving in general. 

No comments:

Post a Comment