The purpose of this lab session is to get some hands-on experience with Part-of-Speech (POS) tagging.
You may work in any programming language (as long as it produces the desired output). In case you opt for Java, you can proceed as follows:
HiddenMarkovModel, which contains two fields:
protected Map<String, Map<String,Double>> transitionProb; protected Map<String, Map<String,Double>> emissionProb;The first will map a tag to a successor tag and the number of times that this transition was seen. The second will map a tag to a word and the number of times that this emission was seen.
toString() method that outputs both maps in a readable format.
public void foundTransition(String fromTag, String toTag); public void foundEmission(String tag, String word);...which increase the respective counter by 1.
For example, if the transitionProb map is empty and we call
foundTransition("NNP", "VB");
foundTransition("NNP", "VB");
foundTransition("NNP", "ADJ");
foundTransition("VB", "ADJ");
then the map should be NNP -> { VB ->2, ADJ -> 1}, VB -> {ADJ -> 1}.
normalize(), which normalizes the counts in transitionProb and emissionProb to probabilities.
For example, the above map should become NNP -> { VB ->0.666, ADJ -> 0.333}, VB -> {ADJ -> 1}
Hints:
BufferedReader
myString.split(X) (where, e.g., X=" " or X="/")
toString() of the model.
The goal is to implement a POS-Tagger by the Viterbi algorithm. You may work in any programming language (as long as it produces the desired output). In case you opt for Java, you can proceed as follows:
Serializable.
Write the methods
public double emissionProbability(String tag, String word); public double transitionProbability(String fromTag, String toTag);which return 0 if the input could not be found.
ObjectOutputStream).
Viterbi, which has a field protected HiddenMarkovModel model and a constructor that takes a HMM stored on disk as argument (use ObjectInputStream).
public List<String> parse(String sentence), which, given a sentence, returns the list of most likely tags.Hints:
double[][] probabilities=new double[numWords][numTags]; int[][] backpointers=new int[numWords][numTags];