Author Archives: aria42
In an earlier post, I hinted at a Yelp review summarization system demo. Well, here it is. Some background: Christy Sauper, Regina Barzilay, and I recently presented Incorporating Content Structure into Text Analysis Applications at the Empirical Methods of Natural … Continue reading
A few people from my last post asked for an accessible explanation of the margin infused relaxation algorithm (MIRA) and confidence-weighted learning (CW) classification algorithms I discussed. I don’t think I can easily explain CW, but I think MIRA, or … Continue reading
Note: This post won’t make sense unless you’re steeped in recent machine learning. There’s a good chance that if you are, you already know this. During a machine learning reading group with Mike Collins, Jenny Finkel, Alexander Rush and myself … Continue reading
Recently, Christy Sauper, Regina Barzilay, and me published a paper, Incorporating Content Structure into Text Analysis Applications, about how to use content structure in a document to improve accuracy on information extraction tasks. One of the datasets we worked with … Continue reading
Last week, I posted a 300 line clojure script which implements some recent work I’ve published in unsupervised part-of-speech tagging. In this post, I’m going to describe more fully how the model works and also how the implementation works. This post is going to assume that you have some basic background in probability and that you know some clojure. The post is massive, so feel free to skip sections if you feel like something is too remedial; I’ve put superfluous details in footnotes or marked paragraphs.
Recently, Yoong-Keok Lee, Regina Barzilay, and myself, published a paper on doing unsupervised part-of-speech tagging. I.e., how do we learn syntactic categories of words from raw text. This model is actually pretty simple relevant to other published papers and actually … Continue reading
There’s a divide I’ve noticed amongst people lumped into a “computer science” department. Compactly, I think there are computer scientists and computational scientists; the knowledge base of these groups is rapidly diverging and CS departments should do a better job … Continue reading