Skip to main content
Loading Events

« All Events

  • This event has passed.

STOR Colloquium: Jacob Bien, USC

2 Nov @ 4:00 pm - 5:15 pm

Jacob Bien
University of Southern California


Tree-Based Aggregation of Rare Features for Prediction

 

It is common in modern prediction problems for many features to be counts of rarely occurring events.  The challenge posed by such “rare features” has received little attention despite its prevalence in diverse areas, ranging from biology (e.g., rare species within a microbiome) to natural language processing (e.g., rare words within an online hotel review). We show, both theoretically and empirically, that not explicitly accounting for the rareness of features can greatly reduce the effectiveness of an analysis. We next propose a framework for aggregating rare features into denser features in a flexible manner that creates better predictors of the response.  Applications to the microbiome and to online hotel reviews show how our methodology is useful in a wide range of contexts.

Details

Date:
2 Nov
Time:
4:00 pm - 5:15 pm
Event Category:

Venue

Hanes Hall
Hanes Hall
Chapel Hill, NC 27599 United States