top of page

CSE 891 Class Project

Project Goal:​


Categorize and cluster geocoded tweets in, and around, New York City. Identify trends within these clusters.

​
Data Collection:​


Using a geocode delimiter of NYC (-74,40 -73,41) we were able to gather public tweets using the twitter API. Over a million tweets were gathered during a sixteen day period (2/27/13 to 3/15/13).

​
Data Analysis:​


We picked ten categories (day, entertainment, location, love, sports, weather, work, politics, pop culture, and art) and developed a list of words for each category. Tweets were sorted into each category dependent on what words they contained and what category was attributed to those words.
Based on the latitude and longitude tweets were sorted into their proper NY State zip code.

Computational Techniques for Large-Scale Data Analysis

All rights reserved. 2013​

bottom of page