Do more with Twitter data

Welcome to our new series, where our data scientists work through examples and share their learnings and tips for getting the most out of Twitter data using Twitter APIs. Each post in the series will center around a real-life project and provides MIT-licensed code that you can use to bootstrap your projects with our enterprise and premium APIs.

Finding the Right Data

In our first post, Fiona Pigott (@notFromShrek) will show you how to get the Tweets most related to the question, “What do people talk about when they fly?” She will walk you through:

  • Getting Tweets using our search APIs
  • Filtering and refining rules to improve the quality of your Tweet sample
  • Working with Tweet payload elements (parsing Tweets, tokenization of text, etc.), and
  • Basic natural-language processing with Twitter data.

Please go here to see it!

Clustering Users

In our next post, Josh Montague (@jrmontag) will show you will take us through an analysis of people who Tweeted about the 2017 Cannes film festival. He will walk you through:

  • Getting Tweets via our Search APIs
  • Working with User-level attributes from Tweet payloads
  • Natural-language processing (NLP) and feature engineering
  • Building and refining clustering models
  • Techniques for model inspection and visualization

As in our previous post, the example is written in Python, but the techniques are language agnostic and can be implemented readily in other languages with good data and machine-learning library support.

Please go here to see it!

Time Series Analysis

People commonly use Twitter data to identify various trends. In this next example in our series, Aaron Gonzales (@binary_aaron) will introduce an overview of methods for working with Twitter data as a time series. We’ll begin by looking at the volume of Tweets that discuss Taylor Swift in 2017, and discuss the following:

  • Using the Search API Counts endpoint
  • Basic operations and transformations for time series
  • Detrending time series to understand local variations
  • Using threshold-based methods to detect trends or bursts
  • Contextualizing a time series
  • Quickly expanding to other pop stars and topics

As in our previous posts, the example is written in Python, but the techniques are language agnostic and can be implemented readily in other languages with good data and machine-learning library support.

Please go here to see it!