tweet_parser package

Submodules

tweet_parser.lazy_property module

Module to define a lazy property decorator that allows attributes to be generated dynamically and cached after creation. Original idea found via http://stevenloria.com/lazy-evaluated-properties-in-python/ and lightly modified to preserve underlying docstrings.

tweet_parser.lazy_property.lazy_property(fn)[source]

Decorator that makes a property lazy-evaluated whilst preserving docstrings.

Parameters:fn (function) – the property in question
Returns:evaluated version of the property.

tweet_parser.tweet module

class tweet_parser.tweet.Tweet(tweet_dict, do_format_validation=False)[source]

Bases: dict

Tweet object created from a dictionary representing a Tweet paylaod

Parameters:
  • tweet_dict (dict) – A dictionary representing a Tweet payload
  • do_format_checking (bool) – If “True”, compare the keys in this dict to a supeset of expected keys and to a minimum set of expected keys (as defined in tweet_parser.tweet_keys). Will cause the parser to fail if unexpected keys are present or if expected keys are missing. Intended to allow run-time format testing, allowing the user to surface unexpected format changes.
Returns:

Class “Tweet”, inherits from dict, provides properties to get various data values from the Tweet.

Return type:

Tweet

Raises:

NotATweetError – the Tweet dict is malformed, see tweet_checking.check_tweet for details

Example

>>> from tweet_parser.tweet import Tweet
>>> # python dict representing a Tweet
>>> tweet_dict = {"id": 867474613139156993,
...               "id_str": "867474613139156993",
...               "created_at": "Wed May 24 20:17:19 +0000 2017",
...               "text": "Some Tweet text",
...               "user": {
...                   "screen_name": "RobotPrincessFi",
...                   "id_str": "815279070241955840"
...                   }
...              }
>>> # create a Tweet object
>>> tweet = Tweet(tweet_dict)
>>> # use the Tweet obj to access data elements
>>> tweet.id
'867474613139156993'
>>> tweet.created_at_seconds
1495657039
all_text

All of the text of the tweet. This includes @ mentions, long links, quote-tweet contents (separated by a newline), RT contents & poll options

Returns:value returned by calling tweet_text.get_all_text on self
Return type:str
bio

The bio text of the user who posted the Tweet

Returns:the user’s bio text. value returned by calling tweet_user.get_bio on self
Return type:str
created_at_datetime

Time that a Tweet was posted as a Python datetime object

Returns:the value of tweet.created_at_seconds converted into a datetime object
Return type:datetime.datetime
created_at_seconds

Time that a Tweet was posted in seconds since the Unix epoch

Returns:seconds since the unix epoch (determined by converting Tweet.id into a timestamp using tweet_date.snowflake2utc)
Return type:int
created_at_string

Time that a Tweet was posted as a string with the format YYYY-mm-ddTHH:MM:SS.000Z

Returns:the value of tweet.created_at_seconds converted into a string (YYYY-mm-ddTHH:MM:SS.000Z)
Return type:str
embedded_tweet

Get the retweeted Tweet OR the quoted Tweet and return it as a Tweet object

Returns:a Tweet representing the quote Tweet or the Retweet (see tweet_embeds.get_embedded_tweet, this is that value as a Tweet)
Return type:Tweet (or None, if the Tweet is neither a quote tweet or a Retweet)
Raises:NotATweetError – if embedded tweet is malformed
favorite_count

The number of favorites that this tweet has received at the time of retrieval. If a tweet is obtained from a live stream, this will likely be 0.

Returns:value returned by calling tweet_counts.get_favorite_count on self
Return type:int
follower_count

The number of followers that the author of the Tweet has

Returns:the number of followers. value returned by calling get_follower_count on self
Return type:int
following_count

The number of accounts that the author of the Tweet is following

Returns:the number of accounts that the author of the Tweet is following, value returned by calling get_following_count on self
Return type:int
generator

Get information about the application that generated the Tweet

Returns:keys are ‘link’ and ‘name’, the link to and name of the application that generated the Tweet. value returned by calling tweet_generator.get_generator on self
Return type:dict
geo_coordinates

The user’s geo coordinates, if they are included in the payload (otherwise return None). Dictionary with the keys “latitude” and “longitude” or None

Returns:value returned by calling tweet_geo.get_geo_coordinates on self
Return type:dict
gnip_matching_rules

Get the Gnip tagged rules that this tweet matched.

Returns:List of potential tags with the matching rule or None if no rules are defined.
hashtags

A list of hashtags in the Tweet. Note that in the case of a quote-tweet, this does not return the hashtags in the quoted status. The recommended way to get that list would be to use tweet.quoted_tweet.hashtags

Returns:list of all of the hashtags in the Tweet value returned by calling tweet_entities.get_hashtags on self
Return type:list (a list of strings)
id

Tweet snowflake id as a string

Returns:Twitter snowflake id, numeric only (no other text)
Return type:str

Example

>>> from tweet_parser.tweet import Tweet
>>> original_format_dict = {
...     "created_at": "Wed May 24 20:17:19 +0000 2017",
...     "id": 867474613139156993,
...     "id_str": "867474613139156993",
...     "user": {"user_keys":"user_data"},
...     "text": "some tweet text"
...     }
>>> Tweet(original_format_dict).id
'867474613139156993'
>>> activity_streams_dict = {
...     "postedTime": "2017-05-24T20:17:19.000Z",
...     "id": "tag:search.twitter.com,2005:867474613139156993",
...     "actor": {"user_keys":"user_data"},
...     "body": "some tweet text"
...     }
>>> Tweet(activity_streams_dict).id
'867474613139156993'
in_reply_to_screen_name

The screen name of the user being replied to (None if the Tweet isn’t a reply)

Returns:value returned by calling tweet_reply.get_in_reply_to_screen_name on self
Return type:str
in_reply_to_status_id

The status id of the Tweet being replied to (None if the Tweet isn’t a reply)

Returns:value returned by calling tweet_reply.get_in_reply_to_status_id on self
Return type:str
in_reply_to_user_id

The user id of the user being replied to (None if the Tweet isn’t a reply). This raises a NotAvailableError for activity-streams format

Returns:value returned by calling tweet_reply.get_in_reply_to_user_id on self
Return type:str
klout_id

(DEPRECATED): The Klout ID of the user (str) (if it exists)

Returns:value returned by calling tweet_user.get_klout_id on self (if no Klout is present, this returns a None)
Return type:str
klout_influence_topics

(DEPRECATED): Get the user’s Klout influence topics (a list of dicts), if it exists. Topic dicts will have these keys: url, id, name, score

Returns:value returned by calling tweet_user.get_klout_topics(self, topic_type = ‘influence’) (if no Klout is present, this returns a None)
Return type:list
klout_interest_topics

(DEPRECATED): Get the user’s Klout interest topics (a list of dicts), if it exists. Topic dicts will have these keys: url, id, name, score

Returns:value returned by calling tweet_user.get_klout_topics(self, topic_type = ‘interest’) (if no Klout is present, this returns a None)
Return type:list
klout_profile

(DEPRECATED): The Klout profile URL of the user (str) (if it exists)

Returns:value returned by calling tweet_user.get_klout_profile on self (if no Klout is present, this returns a None)
Return type:str
klout_score

(DEPRECATED): The Klout score (int) (if it exists) of the user who posted the Tweet

Returns:value returned by calling tweet_user.get_klout_score on self (if no Klout is present, this returns a None)
Return type:int
lang

The language that the Tweet is written in.

Returns:2-letter BCP 47 language code (or None if undefined) Value returned by calling tweet_text.get_lang on self
Return type:str
media_urls

A list of all media (https) urls in the tweet, useful for grabbing photo/video urls for other purposes.

Returns:list of all of the media urls in the Tweet value returned by calling tweet_entities.get_media_urls on self
Return type:list (a list of strings)
most_unrolled_urls

For each url included in the Tweet “urls”, get the most unrolled version available. Only return 1 url string per url in tweet.tweet_links In order of preference for “most unrolled” (keys from the dict at tweet.tweet_links):

  1. unwound/url
  2. expanded_url
  3. url
Returns:list of urls value returned by calling tweet_links.get_most_unrolled_urls on self
Return type:list (a list of strings)
name

The display name of the user who posted the Tweet

Returns:value returned by calling tweet_user.get_name on self
Return type:str
poll_options

The text in the options of a poll as a list. If there is no poll in the Tweet, return an empty list. If activity-streams format, raise NotAvailableError

Returns:value returned by calling tweet_text.get_poll_options on self
Return type:list (list of strings)
profile_location

User’s derived location data from the profile location enrichment If unavailable, returns None.

Returns:value returned by calling tweet_geo.get_profile_location on self
Return type:dict

Example

>>> result = {"country": "US",         # Two letter ISO-3166 country code
...           "locality": "Boulder",   # The locality location (~ city)
...           "region": "Colorado",    # The region location (~ state/province)
...           "sub_region": "Boulder", # The sub-region location (~ county)
...           "full_name": "Boulder, Colorado, US", # The full name (excluding sub-region)
...           "geo":  [40,-105]        # lat/long value that coordinate that corresponds to
...                                     # the lowest granularity location for where the user
...                                     # who created the Tweet is from
... }
quote_count

The number of tweets that this tweet has been quoted in at the time of retrieval. If a tweet is obtained from a live stream, this will likely be 0. This raises a NotAvailableError for activity-streams format

Returns:value returned by calling tweet_counts.get_quote_count on self or raises NotAvailableError
Return type:int
quote_or_rt_text

The quoted or retweeted text in a Tweet (this is not the text entered by the posting user)

  • tweet: empty string (there is no quoted or retweeted text)
  • quote: only the text of the quoted Tweet
  • retweet: the text of the retweet
Returns:value returned by calling tweet_text.get_quote_or_rt_text on self
Return type:str
quoted_tweet

The quoted Tweet as a Tweet object If the Tweet is not a quote Tweet, return None If the quoted Tweet payload cannot be loaded as a Tweet, this will raise a “NotATweetError”

Returns:A Tweet representing the quoted status (or None) (see tweet_embeds.get_quote_tweet, this is that value as a Tweet)
Return type:Tweet
Raises:NotATweetError – if quoted tweet is malformed
retweet_count

The number of times this tweet has been retweeted at the time of retrieval. If a tweet is obtained from a live stream, this will likely be 0.

Returns:value returned by calling tweet_counts.get_retweet_count on self
Return type:int
retweeted_tweet

The retweeted Tweet as a Tweet object If the Tweet is not a Retweet, return None If the Retweet payload cannot be loaded as a Tweet, this will raise a NotATweetError

Returns:A Tweet representing the retweeted status (or None) (see tweet_embeds.get_retweet, this is that value as a Tweet)
Return type:Tweet
Raises:NotATweetError – if retweeted tweet is malformed
screen_name

The screen name (@ handle) of the user who posted the Tweet

Returns:value returned by calling tweet_user.get_screen_name on self
Return type:str
text

The contents of “text” (original format) or “body” (activity streams format)

Returns:value returned by calling tweet_text.get_text on self
Return type:str

The links that are included in the Tweet as “urls” (if there are no links, this is an empty list) This includes links that are included in quoted or retweeted Tweets Returns unrolled or expanded_url information if it is available

Returns:A list of dictionaries containing information about urls. Each dictionary entity can have these keys; without unwound url or expanded url Twitter data enrichments many of these fields will be missing. (value returned by calling tweet_links.get_tweet_links on self)
Return type:list (list of dicts)

Example

>>> result = [
...   {
...   # url that shows up in the tweet text
...   'display_url': "https://twitter.com/RobotPrinc...",
...   # long (expanded) url
...   'expanded_url': "https://twitter.com/RobotPrincessFi",
...   # characters where the display link is
...   'indices': [55, 88],
...   'unwound': {
...      # description from the linked webpage
...      'description': "the Twitter profile of RobotPrincessFi",
...      'status': 200,
...      # title of the webpage
...      'title': "the Twitter profile of RobotPrincessFi",
...      # long (expanded) url}
...      'url': "https://twitter.com/RobotPrincessFi"},
...   # the url that tweet directs to, often t.co
...   'url': "t.co/1234"}]
tweet_type

tweet, quote, and retweet)

Returns:(“tweet”,”quote” or “retweet” only) value returned by calling tweet_text.get_tweet_type on self
Return type:str
Type:The type of Tweet this is (3 options
user_entered_text

The text that the posting user entered

tweet: untruncated (includes @-mention replies and long links) text of an original Tweet

quote tweet: untruncated poster-added content in a quote-tweet

retweet: empty string

Returns:if tweet.tweet_type == “retweet”, returns an empty string else, returns the value of tweet_text.get_full_text(self)
Return type:str
user_id

The Twitter ID of the user who posted the Tweet

Returns:value returned by calling tweet_user.get_user_id on self
Return type:str
user_mentions

The @-mentions in the Tweet as dictionaries. Note that in the case of a quote-tweet, this does not return the users mentioned in the quoted status. The recommended way to get that list would be to use ‘tweet.quoted_tweet.user_mentions’. Also note that in the caes of a quote-tweet, the list of @-mentioned users does not include the user who authored the original (quoted) Tweet, you can get the author of the quoted tweet using tweet.quoted_tweet.user_id

Returns:1 item per @ mention, value returned by calling tweet_entities.get_user_mentions on self
Return type:list (list of dicts)

Example

>>> result = {
...   #characters where the @ mention appears
...   "indices": [14,26],
...   #id of @ mentioned user as a string
...   "id_str": "2382763597",
...   #screen_name of @ mentioned user
...   "screen_name": "notFromShrek",
...   #display name of @ mentioned user
...   "name": "Fiona",
...   #id of @ mentioned user as an int
...   "id": 2382763597
... }

tweet_parser.tweet_checking module

Validation and checking methods for Tweets.

Methods here are primarily used by other methods within this module but can be used for other validation code as well.

tweet_parser.tweet_checking.check_tweet(tweet, validation_checking=False)[source]

Ensures a tweet is valid and determines the type of format for the tweet.

Parameters:
  • tweet (dict/Tweet) – the tweet payload
  • validation_checking (bool) – check for valid key structure in a tweet.
tweet_parser.tweet_checking.get_all_keys(tweet, parent_key='')[source]

Takes a tweet object and recursively returns a list of all keys contained in this level and all nexstted levels of the tweet.

Parameters:
  • tweet (Tweet) – the tweet dict
  • parent_key (str) – key from which this process will start, e.g., you can get keys only under some key that is not the top-level key.
Returns:

list of all keys in nested dicts.

Example

>>> import tweet_parser.tweet_checking as tc
>>> tweet = {"created_at": 124125125125, "text": "just setting up my twttr",
...          "nested_field": {"nested_1": "field", "nested_2": "field2"}}
>>> tc.get_all_keys(tweet)
['created_at', 'text', 'nested_field nested_1', 'nested_field nested_2']
tweet_parser.tweet_checking.is_original_format(tweet)[source]

Simple checker to flag the format of a tweet.

Parameters:tweet (Tweet) – tweet in qustion
Returns:Bool

Example

>>> import tweet_parser.tweet_checking as tc
>>> tweet = {"created_at": 124125125125,
...          "text": "just setting up my twttr",
...          "nested_field": {"nested_1": "field", "nested_2": "field2"}}
>>> tc.is_original_format(tweet)
True
tweet_parser.tweet_checking.key_validation_check(tweet_keys_list, superset_keys, minset_keys)[source]

Validates the keys present in a Tweet.

Parameters:
  • tweet_keys_list (list) – the keys present in a tweet
  • superset_keys (set) – the set of all possible keys for a tweet
  • minset_keys (set) – the set of minimal keys expected in a tweet.
Returns:

0 if no errors

Raises:

UnexpectedFormatError on any mismatch of keys.

tweet_parser.tweet_keys module

tweet_parser.tweet_parser_errors module

exception tweet_parser.tweet_parser_errors.NotATweetError[source]

Bases: Exception

exception tweet_parser.tweet_parser_errors.NotAvailableError[source]

Bases: Exception

exception tweet_parser.tweet_parser_errors.UnexpectedFormatError[source]

Bases: Exception

Module contents